Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
When adding more than one ER field from node to term with the taxonomy_index behaviour, entries in {taxonomy_index}
are incorrectly inserted in more than one occurrence. Apart from the instrinsic error, this causes problems in views using this index, which receive duplicate results.
The problems comes from EntityReferenceBehavior_TaxonomyIndex::buildbuildNodeIndex()
being invoked multiple times from entityreference_entity_crud()
and inserting the correct data every time instead of merging them. I'm writing a test case to demonstrate the problem and prove the fix once someone writes it.
Comment | File | Size | Author |
---|---|---|---|
#57 | entity_reference-duplicate_tx_entries-1924444-57-d7.patch | 3.91 KB | eelkeblok |
|
Comments
Comment #1
fgmAnd here is the test showing the problem.
Comment #2
fgmPinging bot to show RED.
Comment #4
Damien Tournoud CreditAttribution: Damien Tournoud commentedIs this code in 8.x now? If so, please reassign to core.
Comment #5
fgmAlas no: 8.x only included the selection plugins, not the behaviors. I just tested and this index feature is not present at all in 8.x.
Comment #6
fgmPossible patch. Green ?
Note that, although this appears to work, it switches node saves from
#fields
multi-row Insert queries to#fields * #tids
single-row Merge queries. There might be a more efficient way to do this.Comment #7
fgmBetter with the patch.
Comment #8
Damien Tournoud CreditAttribution: Damien Tournoud commentedAlternatively, we could add a entityPostInsertMultiple() and entityPostUpdateMultiple() methods, that get called once per operation per entity, with a list of (field, instance) tuples.
Comment #9
fgmNot sure how to make this work: hook_entity_update() and hook_field_update() both passes a single entity, not an array, so we would need to build a queue somehow and find when to trigger its consumption.
Not worth it, IMO. Or at least not worth delaying a bug fix. We can always refactor later, especially when the time comes to kill taxonomy_term_reference in D8.
Comment #10
Damien Tournoud CreditAttribution: Damien Tournoud commentedNo, I suggest one call per operation, per entity. The "multiple" part is that it's called only once per behavior, even if there are multiple fields in the entity that uses the same behavior.
Comment #11
fgmThat's very likely to be a good idea, especially when looking at our current implementations, where we have probably more repeated loops than needed. I just won't have any time to work on it in the short term, so unless there is a problem with this fix, I'd rather see it merged and the performance issue addressed separately.
Comment #12
Frando CreditAttribution: Frando commentedAttached patch is the same as #7 but adds an update function to remove existing duplicates.
This can take a while for large sites, I am not sure how batch API handles very long queries. Large sites will likely run upates via drush updatedb anyway.
Comment #13
hawkeye217 CreditAttribution: hawkeye217 commentedWas running into this issue and I can confirm the patch in #12 works great.
Comment #14
eelkeblokSame here. Pure coincidence we both needed this patch today, I suppose. I'm putting this on RTBC (glancing over the patch I don't see anything weird code-wise either).
Comment #15
katannshaw CreditAttribution: katannshaw commentedI just tried patch #12, and it worked for me up until the job of clearing duplicate rows. This was the error I received:
Will the SQL query work only in mySQL? I'm using SQL Server, so I don't know if the syntax is what's causing the issue for me.
Comment #16
eelkeblokIt would appear that "CREATE TABLE ... LIKE" is indeed not supported by SQL Server (e.g. http://stackoverflow.com/questions/616104/what-is-the-equivalent-of-crea...). Maybe you can provide an alternative patch that also works on SQL Server? I'm not sure if this is reason enough to reset the issue status back to needs work.
Comment #17
katannshaw CreditAttribution: katannshaw commentedThanks for the quick reply. I can definitely look into doing that. It would be my first attempt at a real patch. If I'm having issues with it with SQL Server, then Postgres users are probably having the same issue as well.
In the meantime, I ran those 5 query jobs manually without an issue.
Comment #18
fgmI think that would have to be a patch against the MS SQL driver rather than EntityReference ? This EntityReference behavior relies on the sql_field_storage, but should not be incuding engine-specific code.
Comment #19
jelo CreditAttribution: jelo commentedI just ran into this as well. Will the patch be committed anytime soon? To me this sounds like a major issue as it seems rather common to use multiple entity references in views. I upgraded my site from D6 to D7 and every second view was suddenly showing duplicates because of this after I completed the field migration from nodereference to entityreference.
Comment #20
jelo CreditAttribution: jelo commentedI can confirm that the patch in #12 applied nicely and fixed the issue.
Comment #21
jelo CreditAttribution: jelo commentedWell, since I applied the patch any nodes being added with a single taxonomy term do not get indexed in taxonomy_index table. The values are stored in the field_data_taxonomy_vocabulary_x table, but not indexed which removes the entries from views as soon as one filters on it...
Comment #22
jelo CreditAttribution: jelo commentedI did some more digging and discovered the following. Not sure if this affects a clean install as my site is an upgraded site from D6. On content types with a single entity reference field the taxonomy index was not created if the field was NOT required (and hence not shown in views that used taxonomy index), i.e. this problem seems not to be caused by this patch. As soon as I switched the field to a required field, the taxonomy index entry was created correctly (with and without the patch being applied). If someone can confirm that it happens to them as well with the described setup, I suggest we create a separate issue...
Comment #23
AnybodyThis issue is now quite old but still seems to have cross-SQL-engine problems.
Could we place clear up, if the patch needs work or if the cross-sql-engine problems are outside of the patch and take this into the next .dev version of entity reference soon?
As said above some month ago this issue is really problematic and should not wait for a fix that long, I think. Is there someone with the required know-how and probably test environments to have a look at this?
Comment #24
AnybodyIsn't there a further little typo in patch #12?
I think there are closing brackets missing (eventhough the patch seems to work fine under MySQL):
In
See:
db_query("INSERT INTO {taxonomy_index} SELECT * FROM {tmp_taxonomy_index}");
db_query("DROP TABLE {tmp_taxonomy_index}");
--------
For MSSQL Server something like this will be needed:
http://stackoverflow.com/questions/15428168/sql-server-create-a-copy-of-...
Does somebody know if all other engines support that?
MySQL: https://dev.mysql.com/doc/refman/5.0/en/select-into.html
Comment #25
indytechcook CreditAttribution: indytechcook commented@anybody, nice catch. Here is a patch attached with the correct formatting.
Comment #27
indigoxela CreditAttribution: indigoxela commentedHi,
unfortunately the latest attached patch (#25) failed testing.
And #12 has typos.
@indytechcook: would you please have a look at this "unable to apply patch" issue?
regards,
indigoxela
Comment #28
zaporylieI just encountered this issue so here's rerolled version of #25 and test-only version to prove it works.
Comment #30
indigoxela CreditAttribution: indigoxela commented@ zaporylie: many thanks for the rerolled patch.
Tested on a fresh install with entityreference-7.x-1.x-dev on a node type with two different entityreference fields to taxonomy terms..
First without patch to reproduce the faulty behaviour. As expected: duplicates in database table taxonomy_index occured.
(Adding a third field to this node type resulted in three records per node.)
Then I applied the patch and ran "drush updb".
This removed the duplicates from database table.
Adding more terms to nodes now works as it should.
No more duplicates.
Comment #31
zaporylieIn that case RTBC?
Comment #32
indigoxela CreditAttribution: indigoxela commentedGood question...
Is entityreference module supposed to support database servers besides MySQL?
I just tested it on MySQL.
Comment #33
indigoxela CreditAttribution: indigoxela commentedWell, I guess, we should make the update compatible to other database servers.
Does any of the followers see a chance to adapt function entityreference_update_7003() to only use Drupal's db abstraction layer?
https://api.drupal.org/api/drupal/includes!database!database.inc/7.x
I'm not familiar with Postgres or MSSQL and couldn't even test it.
Comment #34
indigoxela CreditAttribution: indigoxela commentedThis could be a starting point for the update function without direct MySQL syntax usage.
Tested to work on MySQL.
People with better pdo skills: please have a look and improve the code.
Anyone willing to test on different database servers is welcome.
Comment #35
eelkeblokThanks for that. I would suggest to define a batch size and do this in several goes when there are a lot of entries. I'm a little worried this query could spin out of control if there are a lot of entries in the index table.
Comment #36
indigoxela CreditAttribution: indigoxela commentedGood point.
I just saw, that it is possible to directly feed a select to db_insert (didn't realize that yesterday). This query happens in db, so there's no danger, that php runs out of memory or something like that.
Comment #37
eelkeblokGood stuff. I've actually done a quick scan of how the various database drivers (including the non-core support for MS SQL, https://www.drupal.org/project/sqlsrv) and they all seem to build a construct of the form
At least, if it doesn't work, we've used the appropriate APIs and there is an opportunity to fix it in the database driver, where it belongs.
Comment #38
zaporylieIn that case I updated #28 with changes suggested by @indigoxela in #36
Comment #42
indigoxela CreditAttribution: indigoxela commentedLooks like we are stuck a bit here.
entityreference.admin.test keeps failing. This is not the test introduced with our patch.
Any ideas about how to solve this are welcome.
Comment #43
eelkeblokEdit: Obviously, first make sure the tests also fail without the patch applied.
I believe the correct procedure would be to identify the problem and create a critical issue for it (or raise an existing one to critical if it doesn't exist yet). See https://www.drupal.org/node/45111:
Comment #45
indigoxela CreditAttribution: indigoxela commentedWhatever the problem is - it is not reproducible on my local dev server.
All entityreference tests pass here - with or without this patch.
My setup:
PHP 5.4
MySQL 5.5
Drupal 7.43
Testing module on (obviously)
The test ran for 44 seconds.
Result: 126 passes, 0 fails, 0 exceptions and 12 debug messages
So unfortunately I'm still clueless.
Comment #46
indigoxela CreditAttribution: indigoxela commentedAhhh... reading my own post, I just realized the difference: Drupal core version.
Running the same tests on my local dev server right after the update to Drupal 7.x-dev fails.
Result: 102 passes, 31 fails, 6 exceptions und 6 debug messages
All fails are in the tests for the administrative UI - the same as online.
So very likely it is a problem with a drupal core code change and not with our test.
Comment #47
eelkeblokThanks for that. I did some git bisecting on the most recent changes to core. This is the result:
I'll create an issue for this and try my hand at a patch.
Comment #48
eelkeblok#2741801: Admin tests fail with latest core dev due to added "administer fields" permission
Comment #49
eelkeblokComment #50
deranga CreditAttribution: deranga commentedStill seems to be a problem with entityreference version 7.x-1.2 - Is the patch from this ticket to plugins/behavior/EntityReferenceBehavior_TaxonomyIndex.class.php going to be incorporated at some point into a release?
Comment #51
eelkeblokIf we, as a community can get it to RTBC status, I would expect that the maintainer is more than willing to incorporate it. Currently, the status is "Needs work", which means, well, that :) I've kicked of another test since the change that was causing trouble has long since been released in core.
Comment #52
eelkeblokComment #53
eelkeblokComment #54
deranga CreditAttribution: deranga commentedWow, quick response @eelkeblok, thx!
The patch seems sensible seeing as we can't instantiate the build index once regardless of number of fields associated to entity so we have to merge and I can confirm that this patch addresses the duplicate issue, although I'm not running the update, as I've manually addressed the data issue.
Comment #55
eelkeblokNow that you mention it, I expect we'll at least need to give the update hook a higher number (apart from that, its number is not according to the convention).
Comment #56
deranga CreditAttribution: deranga commentedLatest update hook in dev is 7002, so we shouldn't need to on that basis, but for convention I guess we could it change it to 7100?
Happy to re-roll patch if you think that would help. How do we move this forward from here?
I'll likely forget I have this patch applied at some point :/
Comment #57
eelkeblokI was writing a bunch of text and figured I might as well do what I was writing, was going to be quicker :) I wanted to have an accompanying tests-only patch as well. I chose to use 7100 voor the update hook, since the other two were done during the alpha phase of the module, so there is some sense there (and it would make sense to promote this one to 7100, since it's the first one done during the stable phase). Another option would have been 7102, since strictly speaking the other two should have been 7100 and 7101.
Renumbering the update hook will mean that it will run again for anyone who had this patch installed, but I don't think that is a huge problem; it will simply make another copy of the taxonomy index table.
Comment #59
eelkeblokLooking good. I'd say this is good for RTBC, but since I've supplied the latest version of the patch I'll leave that to someone else.
Comment #60
deranga CreditAttribution: deranga commentedChanging to RTBC, patch tested and works as exptected - entity_reference-duplicate_tx_entries-1924444-57-d7.patch
Separate test patch fails owing to patch not yet being applied so makes sense to keep as one? removed the rest of the patches from displaying in summary.
Comment #61
eelkeblokYes, the patch that will be applied should be a whole (it needs to include the test as well, to proof the test succeeds with the code changes). The test-only variant is only included to show that the test fails without the actual code changes (e.g. that it correctly tests for the error condition we are trying to fix).
Comment #62
eelkeblokComment #63
deranga CreditAttribution: deranga commentedAh, sure that makes sense.
Cheers @eelkeblok for picking this issue back up.
Hopefully the maintainer will include in next release of the project.
Comment #64
AnybodyWhao thank you very very much!
Confirming RTBC. It would be cool to have this in the next release :)
Comment #65
candelas CreditAttribution: candelas as a volunteer commentedI confirm that patch in #57 works. Thanks @eelkeblok
Comment #67
minorOffense CreditAttribution: minorOffense at Coldfront Labs Inc. commentedThanks everyone. Patch now in dev.
Comment #68
joelpittetI'm getting schema errors on this update not sure why... also the update hook looks a bit high did you mean to bump it to 71xx from 70xx?
Comment #69
eelkeblokWhat is line 86 of includes/database/mysql/schema.inc in your Drupal installation (and its immediate surroundings)?
The high version number is simply due to the numbering conventions as described in https://api.drupal.org/api/drupal/modules--system--system.api.php/functi..., which previously were not adhered to correctly in this module. Increasing by 100 or by 1 does not have any functional effect whatsoever. All that matters is that the number is higher than the previous schema version.
Comment #70
joelpittet@eelkeblok Fair point just noticed other patches were continuing with ty 70xx, so they will need to be aware on their rerolls or they won't be run.
The 86 is the ending ); in this:
Sorry that's less than helpful of a line number:(
Maybe worth using this instead?
https://api.drupal.org/api/drupal/includes%21database%21database.inc/fun...
Comment #71
joelpittetAh here's why, I don't have
taxonomy
module enabled, so the schema is not being returned for that table.Needs to be conditional of a
db_table_exists()
or something.Comment #73
minorOffense CreditAttribution: minorOffense at Coldfront Labs Inc. commentedWell that's some obvious I should have caught. Thanks for that. I've added a db_table_exists to the install file.
Comment #74
joelpittetThanks for the quick fix @minorOffense
Comment #75
eelkeblokAh. Of course. Sillyness.
Comment #77
willzyx CreditAttribution: willzyx commentedThe commit uin this issue cause #2877592: Update to 7.x-1.3 fails on MSSQL and PostgreSQL