When adding more than one ER field from node to term with the taxonomy_index behaviour, entries in {taxonomy_index} are incorrectly inserted in more than one occurrence. Apart from the instrinsic error, this causes problems in views using this index, which receive duplicate results.

The problems comes from EntityReferenceBehavior_TaxonomyIndex::buildbuildNodeIndex() being invoked multiple times from entityreference_entity_crud() and inserting the correct data every time instead of merging them. I'm writing a test case to demonstrate the problem and prove the fix once someone writes it.

Files: 
CommentFileSizeAuthor
#12 duplicate-entries-in-taxo_index.patch3.91 KBFrando
PASSED: [[SimpleTest]]: [MySQL] 123 pass(es).
[ View ]
#7 0002-Issue-1924444-duplicate-entries-in-taxo_index.patch3.68 KBfgm
PASSED: [[SimpleTest]]: [MySQL] 121 pass(es).
[ View ]
#1 0001-Issue-1924444-duplicate-entries-in-taxonmy_index.patch2.35 KBfgm
FAILED: [[SimpleTest]]: [MySQL] 120 pass(es), 1 fail(s), and 0 exception(s).
[ View ]

Comments

StatusFileSize
new2.35 KB
FAILED: [[SimpleTest]]: [MySQL] 120 pass(es), 1 fail(s), and 0 exception(s).
[ View ]

And here is the test showing the problem.

Status:Active» Needs review

Pinging bot to show RED.

Status:Needs review» Needs work

The last submitted patch, 0001-Issue-1924444-duplicate-entries-in-taxonmy_index.patch, failed testing.

Is this code in 8.x now? If so, please reassign to core.

Alas no: 8.x only included the selection plugins, not the behaviors. I just tested and this index feature is not present at all in 8.x.

Possible patch. Green ?

Note that, although this appears to work, it switches node saves from #fields multi-row Insert queries to #fields * #tids single-row Merge queries. There might be a more efficient way to do this.

Status:Needs work» Needs review
StatusFileSize
new3.68 KB
PASSED: [[SimpleTest]]: [MySQL] 121 pass(es).
[ View ]

Better with the patch.

Alternatively, we could add a entityPostInsertMultiple() and entityPostUpdateMultiple() methods, that get called once per operation per entity, with a list of (field, instance) tuples.

Not sure how to make this work: hook_entity_update() and hook_field_update() both passes a single entity, not an array, so we would need to build a queue somehow and find when to trigger its consumption.

Not worth it, IMO. Or at least not worth delaying a bug fix. We can always refactor later, especially when the time comes to kill taxonomy_term_reference in D8.

Not sure how to make this work: hook_entity_update() and hook_field_update() both passes a single entity, not an array, so we would need to build a queue somehow and find when to trigger its consumption.

No, I suggest one call per operation, per entity. The "multiple" part is that it's called only once per behavior, even if there are multiple fields in the entity that uses the same behavior.

That's very likely to be a good idea, especially when looking at our current implementations, where we have probably more repeated loops than needed. I just won't have any time to work on it in the short term, so unless there is a problem with this fix, I'd rather see it merged and the performance issue addressed separately.

Issue summary:View changes
StatusFileSize
new3.91 KB
PASSED: [[SimpleTest]]: [MySQL] 123 pass(es).
[ View ]

Attached patch is the same as #7 but adds an update function to remove existing duplicates.

This can take a while for large sites, I am not sure how batch API handles very long queries. Large sites will likely run upates via drush updatedb anyway.