When adding more than one ER field from node to term with the taxonomy_index behaviour, entries in {taxonomy_index} are incorrectly inserted in more than one occurrence. Apart from the instrinsic error, this causes problems in views using this index, which receive duplicate results.

The problems comes from EntityReferenceBehavior_TaxonomyIndex::buildbuildNodeIndex() being invoked multiple times from entityreference_entity_crud() and inserting the correct data every time instead of merging them. I'm writing a test case to demonstrate the problem and prove the fix once someone writes it.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

fgm’s picture

And here is the test showing the problem.

fgm’s picture

Status: Active » Needs review

Pinging bot to show RED.

Status: Needs review » Needs work

The last submitted patch, 0001-Issue-1924444-duplicate-entries-in-taxonmy_index.patch, failed testing.

Damien Tournoud’s picture

Is this code in 8.x now? If so, please reassign to core.

fgm’s picture

Alas no: 8.x only included the selection plugins, not the behaviors. I just tested and this index feature is not present at all in 8.x.

fgm’s picture

Possible patch. Green ?

Note that, although this appears to work, it switches node saves from #fields multi-row Insert queries to #fields * #tids single-row Merge queries. There might be a more efficient way to do this.

fgm’s picture

Status: Needs work » Needs review
FileSize
3.68 KB

Better with the patch.

Damien Tournoud’s picture

Alternatively, we could add a entityPostInsertMultiple() and entityPostUpdateMultiple() methods, that get called once per operation per entity, with a list of (field, instance) tuples.

fgm’s picture

Not sure how to make this work: hook_entity_update() and hook_field_update() both passes a single entity, not an array, so we would need to build a queue somehow and find when to trigger its consumption.

Not worth it, IMO. Or at least not worth delaying a bug fix. We can always refactor later, especially when the time comes to kill taxonomy_term_reference in D8.

Damien Tournoud’s picture

Not sure how to make this work: hook_entity_update() and hook_field_update() both passes a single entity, not an array, so we would need to build a queue somehow and find when to trigger its consumption.

No, I suggest one call per operation, per entity. The "multiple" part is that it's called only once per behavior, even if there are multiple fields in the entity that uses the same behavior.

fgm’s picture

That's very likely to be a good idea, especially when looking at our current implementations, where we have probably more repeated loops than needed. I just won't have any time to work on it in the short term, so unless there is a problem with this fix, I'd rather see it merged and the performance issue addressed separately.

Frando’s picture

Issue summary: View changes
FileSize
3.91 KB

Attached patch is the same as #7 but adds an update function to remove existing duplicates.

This can take a while for large sites, I am not sure how batch API handles very long queries. Large sites will likely run upates via drush updatedb anyway.

hawkeye217’s picture

Was running into this issue and I can confirm the patch in #12 works great.

eelkeblok’s picture

Status: Needs review » Reviewed & tested by the community

Same here. Pure coincidence we both needed this patch today, I suppose. I'm putting this on RTBC (glancing over the patch I don't see anything weird code-wise either).

katannshaw’s picture

I just tried patch #12, and it worked for me up until the job of clearing duplicate rows. This was the error I received:

entityreference module
Update #7003

Failed: PDOException: SQLSTATE[42000]: [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near the keyword 'LIKE'.: CREATE TABLE {tmp_taxonomy_index} LIKE {taxonomy_index}; Array ( ) in entityreference_update_7003() (line 170 of sites\all\modules\entityreference\entityreference.install).

Will the SQL query work only in mySQL? I'm using SQL Server, so I don't know if the syntax is what's causing the issue for me.

eelkeblok’s picture

It would appear that "CREATE TABLE ... LIKE" is indeed not supported by SQL Server (e.g. http://stackoverflow.com/questions/616104/what-is-the-equivalent-of-crea...). Maybe you can provide an alternative patch that also works on SQL Server? I'm not sure if this is reason enough to reset the issue status back to needs work.

katannshaw’s picture

Thanks for the quick reply. I can definitely look into doing that. It would be my first attempt at a real patch. If I'm having issues with it with SQL Server, then Postgres users are probably having the same issue as well.

In the meantime, I ran those 5 query jobs manually without an issue.

fgm’s picture

I think that would have to be a patch against the MS SQL driver rather than EntityReference ? This EntityReference behavior relies on the sql_field_storage, but should not be incuding engine-specific code.

jelo’s picture

I just ran into this as well. Will the patch be committed anytime soon? To me this sounds like a major issue as it seems rather common to use multiple entity references in views. I upgraded my site from D6 to D7 and every second view was suddenly showing duplicates because of this after I completed the field migration from nodereference to entityreference.

jelo’s picture

I can confirm that the patch in #12 applied nicely and fixed the issue.

jelo’s picture

Well, since I applied the patch any nodes being added with a single taxonomy term do not get indexed in taxonomy_index table. The values are stored in the field_data_taxonomy_vocabulary_x table, but not indexed which removes the entries from views as soon as one filters on it...

jelo’s picture

I did some more digging and discovered the following. Not sure if this affects a clean install as my site is an upgraded site from D6. On content types with a single entity reference field the taxonomy index was not created if the field was NOT required (and hence not shown in views that used taxonomy index), i.e. this problem seems not to be caused by this patch. As soon as I switched the field to a required field, the taxonomy index entry was created correctly (with and without the patch being applied). If someone can confirm that it happens to them as well with the described setup, I suggest we create a separate issue...

Anybody’s picture

This issue is now quite old but still seems to have cross-SQL-engine problems.
Could we place clear up, if the patch needs work or if the cross-sql-engine problems are outside of the patch and take this into the next .dev version of entity reference soon?

As said above some month ago this issue is really problematic and should not wait for a fix that long, I think. Is there someone with the required know-how and probably test environments to have a look at this?

Anybody’s picture

Isn't there a further little typo in patch #12?
I think there are closing brackets missing (eventhough the patch seems to work fine under MySQL):

In

  db_query("CREATE TABLE {tmp_taxonomy_index} LIKE {taxonomy_index}");
  db_query("INSERT INTO {tmp_taxonomy_index} SELECT * FROM {taxonomy_index} GROUP BY nid, tid");
  db_query("DELETE FROM {taxonomy_index}");
  db_query("INSERT INTO {taxonomy_index} SELECT * FROM {tmp_taxonomy_index");
  db_query("DROP TABLE {tmp_taxonomy_index");

See:
db_query("INSERT INTO {taxonomy_index} SELECT * FROM {tmp_taxonomy_index}");
db_query("DROP TABLE {tmp_taxonomy_index}");

--------

For MSSQL Server something like this will be needed:
http://stackoverflow.com/questions/15428168/sql-server-create-a-copy-of-...

SELECT *
INTO {tmp_taxonomy_index}
FROM {taxonomy_index};

Does somebody know if all other engines support that?
MySQL: https://dev.mysql.com/doc/refman/5.0/en/select-into.html

indytechcook’s picture

@anybody, nice catch. Here is a patch attached with the correct formatting.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 25: entityreference-tax_index_dup-1924444-25.patch, failed testing.

indigoxela’s picture

Hi,
unfortunately the latest attached patch (#25) failed testing.
And #12 has typos.

@indytechcook: would you please have a look at this "unable to apply patch" issue?

regards,
indigoxela

zaporylie’s picture

I just encountered this issue so here's rerolled version of #25 and test-only version to prove it works.

The last submitted patch, 28: duplicate_entries_with-TEST_ONLY-1924444-28.patch, failed testing.

indigoxela’s picture

@ zaporylie: many thanks for the rerolled patch.

Tested on a fresh install with entityreference-7.x-1.x-dev on a node type with two different entityreference fields to taxonomy terms..
First without patch to reproduce the faulty behaviour. As expected: duplicates in database table taxonomy_index occured.
(Adding a third field to this node type resulted in three records per node.)

Then I applied the patch and ran "drush updb".
This removed the duplicates from database table.

Adding more terms to nodes now works as it should.
No more duplicates.

zaporylie’s picture

In that case RTBC?

indigoxela’s picture

In that case RTBC?

Good question...
Is entityreference module supposed to support database servers besides MySQL?

I just tested it on MySQL.

indigoxela’s picture

Well, I guess, we should make the update compatible to other database servers.

Does any of the followers see a chance to adapt function entityreference_update_7003() to only use Drupal's db abstraction layer?
https://api.drupal.org/api/drupal/includes!database!database.inc/7.x

I'm not familiar with Postgres or MSSQL and couldn't even test it.

indigoxela’s picture

This could be a starting point for the update function without direct MySQL syntax usage.
Tested to work on MySQL.

<?php
function entityreference_update_7003() {
  $tx_schema = drupal_get_schema('taxonomy_index');
  $fields = array_keys($tx_schema['fields']);
  db_create_table('taxonomy_index_tmp', $tx_schema);
  $select = db_select('taxonomy_index', 'tx');
  $select->fields('tx');
  $select->groupBy('tx.nid');
  $select->groupBy('tx.tid');
  $result = $select->execute()->fetchAll(PDO::FETCH_ASSOC);
  $insert = db_insert('taxonomy_index_tmp')->fields($fields);
  foreach ($result as $record) {
    $insert->values($record);   
  }
  $insert->execute();
  db_drop_table('taxonomy_index');
  db_rename_table('taxonomy_index_tmp', 'taxonomy_index');
}
?>

People with better pdo skills: please have a look and improve the code.
Anyone willing to test on different database servers is welcome.

eelkeblok’s picture

Thanks for that. I would suggest to define a batch size and do this in several goes when there are a lot of entries. I'm a little worried this query could spin out of control if there are a lot of entries in the index table.

indigoxela’s picture

this query could spin out of control if there are a lot of entries in the index table.

Good point.

I just saw, that it is possible to directly feed a select to db_insert (didn't realize that yesterday). This query happens in db, so there's no danger, that php runs out of memory or something like that.

<?php
function entityreference_update_7003() {
  $tx_schema = drupal_get_schema('taxonomy_index');
  db_create_table('taxonomy_index_tmp', $tx_schema);
  $select = db_select('taxonomy_index', 'tx');
  $select->fields('tx');
  $select->groupBy('tx.nid');
  $select->groupBy('tx.tid');
  db_insert('taxonomy_index_tmp')->from($select)->execute(); 
  db_drop_table('taxonomy_index');
  db_rename_table('taxonomy_index_tmp', 'taxonomy_index');
}
?>
eelkeblok’s picture

Good stuff. I've actually done a quick scan of how the various database drivers (including the non-core support for MS SQL, https://www.drupal.org/project/sqlsrv) and they all seem to build a construct of the form

'INSERT INTO {' . $this->table . '}' . $insert_fields_string . $this->fromQuery;

At least, if it doesn't work, we've used the appropriate APIs and there is an opportunity to fix it in the database driver, where it belongs.

zaporylie’s picture

In that case I updated #28 with changes suggested by @indigoxela in #36

Status: Needs review » Needs work

The last submitted patch, 38: duplicate_entries_with-1924444-38.patch, failed testing.

The last submitted patch, 38: duplicate_entries_with-1924444-38.patch, failed testing.

The last submitted patch, 38: duplicate_entries_with-1924444-38.patch, failed testing.

indigoxela’s picture

Looks like we are stuck a bit here.

entityreference.admin.test keeps failing. This is not the test introduced with our patch.
Any ideas about how to solve this are welcome.

eelkeblok’s picture

Edit: Obviously, first make sure the tests also fail without the patch applied.

I believe the correct procedure would be to identify the problem and create a critical issue for it (or raise an existing one to critical if it doesn't exist yet). See https://www.drupal.org/node/45111:

Critical bugs include those that:

  • ...
  • Cause tests to fail in HEAD on the automated testing platform for any supported environment (including random failures), since this blocks all other work.

The last submitted patch, 28: duplicate_entries_with-1924444-28.patch, failed testing.

indigoxela’s picture

Whatever the problem is - it is not reproducible on my local dev server.
All entityreference tests pass here - with or without this patch.

My setup:
PHP 5.4
MySQL 5.5
Drupal 7.43
Testing module on (obviously)

The test ran for 44 seconds.
Result: 126 passes, 0 fails, 0 exceptions and 12 debug messages

So unfortunately I'm still clueless.

indigoxela’s picture

Ahhh... reading my own post, I just realized the difference: Drupal core version.

Running the same tests on my local dev server right after the update to Drupal 7.x-dev fails.
Result: 102 passes, 31 fails, 6 exceptions und 6 debug messages
All fails are in the tests for the administrative UI - the same as online.

So very likely it is a problem with a drupal core code change and not with our test.

eelkeblok’s picture

Thanks for that. I did some git bisecting on the most recent changes to core. This is the result:

c3bf7484b38a1821c1631033217c4290a591c0de is the first bad commit
commit c3bf7484b38a1821c1631033217c4290a591c0de
Author: David Rothstein <drothstein@gmail.com>
Date:   Sat May 28 14:43:25 2016 -0400

    Issue #611294 by swentel, klausi, David_Rothstein, tsphethean, jenlampton, attiks, yched, sun, benjy, dcam: Added a new "administer fields" permission for trusted users to use the field UI.

:100644 100644 e6e6e19fd0018f74d26777fac2bd181c9b0553e2 3ededdac75803b056384f9dab0f787867b280c56 M	CHANGELOG.txt
:040000 040000 0efb6b34b8d1de001d32c65a0a42f49eec420831 546c62b5ab10e462141264d1726a6d4a3d038111 M	modules

I'll create an issue for this and try my hand at a patch.

eelkeblok’s picture

eelkeblok’s picture

deranga’s picture

Still seems to be a problem with entityreference version 7.x-1.2 - Is the patch from this ticket to plugins/behavior/EntityReferenceBehavior_TaxonomyIndex.class.php going to be incorporated at some point into a release?

eelkeblok’s picture

If we, as a community can get it to RTBC status, I would expect that the maintainer is more than willing to incorporate it. Currently, the status is "Needs work", which means, well, that :) I've kicked of another test since the change that was causing trouble has long since been released in core.

eelkeblok’s picture

eelkeblok’s picture

Status: Needs work » Needs review
deranga’s picture

Wow, quick response @eelkeblok, thx!

The patch seems sensible seeing as we can't instantiate the build index once regardless of number of fields associated to entity so we have to merge and I can confirm that this patch addresses the duplicate issue, although I'm not running the update, as I've manually addressed the data issue.

eelkeblok’s picture

Now that you mention it, I expect we'll at least need to give the update hook a higher number (apart from that, its number is not according to the convention).

deranga’s picture

Latest update hook in dev is 7002, so we shouldn't need to on that basis, but for convention I guess we could it change it to 7100?

Happy to re-roll patch if you think that would help. How do we move this forward from here?

I'll likely forget I have this patch applied at some point :/

eelkeblok’s picture

I was writing a bunch of text and figured I might as well do what I was writing, was going to be quicker :) I wanted to have an accompanying tests-only patch as well. I chose to use 7100 voor the update hook, since the other two were done during the alpha phase of the module, so there is some sense there (and it would make sense to promote this one to 7100, since it's the first one done during the stable phase). Another option would have been 7102, since strictly speaking the other two should have been 7100 and 7101.

Renumbering the update hook will mean that it will run again for anyone who had this patch installed, but I don't think that is a huge problem; it will simply make another copy of the taxonomy index table.

eelkeblok’s picture

Looking good. I'd say this is good for RTBC, but since I've supplied the latest version of the patch I'll leave that to someone else.

deranga’s picture

Changing to RTBC, patch tested and works as exptected - entity_reference-duplicate_tx_entries-1924444-57-d7.patch

Separate test patch fails owing to patch not yet being applied so makes sense to keep as one? removed the rest of the patches from displaying in summary.

eelkeblok’s picture

Yes, the patch that will be applied should be a whole (it needs to include the test as well, to proof the test succeeds with the code changes). The test-only variant is only included to show that the test fails without the actual code changes (e.g. that it correctly tests for the error condition we are trying to fix).

eelkeblok’s picture

deranga’s picture

Ah, sure that makes sense.

Cheers @eelkeblok for picking this issue back up.

Hopefully the maintainer will include in next release of the project.

Anybody’s picture

Whao thank you very very much!
Confirming RTBC. It would be cool to have this in the next release :)

candelas’s picture

I confirm that patch in #57 works. Thanks @eelkeblok

minorOffense’s picture

Status: Reviewed & tested by the community » Fixed

Thanks everyone. Patch now in dev.

joelpittet’s picture

I'm getting schema errors on this update not sure why... also the update hook looks a bit high did you mean to bump it to 71xx from 70xx?

PHP Fatal error:  Unsupported operand types in includes/database/mysql/schema.inc on line 86
PHP Stack trace:
PHP   1. {main}() drush/drush/drush.php:0
PHP   2. drush_main() drush/drush/drush.php:16
PHP   3. _drush_bootstrap_and_dispatch() drush/drush/drush.php:61
PHP   4. drush_dispatch() drush/drush/drush.php:92
PHP   5. call_user_func_array:{drush/drush/includes/command.inc:182}() drush/drush/includes/command.inc:182
PHP   6. drush_command() drush/drush/includes/command.inc:182
PHP   7. _drush_invoke_hooks() drush/drush/includes/command.inc:214
PHP   8. call_user_func_array:{drush/drush/includes/command.inc:362}() drush/drush/includes/command.inc:362
PHP   9. drush_core_updatedb_batch_process() drush/drush/includes/command.inc:362
PHP  10. _update_batch_command() drush/drush/commands/core/core.drush.inc:1160
PHP  11. drush_batch_command() drush/drush/commands/core/drupal/update_7.inc:242
PHP  12. _drush_batch_command() drush/drush/includes/batch.inc:93
PHP  13. _drush_batch_worker() drush/drush/commands/core/drupal/batch.inc:99
PHP  14. call_user_func_array:{drush/drush/commands/core/drupal/batch.inc:149}() drush/drush/commands/core/drupal/batch.inc:149
PHP  15. drush_update_do_one() drush/drush/commands/core/drupal/batch.inc:149
PHP  16. entityreference_update_7100() drush/drush/commands/core/drupal/update_7.inc:73
PHP  17. db_create_table() sites/all/modules/contrib/entityreference/entityreference.install:172
PHP  18. DatabaseSchema->createTable() includes/database/database.inc:2776
PHP  19. DatabaseSchema_mysql->createTableSql() includes/database/schema.inc:662
eelkeblok’s picture

What is line 86 of includes/database/mysql/schema.inc in your Drupal installation (and its immediate surroundings)?

The high version number is simply due to the numbering conventions as described in https://api.drupal.org/api/drupal/modules--system--system.api.php/functi..., which previously were not adhered to correctly in this module. Increasing by 100 or by 1 does not have any functional effect whatsoever. All that matters is that the number is higher than the previous schema version.

joelpittet’s picture

@eelkeblok Fair point just noticed other patches were continuing with ty 70xx, so they will need to be aware on their rerolls or they won't be run.

The 86 is the ending ); in this:

  // Provide defaults if needed.
    $table += array(
      'mysql_engine' => 'InnoDB',
84      // Allow the default charset to be overridden in settings.php.
85      'mysql_character_set' => $this->connection->utf8mb4IsActive() ? 'utf8mb4' : 'utf8',
86   );

Sorry that's less than helpful of a line number:(

Maybe worth using this instead?
https://api.drupal.org/api/drupal/includes%21database%21database.inc/fun...

joelpittet’s picture

Status: Fixed » Needs work

Ah here's why, I don't have taxonomy module enabled, so the schema is not being returned for that table.

Needs to be conditional of a db_table_exists() or something.

minorOffense’s picture

Status: Needs work » Fixed

Well that's some obvious I should have caught. Thanks for that. I've added a db_table_exists to the install file.

joelpittet’s picture

Thanks for the quick fix @minorOffense

eelkeblok’s picture

Ah. Of course. Sillyness.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

willzyx’s picture