| Project: | Drupal core |
| Version: | 8.x-dev |
| Component: | taxonomy.module |
| Category: | bug report |
| Priority: | major |
| Assigned: | Unassigned |
| Status: | needs work |
| Issue tags: | needs backport to D7, Needs issue summary update |
Issue Summary
Problem/Motivation
There is no primary key on the {taxonomy_index} table. Also taxonomy_index does not contain a vid column currently, so there is no way to index on that missing field.
Patches, that add primary key to the {taxonomy_index} will fail on sites that have duplicates, so we may need to delete those via a query before adding the index.
And the reason there are duplicates in the table: language code is not part of the primary key!
From comment #14:
Upon translating a node I got the "Integrity constraint violation: 1062 Duplicate entry '24-129' for key 'PRIMARY': INSERT INTO {taxonomy_index}".Some notes:
- I used the "old" method of translating, i.e making a new node and linking them via tnid. The fact that the fields of the new node have values in multiple languages looks like an error to me. I will file that separately.
- Because D7 also has translatable fields, this situation is to be expected and handled.
- This table is a denormalized copy of relationships between fields and terms. So it should contain the identifying parts of both:
- enity_type
- entity_id
- language
- delta
- deleted
- tid
- Restricting this to nodes (not other entities) this can be reduced to:
- nid
- language
- delta
- deleted
- tid
- Removing the "inner node" identifying parts, this can be further reduced to:
- nid
- language (strictly speaking this is inner node as well, but for this table to be usable it will be needed as most uses will want to find what nodes are related *given* the current language)
- tid
This means that duplicates remain possible and should be handled by the code inserting records.
Proposed resolution
It's a bug - once we have duplicates properly sorted out in the application code, the database should be enforcing it.
Comments to look #23, #20 and #14.
Remaining tasks
Patch is needed.
Original report by te-brian
I noticed that there is no primary key on the {taxonomy_index} table. I admit that I haven't followed the taxonomy to fields conversion very closely but I do not see a reason why there cannot be a primary key on (nid, tid).
I didn't mark this as "needs review" because I'm not sure if this change is desired or not. If someone has any thoughts the validity of this proposed key, please chime in.
Attached is a patch that adds the proposed key if we do want to add it.
(for legacy issues whose initial post was not the issue summary)
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| taxonomy_primary_key.patch | 924 bytes | Idle | Invalid patch format in taxonomy_primary_key.patch. | View details | Re-test |
Comments
#1
Needs to be tested, assuming the bot works.
Also, shouldn't this "shortcut" table be tid/nid/vid instead of just tid/nid, just like term_node was ?
#2
The last submitted patch, taxonomy_primary_key.patch, failed testing.
#3
Unless I'm going crazy, term_node just had nid, tid as primary keys.
Here is a reroll.
#4
#5
I won't say you're going crazy, but ... :-)
From http://drupalcode.org/viewvc/drupal/drupal/modules/taxonomy/taxonomy.ins...
$schema['term_node'] ... 'primary key' => array('tid', 'vid'),term_node linking by vid instead of nid was one of the changes from D6 to D6.
#6
I guess you are right. tid, vid it is.
#7
Problem, of course, is that taxonomy_index does not contain a vid column currently, so there's no way to index on that missing field.
#8
Doh! Alas you are right again. So are there any issues with nid, tid from #3. I guess it comes down to what the most common query on this table will be.
#9
+1, subscribing.
#10
re-rolled.
Using nid,tid. No vid, which is fine if I understand correctly. This is not like the old term_node table, which had to be revision-aware. This is a helper table to make queries more effiecent when displaying all nodes tagged with a given term.
#11
The last submitted patch, taxonomy_index_primary.diff, failed testing.
#12
The patch should be rolled from Drupal root.
Pretty sure this will fail on sites that have duplicates, we may need to delete those via a query before adding the index.
#13
I do have duplicates in my table, so the update does indeed fail with:
The following updates returned messages
taxonomy module
Update #7011
Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '7-87' for key 'PRIMARY': ALTER TABLE {taxonomy_index} ADD PRIMARY KEY (`nid`, `tid`); Array ( ) in db_add_primary_key() (regel 2762 van ...\includes\database\database.inc).
I extended the patch to remove duplicates, which wasn't easy as this cannot be done with just 1 query. Update and delete queries may not refer to itself in joins/subqueries.
#14
And now I now the reason why I had duplicates in my table: language code is not part of the primary key!
Upon translating a node I got the "Integrity constraint violation: 1062 Duplicate entry '24-129' for key 'PRIMARY': INSERT INTO {taxonomy_index}".
Some notes:
- I used the "old" method of translating, i.e making a new node and linking them via tnid. The fact that the fields of the new node have values in multiple languages looks like an error to me. I will file that separately.
- Because D7 also has translatable fields, this situation is to be expected and handled.
- This table is a denormalized copy of relationships between fields and terms. So it should contain the identifying parts of both:
* enity_type
* entity_id
* language
* delta
* deleted
* tid
- Restricting this to nodes (not other entities) this can be reduced to:
* nid
* language
* delta
* deleted
* tid
- Removing the "inner node" identifying parts, this can be further reduced to:
* nid
* language (strictly speaking this is inner node as well, but for this table to be usable it will be needed as most uses will want to find what nodes are related *given* the current language)
* tid
This means that duplicates remain possible and should be handled by the code inserting records.
I'm making this critical because I think that:
- Insert code is flawed.
- All read uses of this table are flawed in a multilingual situation.
- Repairing this might become a nightmare: the sooner the better.
Am I thinking right or not?
Should the title be changed or should this be a new issue?
#15
The "critical" status is reserved for things that render Drupal completely unusable. This sounds pretty solidly "major" though.
#16
subscribe
#17
#18
After applied #13, a
db_insert('taxonomy_index')(in taxonomy module) throws PDOException.#19
Don't use the patch as it does not take away the cause, so, eventually, you will indeed get errors on insert.
#20
This is blocked on #1050466: The taxonomy index should be maintained in a node hook, not a field hook, tagging as such. It's also a bug - once we have duplicates properly sorted out in the application code, the database should be enforcing it.
#21
#1050466: The taxonomy index should be maintained in a node hook, not a field hook was committed; this should be unblocked now.
#22
Does #1050466: The taxonomy index should be maintained in a node hook, not a field hook prevent duplicates being inserted (in multi lingual sites as well) or should we still check for that?
If so, this issue can be restricted to a hook_update that:
If not, the solution for this issue should be extended with preventing inserting duplicates by using a merge query instead.
Subsequently, if we want better support for multi lingual sites, we could (should) add a language column to the table and the primary index. questions:
If we can settle the scope of this issue, I will write a patch.
#23
See also #1343822: Autocreated taxonomy terms duplicated if they are added to multiple fields.. And yes, I think a merge query is a good idea.
I think anything for D8MI should be considered separately as there are much bigger implications for that.
#24
OK, I will post a patch for D8 containing:
The patch for D8 does not need to contain an update, assuming that the backport to D7 gets in and released in an official version before D8 is released... How to assure this?
The patch for D7 will contain:
I will create a new issue to find out if we want to add language to this table and its indexes.
#25
I don't understand this as a bug, if it is, can someone update the issue summary explaining it?
#26
Talked to @tim.plunkett about this issue in IRC. This is a gnarly bug, see #23 and #20 and #14. And yeah, an updated summary would be good.
#27
Also unassigning.
#28
FWIW, we also tackled that issue in entityreference: #1586428: Integrate entyreference from nodes to terms with core taxonomy_index, where it now seems to be solved, pending further tests.
#29
...friendly bump ;)