Hi,
while restructuring my site's taxonomy, I had to modify the configuration of one main vocabulary from "multiple hierarchy" to "single hierarchy"; that resulted in
* terms to appear multiple times in admin/content/taxonomy/2 (e.g. three times admin/content/taxonomy/edit/term/1236?destination=admin%2Fcontent%2Ftaxonomy%2F2), that seems to have resulted in
* a vocabulary of 80 pages with 25 terms per page (~ approximately 2000 terms, i assume).
None of the terms in this vocabulary can be edited anymore. Also, it became impossible to use contributed modules like taxonomy_manager, or taxonomy_merge on this vocabulary.
When I try to edit a term, the web-browser Opera claims to load several megabytes of data; this takes 20+ minutes without giving a timeout until i canceled the operation. This can be reproduced with any term in this vocabulary (Drupal is hosted on a dedicated server with a dual core cpu; this is definitely not a hardware issue).
I consider this a critical bug, since taxonomy is one of the core fuctions of Drupal that IMHO must not break by using functions provided by Drupal core; whatever happens when modifying a vocabulary in the described way, there should be some kind of safety check preventing that the vocabulary becomes inaccessible by this operation (I guessed that subterms attached to multiple first-level-terms would be converted to *one* top level term). This appears to be a very fundamental design issue of the Drupal taxonomy module, if the behaviour I ran into is intended.
However, the essential question for me is how to recover my main vocabulary without breaking the classification of 26.000 nodes. I'm open to any suggestions.
Thanks & greetings,
-asb
Comments
Comment #1
keith.smith commentedComment #2
drummI tested this with various small taxonomies, 5-10 terms each, and was not able to get any extra terms. Please post a procedure to reproduce this.
How many terms are actually in the database? How many parent relationships? Database queries for this:
SELECT count(*) FROM term_data;SELECT count(*) FROM term_hierarchy;The single/multiple hierarchy setting restricts only new & edited terms. Existing terms with multiple parents will stay in the same place, with multiple parents, until they are edited and saved.
The edit term page is inaccessible because there is a select menu of every term for parent selection. A similar problem might be happing with the contributed modules.
I would recommend:
* If you have a backup, roll back and edit every term to have one parent before switching the vocabulary.
* Try turning off contributed modules, maybe one had a bug which created the terms.
* Making a backup first, try manually removing unwanted term parents in the {term_hierarchy} table.
Comment #3
asb commentedHi drumm,
> How many terms are actually in the database? How many parent relationships?
> Try turning off contributed modules, maybe one had a bug which created the terms.
Very well possible; I'm using quite a bunch of tools related to taxonomy:
(a) for Display/Presentation:
* Taxonomy breadcrumb
* Taxonomy Lineage
* Tagadelic
* Directory
* Vocabulary Index
(b) Managing vocabularies and terms:
* Taxonomy import/export via XML
* Taxonomy Manager
* Taxonomy switch
* Term Merge
* Edit term
(c) Recently installed, but currently disabled:
* Taxonomy Fields (also deinstalled)
* Taxonomy Filter (nothing to deinstall)
(d) Installed some time ago, disabled for a while:
* Taxonomy context (nothing to deinstall)
* Taxonomy Menu (nothing to deinstall)
* Taxonomy Force all (nothing to deinstall)
To build the taxonomy and it's terms relations, I'm mostly using Taxonomy Manager; for quick edits Edit term. Several vocabularies were created by Taxonomy import/export via XML, moving terms between vocabularies was done by using Taxonomy switch. I tested those modules before using on my production sites, bud did not encounter errors or inconsistencies.
There is no backup to roll back; it took too long to discover the problems accessing the taxonomy edit pages; the taxonomy itself is working, and was used to tag content for weeks since I started splitting the vocabularies. I started to split the vocabulary into smaller pieces when modules lite Sitemap and Sitemenu started to time out when trying to display the full taxonomy. I guessed that the vocabulaty became simply to large, so I started splitting it into smaller vobularies since there are no recommendations what size taxonomies can have before collapsing. Since the migration of data between vocabularies was done by Taxonomy switch, myabe there is a bug. I also edited or at least tried to access the new vocabularies with Taxonomy Manager, that might also have changed something in the taxonomy. Indeed I remember that I could not merge several terms with Taxonomy Manager, while Term Merge did work in these cases. Possibly even Taxonomy Lineage might interfer.
Currently, Taxonomy Manager does work on the main taxonomy again, more or less. E.g., there is
* term 2/110
** subterm 2/365
*** subterm 2/75
** subterm 2/1336
*** subterm 2/75
subterm 2/365 and 2/1336 are identical, and currently can be merged. Maybe I can try this to clean up.
> try manually removing unwanted term parents in the {term_hierarchy} table.
Hm, e.g. there's a "tid" = "1", with a "parent" = "879", and a tid = 1 with a parent = 63; how do I work with this? If tid is unique, I can look up it's vocabulary and name in the term_data table, but then?
Thank you very much for your help & greetings, -asb
Comment #4
ainigma32 commented@asb: It's been a while... did you ever solve this?
- Arie
Comment #5
asb commented> It's been a while... did you ever solve this?
Sorry, no, I can't offer any help here. Taxonomy is still degrading continuously. Hardware upgrades can compensate for a while, then it gets worse again. Also, other issues with Drupal core's taxonomy appeared in the meantime, e.g. on two D5 sites I can't access nodes tagged with certain terms anymore. Additionally I'm getting increasingly worried about these obscure size limitations of taxonomy some of us are running into.
Greetings, -asb
Comment #6
ainigma32 commented@asb: Since it looks like this issue can't be reproduced (well not easily anyway) and there doesn't seem to be any activity I would like to set this to won't fix
Do you agree with that or would you like to pursue this further?
- Arie
Comment #7
asb commentedHi Arie,
thank you for the kind question. Since no one knowingly seems to have investigated this issue so far, and a core functionality of Drupal is affected, I wouldn't like to see this marked as won't fix. O course I'm still willing to provide any help I can offer, but I would need precise instructions what to do when it comes to the database level.
Especially I'd like to leave this issue open until the affected sites have been migrated to D6; since there are some changes in taxonomy.module, there might be a slight chance that the problems will be repaired during migration
However, if there's a policy that issues in Drupal core get a "won't fix" if nobody takes a look into it after a certain time (there are other issus similar to this, e.g. with the menu system), then I can't object since I can't resolve those issues myself as a non-core-developer.
Thanks again for asking & greetings, -asb
Comment #8
ainigma32 commentedNot really. At least not that I'm aware of. I just like to keep the queue as short as possible ;-)
As for instructions. Let's see what we can come up with.
First of: do you have a backup of the database from before the changes to the vocabulary?
And do you have a test/development environment (usually some old PC).
- Arie
Comment #9
asb commentedHi Arie,
> First of: do you have a backup of the database from before the changes to the vocabulary?
Yes, I'm keeping Backups since 2006; and no, I have no idea which one is the "last known good". However, I could set up one dating from january or february 2008 randomly, but as well my vocabularies as my server infrastructure has changed significantly since then.
> And do you have a test/development environment (usually some old PC).
Sure, I can set up and break a recent backup ;)
Thanks & greetings, -asb
Comment #10
ainigma32 commentedExcellent! Could you try to set up a test environment and try to reproduce this problem?
If you can we can start to rule out causes.
- Arie
Comment #11
asb commentedHi Arie,
> Could you try to set up a test environment and try to reproduce this problem? If you can we can start to rule out causes.
I'm currently migrating to new server hardware (dedicated server, dual core Opteron 1218HE, 4 GB RAM) where things behave a bit differnt but are basically reproducable. If I'm accessing something like /admin/content/taxonomy/edit/term/1018, I'm getting a blank screen (WSOD) that watchdog (/admin/logs/watchdog) doesn't notice. In sites/default/settings.php, I currently have "ini_set('memory_limit', '96M');".
After raising the PHP memory limit to 128M, I can access at least some terms; according to Opera, approximately 4 MB of data is loaded, taking 67 seconds; in the structure of the vaculary I'm getting deeply nested and partially recursive "paths" like this:
Every operation on the client pc becomes painfully slow since the pc hangs for minutes, causing a heavy load of 100% on one core (hardware of the client: Intel Core2Duo 6700) and leaving the browser unresponsive. Basically, the only feasible way to woth with taxonomy terms is offered through the "taxonomy manager" module that loads a vocabulary differently. Anything I can do about this? Anything else I can try out?
Thanks & greetings, -asb
Comment #12
dpearcefl commentedConsidering the time elapsed between now and the last comment plus the fact that D5 is no longer supported, I am closing this ticket.
Comment #13
asb commentedSorry, issue still exists in the current Drupal 6.22 respectively Pressflow 6.22.x.
Comment #14
dpearcefl commentedNo apologies needed for reopening the ticket.
However, as you may have gathered, no one is going to help solve this problem unless activity is shown.
Can you duplicate this problem on a clean 6.22 install? Or it is only your site?
If you can provide a way to duplicate the problem with a clean install, I might be able to help you.
Comment #15
asb commentedImho the problem is the age of the site (started around Drupal 4.4/4.5) in combination with the structure and the size of taxonomy. I have other sites with a similar age and a smaller taxonomy where the issue doesn't occur, and I have newer sites with larger taxonomies where the issue does not show up as well, at least not in this critical manner. I believe the problem is caused by some kind of "degratation" caused by a number of upgrades we ran through and changes in the handlung of multi-parent terms, so probably this can not be reproduced by definition on a fresh install (unless one would try to replicate it including running through the full line of core upgrades, starting with Drupal 4.x).
Size of taxonomy: 2401 terms + 1885 terms + 856 terms + 639 terms + 49 terms + 679 terms; this should definitely not cause any trouble per se. A similar taxonomy can be reproduced with 'devel' in a couple of seconds, and I assume that it won't cause any performace issues on a current and fresh Drupal installation. Current figures:
What we have used years ago and what probably caused the borked data structures is the taxonomy feature which allowed one term to have multiple parent terms. This duplicated whole taxonomy terms trees at some point (the "same" term showed up with the same label, but a different tid and different parent terms), and when we encountered the duplicates, we tried to merge the duplicated with 'Term Merge' and later 'Taxonomy Manager' to "repair" the increasing performance issues and the useless taxonomy structures (we need the term "dogs" to be the same for the parent terms "biological systematic" and "animals at home"; we do not want two different "dogs" terms).
The only thing worth to investigate - independent from our problem - is to figure out how how multiple-parent terms are supposed to behave and how they are actually behaving. What we expected was that, for example, tid 4 can be child of tid 1, 2, and 3. What actually happened was that tid 4 could be child of tid 1, but as soon as we added it to be a child of tid 2 as well, we would get tid 5 (plus duplicates of the whole tree below tid 4). I never managed to figure out, if Drupal simply can't do this right, if this was broken at some point, or if we encountered a freak accident on this particular site.
Sorry for being so vague, but as you might have guessed by now I'm not a programmer.
Comment #16
dpearcefl commentedNot to worry. Not all of us are programmers. Good thing.
One thing does come to mind: It would seem that this problem may be specific to your site. As such help may be very slow to come by.
How about this: You send me a list of the contrib modules you are using and a dump of your database and I'll do some testing.
Comment #17
cweagansSupport requests are never critical or major.
Comment #18
asb commentedAs this was (and still should be) a bug report, and it causes a WSOD for the whole site, starting with Drupal 6.24, I'm adjusting the issue's category and priority again.
The root cause of this issue is a circular reference in a hierarchical taxonomy. The procedure to reproduce would be to create a circular reference, if Drupal core still allows to create it nowadays, or alternatively to go back to an earlier Drupal version, where it still was possible; the site was started somewhere around Drupal 4.5 or 4.6, so this would be a pretty insane task. A workaround would be not to use unsafe administration tools for taxonomy. The contributed module 'Taxonomy Manager' is, for example, a safe administration tool for taxonomy as it prevents circular references to be created. The proper fix for Drupal core would have been to check for circular references, if/when the issue was fixed, or at least a release note im 6.24 where some safeguards were removed, so it now results in a WSOD.
However, closing this issue since fixing inherent deficits of Drupal core's taxonomy is outside the scope of this issue in specific, or of Drupal 6 in general.