I have been using drupal for quite some time and I thought of using drupal for a recent project I undertook. I have been trying for few weeks now to try and make it work hitting several bottlenecks.

So here it is, a system which would make use of taxonomy extensively. 2 million terms in terms_data recurring in a hundred thousand nodes with a term_node count above 100 million rows. These terms are either words or chemical iupac names.

The server I am using is pretty speedy with quite a lot of heavy machinery! Now, since hacking drupal isn't a nice thing I don't know what else to do.

Normal queries either time out or just kill my poor mysql daemon . Few might say that it is more a mysql issue than drupal issue. True, but there is a way around. Either by using, partitioning from mysql 5.1.6 version or splitting term tables in a query sensitive manner.

I am not sure but I think an alternate very complex module instead of taxonomy can be designed here. The mechanism used to insert values into table and table structure needs to be altered according to different vocabularies and terms e.g. 1,1,1-trichloro-2-methyl-2-propanol can go into term_data1 and methanoic acid could go in term_datam or all the terms for a specific node value in a term_node table could go in one cell. It would require two tables to be made then one being the node >>> terms collective and the second being term >>> nodes collective. This mechanism is not very resource friendly! Anyways,there can be several other methodologies but all would require examination of the term before insertion into mysql or any other database.

So far, I can only think of table division manually for mysql version earlier than 5.1 . I also tried the new partitioning feature in mysql 5.1.6 which is kinda cool! However, transferring the data required a day and then runinng optimization and alteration codes meant few more days! For drupal to handle few million rows with the new mysql version requires taxonomy hacking or a totally new invention using the core old taxonomy values!

I thought it necessary to communicate all this here at the community plumbing venue! I would be much glad if some one can suggest a better solution.

This future optimization or invention in my opinion can set drupal on top for large management platforms once and for all.

Comments

kbahey’s picture

Taxonomy has a many to many relationship between node and term, and this is implemented in the term_node table. So joins involve many tables and it will kill the server like what you have seen when you are talking about that many millions.

Even with far lower numbers than these, I have had to work around some bottlenecks, for example Bottleneck: Replacing taxonomy_term_count_nodes() in taxonomy DHTML.

I don't think that using taxonomy here is appropriate. For your volume and purpose, you may want to implement your own data model instead of trying to fit it in the fully normalized one that Drupal has.

mdupont’s picture

Status: Active » Closed (won't fix)

Old issue. Closing. For such a high volume of data a customized solution is more appropriate.