I'm running a medium-sized network of Drupal sites and I noticed some odd performance issues during traffic spikes that occasionally grind Apache to a halt.

A big part of the problem appears to be due to taxonomy_node_get_terms_by_vocabulary (hereafer tngtbv). Here's the issue: MySQL's optimizer decides the best way to run the query in that function is by copying the intermediate result of the join to a temporary table, but since we're requesting t.* (which includes the BLOB field t.description), MySQL is forced to create the temporary table on disk even though it's only a few rows.

I solved the problem for myself by changing Description from Longtext to Varchar(255) so that the Temp table is created in memory. Not a big deal for me since I don't description anyway. Another solution would be to reverse the join order in the query and use MySQL's STRAIGHT_JOIN directive to force it to do the join the other way, but that's not much of a solution either.

We use free tagging a lot, so tngtbv() gets called several times on just about every page. And when a crawler comes buy and hits all the uncached tag pages on all 18 of our sites at once, it requires a ton of disk access. According to the MySQL status variable "Created_tmp_disk_tables" we were writing something like 1 million extra files to disk per day because of this function.

I'm still running 5.1, but I'm pretty sure this is the same problem in 5.x-dev and 6.x-dev.

CommentFileSizeAuthor
#6 taxonomy_description_3.patch1.05 KBcatch
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

eli’s picture

Version: 5.1 » 6.x-dev
Category: feature » bug

I can confirm that this problem is still present in HEAD and that it also affects the query in taxonomy_node_get_terms(), which gets called when you view a node.

I can think of some less-than-ideal solutions: limiting the length of the tag Description to 255 chars or converting term_node to InnoDB (I think).

But unless there's a real SQL whiz out there who can find a way to optimize these queries so they don't require a temp table in the first place, I think the only solution is to rework Taxonomy so that the Description field is only queried if it is going to be used (like on the Edit Tag page).

I'm willing to code & submit a patch to do this, but I'd love to get some feedback from a more seasoned Drupaler first...

(Is it unfair to call this a bug? I'm not sure. It's definitely a very subtle flaw and has the potential to have a serious effect on performance -- these two functions had a huge effect on my server.)

moshe weitzman’s picture

i think it is fair to call this a bug. it might even be a critical.

catch’s picture

Priority: Normal » Critical

bumping this to critical - there's a few critical patches with temporary tables to disk so it seems consistent.

catch’s picture

Title: performance issue with taxonomy_node_get_terms_by_vocabulary() and MySQL » temporary tables and taxonomy_node_get_terms_by_vocabulary() in MySQL
Wim Leers’s picture

I think that limiting the description to 255 characters is acceptable? Taxonomy terms are not a content type, so if you need more space than that, you should be using a node of a certain content type anyway?

catch’s picture

I remember seeing a mention where a patch went through that removed all SELECT * is that right?

Here's the function: http://api.drupal.org/api/function/taxonomy_node_get_terms_by_vocabulary/6

It's only called by forum_form: http://api.drupal.org/api/function/forum_form/6

This doesn't display descriptions, so there seems no reason for the t.*

Attached a patch which does: SELECT t.tid, t.name, t.vid, t.weight instead of t.tid, t.*

Tested it by applying patch, making a new forum topic, editing it, moving it around. Seems fine. Not sure if this will actually fix the temporary tables issue but worth a shot.

Should be noted that SELECT t.tid, t.* is present in:
taxonomy_get_parents
taxonomy_get_tree
taxonomy_get_term_by_name

All of which do joins and order bys. Will mark this as needs review and see where these functions are getting called. Taxonomy descriptions aren't called in many places so if this is as much of a performance hit as eli suggests then it could be worth doing the same to those queries if they're not displaying descriptions.

catch’s picture

Status: Active » Needs review

changing status.

Wim Leers, that seems like good solution. Doesn't seem any reason to do t.* in at least the forum_form query anyway though.

jaydub’s picture

I noticed the query is a bit different in 5.3 but for the moment I am assuming this is the same in 5 and 6 but from a quick pass through the mysql query explain the only reason we use temporary tables is because of the ORDER BY sort. In the case of this query the sort is by weight. I don't know how important the sort is here but removing the ORDER BY gets rid of the temporary tables...perhaps it would be possible to simply sort the results by weight in PHP? Removing the description field still results in temporary tables but in memory instead of on disk.

If I get time maybe I can try benchmarking the various options...

eli’s picture

@catch - you can see if it's still generating temp tables by logging into the MySQL server and running show global status like '%tmp%'; If you see Created_tmp_disk_tables number going up after you run the query, then you still got a problem.

I'll try to test out the patch later this week, but I'm swamped.

Also there's a little bit of background on the problem here: http://www.mysqlperformanceblog.com/2007/08/16/how-much-overhead-is-caus...

JirkaRybka’s picture

Based on #8 and http://drupal.org/node/185126 , I would suggest to examine possible optimizations of indexes on the term_data table. I noticed that taxonomy uses quite much the same ORDER BY for 'weight' and 'name', plus somewhere the same also on the vocabulary table. So perhaps we should add a new index to these tables, containing weight and name?

This needs and eye of some SQL-guru (and I'm not one, unfortunately).

And BTW, the description should not be shortened if possible, it's the only way to explain your categories once http://drupal.org/node/180719 lands.

catch’s picture

Priority: Critical » Normal
Status: Needs review » Closed (duplicate)

I'm going to mark this as duplicate of: http://drupal.org/node/164532 so efforts can be concentrated in one place.

Second post in that thread has proposed indexes on taxonomy, will link back to here for reference.

eli’s picture

I'm not sure that's a good idea. The indexes proposed in that bug do not appear to fix the problem I'm describing. I hope a MySQL guru will prove me wrong... but I don't think it can be fixed with indexes at all.

eli’s picture

Whoops, just proved how little I know -- looks like that index does fix this problem, though somewhere else there's a function on the tag pages still generating temp tables on disk.

catch’s picture

Title: temporary tables and taxonomy_node_get_terms_by_vocabulary() in MySQL » temporary tables in taxonomy module queries
Status: Closed (duplicate) » Active

In that marking back to active so we can track down which function is causing the temporary tables in here.

giorgio79’s picture

I just stumbled upon this issue, I have a large free tagging plus a hierarchical vocab with a couple hundred thousand of terms, and I am getting 250 Meg temp tables in my tmp lib..

What happened to this issue? I can see the patch has not been committed to core. The SELECT t.* FROM {term_data} and the like queries are all around taxonomy functions like taxonomy_get_parents, taxonomy_get_children...

giorgio79’s picture

Priority: Normal » Critical
Status: Active » Needs review
Anonymous’s picture

Priority: Critical » Normal
Status: Needs review » Closed (duplicate)

Reset hijack

giorgio79’s picture

Version: 7.x-dev » 6.x-dev
Status: Postponed (maintainer needs more info) » Needs review

Please reconsider my change but not as a reset hijack.

Please check the patch provided by catch in #6.

The issue you point to as a duplicate talks about table indices, but catch talks about the actual taxonomy queries.

catch has noticed very nicely that taxonomy queries are written poorly

"I remember seeing a mention where a patch went through that removed all SELECT * is that right?

Here's the function: http://api.drupal.org/api/function/taxonomy_node_get_terms_by_vocabulary...

It's only called by forum_form: http://api.drupal.org/api/function/forum_form/6

This doesn't display descriptions, so there seems no reason for the t.*

Attached a patch which does: SELECT t.tid, t.name, t.vid, t.weight instead of t.tid, t.*"

This patch needs review as the taxonomy queries in Drupal 5 and 6 still do SELECT t.*

and I completely agree with catch.

Anonymous’s picture

Status: Closed (duplicate) » Needs work

I missed #14. And based on #14 it needs work anyway.

andypost’s picture

subscribe

grah’s picture

subscribing

catch’s picture

Version: 6.x-dev » 7.x-dev
Status: Needs work » Postponed (maintainer needs more info)

eli - did you ever track down the source of your on-disk temporary tables?

sun.core’s picture

Version: 6.x-dev » 7.x-dev
Status: Needs review » Closed (won't fix)

Sorry, without further information this issue can only be marked as won't fix.

Feel free to re-open this issue if you want to provide further information. Thanks.

mo6’s picture

Title: temporary tables in taxonomy module queries » Temporary tables in taxonomy module queries
Status: Closed (won't fix) » Active

On a D5 website (250+ terms, 28000+ nodes) we were confronted with lots of temporary tables created on disc (about 30% of all temporary tables) which was analyzed. This lead to the conclusion that (on our setup) every node_load created a temporary table. I implemented the database change:

alter table term_data modify description varchar(255) default '';

which totally eliminated the disc based temporary tables and sped up the database. I don't see the need for a description of more than 255 characters.

I'm re-opening this issue so this can be analysed and maybe re-engineered in D7?

JirkaRybka’s picture

Did anyone consider caching for Taxonomy? These queries are scary either way... In Category module (don't kick me for even mentioning it; it's pretty similar here), the caching (permanent, per-node) saved my day. I'm unsure where D7 currently is, but caching might be worth to explore. (For reference: #501378: PERFORMANCE! Central caching for category API functions, #587218: Performance follow-up: Convert Taxonomy wrapper to the new caching pattern)

pdrake’s picture

I applied this same table alter in D6 with great success. Our site has ~1,400 terms about ~20,000 nodes. This alteration reduced our temporary tables created on disk from 38% to less than 10%. Thanks for posting this.

a_c_m’s picture

Just to +1 this. A Rackspace DBA fixed this for us and it dropped the load by half of the DB (We have 500,000+ nodes and several million terms).

D6.

Ganginator’s picture

I also applied this with great success on my D6 install.

My process went in this order:

1 > Set site to Maintenance Mode
2 > Opened phpMyAdmin
3 > Found table term_data and went to Structure
4 > Edited description to varchar(255)
5 > Found table vocabulary and went to Structure
6 > Edited description to varchar(255)
7 > Restarted MySQL
8 > Brought site out of Maintenance Mode
9 > Flushed all caches

Today:

% temp disk tables: 0.144418267247

Compared to 12/27/11:

% temp disk tables: 34.0387913372

Finally, I took a look into my D7 install to see if this has been changed.

Under table taxonomy_term_data and taxonomy_vocabulary in Structure, description is still set to longtext?

While I have far less traffic and no issue on that site it still seems unnecessary to have these set to longtext.
I see no reason why a description needs to be larger than 255 characters.

All this makes me wonder what other unnecessary longtext types are hiding?
Does anyone know of any other tables that should be switched?

xjm’s picture

Status: Active » Postponed (maintainer needs more info)

Has anyone reproduced this in D7, or is this D6-only now?

markdorison’s picture

We have made this change on a D6 site and have seen a significant decrease in temporary tables written to disk.

rfay’s picture

This still seems to be an ongoing problem on D6 (6.24).

EXPLAIN SELECT n.nid, n.title, l.comment_count FROM node n INNER JOIN term_node tn ON tn.vid = n.vid INNER JOIN term_data td ON td.tid = tn.tid INNER JOIN node_comment_statistics l ON n.nid = l.nid LEFT JOIN node i18n ON n.tnid > 0 AND n.tnid = i18n.tnid AND i18n.language = 'fr' WHERE (n.language ='fr' OR n.language ='' OR n.language IS NULL OR n.language = 'en-working' AND i18n.nid IS NULL) AND ( n.status = 1 AND td.vid = 4 )ORDER BY n.nid DESC LIMIT 0, 5;
+----+-------------+-------+--------+----------------------------------+---------+---------+--------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                    | key     | key_len | ref                | rows | Extra                                        |
+----+-------------+-------+--------+----------------------------------+---------+---------+--------------------+------+----------------------------------------------+
|  1 | SIMPLE      | tn    | index  | PRIMARY,vid                      | nid     | 4       | NULL               | 1201 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | td    | eq_ref | PRIMARY,vid_name,taxonomy_tree   | PRIMARY | 4       | warmshowers.tn.tid |    1 | Using where                                  |
|  1 | SIMPLE      | n     | eq_ref | PRIMARY,vid,node_status_type,nid | vid     | 4       | warmshowers.tn.vid |    1 | Using where                                  |
|  1 | SIMPLE      | l     | eq_ref | PRIMARY                          | PRIMARY | 4       | warmshowers.n.nid  |    1 |                                              |
|  1 | SIMPLE      | i18n  | ref    | tnid                             | tnid    | 4       | warmshowers.n.tnid |  211 | Using where                                  |
+----+-------------+-------+--------+----------------------------------+---------+---------+--------------------+------+----------------------------------------------+
xjm’s picture

Thanks for confirming. Can anyone confirm whether the problem exists in D7, though?

Ganginator’s picture

Like I stated above in #28 this is still an issue in D7.

xjm’s picture

@jgangi, all I see in #28 regarding D7 is this:

Finally, I took a look into my D7 install to see if this has been changed.

That doesn't say anything one way or the other. :) Did you test with the same steps in D7 and get the same results?

catch’s picture

The term and vocabulary descriptions are still type text, so any queries including those columns that require a temporary table will force it to be created on disk. iirc there shouldn't be any core queries with this any more, because we've shifted from including the description in the big SELECT queries to loading IDs then calling entity_load() (which is one of the main performance reasons to move to that pattern in general). How many contrib modules might be doing this I'm not sure though.

Anonymous’s picture

Perhaps the title needs to change to something like "Review longtext usage in core DB schema due to slow query response" and the version assigned to 8.x-dev?

xjm’s picture

That's a bit scope creep. :) If we have any of these bad queries in taxonomy in D7+, I'd like to know, and if not, we should fix the queries in D6. I'd say a separate issue for any meta discussion.

catch’s picture

We've somewhat had that discussion in the other direction due to #1284658: MySQL strict mode is causing PDO Exceptions when lengths of varchar are exceeded, but that was mainly me trying to hold back the tide of varchars being converted to text as opposed to doing it the other way 'round. Really it's fine for term description to be a varchar, it should just never be included in a query, ever, except for the one in entity_load().

Ganginator’s picture

@xjm

The next sentence states that it is still set to Long Text, which is the issue.

Finally, I took a look into my D7 install to see if this has been changed.

Under table taxonomy_term_data and taxonomy_vocabulary in Structure, description is still set to longtext?

But no stress.

I did test it.
It seems that only certain modules will exploit this also.

I think 8 core should definitely ditch any long text, and if possible cut it out of 7 core.

neRok’s picture

Status: Postponed (maintainer needs more info) » Active

Im just going over postponed core bugs.

Based on #39, can this one be closed?

I checked my 7.22 db, and table 'taxonomy_data' > column 'description' is longtext.

Ganginator’s picture

I think the longtext is the issue.
As long as 8 doesn't use longtext for these, then this shouldn't be an issue going forward.
I assume 7 will not receive an update to this, so my opinion is that it should be closed for 7, but should be verified, and changed for 8 if it hasn't been.

benjy’s picture

Title: Temporary tables in taxonomy module queries » Review longtext usage in core DB schema for Taxonomy term descriptions due to slow query response
Version: 7.x-dev » 8.x-dev

D8 is still using longtext however as mentioned, if the description is only loaded in entity_load() we shouldn't have any performance issues.

Ganginator’s picture

Why can't it be changed to VARCHAR 255?
Am I missing some important need of LONGTEXT?
It should be a simple modification that will prevent any possible issues.
And really, who needs a description longer than 255 characters?
IMHO it should be changed.

danblack’s picture

As a db guy relatively new to Drupal let me clarify a few things that have partially been asserted but not really believed here.

Yes, it is going to temp tables on disk because TEXT and BLOB types aren't supported by the Memory Engine which used as a temporary table in memory for joins and order by statements.

Yes, the attached patch is good because it removes unneeded data from the intermediate results. Keeping this up as a practice is a good thing. If all TEXT and BLOB tables are removed because of this then the query won't always go to temporary tables on disk. (large result sets may end up on disk but that's a different problem).

Yes, changing to a varchar will solve the problem but removing unneeded fields from the query is the first step. Also VARCHARS can contain more than 255 characters - 65535 actually (as of MySQL 5.0.3 and MySQL 5.0.15 is specified in D7/D8 as the minimum despite its old age). Don't create VARCHAR fields grossly larger than needed as the Memory Engine, the one used for non-disk based sorts/groups by/joins etc, will convert these to CHAR fields of the maximum varchar size and so will use more memory and hence possibly push the result to disk depending on data size.

Based off 8.x in core/modules/taxonomy/taxonomy.install I'm assuming this is now varchar(255)

 $schema['taxonomy_term_data'] = array(....
      'description__value' => array(
        'type' => 'text',
        'not null' => FALSE,
        'size' => 'big',
        'description' => 'A description of the term.',
      ),
      'description__format' => array(
        'type' => 'varchar',
        'length' => 255,
        'not null' => FALSE,
        'description' => 'The filter format ID of the description.',
      ),

If this is so, apart from applying the attached patch to D6, I don't think any more needs to be done here, but while I'm at it let me also dispel a few myths/misconceptions:

converting term_node to InnoDB (I think).

- Won't help as the join/sort used a temp table independent of what the original table type was.

I think 8 core should definitely ditch any long text,

is >64K text needed anywhere? if not its a fair call.

perhaps it would be possible to simply sort the results by weight in PHP

I'm not convinces moving an entire unsorted dataset from a DB to php to sort will give you a better results. Databases are designed and implemented to do joins and sort so I'd live that functionality there. What happens when a PHP gets too much data? oh, it exceeds its limit and gets killed and no result happens.

Berdir’s picture

#569434: Remove taxonomy term description field; provide description field for forum taxonomy would remove description as a base field, then everyone is free to use whatever configurable fields and lengths that he needs.

andypost’s picture