Taxonomy module doesn't use it's own APIs
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | taxonomy.module |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
| Issue tags: | Performance |
There's a lot of functions in taxonomy module which return an array of term objects - but which are loaded direct from the database instead of via taxonomy_term_load(). Here's a run on taxonomy_get_term_by_name() to start clearing that up now that the multiple load patch is in.
I had to make a couple of changes to taxonomy_term_load_multiple(), because get_term_by_name is case insensitive. So $conditions['name'] does a strcasecmp() in PHP, and a LIKE() in SQL (this is case insensitive in MySQL and ought to be mapped to ILIKE in postgresql by dbtng - need to check that). Changed from LOWER() to LIKE() since LOWER()'s not indexed or query cacheable at all, afaik LIKE is a bit better on that front. I can't think of a situation where you wouldn't want case insensitive matching, so seems OK to treat it in this way.
Comes with tests.
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| get_term_by_name.patch | 5.73 KB | Idle | Failed: Failed to install HEAD. | View details | Re-test |

#1
#2
Re-rolled to use drupal_strtolower() instead of strcasecmp() so things are multi-byte safe.
#4
Added taxonomy_term_select() too. These are the two places where we can convert fairly easily, some of the rest of taxonomy.module will need refactoring before it can be done there.
#5
Re-rolled for taxonomy table name changes.
#6
The last submitted patch failed testing.
#7
retesting.
#8
The last submitted patch failed testing.
#9
better re-roll this time.
#10
The last submitted patch failed testing.
#11
Re-titling, since this is a general cleanup of stuff we can simplify now multiple load is in, and re-rolled now the test has moved.
#12
why "// When name is passed as a condition use LIKE.". seems like new behavior that slows down all name queries a little. just wondering.
#13
The name queries currently use LOWER('name') in taxonomy_get_term_by_name() which isn't indexable at all - chx and Crell have promised that they'll fix the mappings in the postgres driver so we can have case insensitive LIKE behaviour and avoid the LOWER() - this may have already been done but I'm not sure either way. So the current version of the patch is partially dependent on that.
It's true that taxonomy_term_load_multiple() allows for a case sensitive name query, but we've not had that in previous versions of Drupal, so it wouldn't be an API change as such - or at least only within D7 itself. I think it's more important to have proper loading (and a shared static to prevent duplicate queries) than it is to support a case sensitive search.
#14
The last submitted patch failed testing.
#15
re-rolled.
#16
The last submitted patch failed testing.
#17
Currently failing due to #333054: Make all varchar columns case sensitive (regression) (was make page cache case sensitive)
#18
That patch has been reverted, so back to needs review.
#19
The last submitted patch failed testing.
#20
re-rolled after dbtng conversion.
#21
Trying a different title to see if I can get more eyes on the patch ;)
#22
The last submitted patch failed testing.
#23
re-rolled.
#24
Looks great by why
taxonomy_get_term_data()was removed?So only taxonomy_term_load_multiple() will be used to get term object?
#25
Yeah when taxonomy_term_load() went in (with load hook, before multiple), there were several places in taxonomy module where potentially dozens of terms were getting loaded only to get their titles and similar. _multiple() now covers those situations more efficiently, and _get_term_data() doesn't fire hooks - so you never get a proper term object if you use that. Additionally, if you have _load() and get_term_data() on the same page it causes duplicate database queries. The function is basically the same as taxonomy_get_term() in D6 but it took a couple of patches to get to a point where we could kill it.
#26
So suppose RTBC
I'm checking taxonomy_get_term_data() only used once and just eats memory with static
#27
Would it make sense to rename taxonomy_get_term_by_name?
We have http://api.drupal.org/api/function/user_load_by_name/7, so taxonomy_term_load_by_name() would be similiar...
Patch looks good, the broken notNull assertion is a good catch :)
#28
Patch looks good. Though I wonder if you need to explain the following comment further:
+ // Name matching is case insensitive, note that with some collations+ // LOWER() and drupal_strtolower() may return different results.
I'd be happy to set to RTBC assuming all tests pass.
#29
#30
This looks good to me, will commit later today. Some of these tests could probably become UnitTests instead of WebTests but we can leave that to a follow-up patch as it probably requires some additional refactoring of the tests.
#31
A few days late, but committed this to CVS HEAD. Thanks!
#32
Automatically closed -- issue fixed for 2 weeks with no activity.