I had some ugly high page loads lately and after some profiling with XHProf I found that multiple queries were not using indexes. This is bad because I have tables with more than 1 mil entries. One of this queries is in the function shurly_get_latest_short:
SELECT source FROM {shurly} WHERE destination = :long AND uid = :uid AND custom = 0 AND active = 1 ORDER BY rid DESC
This triggers a full table scan which is bad. I manage to optimize it a lot (10x) by just adding a combined index:
ALTER TABLE `shurly` ADD INDEX ( `uid` , `destination` , `custom` , `active` ) ;
From ~900ms execution time to much less than 50ms (I don't even know the new exec time because it is to deep in the exec tree of XHProf to search .... it used to be in the top 20).
I have more than 500k entries in that table.
Comments
Comment #1
Alex Savin commentedAlso I found another problem (at least I think it's a problem). Continuing my research on optimizing DB queries I found out that in shurly_get_redirect function you have this query
SELECT * FROM {shurly} WHERE BINARY source = :short AND active = 1. Using the BINARY switch here means that this will never use indexes. I did some tests and it seams that the same results (having case sensitive search) could be achieved by not using the BINARY switch in the query but use the utf8_bin collation and , in addition, adding a compound index on (source, active) and another one on source only (as the active filter is only used if the function is called with $check_active = TRUE). As I said before, I have more than 500k shortened URLs in my shurly table and this query ads quite a punch to my database as this is used for each new shortened URL.Can one of the maintainers please explain if there was a reason to why not using the utf8_bin collation instead of the switch to binary on every query call? Don't take me wrong but I don't know if I fully understand it either and I'm asking because I don't know how this collation change would affect the short URL creation on the long run.
In my short tests I'm not getting 2 sources evaluated as the same after changing the collation (I tested it before and I did get sources evaluated as equal even if cases differed).
Comment #2
jibus commentedHello!
Could you be more specific about the utf8_bin collation ?
I agree with the fact that using BINARY in a sql query is not a good solution since some SGBD don't support it (PostgreSQL for example).
Does this patch from https://drupal.org/node/1768672 solve this issue ?
Comment #3
jibus commentedThe last dev version include most of the optimisation you need