Shurly table optimisation [#1961232]

I had some ugly high page loads lately and after some profiling with XHProf I found that multiple queries were not using indexes. This is bad because I have tables with more than 1 mil entries. One of this queries is in the function shurly_get_latest_short:

SELECT source FROM {shurly} WHERE destination = :long AND uid = :uid AND custom = 0 AND active = 1 ORDER BY rid DESC

This triggers a full table scan which is bad. I manage to optimize it a lot (10x) by just adding a combined index:

ALTER TABLE `shurly` ADD INDEX ( `uid` , `destination` , `custom` , `active` ) ;

From ~900ms execution time to much less than 50ms (I don't even know the new exec time because it is to deep in the exec tree of XHProf to search .... it used to be in the top 20).

I have more than 500k entries in that table.

Comments

Comment #1

Alex Savin commented 5 April 2013 at 10:19

Also I found another problem (at least I think it's a problem). Continuing my research on optimizing DB queries I found out that in shurly_get_redirect function you have this query SELECT * FROM {shurly} WHERE BINARY source = :short AND active = 1. Using the BINARY switch here means that this will never use indexes. I did some tests and it seams that the same results (having case sensitive search) could be achieved by not using the BINARY switch in the query but use the utf8_bin collation and , in addition, adding a compound index on (source, active) and another one on source only (as the active filter is only used if the function is called with $check_active = TRUE). As I said before, I have more than 500k shortened URLs in my shurly table and this query ads quite a punch to my database as this is used for each new shortened URL.

Can one of the maintainers please explain if there was a reason to why not using the utf8_bin collation instead of the switch to binary on every query call? Don't take me wrong but I don't know if I fully understand it either and I'm asking because I don't know how this collation change would affect the short URL creation on the long run.

In my short tests I'm not getting 2 sources evaluated as the same after changing the collation (I tested it before and I did get sources evaluated as equal even if cases differed).

Comment #2

jibus commented 20 March 2014 at 23:21

Issue summary:

View changes

Hello!

Could you be more specific about the utf8_bin collation ?

I agree with the fact that using BINARY in a sql query is not a good solution since some SGBD don't support it (PostgreSQL for example).

Does this patch from https://drupal.org/node/1768672 solve this issue ?

Comment #3

jibus commented 30 May 2014 at 13:23

Status:

Active

» Closed (fixed)

The last dev version include most of the optimisation you need

Shurly table optimisation

Comments

Comment #1

Comment #2

Comment #3

News items

Our community

Documentation

Drupal code base

Governance of community