slow query in email_registration_user_login_validate

greggles - August 17, 2009 - 18:50
Project:Email Registration
Version:6.x-1.x-dev
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:needs work
Description

The query which does a lower() on the two columns can take a _long time_ if you have a big users table. The use of a function means it is unable to use an index (well, at least since there is no index on the column that uses the function as well).

So...we should see what we can do to make this faster.

1. Perhaps we can check if we are on mysql and do a query without the lower in that case.
2. Perhaps we could make it a configurable option (without any admin UI) that defaults to using the "lower" query but can be changed (i.e. via settings.php) to use the non-lower query. This would behave the same for small sites and then sites that really need performance would be able to set the variable.

3. I think we could add an index to the user table, but this would slow down inserts/updates.

#1

greggles - August 18, 2009 - 17:27

More specifically the query is:

<?php
   
if ($name = db_result(db_query("SELECT name FROM {users} WHERE LOWER(mail) = LOWER('%s')", $form_state['values']['name']))) {
?>

#2

greggles - August 28, 2009 - 14:47
Status:active» needs review

So, if we just get rid of the LOWER() functions this will 1) still work on mysql 2) get to use the index.

Here is the explain statement:

mysql> explain SELECT name FROM users where mail = 'greg@example.com';
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
|  1 | SIMPLE      | users | ref  | mail          | mail | 195     | const |    1 | Using where |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
1 row in set (0.13 sec)

mysql> explain SELECT name FROM users where lower(mail) = lower('greg@example.com');
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL | 3527497 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+

Note how the first query without the lower gets a possible_key of mail and only has to scan 1 row. The second query has to scan all 3 million+ records :(

The attached patch creates a variable that has no user interface, but it can be used based on the documentation in the new README.txt

AttachmentSize
551626_slow_query_override.patch 1.63 KB
README.txt 987 bytes

#3

dww - November 6, 2009 - 22:43
Status:needs review» needs work

I appreciate the desire to fix the performance here. However, a few problems:

A) I'd rather we tested the DB type and if we're on mysql default to the right query automatically, instead of documenting how to solve this for the 99% of users that are on a mysql DB already.

B) This seems awkward:

+    if (variable_get('email_registration_alternate_query', FALSE)) {
+      $query = variable_get('email_registration_alternate_query', "SELECT name FROM {users} WHERE lower(mail) = lower('%s')");
+    }
+    else {
+      $query = "SELECT name FROM {users} WHERE lower(mail) = lower('%s')";
+    }

What's wrong with just:

    $query = variable_get('email_registration_alternate_query', email_registration_default_query());

? email_registration_default_query() would have the logic to default to the right thing if we're on mysql...

 
 

Drupal is a registered trademark of Dries Buytaert.