slow query in email_registration_user_login_validate
| Project: | Email Registration |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | task |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
The query which does a lower() on the two columns can take a _long time_ if you have a big users table. The use of a function means it is unable to use an index (well, at least since there is no index on the column that uses the function as well).
So...we should see what we can do to make this faster.
1. Perhaps we can check if we are on mysql and do a query without the lower in that case.
2. Perhaps we could make it a configurable option (without any admin UI) that defaults to using the "lower" query but can be changed (i.e. via settings.php) to use the non-lower query. This would behave the same for small sites and then sites that really need performance would be able to set the variable.
3. I think we could add an index to the user table, but this would slow down inserts/updates.

#1
More specifically the query is:
<?phpif ($name = db_result(db_query("SELECT name FROM {users} WHERE LOWER(mail) = LOWER('%s')", $form_state['values']['name']))) {
?>
#2
So, if we just get rid of the LOWER() functions this will 1) still work on mysql 2) get to use the index.
Here is the explain statement:
mysql> explain SELECT name FROM users where mail = 'greg@example.com';
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
| 1 | SIMPLE | users | ref | mail | mail | 195 | const | 1 | Using where |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
1 row in set (0.13 sec)
mysql> explain SELECT name FROM users where lower(mail) = lower('greg@example.com');
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 3527497 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
Note how the first query without the lower gets a possible_key of mail and only has to scan 1 row. The second query has to scan all 3 million+ records :(
The attached patch creates a variable that has no user interface, but it can be used based on the documentation in the new README.txt
#3
I appreciate the desire to fix the performance here. However, a few problems:
A) I'd rather we tested the DB type and if we're on mysql default to the right query automatically, instead of documenting how to solve this for the 99% of users that are on a mysql DB already.
B) This seems awkward:
+ if (variable_get('email_registration_alternate_query', FALSE)) {+ $query = variable_get('email_registration_alternate_query', "SELECT name FROM {users} WHERE lower(mail) = lower('%s')");
+ }
+ else {
+ $query = "SELECT name FROM {users} WHERE lower(mail) = lower('%s')";
+ }
What's wrong with just:
$query = variable_get('email_registration_alternate_query', email_registration_default_query());? email_registration_default_query() would have the logic to default to the right thing if we're on mysql...