better performance with indexes
dalin - June 1, 2009 - 12:36
| Project: | Troll |
| Version: | 5.x-2.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
Description
Better indexes, and queries to match. This is against 5.2, but should apply to 6 too. The update function requires some different SQL for PostgreSQL and I can only guess that I'm doing it right based on
http://api.drupal.org/api/function/hook_update_N/5
| Attachment | Size |
|---|---|
| troll_performance.diff | 4 KB |

#1
I'm not familiar with the SELECT 1 you used. You have an article on its advantages somewhere?
#2
SELECT 1 FROM {table} limit 1
is the fastest cross-db compatible way to determine if there is anything in the table. Basically it goes to fetch the first row from the table, but you aren't fetching any fields, just the integer "1". If there are any rows in the table, 1 is returned, if there is nothing, then db_result() will return FALSE.
#3
I'm still reading about SELECT 1. In the meantime, please use db_query_range for the LIMIT.
#4
Ah, that was my internal hack getting in for MySQL performance. Here's a version with db_query_range()
A variation on SELECT 1 is used in Drupal > 5 in drupal_is_denied().
It will also be in Drupal 7 in drupal_lookup_path().
#5
Looks like postgresql doesn't support IGNORE on inserts. The alternative is doing DELETE/INSERT.
You forgot to remove the LIMIT 1 from the range queries, and the 0, 1 should be the last, not the first arguments to the range function.
I'm not so sure the first, plain
SELECT 1 FROM {troll_blacklist}provides the shortcut it is apparently intended for. If the table is empty, the WHERE clause on the following query won't take any measurable time anyway. If the table has data, it's going to do the extended query anyway. If anything, I would think it slows down the execution on the PHP side for Drupal to go through all its processes of adding the range, searching for variables to scrub, etc.#6
I committed a change for troll_is_blacklisted(), so fix the indexing stuff, and it'll be closed.
#7
A first issue:
- db_query("INSERT INTO {troll_ip_track} (uid, ip_address, created, accessed) VALUES (%d, '%s', %d, %d)", $user->uid, troll_ip_address(), time(), time());+ db_query("INSERT IGNORE INTO {troll_ip_track} (uid, ip_address, created, accessed) VALUES (%d, '%s', %d, %d)", $user->uid, troll_ip_address(), time(), time());
INSERT IGNORE INTO does comply with SQL99 and therefore will fail for PostgreSQL.
Please refer to the guides:
Guidelines for writing MySQL and PostgreSQL compliant SQL
http://drupal.org/node/555514
In which you can read:
MySQLism: avoid using MySQL IGNORE syntax
http://drupal.org/node/561978
A second issue: indexes
This is an index on dual fields, which can lead to performance drops.
+ PRIMARY KEY (uid, ip_address)For performance, read:
Indexes are not symmetric
http://drupal.org/node/555558
Setting an index on dual fiels is okay for your queries.
- $blacklisted = db_result(db_query('SELECT COUNT(b.net) FROM {troll_blacklist} b WHERE b.net <= %d AND b.bcast >= %d', $longip, $longip));- $whitelisted = db_result(db_query('SELECT COUNT(w.net) FROM {troll_whitelist} w WHERE w.net <= %d AND w.bcast >= %d', $longip, $longip));
If you add some more functionality, you may need to add single indexes again, especially if you need to query ip_address idependently.
So, as fas as I a concerned, I never use indexes on dual fields.
I prefer a primary key + an single index.
So that other developers running JOIN queries on my table don't have to worry.
Run explain on your queries and verify that they do not lead to sequential scan:
Guidelines for writing efficient SQL code:
http://drupal.org/node/559302
Bye,
Jean-Michel