This morning it was brought to our attention, forcefully so, that the crun runs that clear out watchdog and history tables take quite awhile to complete. After tossing around several ideas about this, I eventually actually looked at the tables and noticed that while they are preened with a DELETE FROM
WHERE timestamp, there is no index on timestamp so it is doing a full table scan. Below are some numbers with and without the timestamp index, I would request that there be one by default.
Without index:
mysql> delete from test1 where timestamp < 1185832802;
Query OK, 33896 rows affected (2 min 56.80 sec)
mysql> explain select * from test1 where timestamp < 1185832802;
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | test1 | ALL | NULL | NULL | NULL | NULL | 1137213 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------
With Index:
mysql> delete from test where timestamp < 1185832802;
Query OK, 33896 rows affected (20.13 sec)
mysql> explain select * from test where timestamp < 1185832802;
+----+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
| 1 | SIMPLE | test | range | timestamp | timestamp | 4 | NULL | 1 | Using where |
+----+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
1 row in set (0.02 sec)
Insert benchmarks without index: (1032 watchdog inserts)
Run1: .797 seconds
Run2: .759 seconds
Run3: .612 seconds
With index: (1032 watchdog inserts)
Run1: 1.07 seconds
Run2: 1.32 seconds
Run3: 1.7 seconds
A performance hit, but a trade-able one considering the gains on the other side.
Comments
Comment #1
damien_vancouver CreditAttribution: damien_vancouver commentedI just ran into this too... deadlocking due to extra logging I added filling the watchdog tables, then the qeuries to
DELETE FROM watchdog WHERE timestamp < XXXXXX
were timing out due to the extra rows in watchdog, then causing MySQL deadlocks, which then slowed down the site and caused monitoring errors....... you get the idea My mistake was definitely me putting in way too much watchdog logging, my DB jumped in size from 3 gigs to over 20 (!).SO - under normal circumstances this should not be a major problem. However, it is definitely still a problem in my opinion. Often when you need this efficiency the most is when something has gone wrong and you have lots of rows... it should be fixed I think and Drupal should never ever be doing full table scans of a table no matter what.
I've attached a patch for DRUPAL-5 (works fine against drupal 5.2) so indexes will be added on the following:
The patch also adds a new update for system.module, #1023, which creates the same indexes. If you are installing this patch onto an existing Drupal site, you will have to visit the update.php and run available updates (system #1023 will create your indexes).
As for Drupal 6, HEAD seems to not have this problem for watchdog or history - but it does still apply for {flood}, which is deleting based on timestamp still. In theory {flood} should not get a huge number of rows so that it really matters, but again it probably should be fixed. If this patch makes any progress I can create one for HEAD with just the {flood} changes.
Comment #2
drummHEAD does have a query on {history}.timestamp, so I think it might be affected.
I am a bit cautious about applying this to 5.x. {watchdog} has many more inserts than deletes, and indexes slow down inserts.
Comment #3
Crell CreditAttribution: Crell commentedIndexes do add time on insert, but it's very spread out compared to the large whammy of a big select, update, or delete. I'm inclined to eat the time on insert and just add the indexes.
Comment #4
nnewton CreditAttribution: nnewton commentedThey do slow down inserts. I am actually mildly mixed on this, but only mildly. Most people don't need this index at all, however, anyone using a replication setup that can't fall too far behind will require it. It is not an option for them. So, maybe this should just be rolled into the slave patch for D5.
Comment #6
Crell CreditAttribution: Crell commentedNarayan, is this Performance-necessary for D7? I say we either move forward and just do it (since we DO support slaves in D7 natively) or mark it won't fix and let the super-high-end sites figure it out themselves. Your call.
Comment #7
nnewton CreditAttribution: nnewton commentedFor watchdog yet, I'll have to test again with history. I'll do that.
Comment #8
catchFor watchdog we have syslog.module now - so I'd rather have the index there if that's still valid, and avoid potential crashing, than have it optimized to be super-fast on inserts in general - since if you need that you can use another logging module anyway.
Comment #9
Crell CreditAttribution: Crell commentedWhat's the verdict, oh people who run bigger servers than I do? :-)
Comment #10
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedI think we should have this in standard Drupal. Even if it slows down sites a bit: if you have a small site you won't have that many inserts.
Comment #11
nnewton CreditAttribution: nnewton commentedI'd say this is worth it. If someone is that worried about a single index on the watchdog table, they probably shouldn't be using the db backend for watchdog.
-N
Comment #12
Crell CreditAttribution: Crell commentedOK, let's do then. Refiling over to the right component, since this isn't really a MySQL driver question.
Comment #13
irandream CreditAttribution: irandream commentedanyone using a replication setup that can't fall too far behind will require it. It is not an option for them.
طراحی وب سایت
Comment #14
DanPir CreditAttribution: DanPir commentedsubscribe
Comment #15
sunsubscribing
Comment #16
basic CreditAttribution: basic commentedAdding timestamp index to dblog and node modules for watchdog and history tables.
The flood table now has an index including timestamp (added to D7 in #485974: Improved security: rate limit login attempts.).
Comment #17
Crell CreditAttribution: Crell commentedLooks like the line endings in #16 are AFU.
Also, update functions needed.
Comment #18
basic CreditAttribution: basic commentedAre updates required for D8?
Not sure what's going on with the line endings... oh odd, 'git diff > patch' kept the color codes. How nice.
Comment #19
Crell CreditAttribution: Crell commentedUnless I missed a memo we still need to have an upgrade path from Drupal 7 sites, so yes.
Comment #20
catchI think we should backport this to 7.x and add the update functions there tbh.
Comment #21
basic CreditAttribution: basic commentedAgreed. Here's the D8 patch without color codes
Comment #22
basic CreditAttribution: basic commentedHere's an attempt at the D7 backport with updates.
Comment #23
Damien Tournoud CreditAttribution: Damien Tournoud commentedThose two upgrades need to be inside a
updates-7.x-extra
group. For example:Comment #24
basic CreditAttribution: basic commentedSomething like this? (edit: i'm not sure why git is mucking with my whitespace)
Comment #25
mgifford#24: timestamp_indexes-165061_D7-24.patch queued for re-testing.
Comment #27
candelas CreditAttribution: candelas commentedany news on this?
Comment #28
danblack CreditAttribution: danblack commentedfixed - watchdog / flood
watchdog table deletions have been improved without adding an index. The flood in the system module now and a expiration that is indexed however it could of been implemented similar to watchdog however I can't see anyone complaining about the performance on this table and @damien_vancouver in comment #1 seem happy with indexs.
history table
history based deletions are in modules/node/node.module are still DELETE FROM history WHERE timestamp < X and still don't have a timestamp based deletion. The trick used for watchdog table won't work as it is a table that is updated and not inserted on.
First thing to check is if this table is innodb. If it is myisam, every insert,update, delete will be a table lock so you won't get much thoughput.
Options for improvement here (aside from adding a timestamp index are):
Second, is if the delete is running at the default transactional isolation level of repeatable-read it may be getting more locks than required. In this case (not generally) a lower read-commited/read-uncommited (a transaction is only every going to update the timestamp and not roll back so uncommited reads are probably useful) will restrict locks to those rows it intends to delete.
An isolation mechanism will need to be written in a db abstraction as this is mysql specific db_query("set tx_isolation="read-uncommitted'") (postgres implementation)).
Even with a limited number row locks, with a big delete is that every update of a row in the middle of being deleted will be locked. DELETE FROM can support DELETE FROM history WHERE timestamp < X LIMIT Y on mysql and sqlite(if compiled with it) but not postgresql which would of been a nice way to delete in batches providing the opportunity for other instances to perform updates between loop iterations (perhaps the availability DELETE LIMITS can be exposed from the driver if exists and places like this can use it?).
To perhaps reduce some locking for the period it takes to determine all rows where timestamp < X;
and then
my only other suggestion is set a lower session locktimeout (also in need of abstraction mysql/postgres) in node_tag_new and catch and ignore any deadlock/locktimeout errors and loose a bit of data.
Comment #29
dagmar