This morning it was brought to our attention, forcefully so, that the crun runs that clear out watchdog and history tables take quite awhile to complete. After tossing around several ideas about this, I eventually actually looked at the tables and noticed that while they are preened with a DELETE FROM

WHERE timestamp, there is no index on timestamp so it is doing a full table scan. Below are some numbers with and without the timestamp index, I would request that there be one by default.

Without index:
mysql> delete from test1 where timestamp < 1185832802;
Query OK, 33896 rows affected (2 min 56.80 sec)

mysql> explain select * from test1 where timestamp < 1185832802;
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | test1 | ALL | NULL | NULL | NULL | NULL | 1137213 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------

With Index:

mysql> delete from test where timestamp < 1185832802;
Query OK, 33896 rows affected (20.13 sec)

mysql> explain select * from test where timestamp < 1185832802;
+----+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
| 1 | SIMPLE | test | range | timestamp | timestamp | 4 | NULL | 1 | Using where |
+----+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
1 row in set (0.02 sec)

Insert benchmarks without index: (1032 watchdog inserts)

Run1: .797 seconds
Run2: .759 seconds
Run3: .612 seconds

With index: (1032 watchdog inserts)

Run1: 1.07 seconds
Run2: 1.32 seconds
Run3: 1.7 seconds

A performance hit, but a trade-able one considering the gains on the other side.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

damien_vancouver’s picture

Category: feature » bug
Status: Active » Needs review
FileSize
2.92 KB

I just ran into this too... deadlocking due to extra logging I added filling the watchdog tables, then the qeuries to DELETE FROM watchdog WHERE timestamp < XXXXXX were timing out due to the extra rows in watchdog, then causing MySQL deadlocks, which then slowed down the site and caused monitoring errors....... you get the idea My mistake was definitely me putting in way too much watchdog logging, my DB jumped in size from 3 gigs to over 20 (!).

SO - under normal circumstances this should not be a major problem. However, it is definitely still a problem in my opinion. Often when you need this efficiency the most is when something has gone wrong and you have lots of rows... it should be fixed I think and Drupal should never ever be doing full table scans of a table no matter what.

I've attached a patch for DRUPAL-5 (works fine against drupal 5.2) so indexes will be added on the following:

  • {flood}.timestamp
  • {history}.timestamp
  • {watchdog}.timestamp

The patch also adds a new update for system.module, #1023, which creates the same indexes. If you are installing this patch onto an existing Drupal site, you will have to visit the update.php and run available updates (system #1023 will create your indexes).

As for Drupal 6, HEAD seems to not have this problem for watchdog or history - but it does still apply for {flood}, which is deleting based on timestamp still. In theory {flood} should not get a huge number of rows so that it really matters, but again it probably should be fixed. If this patch makes any progress I can create one for HEAD with just the {flood} changes.

drumm’s picture

Title: New indexes on watchdog.timestamp and history.timestamp for cron runs. » New indexes on flood.timestamp and history.timestamp for cron runs.
Version: 5.x-dev » 7.x-dev
Status: Needs review » Needs work

HEAD does have a query on {history}.timestamp, so I think it might be affected.

I am a bit cautious about applying this to 5.x. {watchdog} has many more inserts than deletes, and indexes slow down inserts.

Crell’s picture

Indexes do add time on insert, but it's very spread out compared to the large whammy of a big select, update, or delete. I'm inclined to eat the time on insert and just add the indexes.

nnewton’s picture

They do slow down inserts. I am actually mildly mixed on this, but only mildly. Most people don't need this index at all, however, anyone using a replication setup that can't fall too far behind will require it. It is not an option for them. So, maybe this should just be rolled into the slave patch for D5.

Crell’s picture

Issue tags: +Performance

Narayan, is this Performance-necessary for D7? I say we either move forward and just do it (since we DO support slaves in D7 natively) or mark it won't fix and let the super-high-end sites figure it out themselves. Your call.

nnewton’s picture

For watchdog yet, I'll have to test again with history. I'll do that.

catch’s picture

For watchdog we have syslog.module now - so I'd rather have the index there if that's still valid, and avoid potential crashing, than have it optimized to be super-fast on inserts in general - since if you need that you can use another logging module anyway.

Crell’s picture

What's the verdict, oh people who run bigger servers than I do? :-)

killes@www.drop.org’s picture

I think we should have this in standard Drupal. Even if it slows down sites a bit: if you have a small site you won't have that many inserts.

nnewton’s picture

I'd say this is worth it. If someone is that worried about a single index on the watchdog table, they probably shouldn't be using the db backend for watchdog.

-N

Crell’s picture

Component: mysql database » dblog.module

OK, let's do then. Refiling over to the right component, since this isn't really a MySQL driver question.

irandream’s picture

Assigned: Unassigned » irandream

anyone using a replication setup that can't fall too far behind will require it. It is not an option for them.
طراحی وب سایت

DanPir’s picture

subscribe

sun’s picture

basic’s picture

Version: 7.x-dev » 8.x-dev
Status: Needs work » Needs review
FileSize
912 bytes

Adding timestamp index to dblog and node modules for watchdog and history tables.

The flood table now has an index including timestamp (added to D7 in #485974: Improved security: rate limit login attempts.).

Crell’s picture

Status: Needs review » Needs work

Looks like the line endings in #16 are AFU.

Also, update functions needed.

basic’s picture

Status: Needs work » Needs review

Are updates required for D8?

Not sure what's going on with the line endings... oh odd, 'git diff > patch' kept the color codes. How nice.

Crell’s picture

Unless I missed a memo we still need to have an upgrade path from Drupal 7 sites, so yes.

catch’s picture

Issue tags: +Needs backport to D7

I think we should backport this to 7.x and add the update functions there tbh.

basic’s picture

Agreed. Here's the D8 patch without color codes

basic’s picture

Version: 8.x-dev » 7.x-dev
FileSize
1.27 KB

Here's an attempt at the D7 backport with updates.

Damien Tournoud’s picture

Assigned: irandream » Unassigned

Those two upgrades need to be inside a updates-7.x-extra group. For example:

/**
 * @defgroup updates-7.x-extra Extra system updates for 7.x
 * @{
 */

/**
 * @} End of "defgroup updates-7.x-extra"
 */
basic’s picture

Something like this? (edit: i'm not sure why git is mucking with my whitespace)

mgifford’s picture

Status: Needs review » Needs work
Issue tags: +Performance, +Needs backport to D7

The last submitted patch, timestamp_indexes-165061_D7-24.patch, failed testing.

candelas’s picture

any news on this?

danblack’s picture

fixed - watchdog / flood

watchdog table deletions have been improved without adding an index. The flood in the system module now and a expiration that is indexed however it could of been implemented similar to watchdog however I can't see anyone complaining about the performance on this table and @damien_vancouver in comment #1 seem happy with indexs.

history table

history based deletions are in modules/node/node.module are still DELETE FROM history WHERE timestamp < X and still don't have a timestamp based deletion. The trick used for watchdog table won't work as it is a table that is updated and not inserted on.

First thing to check is if this table is innodb. If it is myisam, every insert,update, delete will be a table lock so you won't get much thoughput.

Options for improvement here (aside from adding a timestamp index are):

Second, is if the delete is running at the default transactional isolation level of repeatable-read it may be getting more locks than required. In this case (not generally) a lower read-commited/read-uncommited (a transaction is only every going to update the timestamp and not roll back so uncommited reads are probably useful) will restrict locks to those rows it intends to delete.
An isolation mechanism will need to be written in a db abstraction as this is mysql specific db_query("set tx_isolation="read-uncommitted'") (postgres implementation)).

Even with a limited number row locks, with a big delete is that every update of a row in the middle of being deleted will be locked. DELETE FROM can support DELETE FROM history WHERE timestamp < X LIMIT Y on mysql and sqlite(if compiled with it) but not postgresql which would of been a nice way to delete in batches providing the opportunity for other instances to perform updates between loop iterations (perhaps the availability DELETE LIMITS can be exposed from the driver if exists and places like this can use it?).

To perhaps reduce some locking for the period it takes to determine all rows where timestamp < X;

CREATE TEMPORARY TABLE history_delete(`uid` int(11) NOT NULLm `nid` int(11) NOT NULL, PRIMARY KEY (`uid`,`nid`))
INSERT INTO history SELECT uid,nid FROM history WHERE timestamp < X

and then

DELETE FROM  history USING history,history_delete WHERE history.uid=history_delete.uid and history.nid=history_delete.nid;
 delete would reduce the locking to a primary key and perhaps save deadlocks for the length of time for this INSERT statement.

my only other suggestion is set a lower session locktimeout (also in need of abstraction mysql/postgres) in node_tag_new and catch and ignore any deadlock/locktimeout errors and loose a bit of data.

dagmar’s picture

Component: dblog.module » history.module