David Strauss reports that the Who's Online block in the user module was causing Wikimedia's Fundraising CORE grief when they were handling a load of around 1,200 concurrent users. One of MySQL's developers suggested, "it was because it performs a full table scan to satisfy the query. We could add an index to the sessions table, but it would dramatically slow down table updates.".

The query will be ran every time an authenticated user hits the website, which can be very frequently. The attached patch addresses this issue by caching the query for 60 seconds, greatly reducing the number of times the query will be executed on a busy website.

Comments

jeremy’s picture

Status: Active » Needs review

Updating status to reflect that a patch is attached and needing review.

RobRoy’s picture

Status: Needs review » Needs work

I feel like the variable names should be more namespace compliant like whos_online_* or user_block_* instead of seconds_online_*.

jeremy’s picture

Status: Needs work » Needs review
StatusFileSize
new4.18 KB

Variables updated. And while at it we may as well cache the entire block -- there's nothing gained from re-creating the block text each time if the queries don't change.

RobRoy’s picture

While we're at it, should we look into a general per-block caching solution for D6 instead? I heard talks of this somewhere.

david strauss’s picture

Be careful about caching the whole block. The text inside is internationalized.

FiReaNGeL’s picture

Caching is good, but an optimized query is better. Adding a compound index to the sequence table on (timestamp, uid) yields the following explain (sorry for the bad formatting) :

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE s range uid,timestamp timestamp 8 NULL 2 Using where; Using index
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 michael.s.uid 1

Originally, without the index:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE s range uid,timestamp uid 4 NULL 7 Using where; Using filesort
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 michael.s.uid 1

Notice that we lose the 'using filesort', which is causing the problem, and the we gain 'using index' for the sort, which is way faster.

We already have an index on timestamp; just add a column to uid to it. I have no way to benchmark this, but I believe it should fix the problem. I'm not at home and have not my tools to make a patch right now (and for the days to come).

FiReaNGeL’s picture

This being said, caching the block for 60 seconds anyway is a good idea, since we dont need very precise info in this block. Combined to an optimized query, this block will be pain free!

moshe weitzman’s picture

@fireangel - you suggest adding an index to sessions, not sequences i assume. as mentioned at the top of this issue, we are not inclined to slow down the many inserts into this table.

FiReaNGeL’s picture

Yes, that should read session (dont know how I ended up writing sequences).

Its adding a column to an already existing index, not creating a new one from scratch. I'll try to benchmark 100000 inserts in this table with and without the index to see if it has an impact or not.

FiReaNGeL’s picture

Ok I benchmarked 100 000 inserts in the sequence table with the normal timestamp index or the compound (timestamp, uid) one that I propose.

For benchmarking purposes, I deleted the primary key (the hash), as it would cause collisions with that many inserts (and i didnt want to fiddle with rand() that much). I also emptied the table after each run.

I randomized uid and timestamp, both components of the proposed index, as to do a worst case scenario (no anonymous users, logged at different times).

Using:

<?php
$time = microtime();
$time = explode(" ", $time);
$time = $time[1] + $time[0];
$start = $time;

//change the username / password for your mysql db
$link = mysql_connect('localhost', 'username', 'password');
if (!$link) {
	die('Could not connect: ' . mysql_error());
}
mysql_select_db('database');

for($i=0;$i<100000;$i++)
{
$uid = rand();
$timestamp = rand();
$query = "INSERT INTO `sessions` (`uid`, `sid`, `hostname`, `timestamp`, `cache`, `session`) VALUES 
($uid, 'c684ca0dc88045529fb860f982ee4743', '', $timestamp, 0, 'messages|a:1:{s:6:\"status\";a:14:{i:0;s:57:\"There is new syndicated content from <em>Drupal</em>.\";}}')";
mysql_query($query) or die('Error, select failed for some reason (shouldnt happen?)');
}

$time = microtime();
$time = explode(" ", $time);
$time = $time[1] + $time[0];
$finish = $time;
$totaltime = ($finish - $start);

echo "100000 rows inserted in $totaltime seconds";

?>

Results:

Original index: 100000 rows inserted in 14.8961410522 seconds
Compound index: 100000 rows inserted in 14.7611501217 seconds

So there's 0 performance loss for the proposed index.

killes@www.drop.org’s picture

So I guess we'd want to have a patch for a db change. Caching of blocks should IMO be done in a more general framework.

david strauss’s picture

I'm assuming the test was done with a MyISAM table. MyISAM tables can asynchronously build their indexes, so INSERTs seem artificially fast because the query returns before the index is truly up to data. I'm curious how it would work with an InnoDB table, which is what I try to use on my systems.

FiReaNGeL’s picture

Yes the test was done on a MyISAM table.

catch’s picture

Status: Needs review » Closed (duplicate)

duplicate of: http://drupal.org/node/80951

Block caching is in, but excludes who's online block due to lack of cache invalidation granularity.

catch’s picture

Split the part of that issue dealing with the who's online block over to http://drupal.org/node/186638