A client site has over 8 million rows in the {activity} table and node_save() is called very often on that site (especially updating user profiles).

The query in hook_nodeapi('update') nearly took the site down.

SELECT a.aid, u.status as user_status, a.status as activity_status FROM activity a INNER JOIN users u ON u.uid = a.uid WHERE a.uid <> 0 AND a.nid = 20132509;

This is what the explain looked like:

mysql> EXPLAIN SELECT a.aid, u.status as user_status, a.status as activity_status FROM activity a INNER JOIN users u ON u.uid = a.uid WHERE a.uid <> 0 AND a.nid = 20132509;
+----+-------------+-------+--------+---------------+---------+---------+---------------------+---------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                 | rows    | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------+---------+-------------+
|  1 | SIMPLE      | a     | ref    | nid,uid,uid_2 | nid     | 5       | const               | 1734224 | Using where |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | redacted.a.uid |       1 |             |

There are two issues with that query:

1. There is no index that spans both the uid and nid columns.

2. Having the uid first in the query means that MySQL needs to first find every row in the table where uid <> 0, then find the rows matching nid = 20132509

After running ALTER TABLE activity ADD INDEX nid_uid (nid, uid); and swapping the query clauses, I got it down to this:


mysql> EXPLAIN SELECT a.aid, u.status as user_status, a.status as activity_status FROM activity a INNER JOIN users u ON u.uid = a.uid WHERE a.nid = 20132509 AND a.uid <> 0;
+----+-------------+-------+--------+-----------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type   | possible_keys   | key     | key_len | ref                 | rows | Extra       |
+----+-------------+-------+--------+-----------------+---------+---------+---------------------+------+-------------+
|  1 | SIMPLE      | a     | range  | nid,uid,nid_uid | nid_uid | 9       | NULL                |    3 | Using where |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY         | PRIMARY | 4       | www_zinch_com.a.uid |    1 |             |
+----+-------------+-------+--------+-----------------+---------+---------+---------------------+------+-------------+
2 rows in set (0.00 sec)

Patch forthcoming.

Comments

catch’s picture

This looks problematic as well, no index spans all those rows either.


$rows = db_query("SELECT a.aid, a.status as activity_status, u.status as user_status, n.status as node_status FROM {activity} a INNER JOIN {users} u ON a.uid = u.uid INNER JOIN {node} n ON a.nid = n.nid WHERE a.uid <> 0 AND a.type = 'comment' AND a.eid = %d", $comment->cid);
catch’s picture

Status: Active » Needs review
StatusFileSize
new2.17 KB

Here's a patch.

rjbrown99’s picture

I'm now running with the patch from #2 in prod, I'll let you know if any issues arise.

rjbrown99’s picture

Almost a week here, no problems for me with the patch.

rjbrown99’s picture

Status: Needs review » Reviewed & tested by the community

I'll be bold and declare this RTBC. Simple change, helpful performance.

bibo’s picture

We're using this patch .. or actually, we just added the indexes manually for a site with about 160 000 rows in the activity table. The queries on it used to be very slow.

rho_’s picture

StatusFileSize
new2.04 KB

This patch worked well for me shaving some time off activity queries. The node update query went from .9583 ms to .0395ms after the applying the patch. That's only on about 1100 rows however, so not the best benchmark.

I've attempted to apply this patch to 6.x-2.x-dev but the line numbers do not appear to match up. I went ahead and re-rolled to update the new numbers. The patch is otherwise the same.

rjbrown99’s picture

#7 I noticed that the original SELECT query has $comment['cid'] and you are using $comment->cid in your patch. Is $comment supposed to be an array or an object? In patch #2 it's also an object. It seems to have been changed to an array to fix a bug here. In that case I suspect your patch should also be updated to use an array.

rjbrown99’s picture

Status: Reviewed & tested by the community » Needs review
StatusFileSize
new2.44 KB

Here is #7 re-rolled to use an array instead of an object.

rjbrown99’s picture

FWIW, I have also found a HUGE benefit to adding an index on the activity table, uid field. I am using this with views to display specific activity feeds per user and ran into a large bottleneck with a table scan. Indexing uid fixed it.

jerdiggity’s picture

I had implemented this previously, but after hours of losing hair I also found that the activity_access table could definitely use some improvements as well. By adding separate indexes for each column my query times went from 1+ seconds to 0.05ish seconds.

Patch to follow (if it matters)... Any feedback as to why this might be "not good" would be most appreciated as well.

:)

jerdiggity’s picture

StatusFileSize
new411 bytes
rjbrown99’s picture

Thanks. I think you will also need a function activity_update_62XX() {} function for folks who already have the module installed.

_shy’s picture

Status: Needs review » Closed (outdated)

D6 reached its EOL back in February 2016, and there is no active release for D6 for this module anymore.
Development or support is not planned for D6. All D6-related issues are marked as outdated in a bunch.

If the issue remains relevant for D10+ versions, merge requests with proposed solutions for a new module version (D10+) are welcome in a new follow-up issue.

Thanks!

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.