A client site has over 8 million rows in the {activity} table and node_save() is called very often on that site (especially updating user profiles).
The query in hook_nodeapi('update') nearly took the site down.
SELECT a.aid, u.status as user_status, a.status as activity_status FROM activity a INNER JOIN users u ON u.uid = a.uid WHERE a.uid <> 0 AND a.nid = 20132509;
This is what the explain looked like:
mysql> EXPLAIN SELECT a.aid, u.status as user_status, a.status as activity_status FROM activity a INNER JOIN users u ON u.uid = a.uid WHERE a.uid <> 0 AND a.nid = 20132509;
+----+-------------+-------+--------+---------------+---------+---------+---------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------+---------+-------------+
| 1 | SIMPLE | a | ref | nid,uid,uid_2 | nid | 5 | const | 1734224 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | redacted.a.uid | 1 | |
There are two issues with that query:
1. There is no index that spans both the uid and nid columns.
2. Having the uid first in the query means that MySQL needs to first find every row in the table where uid <> 0, then find the rows matching nid = 20132509
After running ALTER TABLE activity ADD INDEX nid_uid (nid, uid); and swapping the query clauses, I got it down to this:
mysql> EXPLAIN SELECT a.aid, u.status as user_status, a.status as activity_status FROM activity a INNER JOIN users u ON u.uid = a.uid WHERE a.nid = 20132509 AND a.uid <> 0;
+----+-------------+-------+--------+-----------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------+---------+---------+---------------------+------+-------------+
| 1 | SIMPLE | a | range | nid,uid,nid_uid | nid_uid | 9 | NULL | 3 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | www_zinch_com.a.uid | 1 | |
+----+-------------+-------+--------+-----------------+---------+---------+---------------------+------+-------------+
2 rows in set (0.00 sec)
Patch forthcoming.
| Comment | File | Size | Author |
|---|---|---|---|
| #12 | activityaccessindexes-1172732-11.patch | 411 bytes | jerdiggity |
| #9 | 1172732-9.patch | 2.44 KB | rjbrown99 |
| #7 | activity_1172732_7.patch | 2.04 KB | rho_ |
| #2 | activity_1172732.patch | 2.17 KB | catch |
Comments
Comment #1
catchThis looks problematic as well, no index spans all those rows either.
Comment #2
catchHere's a patch.
Comment #3
rjbrown99 commentedI'm now running with the patch from #2 in prod, I'll let you know if any issues arise.
Comment #4
rjbrown99 commentedAlmost a week here, no problems for me with the patch.
Comment #5
rjbrown99 commentedI'll be bold and declare this RTBC. Simple change, helpful performance.
Comment #6
bibo commentedWe're using this patch .. or actually, we just added the indexes manually for a site with about 160 000 rows in the activity table. The queries on it used to be very slow.
Comment #7
rho_ commentedThis patch worked well for me shaving some time off activity queries. The node update query went from .9583 ms to .0395ms after the applying the patch. That's only on about 1100 rows however, so not the best benchmark.
I've attempted to apply this patch to 6.x-2.x-dev but the line numbers do not appear to match up. I went ahead and re-rolled to update the new numbers. The patch is otherwise the same.
Comment #8
rjbrown99 commented#7 I noticed that the original SELECT query has $comment['cid'] and you are using $comment->cid in your patch. Is $comment supposed to be an array or an object? In patch #2 it's also an object. It seems to have been changed to an array to fix a bug here. In that case I suspect your patch should also be updated to use an array.
Comment #9
rjbrown99 commentedHere is #7 re-rolled to use an array instead of an object.
Comment #10
rjbrown99 commentedFWIW, I have also found a HUGE benefit to adding an index on the activity table, uid field. I am using this with views to display specific activity feeds per user and ran into a large bottleneck with a table scan. Indexing uid fixed it.
Comment #11
jerdiggity commentedI had implemented this previously, but after hours of losing hair I also found that the
activity_accesstable could definitely use some improvements as well. By adding separate indexes for each column my query times went from 1+ seconds to 0.05ish seconds.Patch to follow (if it matters)... Any feedback as to why this might be "not good" would be most appreciated as well.
:)
Comment #12
jerdiggity commentedComment #13
rjbrown99 commentedThanks. I think you will also need a function activity_update_62XX() {} function for folks who already have the module installed.
Comment #14
_shyD6 reached its EOL back in February 2016, and there is no active release for D6 for this module anymore.
Development or support is not planned for D6. All D6-related issues are marked as outdated in a bunch.
If the issue remains relevant for D10+ versions, merge requests with proposed solutions for a new module version (D10+) are welcome in a new follow-up issue.
Thanks!