After I turned on notifications/messaging/og content notifications I noticed certain pages on my site were never loading. I checked mysql usage, and on these page loads it was totally maxing out. I then enabled query logging and found that this query was holding up the processes for minutes at a time:
SELECT s.*, f.* FROM notifications s INNER JOIN notifications_fields f ON s.sid = f.sid LEFT JOIN term_node t ON f.field = 'tid' AND f.value = CAST(t.tid AS CHAR) WHERE s.uid = 2 AND event_type = 'node' AND ((f.field = 'nid' AND f.value = '62394') OR (f.field = 'type' AND f.value = 'program_producer') OR (f.field = 'author' AND f.value = '2') OR (t.nid = 62394))
This is a bit beyond my level of understanding, but turning off all notifications modules stopped the problem and my site is back to normal. Here are my versions:
Notifications 6.x-1.0-beta3
OG HEAD from Oct 9th
Comments
Comment #1
fractile81 commentedI've run into this problem as well, and I believe I have narrowed the problem down to the LEFT JOIN. If you do an EXPLAIN on that query, you will find that the LEFT JOIN on the term_node table has to parse through a lot more data than is needed because of the
CAST(t.tid AS CHAR)statement.I've tried moving the cast to f.value, but that throws warnings. Remove the cast completely, and there are no errors but I'm not sure if that's really going to work. This specific LEFT JOIN is coming from the notifications_tags.module file's notifications hook, the 'query' op. For performance, there has to be a better way to do this.
Comment #2
mehmeta commentedI'm having the same problem, and we've observed that removing the CAST (as in doing ...AND f.value = t.tid... instead) improves the speed significantly, seemingly without any side effects (select 123 = '123' returns 1 after all). Is the CAST really necessary?
Comment #3
jaydub commentedI can tell you right off the bat that if you remove the CAST then it's going to cause trouble with PostgreSQL at least 8.3 and higher. The JOIN is on two fields that are not the same type (int vs varchar) and you get away with the implicit CAST in MySQL but not in PostgreSQL.
It might be better off in the long run to have the notifications_fields table have columns for INT values and for VARCHAR values and write the JOIN queries to use the relevant data type for the JOIN. It's a bit hackish but it would allow for the most portable SQL while still allowing for optimal JOINs.
Comment #4
jose reyero commentedYes, I'm afraid that is a huge query and may cause this kind of issues. In order to get it fixed we'll need some data and help from people running into that issue.
Please, could you guys post some more performance data about that query on different conditions? (number of users/nodes/taxonomy terms/subscriptions, time the query takes). We'll be able to use it later for benchmarking.
That is kind of a magic query that gets the notifications_queue table populated in just one db hit, however it was to be expected that sooner or later we'd run into some performance issues. Considering some options to fix it:
Please, feedback.
Comment #5
moshe weitzman commentedFYI, I am not using taxonomy notifications on groups.drupal.org and still seeing a slow query. I sent an export of the relevant tables to Jose. I hope others do the same so he can try out alternatives.
Comment #6
jose reyero commentedJust some status update.
There are some performance improvements already committed in the 5.x branch. We are doing some testing, if successful I'll be porting them to the 6.x version right away.
From my tests, the real issue here seemed to be the table join with the 'cast', which is incredibly slow, so what I'm doing is replacing that table joins with IN(tid, tid2...) conditions.
On some quick tests, with the og_notifications module, this took the query time down from around 2 mins to around 2 secs!!!
I'll follow up on this when I get some more real life performance data. This seems to be a inmediate solution though, without discarding other possible future improvements.
Comment #7
kyle_mathews commentedJose,
I'll upgrade our site to the dev version and give a report.
Comment #8
moshe weitzman commentedThe fixes in D5 version completely resolved the performance problems on groups.drupal.org. Thanks Jose.
Comment #9
kyle_mathews commentedSame from my end. The offending query is still one of the slower queries but it's dropped from ~300ms to ~10-50ms. Thanks Jose.
Comment #10
jose reyero commentedOk, this seems to work, and already ported to 6.x
I'm closing this one, but more patches for further performance improvements will be welcomed any time.
Thanks for the feedback.
Comment #11
emilyf commentedJose, thanks. Looks good for me, too on rc-1.