After I turned on notifications/messaging/og content notifications I noticed certain pages on my site were never loading. I checked mysql usage, and on these page loads it was totally maxing out. I then enabled query logging and found that this query was holding up the processes for minutes at a time:

SELECT s.*, f.* FROM notifications s INNER JOIN notifications_fields f ON s.sid = f.sid LEFT JOIN term_node t ON f.field = 'tid' AND f.value = CAST(t.tid AS CHAR) WHERE s.uid = 2 AND event_type = 'node' AND ((f.field = 'nid' AND f.value = '62394') OR (f.field = 'type' AND f.value = 'program_producer') OR (f.field = 'author' AND f.value = '2') OR (t.nid = 62394))

This is a bit beyond my level of understanding, but turning off all notifications modules stopped the problem and my site is back to normal. Here are my versions:

Notifications 6.x-1.0-beta3
OG HEAD from Oct 9th

Comments

fractile81’s picture

I've run into this problem as well, and I believe I have narrowed the problem down to the LEFT JOIN. If you do an EXPLAIN on that query, you will find that the LEFT JOIN on the term_node table has to parse through a lot more data than is needed because of the CAST(t.tid AS CHAR) statement.

I've tried moving the cast to f.value, but that throws warnings. Remove the cast completely, and there are no errors but I'm not sure if that's really going to work. This specific LEFT JOIN is coming from the notifications_tags.module file's notifications hook, the 'query' op. For performance, there has to be a better way to do this.

mehmeta’s picture

I'm having the same problem, and we've observed that removing the CAST (as in doing ...AND f.value = t.tid... instead) improves the speed significantly, seemingly without any side effects (select 123 = '123' returns 1 after all). Is the CAST really necessary?

jaydub’s picture

I can tell you right off the bat that if you remove the CAST then it's going to cause trouble with PostgreSQL at least 8.3 and higher. The JOIN is on two fields that are not the same type (int vs varchar) and you get away with the implicit CAST in MySQL but not in PostgreSQL.

It might be better off in the long run to have the notifications_fields table have columns for INT values and for VARCHAR values and write the JOIN queries to use the relevant data type for the JOIN. It's a bit hackish but it would allow for the most portable SQL while still allowing for optimal JOINs.

jose reyero’s picture

Assigned: Unassigned » jose reyero
Category: bug » task

Yes, I'm afraid that is a huge query and may cause this kind of issues. In order to get it fixed we'll need some data and help from people running into that issue.

Please, could you guys post some more performance data about that query on different conditions? (number of users/nodes/taxonomy terms/subscriptions, time the query takes). We'll be able to use it later for benchmarking.

That is kind of a magic query that gets the notifications_queue table populated in just one db hit, however it was to be expected that sooner or later we'd run into some performance issues. Considering some options to fix it:

  • (@jaydub) Have the notifications_fields table have columns for INT values and for VARCHAR values and write the JOIN queries to use the relevant data type for the JOIN
  • Just storing the event on the page request and moving that query to cron processing
  • Reworking the notifications_taxonomy part (if it's the only one causing trouble) and loading the node tids then doing some 'IN (tid1, td2, tid3...)' insted of the table join.
  • Splitting the big query into two, so we can use an INNER JOIN for the taxonomy one (we repeat the query only for taxonomy terms)

Please, feedback.

moshe weitzman’s picture

FYI, I am not using taxonomy notifications on groups.drupal.org and still seeing a slow query. I sent an export of the relevant tables to Jose. I hope others do the same so he can try out alternatives.

jose reyero’s picture

Title: notifications query hanging up mysql, maxing processing » Performance issues, notifications queue query very slow

Just some status update.

There are some performance improvements already committed in the 5.x branch. We are doing some testing, if successful I'll be porting them to the 6.x version right away.

From my tests, the real issue here seemed to be the table join with the 'cast', which is incredibly slow, so what I'm doing is replacing that table joins with IN(tid, tid2...) conditions.

On some quick tests, with the og_notifications module, this took the query time down from around 2 mins to around 2 secs!!!

I'll follow up on this when I get some more real life performance data. This seems to be a inmediate solution though, without discarding other possible future improvements.

kyle_mathews’s picture

Jose,
I'll upgrade our site to the dev version and give a report.

moshe weitzman’s picture

The fixes in D5 version completely resolved the performance problems on groups.drupal.org. Thanks Jose.

kyle_mathews’s picture

Same from my end. The offending query is still one of the slower queries but it's dropped from ~300ms to ~10-50ms. Thanks Jose.

jose reyero’s picture

Status: Active » Fixed

Ok, this seems to work, and already ported to 6.x

I'm closing this one, but more patches for further performance improvements will be welcomed any time.

Thanks for the feedback.

emilyf’s picture

Jose, thanks. Looks good for me, too on rc-1.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.