The $mega_query in uc_followup_cron() is very aptly named. On a site with a lot of orders, it can take a LONG time to run. On a site that I'm currently maintaining, I'm using the Elysia Cron to restrict the UC Followup's cron to once a day at 2am, because it takes an average of 15 seconds to run.

Recently, it took 459 seconds to run, which combined with a large flush of traffic caused my database to spin out of control.

I'm creating this issue because I'm interested in exploring ways of improving uc_followup_cron() to make it more efficient.

Comments

DanGarthwaite’s picture

It has been a long time since I fought with this, but I know a little more about SQL pattterns.

What if the unix timestamp of the future date is calculated at the time it is entered, and then the query would look like:

SELECT * FROM {table} WHERE next_reminder < CURRENT_TIMESTAMP() AND already_sent != 1;

If you want to repeat sending, recalculate the next current_timestamp, otherwise set already_sent = 1.

I've feelings of anxiety just remembering all the date logic in that mega_query....

m.stenta’s picture

I found two small things that improve performance a bit:

For one, the module creates a combined index of order_id and followup_id in hook_schema:

'indexes' => array(
  'order_id' => array('order_id'),
  'order_id_followup_id' => array('order_id', 'followup_id'),
),

... but in the mega query, it does a GROUP BY with them switched (followup_id first, then order_id):

GROUP BY followup_id, order_id

In order for that GROUP BY to work with the MySQL index, it needs to be in the same order. So changing it to the following in the query helps a little:

GROUP BY order_id, followup_id

For more info, see:
http://www.ovaistariq.net/17/mysql-indexes-multi-column-indexes-and-orde...
http://stackoverflow.com/questions/3915574/mysql-optimization-group-by-m...

The second thing that would improve query speed is adding an index to the date column in the {uc_followup_sent} table. The mega query performs an ORDER BY using the date column, which can be improved by indexing it. Adding the following to hook_schema would help (a hook_update_n will also be required):

  'date' => array('date'),

Neither of these changes improved my queries significantly, but they did help a little. I think it's going to require more of an overhaul than just some improved indexing. I'm doing more digging, but will try to put together a patch that includes these improvements as well. Just thinking out loud for now...

m.stenta’s picture

Version: » 6.x-1.x-dev

Committed the two above-mentioned changes to the 6.x-1.x branch.

Still organizing my thoughts regarding overhauling the uc_followup_cron() function...

m.stenta’s picture

amorsent and myself spent a good bit of time yesterday (on the plane to DrupalCon) dissecting the query to better understand what it's actually doing. We found a number of issues with it, which I'm willing to bet are responsible for some of the other bug reports in the issue queue.

I copied the query in its entirety below, restructured it a bit for easier reading (without changing what it's doing), and added lots of comments to explain it step by step. I noted the possible issues where applicable.

<?php
# The main purpose of this query is to generate a list of emails that need to be sent. It does this by examining data in the {uc_orders} table, previously sent emails in the {uc_followup_sent} table, and joining things together with the {uc_followup} table, which defines the various followups and their rules.

# Select the order_id from the {uc_orders} table, everything from the {uc_followup} table, and the grouped "date", "manual", and "email" fields from the {uc_followup_sent} table (see the sub-query in the second JOIN for information about how the fields are being grouped).
SELECT o.order_id, fu.* , fus.date, fus.manual, fus.email

# Start by loading the followup rules.
FROM {uc_followup} fu

# Join all orders to the followup rules, based on order_status. This will create a big list of followups, one for each potential followup that needs to go out. The WHERE clauses below will be responsible for filtering them down to followups that ACTUALLY need to be sent (based on their date, whether they've been sent already, etc).
LEFT JOIN {uc_orders} o ON fu.order_status = o.order_status

# Join in previously sent followups, grouped by followup_id+order_id, selecting all fields, and creating a COUNT() field of emails sent for each followup_id+order_id combination.
/**
 * Issues:
 *   1) First and foremost, when performing a GROUP BY in SQL, all fields that are being returned are supposed to be aggregated using one of the aggregation functions (ie: COUNT(), SUM(), MAX(), etc) (the exception is values that are used in the GROUP BY clause itself, which in this case would be 'order_id' and 'followup_id'). In this query, it's returning all the fields, without aggregating them, and then also adding a count of the order ids using COUNT() (which is correct usage).  This results in the fields being grouped in unpredictable ways (see issue #2 below). MySQL will let you do this, and just assumes you won't mind unpredictably grouped results. PostgreSQL might allow this as well, but the fact is it's not correct SQL usage (and therefore may not work at all in some databases).
 *   2) The 'date' field is not being aggregated, so when the GROUP BY clause is performed, if more than one email has been sent out for a given order_id+followup_id, it will give you the date of the first one that's returned by the query... NOT the last email that was sent, as it's supposed to. It seems that the original intention was to use the 'ORDER BY date DESC' clause to sort all of the sent emails by date first, so that the aggregated date you end up with is the largest (most recent) one. This is flawed logic, however, because GROUP BY takes place before ORDER BY, so the date has already been aggregated by the time they are sorted. Naturally, this also means that any logic that uses the 'date' field elsewhere in this query is not using the correct date.
 *   3) Lastly, the query is selecting ALL fields from the sent followups table, which includes all the data in the 'subject' and 'body' fields of every email that's been sent out. That's a huge amount of data to be asking for, and none of it is being used anywhere else. This is terrible for the query's performance, and it drastically increases the amount of time it takes to run.
 */
LEFT JOIN (
  SELECT *, COUNT(order_id) as "sent_count"
    FROM {uc_followup_sent}
    GROUP BY order_id, followup_id
    ORDER BY date DESC
  ) fus ON fu.followup_id = fus.followup_id AND fus.order_id = o.order_id

# Limit the list of pending emails based on the following conditions. There are two overall scenarios that should trigger emails: 1) if no email has ever been sent, and the date meets the right criteria; and 2) if repeat emails need to be sent out.
WHERE
(
  # First scenario: if no emails have been sent, and the date meets the right criteria...
  (
    # If no email has ever been sent for the order_id+followup_id...
    fus.date IS NULL

    AND
    
    # If the 'hours_past' time is between now and X hours ago (where X is the same 'hours_past' value)...
    # For example, if 'hours_past' is set to 1, this will return true if the order modified date + 1 hour is between 1 hour ago and now.
    /**
     * Issue:
     *   I struggled for a while trying to understand what this meant. The logic seems to be inherently flawed, because it's essentially saying: if an email hasn't been sent, send it now, unless it's past the 'hours_past' time, in which case don't send it. So it's only sending emails if cron runs BEFORE the 'hours_past' time has passed, and won't send them after. In other words, it's doing exactly the opposite of what it should be doing.
     */
    (o.modified + fu.hours_past * 3600 BETWEEN %d - fu.hours_past * 3600 AND %d)
  )
  
  OR

  # Second scenario: repeat emails need to be sent out...
  (
    (
      # If the followup should be repeated...
      fu.repeat_after > 0

      AND

      # If the 'repeat_after' time is between now and X hours ago (where X is the 'hours_past' value).
      # For example, if 'hours_past' = 1, and 'repeat_after' = 6, then this condition will return true if the date that the last email was sent + 6 hours is between 1 hour ago and now.
      /**
       * Issues:
       *   1) This logic also took me a long time to understand, and I don't think it makes sense. Like the similarly flawed logic in the first scenario above, it seems to be saying that repeat emails should only be sent if cron runs between a limited set of time. Also, using 'hours_past' in this logic doesn't seem to make sense.
       *   2) It's also worth noting again that the 'date' value is most likely NOT correct here, because of the improper grouping that happens in the sub-query of the {uc_followup_sent} table.
       */
      (fus.date + fu.repeat_after * 3600 BETWEEN %d - fu.hours_past * 3600 AND %d)
    )

    AND

    (
      # If the followup should be repeated unlimited times...
      fu.repeat_max = 0

      OR

      # If the maximum number of repeat emails is less than the number that have been sent
      /**
       * Issue:
       *   The variables are reversed here, so it doesn't make any sense at all. It should be saying "if the number of emails sent is less than the number that should be sent", but instead it's saying "if the number that should be sent is less than the number that have been sent".
       */
      fu.repeat_max < fus.sent_count
    )
  )
)

AND 

# Only send emails for followups that are active.
/**
 *  Issue:
 *   It might make more sense to put this at the top of the WHERE clause, to eliminate all disabled followups right from the start. I'm not sure if the ordering of WHERE clauses actually affects the order in which they are performed, but if so then it will increase performance. But regardless, it would make the query a little easier to understand.
 */
fu.status = 1;

?>

Conclusion: this query needs some serious work. I'm actually surprised that none of this has been discovered sooner. I guess the fact that *some* emails were going out hid the fact that almost everything else about it is NOT working correctly.

I'd love to get some second opinions on this, to make sure that the issues described above actually are issues. I've gone over this query many times, to double-check my assumptions, so I'm fairly confident about it. But I may have made a mistake or misunderstood something, so any/all feedback is welcome!

I've started reworking the query from scratch, and also reworking the uc_followup_cron() function. I'll post my ideas as soon as they are fleshed out.

m.stenta’s picture

I think this issue is the cause of these other three issues:

#620800: Sending way too many emails!
#832416: Repeat send out emails every cron runs.
#724160: Followups not being sent

I'm going to redirect efforts from those issues to this one.

m.stenta’s picture

Possible issue: if this module is installed on a site that has a lot of past orders, there's the potential for unwanted followups to be sent out for them.
It seems that a previous issue (#620800: Sending way too many emails!) tries to fix this by adding the BETWEEN date logic to the query, but that logic doesn't seem right.

Two possible approaches:

1) Add an "effective" timestamp to followup rules, which determines when they should go into effect. This would allow you to only send out emails for orders AFTER a followup rule is added.

And/or...

2) Add a global setting for uc_followup that allows you to define a sort of "statute of limitation" for followups... meaning only orders that have been modified in the past X days would be included as possible followups.

m.stenta’s picture

Title: Increasing performance of the mega query » Reworking uc_followup_cron() and the $mega_query

Changing the name of this issue to better reflect it's new intentions... it's more than just a performance issue.

wqmeng’s picture

Hello,

As some other reporter said before, the logic of the send out should be totally rewrite.

I have already give up to use this module for a long time, I have created a custom module and with a cron_hook which works just as I expected .

m.stenta’s picture

wqmeng: Thanks for the confirmation that it needs rewriting. If you have any code that you think would be helpful, please feel free to share it! I am working on a rewrite right now, and I think I have it mostly figured out, but if you have any additional ideas they are certainly welcome. I will be posting my findings soon.

m.stenta’s picture

Status: Active » Needs review
StatusFileSize
new9.43 KB

Attached is a proposed patch to 6.x-1.x-dev that is a full rewrite of the query and some of uc_followup_cron().

It also adds an 'effective_date' field to followups so that you don't run into the issue where an existing site with lots of past orders wants to send out tons of unnecessary followups as soon as you install this module (see #620800). It defaults all existing followups to become effective when you run update.php (although this may require more thought... it could end up missing some emails that way... ideas?)

Please review!

Note, this only changes the query in uc_followup_cron(), which takes care of automatic followups (the bigger problem). I still need to fix the query for determining which manual emails need to be sent, because right now it's still using the old (incorrect) logic.

m.stenta’s picture

StatusFileSize
new9.53 KB

Couple of things wrong with the patch above, fixed in this new one:

-Forgot to put curly brackets around table names.
-Forgot to replace UNIX_TIMESTAMP() with %d placeholders.
-Moved $i++; out of the email sending "if" structure (see below).

I moved the $i++; because of the following scenario: imagine a site that, for whatever reason, ends up with orders that don't have email addresses. These will eventually build up in the query's results, resulting in needless uc_order_load()s. This is a problem regardless of where the $i++ is, but at least moving it down two lines will stop it from executing uc_order_load() more than the 'uc_followup_send_limit'.

I'm open to ideas on how to avoid that problem too.

m.stenta’s picture

StatusFileSize
new16.65 KB

Ok, here is a patch that includes the stuff in #11, but also fixes all the other queries in the module. I abstracted the query out into a helper function so that it can more easily be reused by other functions (and altered slightly depending on what they need).

Please test this patch. I'm starting to test it myself in a live site, so I'll let you know how that goes, but second opinions are welcome!

m.stenta’s picture

Just to give you an idea of the performance improvements... on the site that I originally reported this ticket for, the max cron execution time has dropped from 459 seconds to 2 seconds! YEA!!!

I'm keeping an eye on the sent emails after every cron run and it appears to be working as expected.

amorsent’s picture

Hurray!

m.stenta’s picture

Status: Needs review » Closed (fixed)

I'm committing this to 6.x-1.x-dev and closing this ticket as fixed. It's working well for me, which is 100% better than the current release of the module was, so I see no reason to wait for more reviews. Please submit issues if you find any bugs with it!

This update is also a great reason to make a 6.x-1.2 release (along with some of the other recent bugfixes). I might wait a bit on that to see if any issues are found with this... or I might not. At the very least I want to run a Code Review cleanup of the module to bring it all up to standards.