Problem/Motivation

To effectively send out digests we should be using queue workers. This patch should aim to add the base for this queue and worker. In an effort to keep patches small and contained, this should not include much if any of rendering / mailing the digests. Just queuing digests at the correct times.

Proposed resolution

Add a digest queue and worker. In the cron hook check which digests need to be sent and queue up these items to be processed.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

DerekCresswell created an issue. See original summary.

DerekCresswell’s picture

I need to write out my thoughts here as jumping into this I feel there is going to end up being too much going on at once. Here is the plan I hope to follow and I feel it will need a few separate issues to break this into in order to keep patches small and code testable.

* On a cron run we need to check each active digest to see if it is due and add it to the queue if it is.
* To check if it is due we need to know the last time we ran it.
* Since the digests are config entities we could store the last run with \Drupal::state().

I set out with TDD in mind and I ran into a few spots where I began to foresee creating too much code for just one issue.

To minimise the amount of code within the .module and make sure everything is split into testable chunks I created a class called Digests very similar to the Views class.
This contained a few static helper functions such as getEnabledDigests. This kept that code small and testable. It also kept code out of the .module file.

Then I wanted to take the checking of if a digest should run out of the cron function so I added a shouldSend function to the digests. This is where I began to feel the need for splitting this issue.

Sadly, the default functionality of the cron expression library from what I could see runs on too small of a granularity for digests to be sent out effectively.
To see if a digest should be sent I needed to compare the last time the digest should have been sent (according to cron-expression) and the actual time it was sent out. This way I could see if the digest could be sent like this :

if (lastActualSendDate < lastScheduledSendDate)
  Send();

This works fine, but in order to have this work properly I needed to know the last time a digest was sent out. So I went to set up the state for holding the last run time. This would be fine, simply sent it when the digest is being sent.
The next problem came here, what to do if the digest has just been created and hasn't been sent yet? A naive solution would be if the last run state hadn't been set yet just assume it is 'now'. The problem with that is if you made a digest that should send out on monday and created it on a tuesday, the next cron would send this digest.
The idea I came up with would be to set this last run date to the NEXT scheduled run date when creating a digest. This would mean a new digest doesn't send until a reasonable time after it should be, rather than randomly the first time.

To achieve that we would need to create a custom digest storage to set up this state variable. This definitely feels like it should be separated to keep everything small, review able, and testable. This would have to be done in a way to avoid breaking other tests (where digests are created without a send_at) which would likely boil down to use the current time if no send_at is set. Or it could mean creating a CreateTestDigestTrait which would spit out a valid / random digest. I might lean to this second option since in reality, a digest should not be created without a send_at.

TLDR:
Split this into two / three issues

* Create Digests static class for digest helper functions. Moves code out of .module and improves dev experience.
* Create DigestStorage to help with storing last time digests were sent out
* Perhaps this requires a test base or test trait be made to help with creating digests. ( this would be the / three)
* Finally, come back here and wire this all up to add digests to a queue.

This might sound like a big pile of gobbly guck, I will discuss it in person next week as I think it will come out better in person.

DerekCresswell’s picture

Here is a progress report more or less. This patch does not include the queue worker, just set up for it.

This is utilising a static Digests class to provide some helper functions for use in places like the .module. The Digest entity was also expanded to include some functions for detecting when to send it out.

The option was brought up to move the functionality of the Digests class into a DigestStorage class. Accomplishes the same, just depends what we want to use.

Unless this gets much too large for a single issue, it will stay here to keep context.

DerekCresswell’s picture

Status: Active » Needs review
FileSize
32.54 KB
22.24 KB
  • Added the Digests::queueDigests function (and tests).
  • Added the Digest->getSubscribedUsers function (and tests).
  • Added a (for now) empty queueworker.

This should be ready for scrutiny now. The functionality of actually sending the digest will come in the next issue, this is just for queuing.

DerekCresswell’s picture

FileSize
20.91 KB
10.68 KB

Whoops, the previous patch snuck itself in.

joshmiller’s picture

Status: Needs review » Needs work
  1. +++ b/digest.module
    @@ -27,6 +28,15 @@ function digest_help($route_name, RouteMatchInterface $route_match) {
    +  Digests::queueDigests();
    

    <3

  2. +++ b/digest.module
    @@ -44,30 +54,10 @@ function digest_field_formatter_info_alter(array &$info) {
    +  return Digests::getDigestsAsOptions();
    

    I wonder if our config of the field value can simply call this function?

  3. +++ b/src/Digests.php
    @@ -0,0 +1,108 @@
    + * Static wrapper for digests.
    

    This is more than a wrapper. It provides a single point to understand all digests in the system. I think you could add a sentence or two that would help others know more about how important this class has become.

  4. +++ b/src/Digests.php
    @@ -0,0 +1,108 @@
    +  public static function storage() {
    +    return \Drupal::entityTypeManager()->getStorage('digest');
    +  }
    

    Why is this static? If it wasn't static, you could do dependency injection.

    Also, every other function with one line has extra lines above and below. It doesn't matter, as long as you stay consistent.

  5. +++ b/src/Digests.php
    @@ -0,0 +1,108 @@
    +    return $digests;
    

    There is technical debt here. Consider adding a @todo to consider optimization strategies for when the number of digests passes thousands.

  6. +++ b/src/Digests.php
    @@ -0,0 +1,108 @@
    +    $queue = \Drupal::queue('digest_send');
    

    This static thing feels hacky.

  7. +++ b/src/Entity/Digest.php
    @@ -119,7 +128,8 @@ class Digest extends ConfigEntityBase implements DigestInterface {
    +    $this->state = Drupal::state();
    

    :thinking: Hmmm ... if we store a state value on the config entity ... this would potentially mean exporting config would nearly always be changing. I believe this is not a good place to store this value.

    This value might be able to be calculated by looking at the queue for the latest sent.

  8. +++ b/src/Entity/Digest.php
    @@ -248,4 +258,62 @@ class Digest extends ConfigEntityBase implements DigestInterface {
    +      return FALSE;
    

    This is known as an implicit feature. Be wary of implicit, these undocumented and highly presumptive features can be a real headache for end-users. It's often far better to make things explicit by making certain features boolean with a checkbox. In this case, I think a note in the form about what a blank value does would be "good enough."

  9. +++ b/src/Entity/Digest.php
    @@ -248,4 +258,62 @@ class Digest extends ConfigEntityBase implements DigestInterface {
    +    return $user_storage->loadMultiple($query->execute());
    

    This could be a problem with 10k+ records (not unheard of). I would, at the very least, note in the README.md and on the module page that "This module was designed with a reasonably small number of users and digests in mind. Consider sponsoring batching operation additions if you need to scale past thousands."

DerekCresswell’s picture

  • #6.2 Yes this works, but the callback is supposed to take in three parameters so it probably should be left as is. Also, if we would prefer switching to a non-static method we would need this call back again.
  • #6.4 It was made static to keep it more as a one use kind of class. It was simply a design choice. This class is not supposed to be used within the module itself (minus procedural files). It is supposed to act as an easy and quick way for a dev to wire up a module to this.
  • #6.5 Yes it would be a problem if this were to pass the thousands. But the first logical thought of this being a digest is that there wouldn't be thousands. I've never found a site that has that many subscription options. It's also premature optimisation; we can fix it if it becomes an issue.
  • #6.7 The state is not part of the configuration. I'm not sure we can query the last time a queue was run and I don't know how you could really classify what is the "last time" it's run as every digest is in the same worker and can be run at different times.
  • #6.8 I think the choice would be to then add this to documentation instead. It makes sense that a digest shouldn't send without a schedule. Also this would not stop you from manually sending a digest; this is just a check you can perform.
  • #6.9 To use your own words against you, Cadillac vs bike. While yes that could be an issue (like 6.5) it's not something we have come across yet. If the need arises we can deal with that.

I will create a patch that adds this functionality to a "DigestStorage" class instead as that was a possibility and we can judge them side by side. I can foresee that making the module functions less like hooks to a slight degree, but not much to worry about.

DerekCresswell’s picture

Comparison between Digests and DigestStorage

Note, these are minimums. Some functions needed for the storage method will simply be added later for the static method. Also I identified some slight improvements so the storage has that benefit. Most suggestions from #6 were left out until a method is chosen.

Digests :

 digest.module                                               |  34 ++--
 src/Digests.php                                             | 108 +++++++++++++
 src/Entity/Digest.php                                       |  70 +++++++-
 src/Entity/DigestInterface.php                              |  29 ++++
 src/Plugin/QueueWorker/DigestSendQueue.php                  |  97 +++++++++++
 tests/src/Kernel/DigestsTest.php                            | 225 ++++++++++++++++++++++++++
 tests/src/Kernel/Entity/DigestTest.php                      |  85 ++++++++++
 tests/src/Kernel/Plugin/QueueWorker/DigestSendQueueTest.php |  94 +++++++++++
 8 files changed, 719 insertions(+), 23 deletions(-)

DigestStorage :

 digest.module                                               |  34 ++----
 src/DigestStorage.php                                       |  95 +++++++++++++++
 src/DigestStorageInterface.php                              |  38 ++++++
 src/Entity/Digest.php                                       |  88 +++++++++++++-
 src/Entity/DigestInterface.php                              |  42 +++++++
 src/Plugin/QueueWorker/DigestSendQueue.php                  |  97 ++++++++++++++++
 tests/src/Kernel/DigestStorageTest.php                      | 161 ++++++++++++++++++++++++++
 tests/src/Kernel/Entity/DigestTest.php                      |  91 +++++++++++++++
 tests/src/Kernel/Plugin/QueueWorker/DigestSendQueueTest.php |  94 +++++++++++++++
 9 files changed, 717 insertions(+), 23 deletions(-)

Pretty comparable by this metric.

I do feel that something like queuing the digests does not belong on the storage class. It simply feels out of place.

DerekCresswell’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 8: 3162041-8-storage.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

DerekCresswell’s picture

FileSize
21.15 KB

Fixed .module file for the storage patch.

joshmiller’s picture

Status: Needs work » Needs review
DerekCresswell’s picture

FileSize
21.63 KB

We are giving this a go with the Digest class but instead as the DigestManager. This avoids naming confusions and will allow us to use dependency injection.

travis-bradbury’s picture

Status: Needs review » Needs work

I think there's two problems:

  1. +  /**
    +   * Gets a digest from ID.
    +   *
    +   * @param string $digestID
    +   *   The digest to find or load.
    +   *
    +   * @return \Drupal\digest\Entity\DigestInterface
    +   *   The loaded digest.
    +   */
    +  protected function getDigest(string $digestID) {
    +
    +    // If this digest has not been loaded already, add it to the cached digests.
    +    if (!isset($this->loadedDigests[$digestID])) {
    +
    +      $this->loadedDigests[$digestID] = $this->digestStorage->load($digestID);
    +
    +    }
    +
    +    return $this->loadedDigests[$digestID];
    +
    +  }
    

    Drupal already has a static cache of entities. I don't think you should also cache them, since your references won't be cleared when the normal entity cache is reset.

  2. +  /**
    +   * {@inheritDoc}
    +   */
    +  public function processItem($data) {
    +
    +    /** @var \Drupal\user\UserInterface $user */
    +    $user = $this->userStorage->load($data['user']);
    +    $digest = $this->getDigest($data['digest']);
    +
    +  }
    

    I think you should also have it do something, even just adjusting the datetime in state so that the tests can have some basic coverage of processing the items, not just checking that they're in the queue.

And a few nitpicks:

  1. +  /**
    +   * {@inheritDoc}
    +   */
    +  public function shouldSend($currentTime = 'now') {
    

    The current time is now; you can't change it. Maybe call the parameter `$time` instead.

  2. +  /**
    +   * {@inheritDoc}
    +   */
    +  public function getLastSentAt() {
    +
    +    // Return NULL if no send at is specified.
    +    if (!isset($this->send_at)) {
    +      return NULL;
    +    }
    +
    +    $from_state = $this->state->get($this->id() . self::STATE_POST_FIX);
    +    $date_time = new DateTime();
    +
    +    if (isset($from_state)) {
    +
    +      // Set the timestamp of the date time object and return it.
    +      return $date_time->setTimestamp($from_state);
    +
    +    }
    +
    +    // No time stamp exists so assume the digest has not run yet and return the
    +    // next scheduled run date.
    +    $set_to = $this->getSendAt()->getNextRunDate()->getTimestamp();
    +    return $date_time->setTimestamp($set_to);
    

    getLastSentAt returns something in the future if it hasn't ever run, but that's
    a lie. Why not return null? Is that so new digests wait for their schedule to come around instead of sending
    immediately?

    Would a digest ever send? Ie, wouldn't getNextRunDate() always give something in the future?

    What do you think about a digest's next run date being in state, so anything at or past its next run date runs? Then a newly created/updated digest could have its next run in state and you won't have this situation with a null run date.

  3. +++ b/src/Entity/Digest.php
    @@ -248,4 +258,62 @@ class Digest extends ConfigEntityBase implements DigestInterface {
    +    return $user_storage->loadMultiple($query->execute());
    

    This could be a problem with 10k+ records (not unheard of). I would, at the very least, note in the README.md and on the module page that "This module was designed with a reasonably small number of users and digests in mind. Consider sponsoring batching operation additions if you need to scale past thousands."

    #6.9 To use your own words against you, Cadillac vs bike. While yes that could be an issue (like 6.5) it's not something we have come across yet. If the need arises we can deal with that.

    Thousands of digests might be unlikely, but thousands of users isn't. I won't complain about leaving it as is since I don't think we have a use case for that yet, but an easy fix might be to return user IDs instead of loaded users, since this is just used to record IDs of users to send to anyway.

  4. +  /**
    +   * Sets the last sent at value for this digest.
    +   *
    +   * @param \DateTimeInterface|string $setTo
    +   *   The time to set to. Defaults to now.
    ...
    +   */
    +  public function setLastSentAt($setTo = 'now');
    

    The time to set what to?

DerekCresswell’s picture

For processing the digest, I am unsure where the best spot to update the run date would be. If we consider the queue reliable, we can sent the next date after queuing up the digest. The queueworker does not check if things should be sent, it merely sends brainlessly.
Perhaps this could mean shifting the name from last send to last queued.

Or we could update it each time the queueworker runs. This would mean it is constantly updating to match the last individual email that was sent out.

The reason I had the last send value be set to the next send date was to eliminate the problem of creating a digest on Tuesday that is scheduled for Monday and having it send out on Tuesday the first time. With the date in the future it allowed for the digest to wait until the next scheduled date instead of just running ASAP the first time.

I think the next run date might be the smoother option, that way we do not need to lie.

travis-bradbury’s picture

For processing the digest, I am unsure where the best spot to update the run date would be. If we consider the queue reliable, we can sent the next date after queuing up the digest. The queueworker does not check if things should be sent, it merely sends brainlessly.
Perhaps this could mean shifting the name from last send to last queued.

Or we could update it each time the queueworker runs. This would mean it is constantly updating to match the last individual email that was sent out.

Ahh, that makes sense. Just saving the time stuff was queued seems reasonable for that. In that case, we can probably wait for the actual email functionality to test that stuff worked correctly because then you can test that emails were sent.

DerekCresswell’s picture

Status: Needs work » Needs review
FileSize
22.79 KB
14.22 KB

I've updated this to use the next run date rather than the last. I do believe this has simplified the logic, but there still needs to be methods in place to catch things like the first time the state is accessed. I think what is here is better than something like setting one when a digest is saved as new.

I have not furthered the Queueworker tests here as that will come in the next major issue. I have added that the queuing of digests updates their send dates. I believe this makes sense as I've also made sure the digest queue is reliable. It would make sense to base this time off the queuing rather than the final sent email.

For setting the send date, I've made it so the time given is relative rather then what to set it to. This keeps the schedule on rails and makes it easy to update.

Other mentions in #14 were addressed.

travis-bradbury’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me.

  • DerekCresswell committed bea9887 on 1.0.x
    Issue #3162041 by DerekCresswell, tbradbury, joshmiller: Create queue...
DerekCresswell’s picture

Status: Reviewed & tested by the community » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.