Scenario. Drupal 6.x running in a shared host environment. Trying to send 20 messages (that's very few) correctly every poormanscron-run (~ every 5 minutes).

I could be logging to watchdog, but I logs to a csv-file instead using


$session_log=fopen($_SERVER['DOCUMENT_ROOT'].'/mail.csv', 'a+');
$log_line=date('Y-m-d') . ";" . date('H:i:s') . ";" . $subscription->mail . ";" . $message[result];
fwrite($session_log, $log_line."\n");
fclose($session_log);

First error occurs on the second cron run
19-01-09 11:25:52 dummy664@sutharsan-fanclub.org 1
19-01-09 11:25:52 dummy663@sutharsan-fanclub.org 1
19-01-09 11:25:53 dummy662@sutharsan-fanclub.org 1
19-01-09 11:25:57 dummy664@sutharsan-fanclub.org 1

Why, oh why?

I could understand multiple copies if the mail-function was returning a FALSE (not delivered). But it is not so. $message[result] is always 1, TRUE.

Result: a newsletter sent to 1.000 subscribers is usually sent 1.500 times (in this case 1.537 times, check the attached files).

How do you stop Simplenews from behaving in this manner? 20 mails per cron run is rather low, yet still yields this strange error.

Comments

A few suggestions:
* run this test with cron.php instead of poormanscron
* the only way this can happen is that the foreach loop in simplenews_mail_spool is not completed but interrupted.
* check if this caused by the visitor interrupting the page load and re-calling the page. Put a global with random number in the code and log it in csv too. The global will get a new random each time a page is called.
* debug simplenews_mail_spool and downstream (simplenews_mail_spool -> simplenews_mail_mail -> simplenews_mail) to find out is something is causing the repeats.
* log the start of the foreach loop and the simplenews_send_status_update().

I have made a few test and can confirm that sending email using poormanscron may cause emails to be send multiple times. Specially in combination with large batches and slow servers. If a second (or third, etc) visitor requests a page before the first poormanscron is finished, a second (or third, etc) cron batch is stated.
What I can not explain however is why the batches in your log are not always a multiple of the batch size, it seems like cron runs are sometimes canceled before the whole batch is sent.

@MauMau: please test with poormanscron and a global random as I described above.
Did you change the cron batch size during the cause of sending the emails? There were several successful batches of 50 emails.

For production environments I strongly discourage the use of poormanscron to send newsletters.

@MauMau: any test results?

Status:Active» Closed (fixed)

No response, closing the issue. Feel free to re-open if you have additional information.

Not using poormanscron - encountering this issue (duplicate messages sent)
Raised message limit as high as possible to reduce message delay.
Unclear on interrupt as described above. Is this to say that if there is a page request before the cron job completes there is probability of duplication / job restart?? This is very bad.. perhaps I misunderstand @sutharsan.

Anyone else seen this?

Status:Closed (fixed)» Needs review

How can one negatively advice against poormanscron, while there is on some sites a positive need for that module with sound reasons why people use it? Clearly cron(tab) and cron(php) don't do the work poormanscron does and is necessary in some situations.

So lets focus on where the error comes from and how to resolve; simplenews, poormanscron, or both?

Has there been an issue filed on poormanscron too?

The simplenews modules page still -after a year- refers to this issue page, so this issue was never resolved, or was it forgotten to remove that text?

Version:6.x-1.0-rc3» 6.x-1.x-dev
StatusFileSize
new2.51 KB

I have written a patch against 6.x-1.x-dev.

There is a structural issue in current spool selection, that pops up when executing simplenews_mail_spool before the previous simplenews_mail_spool has quit. This is the case when using poormanscron; poormanscron is not able to detect fast enough that it has already run, so it will start the simplenews_mail_spool a couple of times in a second on busy sites.
The best solution is to never ever use poormanscron and get a proper host...

My patch tries to resolve the issue by introducing the status "working" in the mail spool. This is not 100% failsafe with poormanscron. A better solution would be to first update the spool with a unique workingid AND set the status to working (WHERE status=pending LIMIT 0,...). Then select the msids by status working and the unique workingid. I'm not sure, but you can probably use the php function getmypid to generate the unique workingid. It requires a adaption to the db table (column workingid), that is why i didn't create the patch... the current patch works for me, also because i never use poormanscron

Plz test it with poormanscron... tnx!

"How can one negatively advice against poormanscron"

Simple... because it is not such an easy problem. It is not necessarily a problem in poormanscron, but depending on poormanscron for tasks that need to be executed once is not wise in general.

Isn't it true, that poormanscron will be a part of Drupal 7?
So it will be nice to work this module (and 7.x necesserily) with it...

I disabled poormans cron and then sent me emails, would it be okay to enable it again? I just dont suddent want to see a mass of duplicate emails being sent to my subscribers.

This issue caught my attention because I'm working on a module that will allow Simplenews to send from multiple SMTP servers at once using several threads. I haven't really used Poormanscron, but this solution should fix the problem there too.

I added two columns to simplenews_mail_spool that are used to note which process is handling a message record. This is similar to the patch in comment #7, which marked them as working, but includes VDMi.Frans' idea of using getmypid().

When simplenews_get_spool() is called during a cron run, it will claim a batch of messages by setting the current PHP process ID on the records, along with the current timestamp. This timestamp is used to clean up after processes that fail: if the timestamp is more than an hour old, other processes may claim that message. Note that the function first sets the process ID on a batch of messages, and then selects those messages - this way, there is never a race condition which results in two processes sending the same batch of messages.

My threaded sending module is nothing more than a test module at this point, but I've attached it here so that others can try my testing method. There are usage notes in the comments on the module.

BrockBoland, please note php docs:
"Process IDs are not unique, thus they are a weak entropy source. We recommend against relying on pids in security-dependent contexts."

By default, apache runs e.g. 20 scripts in parallel with the same pid. (Find the apache php integration specs..)

Note that this might result in multiple sends by specifications:
If a cron.php hit hits the same pid (while cron.php already running), this will result in adding another N amount of items, and resending N+(N-M) items - where M are the already sent items of the previous cron.php hit.

In case of poormanscron, the possibility is high this will happen once.

If we want to seriously support poormanscron and other runtime issues we should introduce something like:
http://api.drupal.org/api/function/lock_acquire/6
http://api.drupal.org/api/drupal/includes--lock.inc/6

Is someone of you taking the ball to implement this for 6--2 to test?

Status:Needs review» Needs work

miro_dietiker: yikes, I didn't catch that. I'll do some more research and see if I can find a better way to handle that.

I'm leaning away from using the locking functions. Whether it winds up in the module or not, I need to find a solution that will allow for multiple processes to be sending messages at the same time, so I'm going to continue working down that path.

BrockBoland

Locking is generally not bad for concurrent sending processes. Don't lock during the time you send - but e.g. during the time you allocate.. Well, depending the implementation it could also be possible without a lock.

If you have 10'000 mails... and if you have "runners" that send on each hit e.g. 100 mails... Each run has an allocation phase.
Build a unique allocation id (any unique id you like). Lock for short, allocate 100 emails, free lock ... and start to send.
There are also other ways i can think of...

Note that drupal (7) queue allows multiple runnerst and implements the concept on an abstract level.
http://drupal.org/project/drupal_queue
http://api.drupal.org/api/drupal/modules--system--system.queue.inc/7/source
http://api.drupal.org/api/group/queue/7

That sounds reasonable.

I haven't had a chance to work on this in the past few days, and won't for about two more weeks, unfortunately. But, I think I can just swap the session ID in for the process ID. The session ID will be unique to each request, which solves the issue you pointed out with the process ID not necessarily being unique. This would also remove any need for the locking functionality: each process would allocate its messages using a SQL UPDATE statement, and the DB will handle locking the table during that UPDATE.

Note that there are modules that might stop building session ids for anonymous user sessions...
Since there are some systems that ignore sessions, building sessions leads to an enormous amount of sessions for one single system that e.g. grabs the site... I might be wrong but this is one of the features of the new session handling system of D7...

I still don't get why not using a single uniqid. something like md5(mt_srand());
BTW: if we add a table that has a primary key for each sending cron run (that sends data), we would have our unique cron-run-primary-id.

Yes, a clean update will lock enough cleanly.

So here we go -- looking forward to hear from you in 3 weeks or so with something like an update? ;-)

Version:6.x-1.x-dev» 6.x-2.x-dev

Will someone help us introducing a lock to reduce race conditions?
I'd be happy to improve the situation if possible for the next 6--2 release.

I'm finally doing some real work with this module and have made some tweaks to the previous patches I submitted. I'll post a new one next week.

StatusFileSize
new4.21 KB

Updated patch attached. Very similar to the previous patch, but uses a random number for each process.

I have tested this with multiple SMTP servers and a module I wrote to send simplenews mail to multiple servers at once. These two modules can be grabbed here if you'd like to try testing with them:

http://github.com/brockboland/multi_smtp
http://github.com/brockboland/simplenews_threaded_send

(once I finish and polish those two, I'll use them to apply for a CVS account, so hopefully they'll be on d.o soon)

Version:6.x-2.x-dev» 6.x-1.x-dev
Status:Needs work» Needs review

BrockBoland: Very nice! Well abstracted setup.I'll need some time to review it.

Since you're changing architecture (add columns in DB) i'd really like to fix this in 6.x-2.x. Did you switch back to 1.x by intention?
This sending part is pretty critical and needs thorough testing to be replaced for 1.x. If possible i'd like to avoid touching the database structure in 1.x at the present state.

miro_dietiker: I did switch it back to 1.x intentionally, because that's the version I've been working against. I haven't looked at 2.x at all yet, but it should be easy enough to port over. Should we open another ticket for 2.x? I'm not sure how this is usually managed, but since I'm launching a production site in a couple weeks, I need the 1.x fix for right now.

BrockBoland,
I just realized that there happened a complete overhaul for D7 in:
#338015: Improve simplenews throttle

I think we should combine efforts and think about the different approaches.

miro_dietiker: I don't have time available to work on the D7 version right now, sorry.

BrockBoland, i'm also interested in a suggestion for a D6 merge. We could port it to a D7 upgrade later.

Version:6.x-1.x-dev» 6.x-1.3

The same thing happens *without* using poormanscron. We use regular cron + cron.php and still, all mails get sent twice (one to the users mail and one to the sitemail). So this might not only be an issue with poormanscron

One to the sitemail?
Simplenews doesn't send mails to the sitemail. There must be some other module that does this adjustment / redirection.

Note that there's an issue wit subscription duplication on node edit after choosing "send" once. Do NEVER edit a node and hit save as soon as you started sending.

Hi,

Can you tell me if the latest version of your module still has the same problem of multiple Email sending ?

Thanks.

Version:6.x-1.3» 6.x-2.0-alpha1

Please check / test 6.x-2.x and report your experience.

README for 6.x currently says to use poormanscron, but on the project page is says don't and refers here. Just thought a heads-up might be useful.

Version:6.x-2.0-alpha1» 6.x-2.x-dev

Issue tags:+patch
StatusFileSize
new4.86 KB

Hi,

Here is another patch against 6.x-2.x using lock_acquire to use a real critical section for getting the mails and setting an in-between status.

Note:

This patch is untested as I ran out of time with it, but it _should_ work and be pretty easy.

Please test it and apply it (if it works good enough).

Best Wishes,

Fabian

It goes to deep into the sending engine, I'll let someone more experienced with Simplenews review it.

@Miro, Sutharsan, whenever you have the time...

Doesn't look that bad to me.

However there's some discussion to support multiple senders processes for horizontal scaling in a separate issue.
This needs some additional identifier of a sending process to make sure things proceed well if multiple servers acquire progressing newsletter issues (horizontal scaling).

In any case this needs much more review since the sending process is very critical to simplenews.

Hallo Miro,

I do agree that this needs much more testing and careful consideration.

However note that this will already scale very well with several processes as each process will grab and mark some bunch of mails in an atomic way (that is why a critical section is used).

Unless you need reporting on "who is processing what" an additional process identifier is not necessary and won't help anyway.

Best Wishes,

Fabian

StatusFileSize
new5.57 KB

Added some comments, removed some trailing whitespaces and did some other small changes to patch from comment #32
Tested with 2 processes which sending 20000 mails concurrent and it worked fine.

Issue tags:-patch

There seems to be some issue with your patch, as I can't download it (and therefore, not review it ;)). Maybe the # in the name ?

StatusFileSize
new5.57 KB

Hi Simon, you are right.
Sorry that i used a character in filename which should not be used.
So here is the patch from comment #36 again, but now without '#' char.

Status:Needs review» Needs work

I didn't mention it earlier, but as the patch is using the lock api, shouldn't we add a requirement that Drupal 6.16 is needed (at least in the documentation, but, better, using hook_requirements) ? There still are a lot of older installations.

I now agree with this patch generally. (Already discussed with vasudeva here about it.)

Simon, i agree with that to add it in the documentation.
A requirement hook would be an option but still is work. ;-)

I think if people update to SN 6.x-2.x we can expect/enforce them to update drupal and not check everything everywhere.
(drupal won't break. cron would break...)

Regarding the $limit, i'd prefer a $limit = NULL for an unlimited case.
And we even should switch to db_query_range for limited results and db_query without LIMIT to fetch everything.
Using LIMIT in query string is NOT db independent.

Note there are some lack in the documentation. Please follow the standards right with the doc notations.
(first line describes the function in one sentence. then an empty line. then the details. return format needs to be defined cleanly with @return.)

@Miro, I agree with all of your remarks.
@Vasudeva, if you do the modifications suggested by Miro, you can count on me to commit it.

StatusFileSize
new5.83 KB

Ok the work described by Miro in comment #40 should be done.
And there is a change in simplenews_count_spool function. I removed timestamp condition from sql statement because failing which mails that have state "in progress" are counted as sent.
So here is the new patch.

Status:Needs work» Needs review

Needs review, then ! ;-)

Status:Needs review» Fixed

Committed to repo after cleaning up some api documentation error.

Please open a new issue if you still want to improve scaling with e.g. additional features.

Poormanscron should now be supported cleanly.

Thank you guys for your help on this one!

Version:6.x-2.x-dev» 7.x-1.x-dev
Status:Fixed» Patch (to be ported)

Shouldn't we port that ?

Yes we should... But we currently don't have time enough to port it.

Your help appreciated.

Title:Multiple emails revisited - 50% too manyUse locking to eliminate duplicate emails from concurrent simplenews cron runs.
Status:Patch (to be ported)» Needs review
StatusFileSize
new33.21 KB

Applied the patch in #42 to a current 7.x-1.x checkout and made corrections, mostly in the doxygen comments.

Needs testing.

Great!

@Vasudeva: Thanks for the clean up work!

And @Miro: Thanks for commiting it!

Status:Needs review» Needs work

pillarsdotnet: thank you for the port. could you please pull, merge and provide a new patch?
please don't mix code formatting with added value. Merges are very likely to fail and it's more work to merge manually.

Status:Needs work» Needs review
StatusFileSize
new25.02 KB
new5.26 KB

Re-rolled and broken into code-only and cleanup-only patches. Must be applied in that order.

Status:Needs review» Needs work

Pillarsdotnet, thank you very much.

I hope you can understand that this helps me a lot to review things in limited time.

Committed the functional code, while the cleanup after failed to apply... What's wrong with it?
Can you please take another look at it?

BTW: Using a git sandbox project on d.o where you start as a fork would be VERY handy...
Without patching i could simply adding your source as origin and merge... you could even use branches for specific topics.

Re-merged.

The problem with sandbox projects is that they're forever. You can't delete them when you're done. I already have one sandbox project that is totally irrelevant, has no code and no releases, and I can't delete it. I find that very annoying and I don't want to repeat the experience.

However, if you give me access, I'd be willing to commit per-issue branches to the simplenews tree, which you could then merge or delete as you see fit.

Status:Needs work» Needs review

Status:Needs review» Fixed

Committed, pushed.

Thank you very much. I really hope we can improve the (self-)documentation of the module in the D7 release.

d.o git repositories and workflows need a lot of tweaking... I strongly hope we'll have improved workflows very soon. Please join the git improvement discussion and make suggestions to help us improve the platform experience.

Please join the git improvement discussion and make suggestions to help us improve the platform experience.

Do you have a link?

Status:Fixed» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Version:7.x-1.x-dev» 6.x-2.0-alpha2
Category:bug» support

Hi guys...
Just a fast simple question... at the end the date of the zip at the download for dp6 is 2011-Apr-17. but I have read out this thread and your last message was written in June... I understand there is transaction 'collision' between cron and sn. anyway is there any other problem with using the 2 module on paralell? I need poorman and would like to use a mail modul... especial this sn if it's possible :) 2 outgoing mails could be bad but using sn at a low traffic site wouldn't be too much problem if I understand exactly... anyway 2 mails is better than nothing :)
hm... I had worked some db and we usually use 2 field to detect these type problems... 1 for the state 'the record is in edited(example sending) mode' and 1 for the date/time for the editing start...
we usually start the job with a check for the edited states/times... if its date is bigger than x(1-2 hour?) it probably doesn't run.... But I don't know so much mysql tranasctions. This may be bad solution there...

Version:6.x-2.0-alpha2» 7.x-1.x-dev
Category:support» bug

This was a clear bug report and a fixing history. And it went through Drupal versions finally fixed. Even if you think you have a support question about this issue (originally bug), you're hijacking it!

Create a new issue with your support question!