Closed (fixed)
Project:
Hosting
Version:
6.x-0.4-alpha3
Component:
Code
Priority:
Critical
Category:
Bug report
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
9 Jun 2009 at 23:26 UTC
Updated:
18 Jun 2010 at 06:50 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
anarcat commentedComment #2
anarcat commentedSo this is rather weird. While trying to debug this, I was able to propagate a client_email setting from frontend to backend using the commandline:
Yet doing the same thing without running the task directly but doing it through the cronjob doesn't create the proper client_email in the drushrc:
Comment #3
anarcat commentedI tried to enable debugging in the backend to figure out what was going on. That had a really weird effect. Here's what I tried:
The backend command was called properly, but exhausted all available memory:
I tried doing an strace on it and it seemed it was locked in an infinite loop, but I am not sure. Here's the last snippet i got from it:
In fact, running in verbose, it seems the single_user_blog profile effectively loops:
So this is not necessarly related. Nevertheless: the install task runs fine (ie. the install profile is not passed to the backend) when ran without --quiet, so maybe the issue lies there.
Comment #4
anarcat commentedI confirm that the backend patch above fixes the issue. I'll try to figure out if it's just a matter of not using --quiet.
Comment #5
anarcat commentedA little context first... As you may well know, Aegir uses heavily the interprocess communcation features to propagate information between drush calls. One of the things that is send from the frontend (the hosting module) to the backend (the provision "command set") is the client_email when a site is created. This in turn is used to send the welcome email by the backend. After upgrading the frontend from Drupal 5 to Drupal 6, I noticed sites were not created properly anymore. The welcome link wasn't sent, the profile chosen wasn't applied and a bunch of other weird things are happening.
I think this is due to the backend being to picky or the frontend changing in a radical way that the backend is not expecting. I quickly reviewed (again) the changed between Hosting D5 and D6 and couldn't find anything really related to IPC, so I don't know how to fix the problem in the frontend. I can fix it in the backend, however...
So here's the patch that fixes the frontend/backend communication for me. I think this would need to be reviewed by Adrian and the Drush team so I'll put this up in the Drush queue, but this is highly Aegir-specific.
I wonder why that --quiet was there for in the first place...
Comment #6
anarcat commentedAdrian says the fix shouldn't be in the backend, so I'm throwing this back at hosting.
Comment #7
adrian commentedThis is working fine on my checkouts of HEAD hosting, HEAD provision, HEAD hostmaster and HEAD drush (although the changes in drush would not have any affect on this)
Comment #8
adrian commentedComment #9
anarcat commentedI just updated to cvs HEAD everywhere (but drush, which is installed as my 2.0 debian package) and I still get the bug:
Comment #10
anarcat commentedI've done what could be an interesting test: I tried applying the above patch, and run two more tasks. A verify of a site that was created before the task (with client_email = NULL) and an install of a new site. The install succeeded and has the proper client_email, but the verify didn't fix the client_email.
Just a shot in the dark i guess...
Comment #11
anarcat commentedI installed a new platform, verified it, and the issue went away.
Let's release this damn thing.
Comment #13
anarcat commentedThis bug is still present in 0.3. It is intermittent: Steve Parks was able to reproduce it in #524916: User not defined on new platform, but only momentarily. Still, he did reproduce it, and whether it's a local problem, a documentation problem or an actual bug in aegir or the IPC, the problem *does* exist and we need to find a way to fix it, or at the very least figure out exactly how and why it happens.
So I'm reopening that horrible issue again. It will manifest itself only by the login_url not showing up in the task log and the welcome email not being sent, amongst other things. There are various workarounds that can be used for the welcome email issue (such as #567094: Password reset link).
Comment #14
anarcat commentedComment #15
omega8cc commentedNot sure if this is proper thread, but just noticed something similar when new site is created using signup form: client receives that e-mail with subject "Your new site - domain - has been created." and with the password reset link, but another e-mail (probably with Aegir account details) is sent empty (no body at all). This is when "Automatically create user accounts for new clients." is enabled.
We are using latest 0.3 version.
~Grace
Comment #16
Anonymous (not verified) commentedSo been tearing my hair out trying to debug this (especially hard since i've never reproduced it). I notice in drupal core's modules/system/system.install beginning line 364 or so we have this:
This is only midway through the install. Now, when this is occuring for users, the task output continues on, i.e it doesn't break, but completes with missing steps.
The interesting part is that the above gets run during provision's install, where it runs profile specific tasks etc prior to doing other things (such as mail the login url and welcome message, whcih is being missed)... then it continues on to do other system specific stuff like restart Apache etc.
Where I am going with this is that though the task log 'completes' but with missing tasks is misleading: I am wondering if the 'install' specific task is failing midway (such as max_execution_time or something being exceeded), and then regular tasks like the Apache restart carry on, since it is a linear series of function calls.
Or would we see the php exception for something like the above, max execution time / max memory limit, in the task log? I may be clutching at straws. But I've been thinking of this since debugging a user's problem where this occured, and he was using openpublish or one of those Drupal distributions, and his task output took over 80 seconds.. it didn't occur (at least with me watching over the server's proverbial shoulder threateningly :) ) when provisioning a site on a regular Drupal 6.13 install. it made me wonder whether that's what we're seeing.. that is, the regular drupal install borking midway and provision carrying on with its job.
Comment #17
omega8cc commentedMy case http://drupal.org/node/486934#comment-2006872 is Open Atrium platform with Pressflow core. Yes, provisioning takes more time than vanilla Drupal, but I have enough high limits set for php timeouts and memory, so I doubt it is the reason of welcome e-mails been sent empty (only header without e-mail body) since it happens with every OA site created - 100% duplicatable. Didn't had time to debug this, yet, but I think it can be fairly simple mistake - typo or wrong variable used etc.
~Grace
Comment #18
Anonymous (not verified) commented@omega8cc sorry my musings were more so on the 'placeholder-for-uid' and total lack of onetime login reset url than of your issue of an empty e-mail. I've never actually seen that occur on any openatrium sites I've provisioned.
The blank email sounds like a case of drupal_mail being executed, but it doesn't find the right module/hook to get the details for (install_welcome_send_mail() executes in provision/platform/drupal/install_6.inc but maybe doesn't get the variables from install_mail() )
Out of curiousity, do your platforms live in /var/aegir/platforms/ i.e /var/aegir/platforms/openatrium ?
I'd be interested if the blank e-mail still occurs on a platform that is in /var/aegir/openatrium ?
Comment #19
anarcat commented@omega8cc - this seems to be a different issue. The one we are talking about here is that no email whatsoever is sent out. It also happens that the login link doesn't show up in the task log. So I think your issue is different.
Comment #20
adrian commentedAlso, @omega8cc
I'm not sure pressflow's improvements offer any benefit to open atrium installs, as they are more oriented towards anonymous users, which open atrium doesn't really have.
Even the db replication stuff is of negligible use because Aegir does not yet allow you to manage that, and you would need to modify Aegir to configure those things.
Comment #21
omega8cc commented@adrian
Before Pressflow I tried and used most of performance improvements available as patches for core in Drupal 5.x since I had to manage fairly big community sites - with more than 10k active users, more than 100k nodes and more than 1,5M comments.
I had to learn a lot from this and I tried almost every patch/advise available, so when I used Pressflow 6.x core the first time and just checked
diff -urpon vanilla Drupal and Pressflow 6.x, I noticed all of these improvements are in place and there is more to scale Drupal.Since I had to deal with many logged in users, as it is with OA, I know Pressflow helps to speed up also this part. Even vanilla OA with Pressflow core (no users, no content) runs faster than vanilla OA. Even one of those improvements - removed LOWER() from queries, so indexes and MySQL query cache starts to work far better, gives you better performance also for logged in users. It is one of the most important improvements for sites with hundreds or thousands of users, but is also noticeable on vanilla install.
~Grace
Comment #22
omega8cc commented@mig5 and @anarcat
I understand, but noticed this thread with "welcome email" problem in its subject and thought it may be related. In any case, I never experienced problem with system e-mails (with password reset link when the site is created) and only was curious why another e-mail with access details for user/client account in Aegir is sent blank. Will try to debug this later and post results, if any.
BTW: I don't use /var/aegir/* and all platforms are located in /data/distro/* while Aegir only, separate platform is located in /data/aegir/. This setup is to unify installs in different datacenters, with data disks mounted differently, so we always use /data symlink pointing to data disk. And all symlinks created by patched provision_apache.drush.inc - http://omega8.cc/dev/provision.patch lives in /data/u/*
~Grace
Comment #23
anarcat commentedFor the record, I have also received empty emails recently. I have no idea if the two issues (empty and no email) are related.
Comment #24
anarcat commentedat any rate, the key idea here is that we need a way to reproduce interprocess issues much more easily. normally, errors should be caught and returned to the task log, but for some reason, sometimes that doesn't happen. I have seen long before issues with memory that wouldn't show up in the task log, and there are probably other such conditions that are not properly handled.
please do test to death this issue, and find the way to reproduce it. any information will help.
Comment #25
steveparks commentedAfter a few weeks of not occurring at all, I've recently had the 'no admin user' problem a lot more frequently - but this time in a regular pattern.
It has happened on alternate installs of sites on d6 based platforms - so I install it and the setup of the admin user fails, then I delete that site, install it again and the admin user is setup properly and the email is sent. This pattern repeats.
To test mig5's comment at #16 I've upgraded my VPS to have more resources, and will see if the pattern continues.
Comment #26
Anonymous (not verified) commentedThanks Steve!
So when you say 'alternate installs of sites on d6 based platforms', do you mean using alternate 'distributions' such as OpenPublish, Pressflow, etc?
I did have a hunch it was specific to platforms with install profiles that are more intensive than a regular or even atrium install. Do let us know if the problem persists with giving your server more juice. See if raising max_execution_time etc helps (I'm mentioning that coz I'm not sure what tasks you did when giving your VPS 'more resources', but presume it was this and memory usage etc )
Comment #27
steveparks commentedHi mig5
By alternate, I mean the sites, not the platforms - and it can be on any platform. It happens on vanilla d6 as much as on atrium (I haven't tried with openpublish or pressflow). it always fails at first, but then works when i delete the site and try again.
Could it be connected to the creation of site nodes (don't know how) - or something else that only happens the first time a site is installed?
steve
Comment #28
anarcat commentedI'm seeing this bug... again! It's intermittent, on the same platform, some sites get created properly, some don't. I stumbled upon this bug here: #597738: We shouldn't provision install more than once on the same site.
Comment #29
vitis commentedsub
Comment #30
adrian commentedIf we don't find a way to reproduce this before the next alpha i'm going to close this issue, or at the very least bump it to minor.
Comment #31
adrian commentedYup. this is getting closed now.
if by some magical reason you run into it again. create a new issue and refer back to this one.