I tried to install Panopoly which failed (#1837312: Troubles Implementing Default Permissions for Panelizer).
The anoying thing is that log in hostmaster was not helping.

There are a lot of green lines, and some white. But none red...

After the standard site setup stuff it gets interesting with:

Backend invoke: php /usr/share/php/drush/drush.php --backend=2 --yes @panopoly2.dev.example.com provision-install-backend --client_email='webmaster@example.com' 2>&1

But then the inconspicuous line:
Undefined index: login_link install.provision.inc:60

And provision begins to clean up the site.

Is this related to #1166530: Aegir use of $integrate = FALSE in drush backend invoke?

If we don't want to integrate the full backend output, could we at least print a summary line to state a failure?

Comments

anarcat’s picture

We should get the full log in the frontend, that's for sure.

helmo’s picture

This is still bugging me every now and then.

A usecase:

The http://drupal.org/project/hosting_site_backup_manager module.
When you run a 'Export backup' task a second time it fails.

The reason is clear, the link this task is trying to create already exists. (I should aad a test for that ;))

In the aegir frontend I only see green and white lines. But still the task itself is 'Failed'.

When I run the same task on the commandline:

$ drush --debug @www.example.com provision-export_backup /var/aegir/backups/www.example.com-20130503.144907.tar.gz
Bootstrap to phase 0. [0.01 sec, 3.04 MB]                            [bootstrap]
Drush bootstrap phase : _drush_bootstrap_drush() [0.02 sec, 3.21 MB] [bootstrap]
Loading drushrc "/var/aegir/.drush/drushrc.php" into "home.drush" scope. [0.02 sec, 3.22 MB]                      [bootstrap]
Loaded alias @www.example.com from file /var/aegir/.drush/www.example.com.alias.drushrc.php [0.03 sec, 3.23 MB]       [notice]
Loading drushrc "/var/aegir/platforms/myplatform/sites/www.example.com/drushrc.php" [bootstrap]
into "site" scope. [0.03 sec, 3.23 MB]
Loading drushrc "/var/aegir/platforms/myplatform/sites/www.example.com/drushrc.php" [bootstrap]
into "site" scope. [0.05 sec, 4.34 MB]
Cache HIT cid: 5.8-commandfiles-0-79226bd154e24f6bec355c2411ac7322 [0.06 sec, 4.35 MB]                                                 [debug]
Bootstrap to phase 0. [0.15 sec, 9.65 MB]                                                                                          [bootstrap]
Bootstrap to phase 0. [0.21 sec, 9.65 MB]                                                                                          [bootstrap]
Found command: provision-export_backup (commandfile=hosting_site_backup_manager) [0.22 sec, 9.65 MB]                               [bootstrap]
Loaded alias @myserveralias from file                                                                               [notice]
/var/aegir/.drush/myserveralias.alias.drushrc.php [0.74 sec, 9.68 MB]
Loaded alias @server_master from file /var/aegir/.drush/server_master.alias.drushrc.php [1.17 sec, 9.7 MB]                [notice]
Loading mysql driver for the db service [1.3 sec, 9.82 MB]                                                                            [notice]
Loading apache driver for the http service [1.31 sec, 9.91 MB]                                                                        [notice]
Loading mysql driver for the db service [1.31 sec, 9.92 MB]                                                                           [notice]
Loading apache driver for the http service [1.31 sec, 9.92 MB]                                                                        [notice]
Loaded alias @myplatformalias from file                                                                     [notice]
/var/aegir/.drush/myplatformalias.alias.drushrc.php [1.57 sec, 9.93 MB]
Start export backup command [1.72 sec, 9.99 MB]                                                                                    [ok]
Calling chdir(/var/aegir/backup-exports/myclient) [1.72 sec, 9.99 MB]                                                   [debug]
Executing: ln /var/aegir/backups/www.example.com-20130503.144907.tar.gz .
  ln: creating hard link `./www.example.com-20130503.144907.tar.gz': File exists
Calling chdir(/var/aegir/backups) [1.72 sec, 9.99 MB]                                                                 [debug]
Drush was unable to export the backup.                                                                                             [error]
The specific errors are below:
ln: creating hard link `./www.example.com-20130503.144907.tar.gz': File exists [1.72 sec, 9.99 MB]
Command dispatch complete [1.72 sec, 9.94 MB]                                                                                         [notice]
Peak memory usage was 11.2 MB [1.73 sec, 9.94 MB] 

So the drush_shell_cd_and_exec() call gives an error, the else branch of the if logs it. But the output is lost....

anarcat’s picture

Priority: Normal » Critical

I wonder if this is related with #1982502: Reduce complexity of backend invoke; consider removing concurrency - it seems that between drush 5.4 and 5.5, DRUSH_BACKEND packets started being redirected to stderr (see comment #4). In the debian package, I fixed this problem by merging stderr to stdout (see #2002076: the debian package shows a lot of drush backend garbage).

Also, I believe this is a critical bug that should be fixed before 2.0 is released.

anarcat’s picture

Status: Active » Needs review

i got a patch that sends the drush packets in the right place in #1982502: Reduce complexity of backend invoke; consider removing concurrency

helmo’s picture

I also assumed that this was caused by #1982502: Reduce complexity of backend invoke; consider removing concurrency

I now have a Debian Wheezy, php 5.4, Drush 6.x-dev(with drush-correct-minor-defects-1982502-40.patch) , Aegir 6.x-2.x,

A task has a failed status, but no red log lines.

From the command line:
drush @hostmaster hosting-task <task-nid> --force

Gives no output lines there. Viewing the task node now does include the error lines.

A retry initiated from the frontend results in a log without the error lines.
---
Disclaimer: I'm messing with devshop and valkyrie

helmo’s picture

Running such a task via the regular cron hosting-dispatch also works OK.

So I guess it's a hosting-queued daemon issue

Cron does: drush.php '@hostmaster' hosting-dispatch

Queued does: php /usr/bin/drush @hostmaster hosting-queued

Do these invoke in a different way?

ergonlogic’s picture

In hosting-queued, we invoke like so:

drush_invoke_process('@self', 'hosting-task', array($task->nid));

whereas in hosting-dispatch, we invoke a queue command (hosting-tasks), which in turn does:

drush_invoke_process('@self', "hosting-task", array($task->nid), array('invoke' => TRUE), array('fork' => TRUE));
ergonlogic’s picture

First I tried just changing hosting-queued to invoke like hosting-dispatch in hosting_tasks_queue():

drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE), array('fork' => TRUE));

This resulted in getting the error output into the front-end task log, but resulted in an error in queued:

This task is not queued, use --force                                 [error]

So it looks like queued is trying to run the task twice. In fact, this is what happens when we pass '--force':

drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE, 'force' => TRUE), array('fork' => TRUE));

If we remove the 'fork' backend option, we lose the output in the front-end task, but the task doesn't try to run twice:

drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE));

If we instead set the 'fork' backend option to FALSE, we don't get the output in the front-end, but we get a bunch of output from hosting-queued:

drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE), array('fork' => FALSE));

I'm not really sure what's going on here, nor how to fix it.

ergonlogic’s picture

First I tried just changing hosting-queued to invoke like hosting-dispatch in hosting_tasks_queue():

drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE), array('fork' => TRUE));

This resulted in getting the error output into the front-end task log, but resulted in an error in queued:

This task is not queued, use --force                                 [error]

So it looks like queued is trying to run the task twice. In fact, this is what happens when we pass '--force':

drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE, 'force' => TRUE), array('fork' => TRUE));

If we remove the 'fork' backend option, we lose the output in the front-end task, but the task doesn't try to run twice:

drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE));

If we instead set the 'fork' backend option to FALSE, we don't get the output in the front-end, but we get a bunch of output from hosting-queued:

drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE), array('fork' => FALSE));

I'm not really sure what's going on here, nor how to fix it. The Drush documentation of these invoke commands is pretty opaque.

helmo’s picture

I just tried "changing hosting-queued to invoke like hosting-dispatch in hosting_tasks_queue():"

diff --git a/queued/hosting_queued.drush.inc b/queued/hosting_queued.drush.inc
index d4474c1..fe7abea 100644
--- a/queued/hosting_queued.drush.inc
+++ b/queued/hosting_queued.drush.inc
@@ -103,7 +103,7 @@ function drush_hosting_queued() {
       sleep(1);
 
       // Execute the task in the backend
-      drush_invoke_process('@self', 'hosting-task', array($task->nid));
+      drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE), array('fork' => TRUE));
       drush_log(dt('Finished executing task.'));
 
       // Delay for a configurable amount of time.

After starting the queue-daemon manually with --verbose:
I also saw the "This task is already running, use --force" error three times.. also three invokes for the task... odd, but the task succeeds.

Setting the hosting_queued_post_task_delay variable to 30 seconds, prevented these from occurring.

So... concurrency?

the queue daemon loops... finds a task, forks, and restarts the loop to look for a new one....
The "This task is already running, use --force" seems to be what's saving us from having the task executed multiple times... a good thing.

Setting the hosting_queued_post_task_delay variable to 2 seconds, reduced it to just one already running error.

So this side effect of forking seems to be just a race condition that's already being caught. drush_hosting_task_validate updates the task status, maybe that can be done earlier.... ? But more important, we do get the full log.

ergonlogic’s picture

Status: Needs review » Fixed

Okay, so I think we've identified the proper solution here. So this is fixed 850e53cb. We've also identified a new bug, which I've documented in #2025173: Possible race condition with _hosting_get_new_tasks().

Feel free to revert the commit and re-open this issue, if anyone feels we've taken the wrong approach.

anarcat’s picture

Status: Fixed » Needs review

So i changd that to --quiet instead of fork, because that's what we were looking for. This still needs some testing, but forking definitely is not the right way to do this. To quote my commit:

commit 8b85ad495fcb720fbd062c8098daf9a02a965b0b
Author: Antoine Beaupré <anarcat@koumbit.org>
Date:   Fri Jun 21 15:30:08 2013 -0400

    don't fork when we fire up tasks with the queue daemon

    the whole idea of the queue daemon is to process one task at a time
    and wait for them to be processed. if we fork, we lose track of them
    and they will pile up like crazy, and we really don't want that.

    instead, use --quiet to start the drush sub-process, which is the
    side-effect of fork=TRUE that we were looking for. masochists can look
    in _drush_backend_invoke() in drush 5's backend.inc (line 984) for
    this marvel of engineering.

diff --git a/queued/hosting_queued.drush.inc b/queued/hosting_queued.drush.inc
index fe7abea..855dd00 100644
--- a/queued/hosting_queued.drush.inc
+++ b/queued/hosting_queued.drush.inc
@@ -103,7 +103,7 @@ function drush_hosting_queued() {
       sleep(1);

       // Execute the task in the backend
-      drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE), array('fork' => TRUE));
+      drush_invoke_process('@self', 'hosting-task', array($task->nid), array('invoke' => TRUE, 'quiet' => TRUE));
       drush_log(dt('Finished executing task.'));

       // Delay for a configurable amount of time.
anarcat’s picture

Status: Needs review » Needs work

so this doesn't work, so there's got to be another magic thing going on here.

ergonlogic’s picture

Unfortunately, that doesn't work. Here's the output after that change:

php /usr/share/drush/drush.php  --backend=2 --invoke --quiet --verbose --debug --uri=aegir2.aegir2.local --root=/var/aegir/hostmaster-6.x-2.x  hosting-task 212 --context_type=site --platform='@platform_hostmaster' --server='@server_master' --db_server='@server_localhost' --site_path=/var/aegir/hostmaster-6.x-2.x/sites/aegir2.aegir2.local --site_enabled --language=en --client_name=admin --redirection= --cron_key= --profile=hostmaster 2>&1

and here's what was happening with 'fork':

Calling proc_open(php /usr/share/drush/drush.php  --invoke --verbose --debug --uri=aegir2.aegir2.local --root=/var/aegir/hostmaster-6.x-2.x  hosting-task 212   --context_type=site --platform='@platform_hostmaster' --server='@server_master' --db_server='@server_localhost' --site_path=/var/aegir/hostmaster-6.x-2.x/sites/aegir2.aegir2.local --site_enabled --language=en --client_name=admin --redirection= --cron_key= --profile=hostmaster 2>&1 --quiet &);

The proc_open() is apparently what's making this work. So here's what happens when we replace 'fork' with 'interactive':

Calling proc_open(php /usr/share/drush/drush.php  --invoke --verbose --debug --uri=aegir2.aegir2.local --root=/var/aegir/hostmaster-6.x-2.x  hosting-task 212   --context_type=site --platform='@platform_hostmaster' --server='@server_master' --db_server='@server_localhost' --site_path=/var/aegir/hostmaster-6.x-2.x/sites/aegir2.aegir2.local --site_enabled --language=en --client_name=admin --redirection= --cron_key= --profile=hostmaster > `tty` 2>&1);
sh: cannot create /dev/pts/1: Permission denied

This appears to fail because of the redirect to 'tty' in interactive. Commenting out the relevant bit from _drush_backend_generate_command() in includes/backend.inc in Drush makes this work properly:

  else {
    // TODO: `tty` is not usable on Windows.  Is this necessary at all, and if so, is there a better way to do it?
//    if (!drush_is_windows() && !empty($backend_options['interactive'])) {
//      $command .= ' > `tty`';
//    }
  }
anarcat’s picture

I have opened #2025527: > `tty` is evil and killed my cat for this, let's see if we can factor this into drush 5, even as a custom package or something...

So note that this affects only hosting-queued, and only if it is ran through sudo aegir. If you fix your terminal so that aegir can actually write to `tty`, you should be able to use #interactive without problems.

ergonlogic’s picture

Status: Needs work » Postponed

This is fixed in a33d62da4e by switching to use 'interactive' mode. Unfortunately, it requires the patch in #4 of #2025527: > `tty` is evil and killed my cat. Marking as 'postponed' pending resolution of that issue, so we don't lose track of this.

Steven Jones’s picture

Can we instead introduce something into Drush that does something like 'interactive' but only does the thing that we want, which I think it returning all the output at the end rather than sending packets as the command runs?

ergonlogic’s picture

Is there some benefit to getting all the output at the end? The AJAX stuff you did for Aegir 3.x, that allows our task log to be updated live is really cool. Wouldn't switching away from packets break this?

ergonlogic’s picture

Status: Postponed » Needs review

#2025527: > `tty` is evil and killed my cat has been committed to Drush 8.x-6.x, and I've requested it be backported to Drush 5. Perhaps we should revisit #2002804: [meta] depend on Drush 8.x-6.x?

anarcat’s picture

Status: Needs review » Postponed

I would prefer lobby for inclusion in drush 5.10.

anarcat’s picture

Status: Postponed » Active

so, what's the status here? we're waiting for drush 5.10?

ergonlogic’s picture

I believe so. Moshe want a 5.10 too, so it'll happen sooner or later. The issue to track is https://github.com/drush-ops/drush/issues/71

helmo’s picture

Status: Active » Fixed

The 5.10 drush release is there... so whatever drush major version we ship with this issue should be fixed now right?

anarcat’s picture

let's assume so, we'll see soon enough...

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

cleanup