You have VCS write permissions as well as other permissions to this project:
http://drupal.org/project/httprl

Comments

gielfeldt’s picture

Thanks for the access to the project. I also got your mail about this.

I look into this when I get the time, however other pressing matters have emerged, so it probably won't be in the near future :-(

I'll get back to you on this. I might create a new branch in HTTPRL, as I suspect my expections/intentions for the API in HTTPRL might cause a major revision bump.

I've also received concerns about a "dependency hell". Background Process was made by abstracting reusable compononents from Ultimate Cron, so for Ultimate Cron the dependency list currently looks like:

Ultimate Cron -> Background Process
Background Process -> Progress

Which would then turn into:
Ultimate Cron -> Background Process
Background Process -> HTTPRL
Background Process -> Progress

Still well within the spirit of reusing modular components IMO, but I have to consider pragmatic requests.

Nevertheless, I'll look at putting my ideas into HTTPRL to make it more flexible and to gain advantage of the focus on the http requesting itself, which that project has.

mikeytown2’s picture

As for some ideas with HTTPRL, I was thinking of a callback parameter (for you) at the bottom and an alter hook at the top (for this #1325662: Usage and/or caching of gethostbyname).

I've done most of my planed dev for HTTPRL now; just got this bug to fix (#1354282: HTTPS connections going missing) and I will be releasing a 1.3 version of HTTPRL. Also note that 6.x and 7.x are the same.

mikeytown2’s picture

More dev work has been done and HTTPRL looks a lot closer to this project now. It supports callbacks and background callbacks after retrieving a HTTP request; background callbacks can be blocking or non-blocking. I also support calling any function in a blocking or non-blocking manner; if using blocking, pass-by-reference works.

We should look into merging the 2 projects together as we have the same goals.

gielfeldt’s picture

I haven't looked at HTTPRL lately, but if they are nearly identical, I agree, we should join them.

Sounds nice with pass-by-reference when using blocking. This could allow callers to parallelize function calls in a very transparent manner.

mikeytown2’s picture

mikeytown2’s picture

I would like to put all of httprl into your module. My ultimate goal is to have the http client code get into Drupal 8. httprl can operate at the database bootstrap level; supporting this in background process would be a good thing.

Some of the things httprl is currently missing is Pool management & Load balancing. I think both of these use cases can be solved in a 3rd party code by using hook_httprl_request_alter(); which allows you to alter the request right before the connection is made.

Let me know how you wish to proceed. I've skimmed your code; if you can do the same that would help both of us out.

gielfeldt’s picture

I'll look through the HTTPRL code. I use mainly Background Process for Ultimate Cron, which were the reason Background Process was made. If we are to separate the pool management and load balancing from the main module, I might need to coordinate a bit first. hook_httprl_request_alter() sounds interesting though. I think I actually tried this approach too to begin with, but then ran into problems, though I can't remember what? Perhaps I was just attacking it from the wrong angle.

I think it would be nice to the see support for non-blocking requests in core. I would have begun looking joining our modules sooner, but haven't really had the time. HTTPRL does all what I need for the http request part and seems more maintained there. I would though like for the API to be a bit more like Background Process.

The HTTPRL code gave me some issues when nesting requests due to the static variable storage of the requests making it a bit global'ish.

    // Set the blocking mode.
    $options = array(
      'blocking' => FALSE,
    );
    // Queue up the requests.
    $max = 10;
    for ($i=1; $i <= $max; $i++) {
      // Build URL to a page that doesn't exist.
      $url = httprl_build_url_self('asdf-asdf-asdf-' . $i);
      httprl_request($url, $options);
    }
    // Execute requests.
    $request = httprl_send_request();

    // Echo out the results.
    echo httprl_pr($request);

vs

    // Set postpone mode.
    $options = array(
      'postpone' => TRUE,
    );
    // Queue up the requests.
    $max = 10;
    $r = array();
    for ($i = 0; $i < $max; $i++) {
      $r[] = background_process_http_request(url('asdf-asdf-asdf-' . $i, array('absolute' => TRUE), $options);
    }
    background_process_http_request_process($r, array('limit' => 5));
    print_r($r);

Another thing is the locking. I see that HTTPRL sets a lock, but I can't seem to figure out what it's used for? I'll look more at the code later. I'm not sure yet, but perhaps the locking should be taken out of Background Process?

Anyways, I'll skim HTTPRL some more and get back to you.

gielfeldt’s picture

BTW, I just added you as a maintainer.

mikeytown2’s picture

I set a lock when making a background process request. I then check to make sure the lock is taken in the new process. Being extra paranoid as this allows any function to run; I release the lock in /httprl_async_function_callback see httprl.async.inc

Switching over to object code is something that I plan on doing; that way I can get rid of the statics.

gielfeldt’s picture

I can't remember if I mentioned this before, but there might be a problem using the lock api across requests, even when using memcache. The reason is the removal of the lock in the shutdown handler.

Consider the following scenario:

1.) request "A" sets a lock and issues a non-blocking http request (request "B").
2.) after issuing request "B", request "A" finishes.
3.) request "A" has finished before request "B" could check the lock. The lock is gone, due to shutdown handler removal in request "A".

I've also recently been made aware of problems, if the lock is db-based and set inside a transaction in request "A". This will make the lock invisible to request "B" on MVCC systems such as InnoDB with isolation level "repeatable read" (which is more or less the default way to go now in D7).

mikeytown2’s picture

I take care of the shutdown function

  // Acquire lock for this run.
  $locked = FALSE;
  $counter = 0;
  while (!$locked && $counter < 20) {
    $id = 'httprl_' . hash('sha512', mt_rand() . time());
    $locked = lock_acquire($id);
  }

  // Make sure lock exists after this process is dead.
  if (empty($mode)) {
    global $locks;
    unset($locks[$id]);
  }

I wait ~2.5 seconds in menu_callback for locks to show up. If this is an issue with "repeatable read" then locking in Drupal is broken... this table might want to default to myisam in that case.

  // Exit if the temp_key does not match a lock that has been taken.
  // Wait up to 2.5 seconds for the lock to propagate out.
  $tries = 0;
  while (lock_may_be_available($_POST['temp_key'])) {
    $tries++;
    if ($tries > 5) {
      httprl_fast403();
    }
    usleep(500000); // Sleep for 500 miliseconds;
  }
mikeytown2’s picture

Title: Rely on HTTPRL instead of having your own version of it inside of Background Process » Merge Projects: Replace Background Process's HTTP client with HTTPRL's.
gielfeldt’s picture

I agree the locking API could be deemed broken in transactional envs. But seeing as it is only lock_may_be_available() that doesn't work, and that the naming of the function implies some sort of unreliability, it might be considered "working as designed".

I ended up using my own table as locking mechanism, due to missing features in the locking API. However, perhaps we should focus on using the locking API, and fix the things there? There are currently some discussion on improving the locking API in D8 #1225404: Modern event based lock framework proposal.

Btw, why do you wait for up to 2.5 seconds for the lock? The lock is set before the http request is issued. Which kinds of delay/latency have you experienced (or are expecting)?

mikeytown2’s picture

bummer... the way I'm using locking with memcache works. Using this with core and it fails. Created an issue to fix this #1555154: Fix locking with core's lock.inc.

mikeytown2’s picture

Core lock issue fixed. Did the 2.5 second delay as a just in case; don't know if it is truly needed.

mikeytown2’s picture

HTTPRL can now use hook_boot to run functions or methods.
D7: hook_boot bootstrap level takes about 25ms. Full level takes close to 300ms.

Passing the session works in D7; going to see how hard it is in D6. Checkout the last 2 examples in httprl's readme to see what it can do http://drupalcode.org/project/httprl.git/blob/refs/heads/7.x-1.x:/README...

gielfeldt’s picture

Very nice. Calling background processes with only bootstrap level should indeed be available where possible.

Another idea a had, was perhaps being able to use closures instead of array programming.

http://www.htmlist.com/development/extending-php-5-3-closures-with-seria...

For example in Background Process it would perhaps look like:

<?php

background_process_start(function() {
      $x = db_select('system', 's')
        ->fields('s', array('filename'))
        ->orderBy('filename', 'ASC')
        ->range(0, 1)
        ->execute()
        ->fetchField();
      $y = db_select('system', 's')
        ->fields('s', array('filename'))
        ->orderBy('filename', 'DESC')
        ->range(0, 1)
        ->execute()
        ->fetchField();
      echo $x . "<br \>\n" . $y . "<br \>\n";
      unset($x, $y);
});
?>

Regarding the core lock fix, doesn't the system module clean up the expired entries in the semaphore table?

gielfeldt’s picture

I also have plans for making Background Process postgres compatible in the 2.x (which also involves making the progress module postgres compatible). And perhaps even embedding the Progress module into Background Process module, as I think it might be a bit over engineered.

We could then at the same time merge HTTPRL into Background Process 2.x? What do you think?

Regarding the merge, I'm not sure where to begin. You? Perhaps we should outline the joined feature set, and then redesign based on the two modules?

mikeytown2’s picture

Just got back from vacation. Looking at the PHP closure example; this looks like it would need to use eval() which I'm trying to stay away from at all costs. Having an eval example in the readme might be a good idea so people know it is possible if they still wish to use it.

For core locks I update the entry in the database; only if its a background one and only if using core. I have a cron job that cleans up locks if the process happened to die at an unfortunate spot.

I think HTTPRL is already postgres compatible. What issues are you having? I don't see any bug reports in your issue queue.

Listing the features both have is a good start. Hopefully I'll do that later on today.

gielfeldt’s picture

Regarding postgres compatibility, it's not the Background Process module as such, that's the problem. It's my Progress module where I had a mild case of brainfart and thus created columns named "start" and "end". It propagates a little into Background Process module, but it should actually be possible to just fix this in the Progress abstraction layer.

I try to stay away from eval() too. However in this case, the code used in eval() has already been parsed and accepted by the php compiler, so it "should" be safe :-). In any case, it's just syntax sugarcoating really, as it is just as easy creating a new regular function and just handing that as callback.

I'll try to work out a list as well.

mikeytown2’s picture

HTTPRL Feature List

Parallel HTTP Requests:

  • More Complete/Better HTTP Client than what's in core.
  • Blocking (wait for it)
  • Non Blocking (do not wait for it)
  • Callbacks (code runs in the event loop; blocking)
  • Background Callbacks (code runs in a new process)
    • Blocking
    • Non Blocking





Parallel Function/Method Calls:

  • Blocking (wait for it)
  • Non Blocking (do not wait for it)
  • Chaining calls
  • Error reporting
  • Capture output
  • Timeouts
  • Pass-by-reference works
  • Operates at the hook_boot level





Other Features:
httprl_background_processing()

  • Output text, close connection, continue processing in the background.

httprl_strlen()

  • Get the length of a string in bytes.

httprl_glue_url()

  • Alt to http_build_url().

httprl_get_server_schema()

  • Return the server schema (http or https).

httprl_pr()

  • Pretty print data.

httprl_fast403()

  • Issue a 403 and exit.

httprl_set_user()

  • Set current user to given UID





Some notes:
Due to the way the code grew organically, the callback code is split up into 3 different ways to do it. I would like to standardize on one way, only using httprl_run_array() as this gives the most options and is not that complex.

Adding in more context helping functions is also a goal; so that background functions will have the correct context passed to them.

gielfeldt’s picture

Hi Mike

Sorry I haven't replied yet, I've been busy with other things. I'll try to make room for this.

Another thing I've been pondering about, is actually to decouple the HTTP stuff from background (exact opposite of merging our two projects). The reason for this is to abstract it in such a way that a background process can be started in several ways depending on your setup.

* Through HTTP request to a Drupal menu callback (like things are today)
* Through Drush
* Through Fast CGI server
* Other potential backends ...

If this is a way to go, then we would probably need to merge background processes http stuff into HTTPRL and extend HTTPRL to also implement a "backend" for Background Process.

What do you think of this idea?

mikeytown2’s picture

I've thought about this. stream_select() and proc_open() play together quite nicely; this gives us drush/shell parallel calls as well as HTTP parallelism (from stream_socket_client()).

Fast CGI server seems like a HTTP request, or did I miss something? As for other back ends, shell commands would make it not super difficult to do.

Given this, the core of what we're looking for revolves around stream_select(). I would opt to bring in support for proc_open() to HTTPRL's core. Not exactly sure how to implement it but here is how drush is doing it _drush_backend_proc_open().

gielfeldt’s picture

An overview of Background Process features:

* Execute arbitrary functions in background either through direct call (http request) or using queue.
* Near drop-in-replacement of drupal_http_request() for making parallel http requests supporting sliding window technique as well.
* Load balancing and pool management for background processes.
* Class API for starting more advanced background processes (such as setting user, session, etc.)
* Automatic cleanup of stale/dead processes using timeout and Apache Server Status.
* Drop-in-replacement for Drupal Batch API, making batch jobs run through background processes.
* Keepalive/restart support for processes, allowing creation of daemons.

Regarding Fast CGI, I must admit I don't know if it's an HTTP request as such. I was thinking more of the separation Fast CGI vs Drupal menu callback.

gielfeldt’s picture

Just thinking out loud:

<?php

// Load nodes parallel in background
$processes[] = BackgroundProcess::lock()->node_load(1);
$processes[] = BackgroundProcess::lock()->node_load(2);
$processes[] = BackgroundProcess::lock()->node_load(3);
$processes[] = BackgroundProcess::lock()->node_load(4);
$processes[] = BackgroundProcess::lock()->node_load(5);
$processes[] = BackgroundProcess::lock('test')->node_load(6);

// May or may not load depending on how fast node_load(6) finishes.
$processes[] = BackgroundProcess::lock('test')->node_load(7); 

// Wait for results
$nodes = background_process_wait($processes);
?>
gielfeldt’s picture

Just thinkin out load some more (this bit is not as "magical" as the previous post, but supports queueing and class/object calls):

<?php

// Load nodes parallel in background
$processes[] = BackgroundProcess::lock()->call('node_load', array(1));
$processes[] = BackgroundProcess::lock()->call('node_load', array(2));
$processes[] = BackgroundProcess::lock()->call('node_load', array(3));
$processes[] = BackgroundProcess::lock()->queue('node_load', array(4));
$processes[] = BackgroundProcess::lock()->queue('node_load', array(5));
$processes[] = BackgroundProcess::lock('test')->call(array('myclass', 'myfunction'), array(1, 2, 3));

// May or may not load depending on how fast previous myclass::myfunction(1,2,3) finishes.
$processes[] = BackgroundProcess::lock('test')->call(array('myclass', 'myfunction'), array(1, 2, 3));

// Wait for results
$nodes = background_process_wait($processes);
?>

Todo: How to fit in pass-by-reference vars in this scenario?

mikeytown2’s picture

HTTPRL does pass-by-reference for all variables. I have a couple bits of code that helps with this process:
httprl_recursive_array_reference_extract() & Pass by reference trick for call_user_func_array(). In short, don't worry about pass by reference, that issue has been solved (it wasn't easy though).

If we wanted something magical we would have to use something like DOH; although we might be able to steal some of it's magic and not require it. In the mean time what is the difference between call and queue and why would the second lock('test') call may or may not run?

Can you chain DB ops? That's the advantage I see with how httprl_run_array() is used.

gielfeldt’s picture

Cool what you've done with the reference stuff.

The example in post #25, I imagined using the __call method for intercepting. But that would limit us from calling class functions or object methods, and the only advantage of the __call() trick was syntax sugar. So I would still suggest something like post #26 for this.

Regarding the chain stuff ... well ... to be honest, it's a weird feature I think :-)
I mean why not just:

<?php
function _mymodule_clear_cache() {
  drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
  module_load_include('inc', 'system', 'system.admin');
  system_clear_cache_submit();
  watchdog('httprl-test', 'background cache clear done', array(), WATCHDOG_DEBUG);
}

BackgroundProcess::lock()->call('_mymodule_clear_cache');
?>

I think this is more readable/maintainable, and probably also faster because it can be compiled and opcode-cached.

Or (if we we're to do the closure stuff, which is probably unnecessary):

<?php

BackgroundProcess::lock()->call(function() {
  drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
  module_load_include('inc', 'system', 'system.admin');
  system_clear_cache_submit();
  watchdog('httprl-test', 'background cache clear done', array(), WATCHDOG_DEBUG);
});

?>
mikeytown2’s picture

Main issue is module code is not available at the hook_boot bootstrap level; that only happens once a full bootstrap has happened. We could use eval (closure stuff) but I try to avoid eval at all costs. The HTTPRL code could be refactored so it looks more like this.


$processes[] = BackgroundProcess::call()->function('node_load', array(1));
$processes[] = BackgroundProcess::call('a')->function('db_select', array('system', 's'));
$processes[] = BackgroundProcess::call('a')->method('fields', array('s', array('filename')));
$processes[] = BackgroundProcess::call('a')->method('orderBy', array('filename', 'ASC'));
$processes[] = BackgroundProcess::call('a')->method('range', array(0, 1));
$processes[] = BackgroundProcess::call('a')->method('execute', array());
$processes[] = BackgroundProcess::call('a')->method('fetchField', array());
$processes[] = BackgroundProcess::call()->method(array('myclass', 'myfunction'), array(1, 2, 3));

With your code, what are the differences between call and queue and why would the second lock('test') call may or may not run? Do we need call() & lock()? Anyway to make this as simple as possible? We should work on the ideal syntax :)

Also need to think about error handling and capturing printed output.

gielfeldt’s picture

call() sends an http requests for starting the background process. queue() creates an item in a queue which is then processed by cron (very useful, if not necessary, when having lots of background processes you want to start, and need to keep the server from congesting). Actually a bit like the job_queue module.

For each background process, I create a unique handle, so the process can be identified and maintained for cleanup, etc. The parameter in lock() makes you able to specify the handle, thereby disallowed further execution of the same background process if it is already running (thundering herd protection, basically).

We could make the lock() optional for the call(), I see no problem with that as such. call() could just do a lock() internally to make the inner workings more consistent.

About bootstrap. Yes, you're right. Unless the module implements hook_boot(). Then the code should be available I guess.

About the chaining. I really think the code has poor readability. I'm sad that serializing of closures isn't part of php, as this seems like something that should be default for such a feature. Regarding avoiding eval() at all cost, well then I partially agree. Eval() shouldn't be used lightly. However in this case where the code has been parsed and validated by the compiler before it's stored, it should be safe. And seeing now that it does serve a purpose, rather than just syntax sugar, I might lean more toward this.

I'll think some more about it. Perhaps I'll turn around and consider the chaining method after I've given it some more thought.

gielfeldt’s picture

What about being able to specify which file the code resides in? (like $menu['file'] and $menu['file path'] or similar). I think the oop chaining technique is easiest if we have a factory method. I'll use the lock() again as a factory method in this example:

<?php
BackgroundProcess::lock()
  ->phase(DRUPAL_BOOTSTRAP_PAGE_HEADER) // Defaults to DRUPAL_BOOTSTRAP_FULL
  ->include(drupal_get_path('module', 'mymodule') . '/mymodule.callbacks.inc')
  ->call('mymodule_callback_function', array(1, 2, 3));

?>

Thoughts?

mikeytown2’s picture

At this point why not just allow multiple calls? We could still have a phase and include method, but internally it would use the call method.

BackgroundProcess::lock()
  ->call('drupal_bootstrap', DRUPAL_BOOTSTRAP_PAGE_HEADER)
  ->call('include', drupal_get_path('module', 'mymodule') . '/mymodule.callbacks.inc')
  ->call('mymodule_callback_function', array(1, 2, 3));
gielfeldt’s picture

How does the pass-through of the results work? (Considering your previous example regarding the db_select()+chaining) ... And we beginning to reinvent the Batch API? (&$context stuff).

mikeytown2’s picture

In HTTPRL I use pass by reference to make it work. Hybrid example below.

$nodes = array();
$errors = array();
BackgroundProcess::lock()
  ->call('drupal_bootstrap', array(DRUPAL_BOOTSTRAP_FULL))
  ->call('node_load', array(1), array('return' => &$nodes[1], 'error' => &$errors[1]))
  ->call('node_load', array(2), array('return' => &$nodes[2], 'error' => &$errors[2]))
  ->call('node_load', array(3), array('return' => &$nodes[3], 'error' => &$errors[3]));

So the call method takes 3 parameters in this case. First one is the function name; second is the function arguments; and third are return values, errors thrown and printed output.

Mithrandir’s picture

$processes[] = BackgroundProcess::call()->function('node_load', array(1));
$processes[] = BackgroundProcess::call('a')->function('db_select', array('system', 's'));
$processes[] = BackgroundProcess::call('a')->method('fields', array('s', array('filename')));
$processes[] = BackgroundProcess::call('a')->method('orderBy', array('filename', 'ASC'));
$processes[] = BackgroundProcess::call('a')->method('range', array(0, 1));
$processes[] = BackgroundProcess::call('a')->method('execute', array());
$processes[] = BackgroundProcess::call('a')->method('fetchField', array());
$processes[] = BackgroundProcess::call()->method(array('myclass', 'myfunction'), array(1, 2, 3));

I don't understand this code. In which use case would you rather define a procedure to perform in a background process like this rather than with a wrapper function?

$processes[] = BackgroundProcess::call()->function('myFunction', array(1), array(1, 2, 3);
function myFunction($arg1, $arg2) {
  // Logic using the arguments
}

Advantages: Clear division of labour between the background process and the actual procedure performed.

I would hate to troubleshoot code that is coming from a list of function calls in the background process initiation. The whole idea of the background process is to take a simple input (what to call with what parameters) and offload it to a background process. The logic of "what to call" should (in my opinion - this is purely opinion-based arguing, I know) be a function or classname/method name and the arguments passed alongside it.
That way, it is easy to log, what is run in a background process, with which arguments, and easy to test run without a background process - just call the function with the same arguments.

I don't see any use cases for the BackgroundProcess::call('a')->method() calls above. In my opinion, they only add to confusion.

mikeytown2’s picture

Sounds good following the KISS principle is almost never a bad idea. So we have 3 methods.
phase() // Bootstrap level.
include() // File to include.
call() // Function to run.

With this we should be able to accomplish everything that is needed. I'm assuming that we are in agreement in that all parameters passed to the function will be handled like a pass by reference variable. Going back to #26 how do we want to handle output (returned value, errors, printed output) and multiple calls?

gielfeldt’s picture

I suggest we that the BackgroundProcess methods return an instance of an object.

I also suggest that the methods actually do what they imply, e.g. call() WILL perform an http-request. To this end we might need:

Initiate background process right away: $process->call();
Queue background process: $process->queue();
Initiate within in current request, but postpone for later to make use of e.g. sliding windows: $process->postpone();

<?php

// Initiate a background process in the three ways we can
$processes[] = BackgroundProcess::lock()->call('node_load', array(1));
$processes[] = BackgroundProcess::lock()->queue('node_load', array(2));
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(3));

// Wait for processes to finish, so that we may get the requests.
// Note: the process using ->postpone() doesn't begin until background_process_wait() is called.
$nodes = background_process_wait($processes);

?>
<?php

// Load 10 nodes in background, but only 2 at a time
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(1));
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(2));
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(3));
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(4));
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(5));
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(6));
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(7));
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(8));
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(9));
$processes[] = BackgroundProcess::lock()->postpone('node_load', array(10));

// Wait for processes to finish, so that we may get the requests.
$nodes = background_process_wait($processes, array('limit' => 2));

?>
<?php

// Load 5 nodes in background using pass-by-reference, but only two at a time
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(1, &$nodesref));
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(2, &$nodesref));
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(3, &$nodesref));
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(4, &$nodesref));
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(5, &$nodesref));

// Wait for processes to finish, so that we may get the requests.
$nodes = background_process_wait($processes, array('limit' => 2));

// $nodesref will contain the nodes, $nodes will contain the return values of mymodule_load_the_node(), whatever that might be.

?>
<?php

// Fast db stuff in background
BackgroundProcess::lock()
  ->phase(DRUPAL_BOOTSTRAP_DATABASE)
  ->include('mymodule.sdfsdf.inc')
  ->call('write_to_db', array('myvar1'));

?>

For the phase() stuff, in order to support an arbitrary bootstrap level, we could perhaps create a script within the background_process module, that does:

<?php


define('DRUPAL_ROOT', $_POST['DRUPAL_ROOT']);
chdir(DRUPAL_ROOT);

require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap($_POST['bootstrap']);

.... invoke the callback somehow

?>
mikeytown2’s picture

What's the difference between queue and postpone? They seem the same to me for the most part. Is there ever a reason why we need call? I think queue is really all we need. Queue up some requests and run them. If some need to be ran before others, we could add weights to it and have an options array that gets passed in if needed (for other things). Can you give me one good example where queue with weights won't accomplish the same goal?

The example that uses &$nodesref (Load 5 nodes in background using pass-by-reference, but only two at a time) is not a good one. We will have multiple things trying to write to the same variable when all 5 background threads return. We need to document that in order for pass-by-reference to work every reference variable needs to be unique.

// This is bad
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(1, &$nodesref));
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(2, &$nodesref));
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(3, &$nodesref));

// This is good
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(1, &$nodesref[1]));
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(2, &$nodesref[2]));
$processes[] = BackgroundProcess::lock()->postpone('mymodule_load_the_node', array(3, &$nodesref[3]));



Being able to execute code at the hook_boot() level (6 of 8) is a big step in the right direction. The lowest level we can boot into will be _drupal_bootstrap_variables() (level 4 of 8) because we need to make sure this function call is authorized. Given this we should implement hook boot. See httprl_boot for an example; what I do is run the same codepath as if it was a menu callback, except it is running at a lower bootstrap level now.

I like where we're going with this thread :)

gielfeldt’s picture

queue() is meant to push the job into a drupal queue, and then let cron handle the processing. The use case would be: I want to defer the call to this function, but it doesn't have to be right away.

call() : I want to call this function right away, but I don't necessarily wan't to wait for it.

postpone() : I want to call this function "right away", but I wan't to pile up a bunch first.

The call() and postpone() somewhat maps 1-1 to the logic I currently have in background_process_http_request(), which is the drupal_http_request() replacement, i.e., just for calling an arbitrary url (could be fetching 10 images from a remote server 2 at a time). A use case for postpone() in background process context, could be that you wanted to call X number of functions and use the results, but still have some congestion prevention by limiting to Y number of functions running in parallel at any given time.

Yes, the sliding window (5 nodes, 2 at a time) example, not a good one ... it could produce strange results when using pass-by-ref, if the caller doesn't take precautions like your new example does.

I still think we could "authorize" the call for levels < 4, since we're using a server-side generated token.

Regarding if we need call() and postpone(): Perhaps not internally. I've also been thinking in the lines of perhaps reusing the Batch API. The Batch API takes a number of functions and adds them to a queue which is picked up by the Batch subsystem. We could do the same? The subsystem that picks these functions up could then just be something else, triggered by cron or a menu callback?

mikeytown2’s picture

I know postpone() is useful as I use that same model in HTTPRL; what is the advantage of using call()? Is call() what I call non-blocking, as in I run it and I don't care about the return value?

I do see the usefulness of queue() if the queue can be processed soon (like start up a background queue worker in a shutdown function).

In terms of sub 4 level bootstraps, how would you accomplish it?
configuration level 1
page cache level 2
database level 3
variables level 4
The best I can see is doing database level 3 with the lock system from level 4. One could come up with more interesting ways of doing this (allow verified level 0 booting) but that will require a lot of custom code (use filesystem for verification) IMHO. Using lock.inc makes this fairly simple to implement from my point of view.

gielfeldt’s picture

Yes, that call() could be considered analogous to the non-blocking concept in HTTPRL. (Of course, technically the postpone() will also perform "non-blocking" requests in order to execute them asynchronously).

It might not be possible to execute a background process in bootstrap levels < 3, if we store the info for a process in the db (don't know where else to store it). Regarding use of the lock subsystem, I'm not sure this is mature/good enough at this time (I know of the trick HTTPRL does regarding unsetting the global lock to avoid cleanup). Besides, if we are to store the process info in table anyway, this could work as a lock.

I also currently have an issue when starting a background process within a transaction due to the tokens not being available in the subrequests (see this thread for more info).

I do see the usefulness of queue() if the queue can be processed soon (like start up a background queue worker in a shutdown function).

To keep it simple, I was just thinking of putting it into a cron queue, and then let a cron queue worker handle it during next cron. If speeder cron queue processing is needed, I would think using something like Ultimate Cron or similar.

mikeytown2’s picture

non-blocking in HTTPRL means it doesn't wait for the response ever; it can send 1000 non-blocking requests in about 100ms and 100ms is how long the script takes to run; if I were to do this in blocking mode it would send requests off in parallel but it would wait (block) for all of them to return. Thus using non-blocking in HTTPRL you can send off a job that will take 20 seconds and the current script will finish fairly quickly as it doesn't care about what was returned. Does call() and postpone() both do this (non-blocking/no-waiting) or only one? It's still not 100% clear to me. Parallel is different from Blocking/Non-Blocking. HTTPRL operates in parallel 100% of the time. Making sure we are speaking the same lingo here :)

I like the idea of using locking because it can then utilize memcache, but I'm not stuck on the idea. I do think using lock.inc is the right way forward though as it's core drupal and maps 100% to what we are trying to do (make sure only authorized requests are ran). I would be in support of filing a core bug and modifying the semaphore table in the mean time if locking doesn't work how we need it to.

Looking forward, this is how things are looking in D8 #1599622: Run poormanscron via a non-blocking HTTP request. Having a small sub module that can fire off a background queue worker is something to look into as not everyone will want to install Ultimate Cron.

gielfeldt’s picture

Lingo/clarification:

non-blocking for me means what it does for stream_set_blocking(), namely that e.g. fread() doesn't wait for data.

In my example, background_process_wait($processes) waits for the $processes to finished and returns the results. It does this by polling. You can see the logic in background_process_http_request_process(). This is used for http requests, for example if you want to download 8 images in parallel (or asynchronously I guess is the proper term here). Is also possible to provide background_process_http_request() with a callback function, e.g.:

<?php
// Normal drupal way: 8 images in serial
$results[] = drupal_http_request('http://www.example.com/image1.png');
$results[] = drupal_http_request('http://www.example.com/image2.png');
$results[] = drupal_http_request('http://www.example.com/image3.png');
$results[] = drupal_http_request('http://www.example.com/image4.png');
$results[] = drupal_http_request('http://www.example.com/image5.png');
$results[] = drupal_http_request('http://www.example.com/image6.png');
$results[] = drupal_http_request('http://www.example.com/image7.png');
$results[] = drupal_http_request('http://www.example.com/image8.png');
foreach ($results as $result) {
  save_image($result);
}
// Images are now saved...



// Background process way: 8 images, 4 parallel at a time
$options = array(
  // If 'postpone' is FALSE, then the requests will be executed immediately when background_process_http_request()
  // is called, and consequently, the 'limit' => 4 in the background_process_http_request_process() will have no effect.
  'postpone' => TRUE, 
  'callback' => 'save_image'
);
$results[] = background_process_http_request('http://www.example.com/image1.png', $options);
$results[] = background_process_http_request('http://www.example.com/image2.png', $options);
$results[] = background_process_http_request('http://www.example.com/image3.png', $options);
$results[] = background_process_http_request('http://www.example.com/image4.png', $options);
$results[] = background_process_http_request('http://www.example.com/image5.png', $options);
$results[] = background_process_http_request('http://www.example.com/image6.png', $options);
$results[] = background_process_http_request('http://www.example.com/image7.png', $options);
$results[] = background_process_http_request('http://www.example.com/image8.png', $options);
background_process_http_request_process($results, array('limit' => 4));
// Images are now saved


?>

stream-wise, it is possible to manually do a blocking http request using the background_process_http_request(), as this is a replacement for drupal_http_request(). However the background processing of specific php functions are always non-blocking.

The good things about memcache as locking, is that it is not volatile to transactions or requests. It is however volatile in ways that are out of our control (restart, garbage collection, etc.) which potentially could become a problem. Also, if we are to store an entry in the db with our process information, we essentially do a double locking?

Btw, this might also be of interest: #962958: Run cron in a separate process to avoid a response delay

mikeytown2’s picture

In HTTPRL I set all streams to stream_set_blocking($stream, FALSE); so it can operate in parallel. In HTTPRL what I call non-blocking is that after fwrite is done it does a fclose on the stream. This prevents that request from blocking the the script as a whole. stream_set_blocking($stream, FALSE) is a requirement if you want to do parallel streams from my understanding, and HTTPRL assumes you want parallel 100% of the time.

Side note is I think that stream_set_blocking should be called stream_set_sync as that is a better way of describing what is happening IMHO; but then there might be some confusion with the STREAM_CLIENT_ASYNC_CONNECT flag.

Why would one want to do a non-parallel http request? Just issue a single request, or set the window size/limit to be one if needed. IMHO having a method just for this is wasteful, we can simplify here :)

gielfeldt’s picture

At the time it seemed more logical (simpler?) in my copy/paste of drupal_http_request() to implement support for non-blocking rather than to replace the current functionality.
One use case, though, could be the use of the callback option, i.e. a regular drupal_http_request() with a callback option.

However, this blocking/non-blocking part is completely decoupled from the high level API in Background Process which handles background processing of functions, so we don't necessarily have to take this in to consideration. I'm actually thinking more of using the code from httprl_request() instead of background_process_http_request(), as you have given/gotten more attention to the actual IO part, so we don't need to implement the blocking part as such. We need though to eliminate static vars in order to encapsulate it properly.

We also need to make a clear distinction between the API for actually issuing an http request, and the API for handling the execution of functions in the background via the http requests, e.g.:

1.) background_process_http_request(): Issue an http request.
2.) background_process_start(): Run an arbitrary function in the background (which "incidentally" uses a non-blocking http request to do so).

Don't know (yet) what the async-flag does, but I think the stream_set_blocking() function is named properly. It does what it does. Async operations is what you can use it for. Then again, we could probably talk terminology all day :-)

Do you watch the European Soccer Championship? We (Denmark) just lost a match :-(

mikeytown2’s picture

I do not watch it, but one of my coworkers is rooting for Italy. Sorry to hear that Denmark lost, good luck against Germany this weekend and hopefully Portugal will lose :)

In terms of workflow, I like the idea of a "local" job queue that gets populated and then a run function that will start to issue HTTP requests.

$job = array():
$job[] = create_http_request();
$job[] = create_http_request();
$job[] = create_http_request();
$results = run_job($job);

$job = array():
$job[] = create_thread();
$job[] = create_thread();
$job[] = create_thread();
$results = run_job($job);

No HTTP requests are issued until run_job() is ran. This removes the possibility of create_http_request() issuing a HTTP request by its self; it needs run_job() in order for it to work. Also noted is create_thread() is a special wrapper around create_http_request(). The returned value from create_http_request and create_thread should be of the same type. If one wishes to use a callback for a single http request simply do this:

$job = array():
$job[] = create_http_request();
$results = run_job($job );



In terms of supporting core's queue we can either pass in an argument to create_thread() and create_http_request() or we can create a new function to do this. If we are creating a new function/method this is how I see it working.

$job = array():
$job[] = create_http_request();
$job[] = create_http_request();
$job[] = create_http_request();
$job[] = create_thread();
$job[] = create_thread();
$job[] = create_thread();
// Do it.
$results = run_job($job);

$job = array():
$job[] = create_http_request();
$job[] = create_http_request();
$job[] = create_http_request();
$job[] = create_thread();
$job[] = create_thread();
$job[] = create_thread();
// Queue it.
$queue_id = queue_job($job);


// Issue a fwrite(), fclose() and startup the druapl queue in a background process that does not effect this scripts load time in any noticeable way.
run_drupal_queue();

Here I have 2 new functions queue_job() and run_drupal_queue(). queue_job will put this $job array into the drupal queue to be processed at a later date. run_drupal_queue() will issue a create_thread() that is special because it uses the HTTPRL non-blocking version of execution (fwrite, fclose with no fread). run_drupal_queue() will then process the queue. If we pass in the $queue_id to run_drupal_queue() then it will only process that queued item and ignore all others. This allows for certain things that have a high priority to be ran first and lower priority items to be ran on cron. An example of a higher priority queued item is building imagecache presets on file upload so they are ready by the time the node is saved. A lower priority item would be to ping google because your news sitemap has been updated.



I use the STREAM_CLIENT_ASYNC_CONNECT flag in httprl. It means that once DNS has been looked up, it will establish the TCP connection in the background and not block the script from running. If you connect with an IP address and set the host header, one can issue a lot of HTTP requests in a short amount of time due to the usage of this flag and no DNS lookup :)

gielfeldt’s picture

I mostly agree, except for the naming of the functions, which makes we wonder if we're talking past each other here.

My idea on the abstraction level. The caller starting a job, shouldn't necessarily concern himself with if it's an http request or not. As I see it the caller wants to start a job, either immediately or when possible in the nearest future. In our case, we'll use an http request for starting the job immediately (immediately as when run_job() is called), and a drupal queue for the queueing part.

But it could easily be another mechanism that does the work (a drush call through proc_open() or whatever). The main thing is that we decouple high level api of starting a background process from the actual work needed to do so. Ensuring that the right code is responsible for the right task.

gielfeldt’s picture

Should we create a new branch where we can begin some prototyping? Regarding naming, I think I prefer Background Process over HTTPRL, but I may be biased :-).

mikeytown2’s picture

Background Process is a better name than HTTPRL for this merged project.

But in terms of short names here are some I've come up with
BP - Background Process
BPRL - Background Process & Parallel Request Library
BPMRL - Background Process & Multi Request Library
PRBPL - Parallel Request & Background Process Library
TPRL - Threading & Parallel Request Library
TMRL - Threading & Multi Request Library
PRL - Parallel Process & Request Library
MPRL - Multi Process & Parallel Request Library
etc...

Also wondering if we should make httprl client code be a sub module or auto included. I would lean towards sub modules so we can make both stand on their own. And start to roll in some other useful things that modules that depend on Background Process do like cron, etc... similar to AdvAgg; have it all in one module.

What other modules use Progress?

gielfeldt’s picture

I have no other modules that use progress. The abstraction/separation is a bit over-engineered, so I definitly think it should be merged into Background Process. However, I do use Background Process for a couple og modules (ultimate cron, cache graceful, http accelerator). I dont believe in putting ultimate cron into background process, thats a bit too monolithic an approach for me. Keeping the http part in a sub-module, we could do that. We could just start by merging the http stuff into httprl and the background process stuff into Background Process?

Regarding shortnames, i dont see the need for this, i think it will only leed to confusion.

mikeytown2’s picture

Sounds good. We bring in progress and httprl as stand alone sub-modules; Keep the long names; and keep ultimate cron, cache graceful, http accelerator separate projects.

I'll see if I can get a new httprl version rolled out in the next day or two.

gielfeldt’s picture

Cool. I'll create a new branch for Background Process and merge in the Progress module. I meant to embed the Progress code into the Background Process module, as the Progress module doesn't really deserve a module of it's own. Also, There's some duplication of data between the Progress and Background Process module (start time), which speaks in favor of a normalization.

So to roughly sum up:
1.) Background Process + Progress + HTTPRL background process code => Background Process
2.) Background Process http code + HTTPRL => HTTPRL

Is this what we we mean?

mikeytown2’s picture

I was thinking:
Background Process + Progress + HTTPRL threading library & http client code => Background Process.

Thus the HTTP client is a stand alone inside of Background Process. Or do you want to rely on httprl? I was going to merge 100%. Reason being if your doing parallel HTTP requests, odds are you want to do a lot of "work" and can thus benefit from something like background process.

gielfeldt’s picture

Sure we, can put add the HTTPRL as a submodule to Background Process. But a separate Progress module still serves no purpose IMO. And it's just a matter of where the code is placed, so I think it's safe to "truly" merge Progress into, just to get rid of the unnecessary duplicate data in the tables. It would certainly make my life a bit simpler I think, and I know of no-one who uses the Progress module by itself.

I already have a couple of sub modules in Background Process. If we are to put HTTPRL in as a submodule, perhaps it should be renamed to conform to proper standards? (Is it proper standards to prefix submodules with the mother-module's name? I'm not so sure. I'm thinking panels, panels_mini, panels_page, etc.)

Anyways, currently I have:

background_process
background_process/background_process_ass
background_process/background_batch

I think we can ignore the two existing submodules for now and port them later or move them out.

Whether or not we place HTTPRL inside Background Process makes no difference to me. We can always move it around (though we might have to rename the module), at least until we create the first release.

mikeytown2’s picture

background_process_http_client work for you?

gielfeldt’s picture

Yes, sounds fine.

gielfeldt’s picture

Should we branch background_process 7.x-1.x into a new branch, e.g. httprl or someting, and work from there?

mikeytown2’s picture

background_process 7.x-2.x should be fine. Just hide the dev release till we get most of the code working together; once that's done make 7.x-2.x-dev available. Luckily all the projects we wish to merge have a lot of the bugs worked out of them so this shouldn't be insanely hard to do.

gielfeldt’s picture

Ok. Done. 7.x-2.x is ready.

gielfeldt’s picture

Shouldwe try to divide the workload somehow? I was thinking of beginning merging in the progress module.

mikeytown2’s picture

I'll add in httprl http client code. You can merge in progress module. Once that is done we can work on replacing old http code with new http code, pass-by-reference, API interfaces, and things of that nature.

mikeytown2’s picture

as a fallback if http requests to self do not work, we could implement something like this http://drupal.org/project/request_queue

gielfeldt’s picture

That would be equivelant of doing a:

<?php
BackgroundProcess::lock()->queue('drupal_http_request', array($url, $options));
?>

On that note, I've been asked if it would make sense to make the API in such a way that the caller doesn't have to take into account whether or not the jobs should be executed using an http request or through a queue, but rather that this is a setting.

e.g.:

<?php
BackgroundProcess::lock()->run('drupal_http_request', array($url, $options));
?>

and then it's the setting that determines whether it runs through a queue or an http request is made right away.

I'm not sure what I think of this. On one hand it makes sense, but on the other hand you could image that the caller actually wants to control it. Perhaps we could support both paradigms?

What do you think?

BTW, I haven't had much time to look into making new code, as I'm on vacation at the moment and will be for the next week or so.

mikeytown2’s picture

Tough question...

We have these different modes of operation from my understanding:
- queue the request (I want this done in the near future, but it doesn't have to be now)
- queue the request and start up background process to run queue (useful for node save)
- queue then execute the request in same process if doing a non-blocking request (useful for node save)
- queue then execute the request in same process (useful for cron)

I think we should add the request to the queue so that if it fails for some reason it will be ran in the near future. If there are too many worker processes preventing a queued item from being ran this will allow us to fail gracefully.

In terms of how the different modes of operation is controlled via a setting or how it's called I would opt for how it's called (different methods).

gielfeldt’s picture

Think it sounds like a good idea having a queue as fallback. We might need a separate queue for this, and we need to make sure that the job isn't run twice.

- queue the request and start up background process to run queue (useful for node save)
- queue then execute the request in same process if doing a non-blocking request (useful for node save)
- queue then execute the request in same process (useful for cron)

I'm not sure about these 3... Can you elaborate a bit perhaps with some examples?

mikeytown2’s picture

"queue the request and start up background process to run queue (useful for node save)" Example: Notify users that a node has been updated. Needs to be done soon but if delayed, not a big deal.

"queue then execute the request in same process if doing a non-blocking request (useful for node save)" Example: Generate image cache presets for newly created node by requesting them in a non blocking manner. If this fails, no big deal as the browser will be requesting these very soon.

"queue then execute the request in same process (useful for cron)" Example: Think of something like the feeds module.

Also there's been some developments in the direction of http clients in drupal and we should adopt guzzle instead of httprl #1447736: Adopt Guzzle library to replace drupal_http_request(). I've been working on stuff over here if you've been wondering :)

gielfeldt’s picture

Hi Michael

I've committed a prototype of a new API to 7.x-2.x. Here's what I've done.

* Removed lot's of old code and moved db-stuff to the Background Process class.
* Moved all http stuff to background_process.http.inc
* Refactored BackgroundProcess class and put it into background_process.inc (I'm not sure of the file convention here …)
* Added support for various dispatchers (see example).
* Added support for run in arbitrary bootstrap phase (though below DATABASE phase might not make sense).
* Pass-by-reference results
* Partially moved progress handling to background process

I've tried to isolate the inner workings of Background Process, so that actually running the process is taken care of by some "dispatcher". Currently implemented are:
* call() - via non-blocking http request
* queue() - add's item to queue (cron worker not implemented yet)
* runNow() - proof-of-concept/stub, just runs the function and returns the result in the current request.

For the arbitrary bootstrap phase, I've add background_process.bootstrap.php … it may be a bit insecure at this moment :-)

<?php

function test_callback1() {
  $results = array();
  $i = 0;

  // Start processes
  $processes[] = BackgroundProcess::lock()
                   ->resultStorage($results[$i++])
                   ->setCallback('_test_callback1', array('test', 4))
                   ->call();

  $processes[] = BackgroundProcess::lock()
                   ->resultStorage($results[$i++])
                   ->setCallback('_test_callback1', array('test', 2))
                   ->call();

  $processes[] = BackgroundProcess::lock()
                   ->resultStorage($results[$i++])
                   ->setInclude(__FILE__)
                   ->setPhase(DRUPAL_BOOTSTRAP_DATABASE)
                   ->setCallback('_test_callback1', array('test', 1))
                   ->call();

  // Two runNow() examples …
  $results[$i++] = BackgroundProcess::lock()
                     ->setCallback('_test_callback1', array('test', 3))
                     ->runNow();

  BackgroundProcess::lock()
    ->resultStorage($results[$i++])
    ->setCallback('_test_callback1', array('test', 3))
    ->runNow();

  // Wait for $processes to finish, polling every 200ms
  BackgroundProcess::waitForFinish($processes, array('interval' => 0.2));

  $msg = '<pre>';
  $msg .= print_r($results, TRUE);
  $msg .= '</pre>';
  return $msg;
}

function _test_callback1($msg, $seconds) {
  // Get current process
  $process = BackgroundProcess::currentProcess();

  // Set progress (so others may see how far we've come
  for ($i = 1; $i <= $seconds * 100; $i++) {
    $process->setProgress($i / ($seconds * 100));
    usleep(10000);
  }

  // Return message to caller
  return "Slept for $seconds seconds ($msg)";
}

?>

As you can see from the API above, it's a bit explicit with the setCallback() and all … however, I think it's good to have this clean and visible distinction, and it also makes it easier to separate runNow(), call() and queue(). Later, perhaps we could shortcut the API, e.g. passing $callback and $arguments directly to call() or so.

EDIT: Changed codestyle for better viewing pleasure ...

mikeytown2’s picture

I like what I see :)
Below are some comments.

Do we want to support this? I think the resultStorage method is all we need.

  $results[$i++] = BackgroundProcess::lock()
                     ->setCallback('_test_callback1', array('test', 3))
                     ->runNow();

In regards to waitForFinish. The current default is to wait 10 seconds, correct? Just making sure I'm reading the code correctly.

So if one cares about the results of call(), waitForFinish() is the answer. If one doesn't care, don't call waitForFinish() and we're good to go. Correct?

gielfeldt’s picture

Hi

1.) Perhaps not, it might be confusing with two ways of getting the variables, and runNow() not return the object is inconsistent. runNow() or any other dispatcher should probably return the object to support further chaining. And then we could provide a getResult() method if needed:

<?php
  $results[$i++] = BackgroundProcess::lock()
                     ->setCallback('_test_callback1', array('test', 3))
                     ->runNow()
                     ->getResult();

  $handle = BackgroundProcess::lock()
                   ->resultStorage($results[$i++])
                   ->setCallback('_test_callback1', array('test', 4))
                   ->call()
                   ->getHandle();
?>

2.) Correct, default wait is ten seconds. I'm not sure what a good default value is here nor with the polling interval.
3.) Correct again.

gielfeldt’s picture

Hi Mike

I've been working some more on this. Here's what I've been focusing on:

  • Stability and cleanup.
  • Finishing merge of Progress into Background Process
  • Background Batch (should work, you can try it out if you like)
  • NodeJS support for Background Process :-)

Known todo (from the top of my head):

  • Merge all the "good stuff" from HTTPRL into Background Process (background_process.http.inc)
  • Security (finish the token stuff, currently hardcoded)
  • Create dispatchers with fallback logic (e.g. try HTTP first, then queue or something like that)
  • Create a plugin-architecture for dispatchers?

Here's a "short" list of the Background Process API:

<?php
/**
 * ::load($pid) - load a process
 * ::loadByHandle($handle) - load a process
 * ::currentProcess() - get currently running process
 * ::lock($handle = NULL) - lock new process
 * ->getHandle() - get handle of process
 * ->getProgress() - get current progress of process
 * ->getProgressMessage() - get current progress message of process
 * ->calculateETA() - calculate the ETA of the process (if applicable)
 * ->setServiceHost($service_host) - set the service group to use
 * ->setServiceGroup($service_group) - set the service group to use
 * ->setCallback($callback, $args) - set callback and arguments
 * ->setProgress($progress) - set the current progress of the process (0..1)
 * ->setProgressInterval($interval) - set the interval in seconds (float) to write the status to the database
 * ->setInclude($file) - include specific file before executing (can be called multiple times)
 * ->setPhase($phase) - set the phase for which to run the process at
 * ->setResultStorage(&$result) - return result in variable passed-by-reference.
 * ->keepAlive($keepalive) - restart the process after completion
 * ->setShutdownCallback($callback, $args = array())  - register handler to be run when shutting down the process
 * ->unlock() - unlock the process
 * ->reDispatch($threshold = 10) - reDispatch (or cleanup) the process if applicable
 */
?>

To see the effect of the NodeJS stuff, you can try git checkout of Ultimate Cron 7.x-2.x which utilizes this. You of course need to have NodeJS setup. Here's what I did (on Ubuntu, from http://drupal.org/node/1713530 and https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager):

%> sudo apt-get install python-software-properties
%> sudo add-apt-repository ppa:chris-lea/node.js
%> sudo apt-get update
%> sudo apt-get install nodejs npm

%> cd /home/thomas/drupal/sites/all/modules/nodejs

%> sudo npm install
%> sudo npm install socket.io
%> sudo npm install request
%> sudo npm --node-version=0.4.12 install express@2.4.3
%> sudo npm --node-version=0.4.12 install connect@1.1.4

Then I:

  1. enabled "nodejs" and "nodejs_config" module.
  2. Went to: http://drupal.example.com/admin/config/nodejs/config
  3. changed hostname to "drupal.example.com" and press save
  4. Went to: http://drupal.example.com/admin/config/nodejs/js
  5. changed host to "drupal.example.com", backend to "drupal.example.com" and servicekey to "whatever" and press save
  6. Copied the text from the textarea and put into /home/thomas/drupal/sites/all/modules/nodejs/nodejs.config.js

Then I started the nodejs server:

%> cd /home/thomas/drupal/sites/all/modules/nodejs
%> node server.js

After this you should be ready to install Ultimate Cron and goto /admin/config/system/cron

What do you think? And how do you suggest we proceed?

gielfeldt’s picture

Did you have a chance to look at this yet?

gielfeldt’s picture

Hi Mike

I really want to take this further, and you're probably busy with other stuff. But do you have any immediate concerns or comments? Otherwise, my plan is to continue with the points I listed, perhaps waiting with the merge of HTTPRL till the end, as that part of the functionality is somewhat isolated in the http-part.

mikeytown2’s picture

Things look good from what I can see. Sorry for taking so long to get back to you.