http://drupal.org/project/httprl

Working through some issues that recently came up in HTTPRL, but it's something to look into using for your crawler. HTTPRL is a standalone module for the most part. It can call theses functions from drupal inside the .module file.
drupal_convert_to_utf8() - Only If stream_socket_client failed (I can work around this but doing an include of includes/unicode.inc is all that's needed).
drupal_generate_test_ua() - Only if $GLOBALS['drupal_test_info'] is set (this bit of code I could kill).
variable_get - in bootstrap.inc
request_uri - in bootstrap.inc
VERSION - in bootstrap.inc
HTTP_REQUEST_TIMEOUT in includes/common.inc. I can selectively define this if needed.
MENU_CALLBACK - in includes/menu.inc. Only called if httprl_menu() is called.
MENU_NORMAL_ITEM - in includes/menu.inc. Only called if httprl_menu() is called.

Comments

perusio’s picture

Yes but it requires to bootstrap Drupal to a higher level in order to use the module. Does it not? I thought about it and that was the main reason I left it out. I could do a simple require or include of the module files but I'm uncertain which Drupal API functions it makes use of.

Are the two you state above the only Drupal API functions used?

It certainly would be nice to have it as a simple parallel option. The Nginx Lua module is very powerful but it's not the route most people feel comfortable following, I suspect.

EDIT: Can you create a version of the module that has an include file providing the needed stuff from the Drupal API, therefore not
requiring bootstrapping Drupal above the DB layer?

mikeytown2’s picture

Status: Active » Needs review

Module loading code for D7.

define('DRUPAL_ROOT', getcwd());

require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_DATABASE);

$result = db_query('SELECT filename FROM {system} WHERE name = :name', array(':name' => 'httprl'))->fetchAssoc();
if (!empty($result)) {
  require_once DRUPAL_ROOT . '/' . $result['filename'];
}

Also need this patch #1426886-1: Allow HTTPRL to operate at the database bootstrap level. or to grab the latest version from git.

perusio’s picture

Ok. Thanks. It's now on the TODO list and it will be part of the next release.

perusio’s picture

Assigned: Unassigned » perusio

Ok. Thinking out loud. Provide a --with-httprl that can be empty or passed a /path/to/module for people not whishing to install another module but just want to take advantage of httprl parallel abilities just for the crawler.

When --with-httprl is empty it assumes that the module is installed.

mikeytown2’s picture

Code for httprl has been fairly stable. Have you thought about implementing an option for using it?

perusio’s picture

I have. But lacking time :( I recently moved from one country to another and just now I'm recovering my work rhythm. I'll do it ASAP.

Hopefully there will be a drupal meetup in April here in Paris and I will talk about microcaching with `httprl` as an option.

mikeytown2’s picture

Latest dev of httprl requires no drupal bootstrap now.

mikeytown2’s picture

Issue summary: View changes

add in more things that are in core