Ability to execute XMLPipe generator from PHP CLI

markus_petrux - October 29, 2008 - 20:45
Project:Sphinx search
Version:6.x-1.x-dev
Component:XMLPipe generator
Category:task
Priority:normal
Assigned:Unassigned
Status:needs work
Description

Hi,

I tried drupal.sh to run the XMLPipe generator, but it doesn't work, so we need an alternative script so we would not have to involve the Apache overhead during the indexer process.

This is something that can be done at any time. Though, for the momento I will be more focused on closing issues related to the D6 port, and next, adding new features as described here: #306959: Porting sphinxsearch module to D6 (and Battle Plan)

#1

markus_petrux - October 29, 2008 - 20:47
Title:TODO: Ability to execute XMLPipe generator from PHP command line» Ability to execute XMLPipe generator from PHP CLI

#2

moshe weitzman - November 7, 2008 - 21:45

You have two options. One is to expose sphinxsearch_xmlpipe() as a regular menu callback. Then drupal.sh can call. You can put anything you want in the the access callback including requiring that it be run via CLI. I have some code for this.

The other option is a drush script which calls sphinxsearch_xmlpipe(). I can whip that up for you if you want to go this way. It is a pretty good idea because we can make drush commands for related operations like wiping the index, retrieving stats, reindexing a given node, etc.

#3

markus_petrux - November 8, 2008 - 10:25

The XMLPipe generator is invoked from the Sphinx indexer command based on how the index sources have been defined in sphinx.conf. Aside from that, Sphinx does not have an API that can be used to reindex documents. You can only update numeric based attributes, or you have to rebuild indexes in batch, using the Sphinx indexer command. I tried to expliain this in the README.txt

Coding wrappers for drush would not allow us to reindex a single document. So that leaves just one option. We need an entry point that will be invoked by the Sphinx indexer command.

Exposing sphinxsearch_xmlpipe() as a regular menu callback sounds good. But something should be done to prevent anyone from running this process from the browser. This process will generate a huge XML stream that Sphinx indexer will parse and build an index.

#4

moshe weitzman - November 8, 2008 - 14:15
Status:active» needs review

This patch makes a menu callback for the main xmlpipe() function. IMO, we can now remove the whole scripts subdirectory (need to deal with .htaccess features?).

I've added an access callback for the menu item that checks IP just as before. Access callbacks must live in the .module so it has moved there. The $caller_version check is gone now since there is no xmlpipe.php and things cannot get out of sync.

TODO: Update README-XMLPIPE.txt by removing INSTALL section and stating that the URL for indexing is now http://www.example.xom/xmlpipe?mode=main (for example). README.txt also mentions the scripts subdir so that will need updating also. I'll do this if you agree with this proposal, or Markus is quite welcome to update the text as desired. I haven't played with this module yet and don't have the full understanding.

AttachmentSize
mw.patch 5.43 KB

#5

moshe weitzman - November 8, 2008 - 14:16

Oops. Now xmlpipe is a MENU_CALLBACK

AttachmentSize
mw.patch 5.46 KB

#6

markus_petrux - November 8, 2008 - 15:25

Thanks for chiming in. :)

For the menu callback I would prefer to an access control method that allows everyone use it. That way we can still keep code related to XMLPipe generation stuff in its own file, which is not needed during normal operation.

When not authorized event happens, it is important to send output as clean as possible, so it can be easily debugged when Sphinx indexer is invoked. See this report for an example: #329509: how you authorize the indexer for requesting pages from drupal6-site?

If we send a Drupal 'access denied' page, then the error reported by Sphinx indexer is not so clear to interpret.

Before removing the current XMLPipe script we would have to test this under different scenarios (with or without CCK, taxonomies, etc.), because maybe drupal.sh does not reproduce all the environment variables that could be required by code invoked when node processing is executed by the XMLPipe generator to obtain the data to be indexed. I have never used drupal.sh, so I don't know.

Or maybe we can keep both methods, just in case.

#7

markus_petrux - November 8, 2008 - 15:25
Status:needs review» needs work
 
 

Drupal is a registered trademark of Dries Buytaert.