Timeouts and unfinished operations
| Project: | Search and Replace |
| Version: | 6.x-1.1-beta4 |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Hi,
I'm encountering major problems with this module on a medium-sized site with a total of 27703 nodes; some very simple replace operations (removing a html comment like <!-- foo -->comment) led to timeouts, unfished (and unfinishable) operations, end in the end to a heavily broken site.
The trouble started after upgrading from D5 to D6 a few days ago when I noticed that HTML comments embedded into the node body were displayed. I tried to replace those comments in the node's body with the %NULL% operator provided by the "search and replace module". Firsts tests on a rarely used content type with 21 nodes seemed to work fine.
Timeouts started to occur on a content type with 209 nodes (counted by the "systeminfo" module); the site is run on a dedicated server with dual core cpu and 4 gb ram, so I think it is not the hardware which limits the functionality; when a tool already times out with default setting on such a small subset of nodes, there might be a major design flaw in the code. With a PHP timeout set to the usual 30 seconds, I can import ~4000 images into image nodes through the "image import" module, or import the same amout of HTML files into "article" nodes through the "import html" module *without* timeouts. On the same machine, "search and replace" times out with 1/200th of this "usual" processing capability, that limits the modules use imho quite heavily. However, I ignored common sense (and did not immediately uninstall this module) and raised some php limits in settings.php:
ini_set('memory_limit', '256M');
ini_set('max_execution_time', '600');
ini_set('max_input_time', '600');Now I tried to process two heavyweight content types with 12714 ("article") and 14723 ("image") nodes and let the module untouched with open browser for several hours. Errors or timeouts did not occur, but results were also not outputted (something like "successfully processed 37 nodes"). Probably it is no good idea to increase PHP limits at all, it might be a better idea if the module would utilize the drupal batch api, cron hooks or implement something like the "pending jobs" (in MediaWiki).
Anyway, the results were very strange; the nodes that were touched seem to get a "new" flag at ./admin/content/node, but recive no updated edit date (both is fine with me); but the images from all "image" nodes are gone, meaning: are no longer displayed. neither as thumbnails or in the full node view; "Rebuild derivative images" gives me a message like
The derivative images for DSCN2722 have been regenerated.
Image DSCN2722 has been updated.However, the image is still not displayed. I'll have to investigate in the database what might have happened with the image nodes (they did work before!).
Even worse affected are the "article" nodes that contain html links to other nodes and images inside the node body (input format: "full html"). For some bizarre reason images are also no longer displayed anymore on the touched nodes, and almost all links I checked so far "broke" (meaning: do lead to a "page not found" message). The code in the node body seems to be alright, and if I directly paste the img src paths like ../../../../../../../../sites/default/files/legacy/at/reisen/2003/fotos/digital/dimage/2003-08-01/images/pict2267.jpg into the browser's location bar, the images are displayed correctly. It appears as if some mechanism that processes the node bodies has gone out of sync. Drupal never changes the content inside the node body by itself (that is handled through the chain of input filters), and maybe it has not been properly notified that the node bodies have changed? I absolutely do not understand this since I did not replace parts of the paths referencing nodes and images, and the paths in the html code itself seem to be still correct. Somehow it appears to be a display problem.
In D5, I did some major work in the past few years with the "Node find and replace" module and never experienced such a massive content destruction like this, and I have absolutely no idea yet what might have happened. I'll try to investigate this and then most probably fall back to the last backup from a few days ago (my fault for not making backups directly before using an unknown module).
I'm sorry for this vague bug report, but at the moment I can only describe the symptoms ;-/
After this experiences I'd strongly recommend (a) to not use this module for any code/path replacements containing stuff like .. / < or >; (b) make a backup before any operation; (c) to never use the module for operations affecting more than a few dozend nodes; (d) you might want to enable versioning on content types you're processing (however, I don't know how this modules handles versions).
Greetings, -asb

#1
Hi,
an update: the "images not displaying" problem seems to resolve itself; somehow a reindex of the whole site was triggered, and as the 27k nodes are indexed, the images reappear.
However, the problem with timeouts and unfinished operations remains (thus changing title and priority).
Greetings, -asb