Creation of index not procceeding due to memory limits
| Project: | Fuzzy Search |
| Version: | 5.x-1.2 |
| Component: | Code |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
When generating (cron) the index tables I get the following messages:
Fatal error: Allowed memory size of 41943040 bytes exhausted (tried to allocate 131 bytes) in /home/www/web14/html/drupal/includes/database.mysql.inc on line 157
Fatal error: Allowed memory size of 41943040 bytes exhausted (tried to allocate 24 bytes) in /home/www/web14/html/drupal/includes/database.mysql.inc on line 158
Fatal error: Allowed memory size of 41943040 bytes exhausted (tried to allocate 131 bytes) in /home/www/web14/html/drupal/includes/database.mysql.inc on line 157
Fatal error: Allowed memory size of 41943040 bytes exhausted (tried to allocate 131 bytes) in /home/www/web14/html/drupal/includes/database.mysql.inc on line 157
Fatal error: Allowed memory size of 41943040 bytes exhausted (tried to allocate 81 bytes) in /home/www/web14/html/drupal/includes/database.mysql.inc on line 399
Tables search_index_queue and search_fuzzy_index are containing content but do not change any more.
So the index seems to be incomplete.
Regards
Schildi

#1
How many nodes are being indexed on one cron run?
Also, I have it setup so that it queries the queue table to find out which nodes are tagged for indexing, then the index function indexes one node at a time, first deleting the previous indexed content for that node, then inserting the new indexed content.
Blake
#2
How can I determine the number of nodes indexed in one run? It seems not to be configurable.
The table search_index_queue contains a timestamp, but all the 1449 entries have as value either 1188986312 or 1188986313.
#3
I guess one step to try, just to see if the problem is that its indexing too many nodes at once.
add a limit to the query in the function fuzzysearch_cron().
The query should look like this
$query = db_query("SELECT nid FROM {search_index_queue} LIMIT 0, 100");instead of this
$query = db_query("SELECT nid FROM {search_index_queue}");If you make that change and try it again that should help us rule out the fact that indexing is pulling too much memory. The only other thing I can think of is if the nodes are extremely lengthy, but for it to fill 40mb thats a lot, and probably not likely. Do you have any image/video processing running at cron? I know I've had some issues with imagecache when uploading images larger than some 6MegaPixels (it requires a lots of memory to process the images). This is definitely a high priority for me to fix.
#4
No, there is no video/image processing while the index is generated.
As you advised I added the LIMIT option to the query. Now I can see that indexing is done portions of 100 nodes.
It will take a while until I can see if the problem remains ...
OK, it looks that the memory problem is solved using the current configuration. The process passed the 1449 nodes limit without further complains.
But I have to wait until tomorrow to ensure that it finishes without a harm.
#5
Done. No errors raised.
Bug is fixed and can be closed.
Best regards
Schildi
#6
Great to hear. So by inserting the limit on the query it solved the problem? If so I'll patch and update the release snapshot.
Also, so far how has the experience been with result relevancy? Any completely unexpected results or has it been quite useful so far? I havn't had the opportunity to test it out on a site as large as yours.
#7
Of course you are invited to check the your fuzzy search using the (hopefully growing) archive at archiv.bgv-rhein-berg.de. Currently about 2/3 of the content is visible by world. Please select "Unscharfe Suche" (unsharp search) for fuzzy searching.
It would be great to have the matching words/phrases highlighted. As far as I can see the shown text normally does not contain the matching word. So users have to enter the original text and search for it (which might be difficult in case the writing is insure).
When I look e.g. for "Otto-Hahn Schule" I get a long listing of hits. But none of the first 6 or more entries has anything to do with this search phrase (may be this is a bit unfair since there is currently no entry matching the phrase). Output some numeric relevancy to give the user a hint?
On the other hand searching for the famous duke "Jan Wellem" who appears several times in the archive yields to results where I can't find the name in the first three entries (did not check further). So there seem to be some glitches remaining.
Regards
Schildi
Please contact me for further testing. You can reach me by email (see Impressum)