Running a pretty simple Drupal 6.16 site with the latest versions of the following modules:
- Contemplate
- Feedapi/Feedapi_eparser/feedapi_mapper
- Nodewords
- opensearch
- pathauto
- search_config (meanwhile disabled)
- site_verify
- token
- I had CCK enabled, but removed it after I read the issues with caching, search indexing and CCK...

The site imports feeds and turns them into blog entries. CRON is running fine. The Search setting shows new nodes coming in to be indexed, and each time CRON runs, it hits "100% indexed".

All went fine until a week ago, I experimented with the standard cache module (enabled it, then enabled it as 'Aggressive')...

Now, no matter what I do, it seems the nodes are indexed but search results do not yield any nodes more recent than 1 week ago, when I enabled the cache.
I re-indexed the site (about 180,000 nodes, took 4 days of CRON runs, ough)... but still no changes: search results only show results up to one week ago. I experimented by creating dummy manual blog entries, waited for the next CRON run (100% indexed), but same problem.

Changing "Content ranking" in the Search settings yields different results, but yet no posts more recent than 1 week ago.

I have no clue where to look anymore for any possible solution. Any help welcome!

Peter

Comments

cog.rusty’s picture

It seems that the bigger issue with search will be alive even until Drupal 8.
http://drupal.org/node/286263

In threads about possible causes they usually talk about PHP code snippets inside node content which do redirects (drupal_goto(), header() etc), or even modules which do redirects inside node content.

Maybe you could examine your search_index and search_dataset tables, find the last nid (last sid of type 'node') in there and, if it is old, examine the body of that node and of the next node for anything out of the ordinary.

petercasier’s picture

@cog.rusty:

Tnx for offering to help.

The last nids for both the search_index and search_dataset tables are very recent (a few minutes ago)..
I am a novice, but to me, it looks all is indexed but there might be something with the queries?

P.

cog.rusty’s picture

In case it is an access permissions problem, are you doing the searches as a user who can access everything?

petercasier’s picture

Yes, as the admin.

maciej.zgadzaj’s picture

Following on Twitter - you can try to temporarily add following lines to your cron.php:

ini_set('error_reporting', E_ALL);
ini_set('display_errors', 1);

and then execute it directly from shell:

php cron.php

to see if you get any errors.

Don't you have any specific content type permissions defined?
Are all those nodes published?
Don't you have any PHP code in them which could throw you an exception?

maciej.zgadzaj’s picture

Or, you can use:

ini_set('error_reporting', E_ALL-E_NOTICE);

if you don't want your cron to display PHP notices (in case you get any interfering with cron proper execution).

petercasier’s picture

BTW: I rebuilt the node permissions just in case.

after that, I started reindexing the site again.

I get errors like:

user warning: Duplicate entry '78-node' for key 1 query: INSERT INTO search_dataset (sid, type, data, reindex) VALUES (78, 'node', ' how nonprofit organizations etc etc etc ', 0) in /home/content/h/u/m/humanitarianne/html/modules/search/search.module on line 571.

...almost for quite some nodes...
I don't know if this is relevant for the problem I am experiencing

Peter

petercasier’s picture

Hmm.. apparently those errors only occurred for the first hunderd or so nodes... after that, all is proceeding normally...

Peter

petercasier’s picture

Tnx for following up.

On GoDaddy, I have shell access, but can not execute PHP commands it seems:

Unable to use the MyQSLi database because the MySQLi extension for PHP is not installed...

any other way I can get the errors displayed?

to answer your questions:
- no, I have no specific content type permissions defined. (I read abt the CCK permissions problem with caching, but I disabled that module as I could never see the permissions setting in the user settings).
- all nodes are published
- there is no PHP code in any of the posts (input is from feeds, PHP is filtered out)

Peter

maciej.zgadzaj’s picture

Frankly speaking, I do not have any good idea what could be your problem.

What I would do in this case would be probably to manually create a new page (in your own module) calling function trying to (at least partially) replicate search functionality - trying to reindex only this one (or few) specific node(s), and see if it works, how it works and if it returns you any errors...

petercasier’s picture

Tnx Maciej....

That was one of the first things I tried: to manually create a node (a blog entry in my case, same as the FeedAPI imported nodes), and reindex it... But unfortunately, it does not appear in the search neither...

In a last attempt, I reset the node permissions, and I am once more reindexing the search...

Peter

cog.rusty’s picture

By the way, check if the collation of the index tables is the same as the collation of all the other tables, and especially the node revisions table (either all of them utf8_general_ci or all of them utf8_unicode_ci).

petercasier’s picture

Sorry, beyond my capacity, I am afraid. What would be the commands to check that?

cog.rusty’s picture

If you use phpmyadmin, you see them listed in the page where you see all the tables. There is a "collation" column.

petercasier’s picture

Indeed, all tables are utf8_general_ci

Peter

maciej.zgadzaj’s picture

Actually, what I had in mind was creating your own little test module with simple hook_menu() calling a function replicating behavior of search' reindexing - but trying to reindex only that one specific node which you have problems with. This way you could monitor the whole process step by step and probably find out at which point exactly it fails.

roadsideok’s picture

This was happening to me too. I had recently added Panels and Chaos Tools, so (after trying many other things) I disabled those modules and now my search is working again. Fortunately I wasn't using those modules yet, and don't really need to. You don't seem to use those modules, but perhaps one of the ones you are using is new and causing problems?

petercasier’s picture

@roadsideok:

When you disabled the modules, did you have to run a re-index on the search for your users to see the up-to-date posts, or did the results "reappear" as soon as you disabled the "offending" modules.

(and no, I don't run Panels or Chaos)

Peter.

petercasier’s picture

is related to this issue: http://drupal.org/node/488166

http://drupal.org/node/488166

see node.module:

In the search the weight of the node is calculated as:

if ($weight = (int)variable_get('node_rank_recent', 5)) {
        // Exponential decay with half-life of 6 months, starting at last indexed node
        $ranking[] = '%d * POW(2, (GREATEST(MAX(n.created), MAX(n.changed), MAX(c.last_comment_timestamp)) - %d) * 6.43e-8)';

However, if there is no entry in node_comment_statistics for a given node, c.last_comment_timestamp will = NULL. In my mysql version, GREATEST(any number, NULL) will return NULL
Thus is returned as NULL.

So the patch would be: (all in node.module)

if ($weight = (int)variable_get('node_rank_recent', 5)) {
        // Exponential decay with half-life of 6 months, starting at last indexed node
        $ranking[] = '%d * POW(2, (GREATEST(MAX(n.created), MAX(n.changed), MAX(c.last_comment_timestamp) || 1) - %d) * 6.43e-8)';

My assumption is that (but not sure of this), at a certain moment, I switched off the comments module, and as of that time (which is when i discovered the search no longer working), c.last_comment_timestamp started to be NULL... thus as of then, the search no longer returned the most recent results.

Thanks to Ema for the offline help in debugging!!!!

PS: I can see that the problem is there in 6.17 too.