Hi,

On our site we have this node:

http://www.freesoftwaremagazine.com/articles/beginners_guide_understandi...

It's title is "Having your cake and eating it" and it contains the word cake many times.

However, when I search for the term cake:

http://www.freesoftwaremagazine.com/search/node/cake

The article does not appear.

Is this a bug or do we need to do something to get search working properly?

This is not an isolated incident - I have noticed this on other occasions when I have searched for other terms and haven't found articles that contain the terms in high frequency

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dmguard’s picture

Anyone?

VM’s picture

first and foremost you should be runnig Drupal 5.1

that aside, are you sure you are indexing the pages necessary ? in adminsiter -> search settings ?

mercmobily’s picture

Version: 5.0-rc2 » 5.1

Hi,

Just stepping in...
We are actually running Drupal 5.1 - sorry.

Now... the page in adminsiter -> search settings doesn't actually allow you to select what to search for.
The page just says:

100% of the site has been indexed. There are 0 items left to index.

I couldn't fine a "index this node type" option in http://www.freesoftwaremagazine.com/admin/content/types ... are we missing something?

Merc.

mercmobily’s picture

Hi,

Hummm it looks like this should be a bug report. I looked and looked, and some pages just aren't getting indexed...
The painful part is that reproducing this problem would be a huge pain.

Merc.

mercmobily’s picture

Title: Searching doesn't provide correct results » Searching doesn't provide correct results - some pages are not indexed
Category: support » bug

Changed into a bug report.
What can I actually do to see what's causing it?
The node 1740 has the word "cake" several times.
However, searching the index returns:

mysql> select * from search_index where word='cake';
+------+------+------+---------+----------+----------+
| word | sid | type | fromsid | fromtype | score |
+------+------+------+---------+----------+----------+
| cake | 1445 | node | 0 | NULL | 0.166986 |
| cake | 1580 | node | 0 | NULL | 0.504001 |
| cake | 1627 | node | 0 | NULL | 1 |
| cake | 1630 | node | 0 | NULL | 0.938382 |
| cake | 1675 | node | 0 | NULL | 0.653974 |
| cake | 1367 | node | 0 | NULL | 0.24279 |
| cake | 1508 | node | 0 | NULL | 0.873132 |
| cake | 2052 | node | 0 | NULL | 0.61241 |
| cake | 2168 | node | 0 | NULL | 0.162972 |
+------+------+------+---------+----------+----------+
9 rows in set (0.00 sec)
mysql>

So, the problem is in the indexing.

Now... I run the re-indexing of the whole site. The same query:

mysql> select * from search_index where word='cake';
+------+------+------+---------+----------+----------+
| word | sid | type | fromsid | fromtype | score |
+------+------+------+---------+----------+----------+
| cake | 1580 | node | 0 | NULL | 0.504001 |
| cake | 1627 | node | 0 | NULL | 1 |
| cake | 1630 | node | 0 | NULL | 0.938382 |
| cake | 1367 | node | 0 | NULL | 0.24279 |
| cake | 1508 | node | 0 | NULL | 0.873132 |
| cake | 2052 | node | 0 | NULL | 0.61241 |
| cake | 2168 | node | 0 | NULL | 0.162972 |
| cake | 1445 | node | 0 | NULL | 0.166986 |
| cake | 1844 | node | 0 | NULL | 0.730038 |
| cake | 1675 | node | 0 | NULL | 0.653974 |
| cake | 1977 | node | 0 | NULL | 3.81698 |
| cake | 2022 | node | 0 | NULL | 0.736141 |
| cake | 1740 | node | 0 | NULL | 45.8584 |
| cake | 2127 | node | 0 | NULL | 0.707906 |
| cake | 2102 | node | 0 | NULL | 0.533197 |
| cake | 2110 | node | 0 | NULL | 0.246807 |
| cake | 1265 | node | 0 | NULL | 0.430926 |
+------+------+------+---------+----------+----------+
17 rows in set (0.00 sec)
mysql>

1740 is there!
(And a lot of other stuff as well...)

Is this a known problem with Drupal's indexing system?
Or can I consider this a "one-off" thing?

Merc.

dropcube’s picture

Component: search.module » node.module
Assigned: Unassigned » dropcube

I think that the problem is in node.module, not in search module. Some nodes are not been indexed.

  $result = db_query_range('SELECT GREATEST(IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp), n.changed) as last_change, n.nid FROM {node} n LEFT JOIN {node_comment_statistics} c ON n.nid = c.nid WHERE n.status = 1 AND ((GREATEST(n.changed, c.last_comment_timestamp) = %d AND n.nid > %d) OR (n.changed > %d OR c.last_comment_timestamp > %d)) ORDER BY GREATEST(n.changed, c.last_comment_timestamp) ASC, n.nid ASC', $last, $last_nid, $last, $last, $last, 0, $limit);

the above query at node_update_index() collects the nodes to index. It is not working correct (at least in my MySQL version 5.0.15 ).

If a node has not comments, then c.last_comment_timestamp will have a NULL value in the resulting records. So GREATEST(n.changed, c.last_comment_timestamp) will return NULL if c.last_comment_timestamp is NULL, causing that some nodes are skipped and not indexed.

A correct query may be like this:

  $result = db_query_range('SELECT GREATEST(IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp), n.changed) as last_change, n.nid FROM {node} n LEFT JOIN {node_comment_statistics} c ON n.nid = c.nid WHERE n.status = 1 AND ((GREATEST(n.changed, c.last_comment_timestamp) = %d AND n.nid > %d) OR (n.changed > %d OR c.last_comment_timestamp > %d)) ORDER BY last_change ASC, n.nid ASC', $last, $last_nid, $last, $last, $last, 0, $limit);

This has solved the problem in my websites, all nodes are been indexed correctly now.

Also, the query at node_search() should be re-written in order to obtain correct statistics at search settings page (fixing the same bug).

    case 'status':
      $last = variable_get('node_cron_last', 0);
      $last_nid = variable_get('node_cron_last_nid', 0);
      $total = db_result(db_query('SELECT COUNT(*) FROM {node} WHERE status = 1'));
      $remaining = db_result(db_query('SELECT COUNT(*) FROM {node} n LEFT JOIN {node_comment_statistics} c ON n.nid = c.nid WHERE n.status = 1 AND ((GREATEST(IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp), n.changed) = %d AND n.nid > %d ) OR (n.created > %d OR n.changed > %d OR c.last_comment_timestamp > %d))', $last, $last_nid, $last, $last, $last));

mercmobily’s picture

Hi,

This is quite an important bug! For us, it means that the some out of the 2000+ articles we have published are not indexed...

Merc.

dropcube’s picture

Hi,

If your articles does not have comments, is probably that they aren't indexed.

I have modified the module, and now it is working OK in my websites.

I don't know how to contribute to solve the problem....

PS: Let me know if you understand what I was explaining in my previous comment.

mercmobily’s picture

Hi,

Herc: what you write seems to make a lot of sense. However, I feel very uneasy modifying the core...
I would love to see a core developer step in and actually apply the relevant changes to Core!

(I wish I were able to...)

Merc.

MauMau’s picture

Title: Searching doesn't provide correct results - some pages are not indexed » A solution to: Searching doesn't provide correct results - some pages are not indexed

I had exactly the same problem. Lots of new content. None of it being indexed.

Changing the sql-query didn't work for me.

But telling Drupal to index a node on submit works wonders.

In node.module you put the hack just below

 // Clear the cache so an anonymous poster can see the node being added or updated.
  cache_clear_all();

(usually line 652)

The hack looks like this:

    // Build the node body.
    $node = node_build_content($node, FALSE, FALSE);
    $node->body = drupal_render($node->content);

    $text = '<h1>'. check_plain($node->title) .'</h1>'. $node->body;

    // Fetch extra data normally not visible
    $extra = node_invoke_nodeapi($node, 'update index');
    foreach ($extra as $t) {
      $text .= $t;

 // Update index
    search_index($node->nid, 'node', $text);

Good things:

  • You will not notice that the Drupal search indexing is broken.
  • New content will show up in the search results immediately.

I wonder very much why Drupal just stops indexing new content (while having no problems reindexing the full site).
But the answer I have not.

I wonder a bit why Drupal doesn't just index nodes on submit. Why wait for the cron-job?
But above you see my solution.

dropcube’s picture

After modified the query, you should go to Administer > Site configuration > Search Settings and Re-Index site. Of course, also run cron until all nodes get indexed.

How can we help to fix this bug ????

dropcube’s picture

After modified the query, you should go to Administer > Site configuration > Search Settings and Re-Index site. Of course, also run cron until all nodes get indexed.

How can we help to fix this bug ????

robertDouglass’s picture

Version: 5.1 » 6.x-dev

Subscribing. Changing the version number because the proper workflow will be to fix it in D6 and then backport it. Most likely the bug (and the fix) will apply to 4-6, 4-7, 5-0 and 6.

deville’s picture

Version: 6.x-dev » 4.7.6

I have the same problem on two 4.7 sites - surprised this hasn't been noticed before.

I tried the new sql-query as above but it did not help.

And I can't make the index-on-submit mod either because I believe that is for 5.0 only.

Subscribing.

douggreen’s picture

Version: 4.7.6 » 6.x-dev

Subscribing and changing back to 6.x for reasons Robert mentioned.

I've run into indexing problems as well, this from the devel list: http://lists.drupal.org/pipermail/development/2007-April/023272.html

catch’s picture

This is a duplicate of http://drupal.org/node/57106 but leaving it active in case it's a different bug.

Out of 15,000 nodes on my site, I've only ever managed to successfully index the first 319. So I'm very interested in testing/reviewing to get it sorted out.

m3avrck’s picture

subscribing

catch’s picture

http://drupal.org/node/139537#comment-238296

is working for me - I've got 7% of my site indexed so far with the changes to node.module - will take a few hours to get everything done. status on search settings page also seems to be working fine.

Assuming it gets to 100% I'll roll it into a patch later tonight or tomorrow with a bit of luck.

catch’s picture

Status: Active » Needs review
FileSize
2.66 KB

OK I rolled http://drupal.org/node/139537#comment-238296 into a patch against HEAD.

This fix (manually applied) is working great on my 5.1 site as of today, I'm 15% through 14,000 nodes and still running - all other fixes I've seen haven't got me past 320.

catch’s picture

Title: A solution to: Searching doesn't provide correct results - some pages are not indexed » search indexing gets stuck: node_index()
Priority: Normal » Critical

updating title. Setting as critical.

robertDouglass’s picture

Do you end up with case 'status': twice in your code?

     case 'status':
+      case 'status':
m3avrck’s picture

Status: Needs review » Needs work

Yeah patch has duplicate "case: 'status'" in there...

catch’s picture

Status: Needs work » Needs review
FileSize
2.62 KB

ack! my fault testing first patching after instead of the other way 'round.

New patch without duplicate status attached.

Dries’s picture

When a node is created, comment_nodeapi does:

      db_query('INSERT INTO {node_comment_statistics} (nid, last_comment_timestamp, last_comment_name, last_comment_uid, comment_count) VALUES (%d, %d, NULL, %d, 0)', $node->nid, $node->created, $node->uid);

This is strange because herc reported that the problem is casued by the fact that last_comment_timestamp is NULL when the node has no comments yet.

Also, in system.install, last_comment_timestamp is defined as int NOT NULL default '0'.

Wouldn't this mean that the IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp) checks are redundant?

I'd like to see us look into this a bit better. I don't feel comfortable committing this patch until we/I really understand what is going on. Thanks.

catch’s picture

That makes sense, I don't have any null values in last_comment_timestamp at all. However that change has worked (up to 60% now, 5,000 odd to go).

dropcube’s picture

Dries, If a node has not comments, then c.last_comment_timestamp will have a NULL value in the resulting records. So GREATEST(n.changed, c.last_comment_timestamp) will return NULL if c.last_comment_timestamp is NULL, causing that some nodes are skipped and not indexed.

Note, if comments module is not enabled, comment_nodeapi is not executed when a new node is created.

So, c.last_comment_timestamp is NULL in the resulting records set, because of the LEFT JOIN clause, even if last_comment_timestamp is 0 in node_comment_statistics table. The LEFT JOIN will not match records in node_comment_statistics table and NULL is returned.

This only happens when comments module is not enabled.

If a node has not comments yet, then

David Lesieur’s picture

Since herc mentioned he was using MySQL 5.0.15, this bug seem related to this issue.

Dries’s picture

If a node has not comments, then c.last_comment_timestamp will have a NULL value in the resulting records.

I don't see how that is possible because we execute the code below when the node is created:

 case 'insert':
      db_query('INSERT INTO {node_comment_statistics} (nid, last_comment_timestamp, last_comment_name, last_comment_uid, comment_count) VALUES (%d, %d, NULL, %d, 0)', $node->nid, $node->created, $node->uid);
      break;
Dries’s picture

So, this means that the problem only triggers when the comment.module is disabled?

catch’s picture

I have comment module enabled on my site (although probably not while the first 2,000 or so nodes were added, where the indexing was getting stuck on). No NULL values in last_comment_timestamp though.

Can confirm that the patch I posted at #25 has taken my search index from less than 1% to 100% indexed though.

Dries’s picture

I looked at this some more, and I know understand what you were saying. (It's Monday morning! I'm slow to boot.)

It's a bit of an awkward situation. Both the node module and the comment module manipulate the node_comments_statistics table. They have a 1-to-1 relationship and sometimes that relationship is not maintained properly, it seems.

Three possible solutions:

1. Merge the node_comment_statistics table into the node table -- it would eliminate many joins and avoid problems like this. It might also result in poor cache performance.

2. Update the node_comment_statistics table in the node module.

3. The proposed patch to work around the inconsistencies in the mapping.

catch’s picture

Although I'd love my patch to get committed ;) node_comment_statistics seems to be causing problems all over the place at the moment, and since it's generally joins, then a loss in caching performance might be outweighed by eliminating the joins?

tracker: http://drupal.org/node/105639#comment-246235
search results: http://drupal.org/node/106659

Dries’s picture

I think it would, yes -- but benchmarks could help make the point. ;-)

Dries’s picture

See also http://drupal.org/node/148849: merge {node_comment_statistics} and {node_counter} into {node}.

robertDouglass’s picture

arjenk’s picture

There is still a small problem with the last patch, the second reference to last_comment_timestamp which is not IF-ed, should also be taken care of. This caused in my site some nodes falling between 2 cron-runs, and thus not being indexed. so, i would propose to replace:

$result = db_query_range('SELECT GREATEST(IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp), n.changed) as last_change, n.nid FROM {node} n LEFT JOIN {node_comment_statistics} c ON n.nid = c.nid WHERE n.status = 1 AND ((GREATEST(n.changed, c.last_comment_timestamp) = %d AND n.nid > %d) OR (n.changed > %d OR c.last_comment_timestamp > %d)) ORDER BY last_change ASC, n.nid ASC', $last, $last_nid, $last, $last, $last, 0, $limit);

by

$result = db_query_range('SELECT GREATEST(IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp), n.changed) as last_change, n.nid FROM {node} n LEFT JOIN {node_comment_statistics} c ON n.nid = c.nid WHERE n.status = 1 AND ((GREATEST(n.changed, IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp)) = %d AND n.nid > %d) OR (n.changed > %d OR c.last_comment_timestamp > %d)) ORDER BY last_change ASC, n.nid ASC', $last, $last_nid, $last, $last, $last, 0, $limit);

This problem occurs when there are more nodes with the same changed-dates, and the cron job does only some of them (because of the $limit-setting). The next run run will incorrectly skip the remaining nodes.

Of course i would like to see a better solution (see prev. comment), but for back porting in older versions this could be useful.

catch’s picture

That might explain at least some of my initial problem - I had about 5,000 nodes imported from phpbb.

Do you want to roll that into a new patch? If not I'll try to do that later.

arjenk’s picture

FileSize
2.71 KB

ok, here is the patch against cvs-head.

btw: the following query gives the number of skipped nodes in the search (regardsless if your site says 100% indexed...), so you can see if this patch could (perhaps) solve your problem. There is only a possible problem when doing a mass import.

select count(nid) from node n left outer join search_dataset sd on n.nid=sd.sid where n.status=1 and sd.sid is null;
catch’s picture

ok none showed up. I used phpbb2drupal which I believe uses a node_save though, so it might not have been an issue in the first place.

arjenk’s picture

The patch still stands though...

The problem is caused by changed behaviour in the GREATEST() function in MySQl since 5.0.13. From the manuals:

Before MySQL 5.0.13, GREATEST() returns NULL only if all arguments are NULL. As of 5.0.13, it returns NULL if any argument is NULL.

How to reproduce:

  • start with a fresh cvs checkout (using a mysqldatabase >= 5.0.13)
  • disable comments module
  • enable search module
  • install devel module
  • generate 1000 nodes without comments (devel module can do this for you)
  • run cron.php 10 times (hint: ab -c1 -n10 http://localhost/drupal/cron.php)
mysql> select count(nid) from node n left outer join search_dataset sd on n.nid=sd.sid where n.status=1 and sd.sid is null;
+------------+
| count(nid) |
+------------+
|        114 |
+------------+

114 (!) nodes are not included in the search index, even though the admin interface says "100% indexed"...

applying the patch will remove this problem.

I haven't tested Postgresql, but according to the manual
GREATEST(TRUE, NULL) will return TRUE, so likely no problem here.

douggreen’s picture

Can't we use COALESCE(col, 0) rather than IF(col IS NULL, 0, col)?

arjenk’s picture

Since the solution of a (similar) tracker problem with the comment join also uses coalesce(), i would say yes. http://drupal.org/node/87590

douggreen’s picture

FileSize
2.62 KB

re-rolled with COALESCE instead of IF.

Gábor Hojtsy’s picture

Shouldn't both of the places this patch affects use COALESCE?

m3avrck’s picture

I see... "GREATEST(COALESCE(" in both queries...

Gábor Hojtsy’s picture

Yes, there is a COALESCE at the beginning of both queries, but the change is not there... This is added to the second query, (the first COALESCE was already there): IF(c.last_comment_timestamp IS NULL, 0, c.last_comment_timestamp) which could be COALESCEd, as suggested before.

Dries’s picture

I want to postpone this patch until the merging of node and node_comment_statistics has taken place. Thanks.

nedjo’s picture

The issue Dries referred to, http://drupal.org/node/148849, looks to be stalled. Likely we should apply this patch now (when it has been tested and marked RTBC) in case the table merging doesn't happen for D6. In any case, the change is minor and won't significantly increase the work in http://drupal.org/node/148849.

catch’s picture

Status: Needs review » Needs work

Per Gabor's comments marking to code needs work.

m3avrck’s picture

For when the patch is rerolled, this should likely go back into 5 as a bug fix, correct?

What is the REAMDE that goes along with this... "reindex" your site to catch the missing nodes?

catch’s picture

Yes, it should definitely be backported - fixed my search index which had been broken for months. And a re-index is required - if people have it as bad as I did, there's not much index to re-index in the first place, I got about 300 out of 11,000 nodes for a long time.

arjenk’s picture

Status: Needs work » Needs review
FileSize
2.66 KB

rerolled patch with all IF() replaced by COALESCE().

douggreen’s picture

This may be obsolete by the end of the day. See #146666.

catch’s picture

Doug, surely the wrong nid?
http://drupal.org/node/146666

douggreen’s picture

Yep, sorry. 146466.

catch’s picture

Priority: Critical » Normal

Assuming http://drupal.org/node/146466 gets in, there's no reason at all for this to be marked as critical.

douggreen’s picture

Status: Needs review » Fixed

The above patch (146466) did get in, this patch no longer applies, and I think that the problem is now fixed. The solution in the other issue was to add a search_dataset.reindex column that gets explicitly set when the node needs to be reindex. This may end up with it's own set of issues, but for now, I think the problem has been addressed.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

DamienMcKenna’s picture

Version: 6.x-dev » 5.x-dev
Status: Closed (fixed) » Active

Am currently experiencing this problem on a Drupal 5.10 install with 57,000 nodes. I ran the SQL query above and I get: 55064 - obviously the problem still exists. I've reindexed several times to no effect, it's set to do 100 at a time but each time I run cron it makes these massive multi-thousand node leaps. I started a thread (#300420) before being referred to this issue. Any suggestions would be appreciated. I'm going to test out some of the patches, will post an update if anything solves the problem.

douggreen’s picture

Status: Active » Closed (fixed)

This was fixed in 6.x and might get ported back to 5.x, see #146466: D6 search refactoring (backport to 5).

DamienMcKenna’s picture

FYI the node-search-coalesce patch works great, and I wrote a little module to churn through the reindexing (it uses JS to reload the page after each block of 100), so it's working well. Thanks for the fixes, everyone.

OwnSourcing’s picture

Status: Closed (fixed) » Active

I'm having search indexing problems with 2000 nodes. Same problem on 5.7 and 5.10. DamienMcKenna, can you post your module (and/or provide link here)?

DamienMcKenna’s picture

wbmangy: FYI I've just successfully gotten the D6 engine (#146466: D6 search refactoring (backport to 5)) to completely reindex 100% of the content so definitely recommend doing that.

After fiddling with the module I ended up backtracking to just the following:

// cron_manual.php, based on:
// Id: cron.php,v 1.36 2006/08/09 07:42:55 dries Exp

/**
 * @file
 * Run a specific module's cron task.
 */

include_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

// make sure that a task was specified
if(isset($_GET['task']) && trim($_GET['task']) != '') {

  $task = trim($_GET['task']);

  // make sure the task exists
  if(module_hook($task, 'cron')) {
    watchdog('manual_cron', "{$task} began");
    module_invoke($task, 'cron');
    watchdog('manual_cron', "{$task} finished");
  }

  // task doesn't exist
  else
    watchdog('manual_cron', "'{$task}' isn't a module or doesn't have a cron function");
}

// no task specified
else
  watchdog('manual_cron', 'No task specified');

Then call it like this: http://mysite.com/cron_manual.php?task=search

I then used a bash loop to run it a few hundred times to reindex our data:

COUNTER=0;while [  $COUNTER -lt 100 ]; do echo $COUNTER; wget --spider "http://mysite/cron_manual.php?task=search"; let COUNTER=COUNTER+1; done

If, after running it X billion times it still reports items it won't index, try the following query to see if there are nodes unindexed:

SELECT * FROM node n LEFT JOIN search_dataset d ON d.type = 'node' AND d.sid = n.nid WHERE n.status = 1 AND (d.sid IS NULL OR d.reindex > 0)

That will be a dump of the nodes that haven't been indexed. The first thing to check for if any nodes are configured to not be indexed via the Settings preference page, e.g. I've set feeds to not be indexed so I'll always have a few that don't get indexed. So, if there are nodes that aren't indexed which you would like to, what I ended up doing was to delete their records in search_dataset, e.g.:

DELETE FROM search_dataset where search_dataset.type = 'node' AND (search_dataset.sid IS NULL OR search_dataset.reindex > 0)

So once after I removed those I was able to have it finish off the reindexing and the site is now fully indexed.

Hope this helps some of you.

Damien

Jaapx’s picture

I also had the problem that the search index was incomplete (less than 10% was indexed of the 10.000 nodes).
Site: druapl 5.10 with the modules 'Faceted search' and 'Field indexer'.

The nodes were imported with the same creation and change date/time settings.

A simple, not so elegant but effective workaround was:
1) changing the creation and change time of the node with the following commands:
update cia_node set changed = (changed+ nid)
update cia_node set created = (created-nid)
2) hit the re-index site button on admin/settings/search.

jgoldfeder’s picture

@DamienMcKenna

very helpful.......thanks!

catch’s picture

Status: Active » Closed (fixed)

This doesn't look like the same bug as the original one this issue was about - so please post a new bug report if you're still having issues.

spjsche’s picture

I was having this same problem, as I do not have comments module enabled. Applying the above change to the node module has done the trick.

Thanks for the informative change.

Stephen

gpk’s picture

Title: search indexing gets stuck: node_update_index() - if comment.module is disabled » search indexing gets stuck: node_index()

Since #146466: D6 search refactoring (backport to 5) never got backported to 5.x, this was never fixed. And remains a problem.

#434900: Search indexing of nodes not getting correct nodes to index if comment module is disabled is essentially a duplicate but has a slightly different patch from those here and might need consideration.

Does anyone know if enabling comment.module, but leaving comments disabled everywhere, works as a workaround?

[edit] Another issue with similar/identical solution: #42277: Incorrect loop logic in node_update_index.
Also: #26025: Search never indexes some nodes.

Also the calculation of search indexing status is slightly off (inconsistent treatment of unpublished nodes - #239196: Indexing status shown on search settings page is incorrect). That one affects 5.x and 6.x (it's now been fixed in 7.x,) but is not directly related to this.

gpk’s picture

Title: search indexing gets stuck: node_index() » search indexing gets stuck: node_update_index() - if comment.module is disabled
Status: Closed (fixed) » Needs review

Amplify title.

cleanthes’s picture

gpk: I can confirm that if the comments module is enabled but comments on a Content Type are disabled, the Content Type is still indexed as you'd expect, so this is a good workaround.

TR’s picture

Title: search indexing gets stuck: node_index() » search indexing gets stuck: node_update_index() - if comment.module is disabled
Status: Needs review » Closed (won't fix)

The original issue was marked as fixed in 6.x by comment #57. This issue is currently open only to deal with the backport to 5.x. 5.x is no longer supported, so I'm marking this as a won't fix.