The status page indicated that the sitemap was out of date, and since running cron with higher frequency did not improve the situation I went for a manual rebuild.

The rebuild took almost a day and not finishing, then I found that the sequential number of the processed node is higher than the count of all nodes. Something like this:

Remaining 10 of 14.
Now processing node 298652 (142409 of 114079).

That was on beta3, so I installed 6.x.-2.x-dev, ran /update.php, then I even deleted the sitemap via /admin/settings/xmlsitemap and ran the rebuild again. But I am in the same situation. The rebuild goes on forever and the number of the node processed is higher than the total count of nodes.

Ideas about what might be going on here?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

Are the so called "nodes" really links? What xmlsitemap modules do you have installed?

Vacilando’s picture

Version: 6.x-2.x-dev » 6.x-2.0-beta3

Still happening. Specifically in 6.x-2.0-beta3, but that's currently the same as 6.x-2.x-dev.

The following submodules are on:

  • XML sitemap
  • XML sitemap custom
  • XML sitemap engines
  • XML sitemap internationalization
  • XML sitemap menu
  • XML sitemap modal UI
  • XML sitemap node
  • XML sitemap taxonomy
  • XML sitemap user
Vacilando’s picture

Tried again today, let it run the whole day. See the attachments. Note that the status page says that "last attempted generation" was 9 hours ago.. so somewhere the module noticed it was a failed attempt but the batch kept on going beyond the assumed total figure.

giorgio79’s picture

Could they be node revisions?

Anonymous’s picture

I'm going to guess that something, either on the client browser side or on the server side, is now allowing the page to complete. If for instance you have a network lapse it could cause the server to think your client hung up the connection. I don't have a solution or can't think of a way around that scenario. You should look in your server logs to see if you can find anything relevant as well as look in the watchdog log. Good luck.

Vacilando’s picture

Version: 6.x-2.0-beta3 » 6.x-2.x-dev

After a long time, I went on to rebuild sitemaps, using the batch process, on another D6 site. And I've encountered the same problem.

As a test, I disabled several content types, to make the count of nids to process smaller -- and the sitemap finished in a short time just fine.
So it is clear that the batch works fine for relatively small sitemaps and the problem described here happens only for the larger ones.

I again added all content types that need to be included in the sitemap and got into the same situation as before:

Rebuilding Sitemap
83%
Remaining 9 of 13.
Now processing node 520346 (552290 of 80921).

This time though, I let it continue.. actually for days.
The nid keeps changing regularly, and also the count of processed pages. (I do not think the total count changes but I am not sure.)

One thing I've noticed is the nid seems to jump back sometimes. It's like the batch goes back to re-visit the same nodes!
So my question now is... is it possible that the batch process begins from the start if it runs too long? E.g. if it runs longer than the minimum sitemap lifetime (which I've got set at 3 hours), or if some caches are emptied? Any thoughts?

Dave Reid’s picture

Category: bug » support

Until we can get hard proof about what the actual bug is, moving to a support request.

Anonymous’s picture

Status: Active » Postponed (maintainer needs more info)

During the rebuild is the site in maintenance mode or active mode? I would think it would need to be in maintenance mode so that no node can be modified, added or commented on and no other maintenance work should happen either.

Vacilando’s picture

Both sites where this happens are live. I cannot run the rebuild in maintenance mode because the batch process, due to the number of nodes, takes several days to complete.

Yes, nodes are being added (Drupal commenting is off) during the rebuild. Also, of course, the relevant XML sitemap cron jobs run during the process. Do you think that could cause the issue?

Dave Reid’s picture

Yeah if nodes are being added while the batch runs, then that would definitely cause the problem.

Dave Reid’s picture

Status: Postponed (maintainer needs more info) » Fixed

Ok I think that I have fixed this in 7.x-2.x-dev and 6.x-2.x-dev with some extra checking for if the batch operation is actually finished or not.
http://drupalcode.org/project/xmlsitemap.git/commit/c5d6a29
http://drupalcode.org/project/xmlsitemap.git/commit/ba66fc6

Dave Reid’s picture

Category: support » bug

Please test the dev release once it is rebuild (about 12 hours from now).

Dave Reid’s picture

Note that I only think this was a bug with Drupal 6 only as it seems that #600836: Batch API never terminates if you set $context['finished'] > 1 was fixed only in Drupal 7 and above. It's a shame that it was never backported to D6 core.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Vacilando’s picture

Thanks for having worked on this, @Dave Reid. Unfortunately, I cannot confirm whether your fix solves my problem because in the meantime I cleared the table xmlsitemap for all affected sites. Now a rebuild is not necessary and when I run it it finishes quickly and without "imaginary" items (because all the items that are only generated by hook_xmlsitemap_links() are not yet there). I'll re-open this in case the problem reappears later.