I've seen some other bug reports about this, but none that quite matched my issue, and I thought it might help track this down since I'm using a fresh install of this module, with no upgrading.

Details:

Drupal 5.3
XML Sitemap 5.x-1.4

No upgrades of Drupal or XML Sitemap (never used GSitemap on this site). The site's been up and running for a few weeks; I just installed XML Sitemap this afternoon.

When I installed XML Sitemap and created my first sitemap, I noticed that hardly any path aliases appear.

I'm including nodes and taxonomy, but not users, in my sitemap.

When I first generated the sitemap, there were

205 lines containing 'html' (so, using the alias I'd assigned)
2452 lines containing '/node' (so, not finding an alias)

Taking a quick look at the database:

node has 2781 rows
url_alias has 2371 rows
xmlsitemap_node has 231 rows

So a large number of url_alias entries exist that are not getting picked up by xmlsitemap.

Reading through other comments, it looks like the problem may be related to joins to the comments table, so I unchecked the 'Count comments in change date and frequency' option and saved the configuration settings, then visited my sitemap again.

This time, there were

1759 lines containing 'html' (so, using the alias I'd assigned)
0 lines containing '/node' (no, not finding an alias)
902 lines that look like this:

http://dev.mysite.com/ 2007-12-19T20:02:07+00:00 monthly 0.4

Does this info help track down the issue?

Thanks!

Kristi

Comments

darren oh’s picture

Category: bug » support
Status: Active » Closed (works as designed)

Loading cron.php multiple times should fix the problem, unless your issue is a duplicate of 198173.

kristi wachter’s picture

Category: support » feature
Status: Closed (works as designed) » Active

Ah. I see.

So, cron's run several times since I posted this last night, so now there are 2781 rows in xmlsitemap_node .

However, I'm still seeing a few dozen lines that just have

http://dev.mysite.com/ 2007-12-19T20:02:07+00:00 monthly 0.4

in them, and that doesn't change if I go back to the config page and select the 'Count comments in change date and frequency' option.

Would it be possible to display a status on the Config page, next to the "Your site map is at http://dev.example.com/sitemap.xml. " link, that says "The site map is not yet ready to be submitted" and changes to "The site map is ready for submission now", so people know when all the data has been added?

Also, could the documentation contain some info about this, so users know that the module needs several cron cycles before the map is ready to submit to search engines?

Since submitting to search engines with the wrong data (whether it's the plain 'http://dev.example.com/' entries with no full page, or the 'node/x' entries) can cause serious problems, it seems like there should be a option to not submit the sitemap to search engines whenever either of these problems appears in the sitemap.

Thanks!

darren oh’s picture

Status: Active » Closed (duplicate)

Work has already been done toward this. Duplicate of issue 198173.