Sitemap is submitted on every cron run with "update when new content" and no new content is added.
expandonline - August 13, 2009 - 09:46
| Project: | XML sitemap |
| Version: | 6.x-1.1 |
| Component: | Code |
| Category: | bug report |
| Priority: | minor |
| Assigned: | earnie |
| Status: | active |
Description
I have a problem with my site, I have the
'Submit the sitemap when content changes' set to 'TRUE'
and
'Frequency of sitemap submission' set to '15 days'
However I notice in the log that the sitemap is being submitted at every cron run even though no content is changed.
With a bit of debugging, I found that (line 28 of xmlsitemap_engines/xmlsitemap_engines.module)
<?php
$content_changed = variable_get('xmlsitemap_sitemap_is_changed', FALSE);
?>is returning TRUE, although looking in the database it appears to be 'b:0' before and after the cron run,
so I assume some other process in the cron run is setting it to TRUE.
I don't fully understand the code in xmlsitemap_node_cron() but I suspect this is causing the problem as it is calling xmlsitemap_flag_sitemap() on condition of a valid database query (maybe intention was 'more than zero results').
Is anyone else seeing this? (I haven't done a clean install to test this yet).
Peter
Peter

#1
#545392: Move "update when new content is added" checkbox into the frequency selection list.
Your sitemap is submitted for each cron run because your content changed.
#2
I can verify that this module sends the sitemap even when no content changes have been made, when that checkbox is checked.
#3
I will give it a test.
#4
This report has caused the attached patch to be submitted.
#5
The real patch.
#6
I'm seeing this too. It's causing some issues because I'm using an outside monitoring service that also runs cron, and that will sometimes send me timeout reports because one of the search engines takes forever to respond.
It's also wasteful to keep submitting every hour when there are no changes (I have a site that changes about once month, my other one maybe once or twice a week). I wonder if Google et al are going to start blacklisting people at some point who keep submitting the same sitemap over and over ...
#7
@eagereyes: Thanks for confirming the bug. Use the recently released 6.x-1.0 version to correct it.
#8
Seems to work. Great, thanks!
#9
I seem to have been a bit too quick there, it's still submitting the sitemap on every cron run. I checked that it's set to "Never" and saved the settings, and it just ran again as part of my hourly cron job. Anything I can do to help diagnose the problem?
#10
"Submit the sitemap when content changes" takes precedence over "Frequency of sitemap submission".
#11
Makes sense, but there are no content changes. Not even new comments. It's still submitting every hour.
#12
What modules do you have active? What aggregation do you do? If a module does a node_save() even if nothing changed then xmlsitemap thinks it did.
#13
I turned off submission on changed content, but it still submits every hour. I have also removed the Aggregator module from the one site where I was using it, but it's still submitting. I don't get it. There is no other module that calls node_save() (except CCK) on one of my sites.
The other one had no aggregator to begin with, but same thing. There are a few modules that call node_save(), like biblio, geshifilter, and scheduler, but none of these should do anything to the content every hour.
#14
I didn't get a list of modules.
#15
Automatically closed -- issue fixed for 2 weeks with no activity.
#16
@earnie: I can verify this behavior. I just went through a D5 to D6 upgrade on a site with only nodewords and xmlsitemap.
Nodewords is @ 6.x-1.2 and I have "Nodewords" and "Basic meta tags" enabled.
I seem to recall xmlsitemap submodules showed up different between 6.x-1.1 and the version that is successful which is 6.x-1.0-beta0. Basically I enable core, engines, and nodes on xmlsitemap.
I thought I had it configured to send sitemaps when content changes, and content rarely changes on this particular site. I was seeing no submission when content changed, and did see submission during cron runs.
Since these search engines get ticked with too frequent of submissions (Yahoo will not even accept notifications from the websites I take care of at present!), I propose an option to "notify when content changes AND the last notification is more than one hour ago" type of a thing. That way if there is need to save a particular page multiple times within a short time period, the search engines would only be notified once.
I have it enabled to log when search engines download the sitemap. I see search engines checking in even when no content has changed, thus no notifications have gone out. Thus the basis for my proposal.
#17
My solution to this problem is to use the supercron module. That lets me exclude the search engine submission job from the regular cron runs, but I can still run it manually when I have added new stuff. More control, fewer timeouts.