Sitemap is submitted on every cron run with "update when new content" and no new content is added.

expandonline - August 13, 2009 - 09:46
Project:XML sitemap
Version:6.x-1.1
Component:Code
Category:bug report
Priority:minor
Assigned:earnie
Status:active
Description

I have a problem with my site, I have the
'Submit the sitemap when content changes' set to 'TRUE'
and
'Frequency of sitemap submission' set to '15 days'

However I notice in the log that the sitemap is being submitted at every cron run even though no content is changed.

With a bit of debugging, I found that (line 28 of xmlsitemap_engines/xmlsitemap_engines.module)

<?php
$content_changed
= variable_get('xmlsitemap_sitemap_is_changed', FALSE);
?>

is returning TRUE, although looking in the database it appears to be 'b:0' before and after the cron run,
so I assume some other process in the cron run is setting it to TRUE.

I don't fully understand the code in xmlsitemap_node_cron() but I suspect this is causing the problem as it is calling xmlsitemap_flag_sitemap() on condition of a valid database query (maybe intention was 'more than zero results').

Is anyone else seeing this? (I haven't done a clean install to test this yet).

Peter

Peter

#1

earnie - August 13, 2009 - 12:04
Category:bug report» support request
Status:active» duplicate

#545392: Move "update when new content is added" checkbox into the frequency selection list.

Your sitemap is submitted for each cron run because your content changed.

#2

dirksonii - August 19, 2009 - 21:34
Status:duplicate» active

I can verify that this module sends the sitemap even when no content changes have been made, when that checkbox is checked.

#3

earnie - August 17, 2009 - 13:19
Title:sitemap is submitted on every cron run» sitemap is submitted on every cron run with "update when new content" and no new content is added.
Category:support request» bug report
Priority:normal» minor
Assigned to:Anonymous» earnie

I will give it a test.

#4

earnie - August 21, 2009 - 14:27
Title:sitemap is submitted on every cron run with "update when new content" and no new content is added.» Sitemap is submitted on every cron run with "update when new content" and no new content is added.
Component:xmlsitemap_engines.module» Code
Status:active» fixed

This report has caused the attached patch to be submitted.

AttachmentSize
issue-548064.patch 0 bytes

#5

earnie - August 21, 2009 - 14:28

The real patch.

AttachmentSize
issue-548064.patch 3.21 KB

#6

eagereyes - August 24, 2009 - 11:53

I'm seeing this too. It's causing some issues because I'm using an outside monitoring service that also runs cron, and that will sometimes send me timeout reports because one of the search engines takes forever to respond.

It's also wasteful to keep submitting every hour when there are no changes (I have a site that changes about once month, my other one maybe once or twice a week). I wonder if Google et al are going to start blacklisting people at some point who keep submitting the same sitemap over and over ...

#7

earnie - August 24, 2009 - 15:13

@eagereyes: Thanks for confirming the bug. Use the recently released 6.x-1.0 version to correct it.

#8

eagereyes - August 26, 2009 - 01:49

Seems to work. Great, thanks!

#9

eagereyes - August 26, 2009 - 12:30
Version:6.x-1.0-rc2» 6.x-1.1
Status:fixed» active

I seem to have been a bit too quick there, it's still submitting the sitemap on every cron run. I checked that it's set to "Never" and saved the settings, and it just ran again as part of my hourly cron job. Anything I can do to help diagnose the problem?

#10

earnie - August 26, 2009 - 13:13
Status:active» fixed

"Submit the sitemap when content changes" takes precedence over "Frequency of sitemap submission".

#11

eagereyes - August 26, 2009 - 22:47

Makes sense, but there are no content changes. Not even new comments. It's still submitting every hour.

#12

earnie - August 27, 2009 - 13:08
Status:fixed» postponed (maintainer needs more info)

What modules do you have active? What aggregation do you do? If a module does a node_save() even if nothing changed then xmlsitemap thinks it did.

#13

eagereyes - August 28, 2009 - 00:33

I turned off submission on changed content, but it still submits every hour. I have also removed the Aggregator module from the one site where I was using it, but it's still submitting. I don't get it. There is no other module that calls node_save() (except CCK) on one of my sites.

The other one had no aggregator to begin with, but same thing. There are a few modules that call node_save(), like biblio, geshifilter, and scheduler, but none of these should do anything to the content every hour.

#14

earnie - October 5, 2009 - 14:39
Status:postponed (maintainer needs more info)» fixed

I didn't get a list of modules.

#15

System Message - October 19, 2009 - 14:40
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

#16

mdlueck - October 21, 2009 - 14:12
Status:closed» active

@earnie: I can verify this behavior. I just went through a D5 to D6 upgrade on a site with only nodewords and xmlsitemap.

Nodewords is @ 6.x-1.2 and I have "Nodewords" and "Basic meta tags" enabled.

I seem to recall xmlsitemap submodules showed up different between 6.x-1.1 and the version that is successful which is 6.x-1.0-beta0. Basically I enable core, engines, and nodes on xmlsitemap.

I thought I had it configured to send sitemaps when content changes, and content rarely changes on this particular site. I was seeing no submission when content changed, and did see submission during cron runs.

Since these search engines get ticked with too frequent of submissions (Yahoo will not even accept notifications from the websites I take care of at present!), I propose an option to "notify when content changes AND the last notification is more than one hour ago" type of a thing. That way if there is need to save a particular page multiple times within a short time period, the search engines would only be notified once.

I have it enabled to log when search engines download the sitemap. I see search engines checking in even when no content has changed, thus no notifications have gone out. Thus the basis for my proposal.

#17

eagereyes - December 5, 2009 - 03:55

My solution to this problem is to use the supercron module. That lets me exclude the search engine submission job from the regular cron runs, but I can still run it manually when I have added new stuff. More control, fewer timeouts.

 
 

Drupal is a registered trademark of Dries Buytaert.