Sitemap.xml is working wonderful. But upon analyzing the sitemap.xml by google webmaster tools, I get the following errors from Google for each url in the sitemap:
"Invalid date - An invalid date was found. Please fix the date or formatting before resubmitting. - Parent tag: url"
My url tags in the sitemap look like this:
<url>
<loc>http://info.ulrich-schrader.de/node/603</loc>
<lastmod>2009-04-15T13:23:26+0000</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
I did look up the protocol 0.9 used by Google for the sitemap format on sitemap.org. They refer to the datetime format defined by the W3C at http://www.w3.org/TR/NOTE-datetime . The error seems to be caused by a different format for the coding of the time zone. In the sitemap.xml the format +0000 is used. The w3c format looks like:
TZD = time zone designator (Z or +hh:mm or -hh:mm)
It looks to me like there should be a ':' between the hours and minutes indicated in the time zone.
Thanks for an otherwise great module
Ulrich
Comments
Comment #1
avpadernoThis is caused by the use of the constant
DATE_W3C; I checked its definition, and I noticed that in PHP 5.x it is defined to be equal to'Y-m-d\TH:i:sP'. Naturally, that is not desired value to use.I will fix it in the development snapshot because it needs to be fixed first in the development snapshot, and then the code can be used to create another official version.
Thanks for pointing out this problem.
Comment #2
dave reidHmm... That should work just fine actually. I'm also using
define('DATE_W3C', 'Y-m-d\TH:i:sP');in 6.x-2.x and my sitemap.xml is generating date formats like 2009-02-24T03:59:02+00:00. This needs some more investigation.Comment #3
dave reid@Kiam: Your current define is using
'Y-m-d\TH:i:s+00:00'. It should probably be'Y-m-d\TH:i:sP'.Comment #4
dave reidOk, I found what the problem is: According to PHP's date(), the 'P' format in date() was added in PHP 5.1.3, so we can't use that. Grr.
Comment #5
dave reidAnd there also appears to be a bug in PHP 5.1.1 and PHP 5.1.2 using the DATE_W3C constant:
http://bugs.php.net/bug.php?id=36599
This sounds exactly like the original problem. ulrich, which version of PHP are you using?
Comment #6
avpadernoThe code has been fixed in CVS.
The same fix must be done for the 6.x-2 branch, which uses the following code to set the date in the sitemap:
Comment #7
avpadernoThat is at least compatible with the definition of time zone designator. I think that Google doesn't make any differences between +00:00, and Z; in the case it would complain about the +00:00, then I will change the definition of the
XMLSITEMAP_DATE_W3Cconstant to'Y-m-d\TH:i:s\Z'.Comment #8
avpadernoTo notice that, as the date is generated from
gmdate(), the time refers to the GMT time zone; there is no need to generate the last part dynamically, as we already know which time zone is.Comment #9
avpadernoI changed the definition of the custom constant used by the module in the 6.x-1 branch, and now it uses the Z, which has a slightly different meaning, and it is shorter.
Thanks again to ulrich for the report.
Comment #10
dave reid@Kiam/8: That's a good point. I'm going to change the 6.x-2.x define to
define('DATE_W3C', 'Y-m-d\TH:i:s+00:00');. I'm not inclined however, to work around a *very* buggy PHP 5.1. That's a big reason why Drupal 7 is requiring PHP 5.2 instead of just PHP 5.0 or PHP 5.1.Comment #11
dave reid@Kiam/9: The timestamps with /Z will now look like:
2009-05-04T23:56:17Z. Not correct at all. We should just use'Y-m-d\TH:i:s+00:00'which is the format required by the protocol.Comment #12
avpadernoSee the examples in http://www.w3.org/TR/NOTE-datetime; they clear report the following ones:
The difference is slight, but the Z means that the time is UTC, while any other values would mean that the time is a local time that has a difference of hh hours, and mm minutes with UTC.
It's all to see what Google accepts; if it accepts also the Z, then there is no reason to not use it.
Comment #13
dave reidAh. Got it! Let me know if Google verifies it!
Comment #14
avpadernoI will surely do it. I think to add a secondary sitemap to my site written by hand, and see if Drupal says anything about the date format.
Comment #15
avpadernoGoogle seems to accept the date format using the Z for the time zone descriptor; they accessed the additional sitemap I created, and reported the correct number of links present, but not any error about the incorrect (or unknown) date format.
Comment #16
ulrich commentedI use PHP 5.1.2 . Thanks for all Your heIp!
Ulrich
Comment #17
avpadernoThe simpler way to resolve the problem is not use the
DATE_W3Cconstant.Comment #18
dave reidThe constant is in PHP for this very purpose. It's not my fault that it's a bug that has been fixed over two years ago. As such, I am marking as fixed. I have also documented this in the 'Known Issues' section of the README.txt file in 6.x-2.x-dev. See http://drupal.org/cvs?commit=215670.
Comment #19
avpadernoThat is true, but it's also true that the constant is not of any use in versions of PHP before 5.1.3.
If the code must be compatible with PHP 4, then it will not use that constant, or create a constant with the same value; after all, Drupal 6 is compatible with PHP 4.
Comment #20
Anonymous (not verified) commentedThat doesn't mean that a module has to be. In fact modules are encouraged to create a PHP 5 requirement.
Comment #21
avpadernoIf all modules would require PHP 5, why would stay compatible with PHP 4?