Sitemap.xml is working wonderful. But upon analyzing the sitemap.xml by google webmaster tools, I get the following errors from Google for each url in the sitemap:
"Invalid date - An invalid date was found. Please fix the date or formatting before resubmitting. - Parent tag: url"

My url tags in the sitemap look like this:

<url>
  <loc>http://info.ulrich-schrader.de/node/603</loc>
  <lastmod>2009-04-15T13:23:26+0000</lastmod>
  <changefreq>monthly</changefreq> 
  <priority>0.8</priority>
</url>

I did look up the protocol 0.9 used by Google for the sitemap format on sitemap.org. They refer to the datetime format defined by the W3C at http://www.w3.org/TR/NOTE-datetime . The error seems to be caused by a different format for the coding of the time zone. In the sitemap.xml the format +0000 is used. The w3c format looks like:

TZD = time zone designator (Z or +hh:mm or -hh:mm)

It looks to me like there should be a ':' between the hours and minutes indicated in the time zone.

Thanks for an otherwise great module
Ulrich

Comments

avpaderno’s picture

Version: 6.x-1.0-beta3 » 6.x-1.x-dev

This is caused by the use of the constant DATE_W3C; I checked its definition, and I noticed that in PHP 5.x it is defined to be equal to 'Y-m-d\TH:i:sP'. Naturally, that is not desired value to use.

I will fix it in the development snapshot because it needs to be fixed first in the development snapshot, and then the code can be used to create another official version.

Thanks for pointing out this problem.

dave reid’s picture

Hmm... That should work just fine actually. I'm also using define('DATE_W3C', 'Y-m-d\TH:i:sP'); in 6.x-2.x and my sitemap.xml is generating date formats like 2009-02-24T03:59:02+00:00. This needs some more investigation.

dave reid’s picture

@Kiam: Your current define is using 'Y-m-d\TH:i:s+00:00'. It should probably be 'Y-m-d\TH:i:sP'.

dave reid’s picture

Ok, I found what the problem is: According to PHP's date(), the 'P' format in date() was added in PHP 5.1.3, so we can't use that. Grr.

dave reid’s picture

And there also appears to be a bug in PHP 5.1.1 and PHP 5.1.2 using the DATE_W3C constant:
http://bugs.php.net/bug.php?id=36599

This sounds exactly like the original problem. ulrich, which version of PHP are you using?

avpaderno’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev

The code has been fixed in CVS.

The same fix must be done for the 6.x-2 branch, which uses the following code to set the date in the sitemap:

  $output .= '<lastmod>'. gmdate(DATE_W3C, $link['lastmod']) .'</lastmod>';
avpaderno’s picture

Your current define is using 'Y-m-d\TH:i:s+00:00'. It should probably be 'Y-m-d\TH:i:sP'.

That is at least compatible with the definition of time zone designator. I think that Google doesn't make any differences between +00:00, and Z; in the case it would complain about the +00:00, then I will change the definition of the XMLSITEMAP_DATE_W3C constant to 'Y-m-d\TH:i:s\Z'.

avpaderno’s picture

To notice that, as the date is generated from gmdate(), the time refers to the GMT time zone; there is no need to generate the last part dynamically, as we already know which time zone is.

avpaderno’s picture

I changed the definition of the custom constant used by the module in the 6.x-1 branch, and now it uses the Z, which has a slightly different meaning, and it is shorter.

Thanks again to ulrich for the report.

dave reid’s picture

@Kiam/8: That's a good point. I'm going to change the 6.x-2.x define to define('DATE_W3C', 'Y-m-d\TH:i:s+00:00');. I'm not inclined however, to work around a *very* buggy PHP 5.1. That's a big reason why Drupal 7 is requiring PHP 5.2 instead of just PHP 5.0 or PHP 5.1.

dave reid’s picture

@Kiam/9: The timestamps with /Z will now look like: 2009-05-04T23:56:17Z. Not correct at all. We should just use 'Y-m-d\TH:i:s+00:00' which is the format required by the protocol.

avpaderno’s picture

See the examples in http://www.w3.org/TR/NOTE-datetime; they clear report the following ones:

1994-11-05T08:15:30-05:00 corresponds to November 5, 1994, 8:15:30 am, US Eastern Standard Time.
1994-11-05T13:15:30Z corresponds to the same instant.

The difference is slight, but the Z means that the time is UTC, while any other values would mean that the time is a local time that has a difference of hh hours, and mm minutes with UTC.

It's all to see what Google accepts; if it accepts also the Z, then there is no reason to not use it.

dave reid’s picture

Ah. Got it! Let me know if Google verifies it!

avpaderno’s picture

I will surely do it. I think to add a secondary sitemap to my site written by hand, and see if Drupal says anything about the date format.

avpaderno’s picture

Google seems to accept the date format using the Z for the time zone descriptor; they accessed the additional sitemap I created, and reported the correct number of links present, but not any error about the incorrect (or unknown) date format.

ulrich’s picture

I use PHP 5.1.2 . Thanks for all Your heIp!
Ulrich

avpaderno’s picture

I'm not inclined however, to work around a *very* buggy PHP 5.1.

The simpler way to resolve the problem is not use the DATE_W3C constant.

dave reid’s picture

Status: Active » Fixed

The constant is in PHP for this very purpose. It's not my fault that it's a bug that has been fixed over two years ago. As such, I am marking as fixed. I have also documented this in the 'Known Issues' section of the README.txt file in 6.x-2.x-dev. See http://drupal.org/cvs?commit=215670.

avpaderno’s picture

The constant is in PHP for this very purpose.

That is true, but it's also true that the constant is not of any use in versions of PHP before 5.1.3.
If the code must be compatible with PHP 4, then it will not use that constant, or create a constant with the same value; after all, Drupal 6 is compatible with PHP 4.

Anonymous’s picture

after all, Drupal 6 is compatible with PHP 4.

That doesn't mean that a module has to be. In fact modules are encouraged to create a PHP 5 requirement.

avpaderno’s picture

If all modules would require PHP 5, why would stay compatible with PHP 4?

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.