I have drupal 6.14 and xmlsitemap 6.x-2.x-dev
I have set up a cron job to run cron.php every day.
Only problem is that after the cron is run, the sitemap.xml file that is generated shows incorrect links.

Instead of the links being
http://example.com/sitedir/path-to-node,
the links generated in the xml file read
http:///path-to-node.

The base domain and drupal install dir, don't get listed in the link.

If I rebuild links, the sitemap.xml that gets generated has all the correct links.
But after the cron runs the next time after this, the links devolve into the form mentioned above i.e. without the base dir and the install dir.
Help.

Comments

v-a-1’s picture

Update:
I checked the log entries from drupal reports and this is what the report states:

Details
Type xmlsitemap
Date Tuesday, November 16, 2010 - 14:10
User Anonymous
Location http:///
Referrer
Message Finished XML sitemap generation in 0 sec. Memory usage: 12.25 MB.
Severity notice
Hostname
Operations

Even here the base directory and domain are missing.
I checked in the settings of xmlsitemap and the base path is correctly shown in the format example.com/installdir/

dave reid’s picture

Status: Active » Postponed (maintainer needs more info)

By chance did you happen to submit the settings page yet? Try that and then regenerate the sitemap files and report back.

The settings form shows the default values of the variables until you submit - then it shows you the actual values you have saved.

v-a-1’s picture

Status: Fixed » Postponed (maintainer needs more info)

Thanks!! that worked.

I submitted the settings page as it was.
I added a new node. And ran the cron. Now the log shows:

Type xmlsitemap
Date Wednesday, November 17, 2010 - 06:06
User admin
Location http://xxxxx.xxxx/xxxxxx/cron.php
Referrer
Message Finished XML sitemap generation in 0 sec. Memory usage: 12.25 MB.
Severity notice
Hostname xxx.xxx.xxx.xxx
Operations

v-a-1’s picture

Status: Postponed (maintainer needs more info) » Fixed
laurie112’s picture

Version: 6.x-2.x-dev » 6.x-2.0-beta1
Assigned: Unassigned » laurie112
Category: support » feature
Status: Postponed (maintainer needs more info) » Needs review

Hi

I to recently expereinced this issue...

In my case it related to moving the site in question from a development server to it's live server...

This involved using a mysqldump of the sites database and replacing all instances of the development server url with the live server url prior to loading the database onto the live server... This created the issue of xmlsitemap "forgetting" it's base url...

As suggested by Dave resubmitting the settings page did resolve the issue... This was only because it brought my attention to the base URL setting that was hidden in the collapassed setting area "Advance Settings"...

I'd not noticed this section before and I can only assume this crucial peice of information was loaded into base URL field automattically on installation... This seems to me incorrect behaviour.

Would it not be correct to have the module use the sites $_SERVER['SERVER_NAME'] if the base URL is left empty and only use "Base URL" if it has a value... This would give the option of overriding the default via "Advanced Settings" which does seem correct...

Also expanding the "Advanced Setting" section by default would make it clearer... Or even moving this setting out of "Advanced Setting" as clearly it is not an "Advanced Setting", more a fundemental setting...

Great module, love the work and keep it up...

v-a-1’s picture

Version: 6.x-2.0-beta1 » 6.x-2.x-dev
j_byrd’s picture

After a cron run, Google Webmaster had the following message for a site link:

Errors 4
Invalid URL
This is not a valid URL. Please correct it and resubmit.
URL:
http:///home/gypsy/www/www/contact
Parent tag: url
Tag: loc
Problem detected on: Jan 20, 2011

The correct url was: http://gypsycampcreations.com/contact
The URL they got was from the host's file tree.
Is there a way to fix this other than generating the sitemap myself?

asmaka’s picture

Hello,

I have sitemap.xml file generated by the sitemap module.
When I try to validate the xml file using schemaValidate in php, I see a lot of errors.
xsd file that I am using to validate the sitemap.xml file is http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd

For your reference, below is the code snippet
$xmlDom = new DomDocument();
if (!$xmlDom->load(//path to xml file))
{
$errors = libxml_get_errors();
libxml_clear_errors();
}
else{
if (!$xmlDom->schemaValidate('http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd'))
{
$Errors = libxml_get_errors();
libxml_clear_errors();
}}
Let me know, If there is something wrong with the way that I am trying to validate or what ?
I using drupal 6.x

http://sitemaps.org/protocol.php#validating - says the xml file should be having predefined header !
sitemap.xml file I have has the following header,
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.mywebsite.com/sitemap.xsl"?>

Thanks.

racheljohnson188’s picture

Hello All,

I see the same issue as discussed in #8.
Has anybody any comments/suggestions on this ? Is there a workaround for xml validation using php ?
Can anyone from xmlsitemap team clarify this issue ?

Regards

dave reid’s picture

If you enable the 'developer mode' in the XML sitemap advanced settings, it should add the required XML for validation to the generated sitemap XML files. You'll need to manually re-generate the output files though after making the change.

dave reid’s picture

Status: Needs review » Active

No patches to review in this issue, moving to active.

AceMesh’s picture

How do I get to the Advanced Setting of xmlsitemap ?

spidersilk’s picture

I just encountered the same problem described in #1-7 above (#8-10 seem to be describing a completely different problem and should probably have been a separate issue). But the module in this case was not newly installed, nor had the site recently been moved. The settings page had been submitted, probably multiple times as various values were adjusted.

The sitemap seemed fine yesterday, and the person who's assisting with SEO on this site submitted it to the engines, but when she checked Google Webmaster Tools this morning, it was full of errors - it said that every URL in the sitemap was missing the domain name, and just had http:/// + the path. But then when she checked the actual sitemap, it seemed fine, though she wasn't sure whether that meant it always had been, or whether it had been regenerated (via cron I assume) between when Google checked it and when she did.

So it looks like at some point during a cron run the domain name was stripped out of all the URLs in the sitemap, and then during another it was put back in? I don't really understand how that would have happened - she and I are the only ones working on the site at present and no changes were made to it during that time. I've checked the module settings page and it's showing the default base URL correctly.

The site map now seems OK, but Google is showing us as having 0 urls in the web index, presumably because of the invalid URLs they encountered before. And it's kind of worrying that the sitemap file could spontaneously become corrupted that way, even if it did appear to fix itself later.

Any ideas?

spidersilk’s picture

Category: feature » bug
Priority: Normal » Major

Just a follow-up to my comment above - after posting that, I tried re-submitting the settings page (without actually making any changes), and it seemed to fix the problem at the time.

But now, a couple of weeks later, and after submitting the sitemap to all the search engines, it's just happened again! Our SEO person went into Google's Webmaster Tools and once again all the URLs were broken - the domain name was missing again, as in comments #1-7 and #13 above.

I've resubmitted the settings page again (after checking to make sure all settings were correct), and hopefully that will fix it temporarily, but I'm very concerned that this is just going to keep happening. For some reason, the domain seems to keep disappearing from the URLs in the sitemap, even though it's entered correctly in the Default base URL field on the settings page.

This is a really nasty bug, that borders on making the module unusable. The main reason for using a module like this is to help search engines index your site, but if the sitemap keeps getting spontaneously corrupted so that all the URLs in it are broken, then it's useless - or worse than useless, since Google may actually drop or at least penalize sites whose sitemaps are full of broken URLs. As it is, with the site I'm experiencing this with, we had an SEO specialist do a ton of work on the site, and now it may be all for nothing, since this module screwing up again means that Google may now be convinced that none of the pages on our site actually exist.

It's obvious from the posts here that a number of people are experiencing this problem - something really needs to be done about it! We've disabled search engine submission of the sitemap for now until we can find a fix for this, but obviously that's not a good long-term solution.

Anonymous’s picture

Category: bug » feature
Priority: Major » Normal

See #5 for a possible resolution and the feature being requested.

Anonymous’s picture

spidersilk’s picture

The situation described in #5 is not the same as what I've been experiencing. In the case of this site, it did not happen after any sort of transfer or migration, in either case - both times, it occurred spontaneously, when nothing more was going on on the site in question than ordinary content updates.

And the Base URL value in the settings page was not empty, in either case - the URL was still showing up fine there, just not generating into the actual site map file. So even when the module has the Base URL value, it seems to occasionally strip it out of all the URLs in the sitemap for no apparent reason.

And the 404 problem in the other issue is something totally different - the sitemap itself is not missing, it just has the URLs in it missing their domain name.

Anonymous’s picture

You missed:

This was only because it brought my attention to the base URL setting that was hidden in the collapassed setting area "Advance Settings"...

Which is what I had hoped you would notice.

spidersilk’s picture

No, I didn't miss that. I specifically said that the Base URL setting was NOT empty. It contained the appropriate Base URL just as it should. I wouldn't know that if I hadn't clicked to expand it and checked to see that the field was filled correctly.

The problem is that the base URL is still being randomly stripped out of the URLs in the sitemap, despite the fact that it DOES appear correctly in the field on the settings page.

Anonymous’s picture

Just for grins execute the save function in that form. I'm guessing the value you see in the admin screen is calculated and a different value for the calculated value occurs during the cron run for some reason. If the value exists in the variables table then it should always be the same since the value is no longer calculated.

monil-dupe’s picture

If you mean the value of links on sitemap, it doesn't effect the indexing process of your links by search engines or ranking them at all. At least Google doesn't care them for faster indexing or higher ranking as i know.

nkirlew@bigpond.net.au’s picture

I am fairly sure this is the same problem.
I am using an AWS server which gets a fairly long AWS domain name.
Clean build over the last two months or so. Drupal 7.
I have my own domain which is working fine.
Yesterday install the dev version of sitemap and ran the site map after doing the advanced setting mentioned above all good.
One day later and the site map is pointing to the AWS name for the server not the correct name.
Rebuild the sitemap and all is good again.

Anonymous’s picture

@#22: This isn't the same issue and frankly I'm leaning toward marking this request as won't fix. For D7 you might try using the sites.php file to map the directory but make sure your settings.php file makes use of the $base_url variable. I have no clue why you would be having an issue showing the link to the older server unless the variable in the advanced settings is wrong.

subir_ghosh’s picture

Issue summary: View changes

I have faced this issue a number of times. Every time I forget what is it that I did to fix it the previous time ;)

Well, I just ensure that the $base_url=www.myurl.com is there in the settings.php file.

gobinathm’s picture

Status: Active » Closed (outdated)

As D6 is in EOL, I believe there won't be any support / fix for a problem identified in 6x going forward. Given the fact that D6 is already EOL. Hence i guess this issue can be closed.

Changing the status, if incorrect pls revert the status.