If i have gsitemap and i18n module installed, the sitemap-document at the root level only contains one item, a reference to another url

The url submitted in google (without i18n)
http://www.domainname.com/sitemap.xml (is OK)
But when i18n is installed sitemap is stored at
http://www.domainname.com/nl/sitemap.xml
or

Example
All three urls:
http://www.fire-proof.be/sitemap.xml
http://www.fire-proof.be/nl/sitemap.xml (as mentioned in admin / settings / xml sitemap) (in the nl section)
http://www.fire-proof.be/fr/sitemap.xml (as mentioned in admin / settings / xml sitemap) (in the fr section)
reffer to
http://www.fire-proof.be/fr/sitemap0.xml (which contains the correct sitemap)

BUT
http://www.fire-proof.be/fr/sitemap0.xml (fr)
http://www.fire-proof.be/nl/sitemap0.xml (nl)
contains the same sitemap (and only de fr links to the documents)

When submitting a sitemap of the following to google:
http://www.fire-proof.be/nl/sitemap.xml
http://www.fire-proof.be/fr/sitemap.xml
http://www.fire-proof.be/fr/sitemap0.xml
the google error is:

The Sitemap must be located at http://www.fire-proof.be/. To add a Sitemap at http://www.fire-proof.be/nl/, first add that site to your account and then click the Add a Sitemap link beside it.

I added http://www.fire-proof.be/ but not http://www.fire-proof.be/nl/ (it is not my intention to do so, because it is the same site)

Any feedback on this.

CommentFileSizeAuthor
#14 gsitemap-149693.patch1.94 KBdarren oh
#8 gsitemap_10.patch997 bytesvoidberg

Comments

melon’s picture

Priority: Normal » Critical
Status: Needs work » Active

subscribing

rondev’s picture

What I noticed is:
If you modified xmlsitemap modules parameters when fr language is active, the sitemap is linked to the /fr/ site. If you do when nl active, the sitemap refer to /nl/.
There is only one sitemap active. I don't know how to do to have other sitemaps active.
I added those lines in my .htaccess:

#Modifié pour le gsitemap
RewriteRule ^frgsitemap fr/sitemap0.xml [L,QSA]
RewriteRule ^engsitemap en/sitemap0.xml [L,QSA]
RewriteRule ^esgsitemap es/sitemap0.xml [L,QSA]
#Modifié pour le gsitemap

And tell google to go to domain.com/frgsitemap ; domain.com/engsitemap ; domain.com/esgsitemap.
But is not needed as there is only one sitemap active. All those sitemaps refer to the same thing.
Does anyone have a solution?

Ronan

cburschka’s picture

Hardcoded Rewrite? The path module does an excellent job at providing alias tables.

I'd suggest auto-aliasing [lang]/sitemap.xml to lang-sitemap.xml rather than langsitemap.xml, simply for readability.

The problem that remains is figuring out where to hook into the i18n module so that the alias can be created when a locale is added.

Perhaps a better way to do this would be to add an option to auto-alias all existing locale sitemaps in the sitemap settings page. This would have to be revisited whenever new locales are added, but you'll probably go to that page anyway when submitting a new sitemap to Google.

rondev’s picture

Thank you for the tip. I didn't thought about that as I use less modules for better maintenance. It would delay sometimes the upgrade if a module is not supported anymore.
For the following of your comments, I can't say more as my level is more the same as a simple end user than a developer. Help is much appreciate. It would be very nice if xmlsitemap and i18n work well together. I hope it will happen before Drupal 6 (supporting multilingual system I eared) were released.
Ronan

rondev’s picture

Another thing I realized is that if changes are realized in xmlsitemap settings, the sitemap contains only the pages of the language (83 references in my case). If I do a modification in my module activation list, the sitemap contains pages of all languages (113 pages in my case). That is strange for me. Now in xmlsitemap, pages of every languages are referenced in one xml page. fr/sitemap0.xml and en/sitemap0.xml gives the same result. I don't know if it is important for Google to have a better PR.
Ronan

PixelClever’s picture

I have been dealing with this exact same problem. Is there any news as to how to patch it? I am tinkering with the code myself, but I am no expert at this level of programming.

PixelClever’s picture

Title: gsitemap and i18n conflict? » 100$ to who ever can fix this

I am willing to offer a bounty of $100 to anyone who can fix this module so that internationalization and google sitemap generator can work together. I know it's not much, but that's what I can afford, and I think a fix will benefit the project as a whole. Maybe if someone else is willing to add to it we can afford to pay more.

voidberg’s picture

Title: 100$ to who ever can fix this » I think I fixed it
Status: Active » Needs review
StatusFileSize
new997 bytes

The problem is that gsitemap uses the url function to generate links to chunks. i18n adds the current language code to all links generated by url and this is where the problem is.

The solution is the following: remove the call to url from the generation of the links to chunks and generate the url with another method.

The patch was created under OS X so beware of the line endings. However it's small enough that it can be applied manually.

One could take this further and use the same code in the chunk generation code to generate the links in the sitemap without the 'en/' or 'fr/' since i18n is smart enough to prepend the language code.

PixelClever’s picture

Title: I think I fixed it » Good enough for me

It Works. The url that is listed for pinging google still shows the "en" in the url, but the that is easily changed by typing in the correct url. That's just a detail, but it would be good to have that fixed for the long term.
What you did justifies the bounty... so should we set something up through paypal?

voidberg’s picture

The easiest way to remove the lang addition to the url's generated in the chunks is to remove the following code from _gsitemap_get_path_alias which is located at the end of gsitemap.module:

  if (function_exists('custom_url_rewrite')) {
    $result = custom_url_rewrite('alias', $result, $path);
  }

Warning: this also breakes any other custom url rewrites that other modules could supply.

vinayakaya, I wrote you an email regarding the bounty.

darren oh’s picture

Title: Good enough for me » gsitemap and i18n conflict?
SubZero5’s picture

the patches will not fix anything as i see...
and there might be a problem here:

$script = (strpos($_SERVER['SERVER_SOFTWARE'], 'Apache') === FALSE) ? 'index.php' : '';

This code might blow things off. Many hosts just seem to hide the Apache tag. Dev's must lean on the Clean URLs setting because "This option makes Drupal emit 'clean' URLs (i.e. without ?q= in the URL.)"..

cburschka’s picture

clean_url doesn't enter into it - "index.php" is here used for the purpose of generating an absolute link on non-Apache webservers. Drupal's url() function does the same thing, by the way:

  if (!isset($script)) {
    // On some web servers, such as IIS, we can't omit "index.php". So, we
    // generate "index.php?q=foo" instead of "?q=foo" on anything that is not
    // Apache.
    $script = (strpos($_SERVER['SERVER_SOFTWARE'], 'Apache') === FALSE) ? 'index.php' : '';
  }

I'm not sure how the circumvention of Drupal's url() function solves the original problem (since that is all the patch seems to do), but this line is used in the heart of Drupal's core.

darren oh’s picture

Assigned: Bart Van Herreweghe » darren oh
StatusFileSize
new1.94 KB

The attached patch tries to prevent i18n from rewriting XML Sitemap links while allowing other modules to do so. Please test.

SubZero5’s picture

Status: Needs review » Needs work

it did not no anything... just in case, I have reverted back to a previous backup..

dsp1’s picture

what exactly is the patch suppose to fix? having a sitemap for each language?
making it so there is not sitemap.xml and sitemap0.xml?

i did not notice any difference after patching.

SubZero5’s picture

Hold on a sec..

> having a sitemap for each language?

why require a sitemap for each language? It is meaningless. Each path must be preceeded with its own language to be precise and included in the single sitemap. :)

> making it so there is not sitemap.xml and sitemap0.xml?

This is one of greatest requests I have ever seen. If there is only sitemap0.xml, why on earth do we have sitemap.xml sitemap index?

> i did not notice any difference after patching.

I can not access the error (or php warning) logs on my server. If you can access those, can you please take a look at it? It might not permit you to re-define a function for some off reason.. :)

darren oh’s picture

The patch is supposed to prevent i18n from adding the language prefix to site map URLs.

SubZero5’s picture

it did not work/did nothing at my site... still has links like www.abc.com/en/sitemap0.xml

darren oh’s picture

Status: Needs work » Needs review

Have you tried deleting your cached site map files?

darren oh’s picture

By the way, the files would be in your temp directory.

SubZero5’s picture

Status: Needs review » Needs work

yes darren, I simply have cleared them all from the tmp folder..

BTW, are you going to use xmlsitemap.module or will you still use the gsitemap.module? When will there be a complete move?

darren oh’s picture

There will be a complete move as soon as we finish working the bugs out. I've found the solution to this problem and will be posting details shortly.

darren oh’s picture

Status: Needs work » Fixed

I got the problem fixed in CVS commit 83768. For the fix to work, gsitemap must be loaded before i18n. i18n is a bit unstable at the moment, so for now this has to be done by setting the module weights manually. Please check issue 111047 if this causes problems.

SubZero5’s picture

strange. did not fix my issue. same as the old version.. :(

darren oh’s picture

Step by step:

  1. Replace gsitemap.module with the new file from CVS.
  2. Make sure that i18n has a greater module weight than gsitemap (use Module Weight if you don't know how).
  3. Delete your temp files.
  4. Make sure your search engine URLs are correct.
  5. View the site map at /sitemap.xml.
dsp1’s picture

SubZero5, why not have a sitemap for each language? currently my sitemap only contains english nodes. no other language. is it suppose to put all nodes from all languages on the one sitemap?

SubZero5’s picture

Status: Fixed » Needs work

yes dsp1, currently that part is faulty.. for n from 0 to inf., sitemap(n).xml has all the links of the site when number of links > chunk size (curr: 50k).. which means all the /en/.. links, /fr/... links, /de/... links, and non-language related links must be included in the sitemap. if the chunk is not filled, then there will be no sitemap(n+1).xml :D

thanks darren, the module was written as -10. to fix this, I have given xmlsitemap a -11 weight. that fixed the issue... but this caused another issue.. my site links now does not have the language prefixes and the sitemap itself only has the english (my primary) pages... :(

darren oh’s picture

Status: Needs work » Fixed

It seems like the best solution would be to generate and submit separate site maps for each language. The current code is not capable of determining what language content is in. Please open a new issue if you would like to discuss adding that ability.

Anonymous’s picture

Status: Fixed » Closed (fixed)