If i have gsitemap and i18n module installed, the sitemap-document at the root level only contains one item, a reference to another url
The url submitted in google (without i18n)
http://www.domainname.com/sitemap.xml (is OK)
But when i18n is installed sitemap is stored at
http://www.domainname.com/nl/sitemap.xml
or
Example
All three urls:
http://www.fire-proof.be/sitemap.xml
http://www.fire-proof.be/nl/sitemap.xml (as mentioned in admin / settings / xml sitemap) (in the nl section)
http://www.fire-proof.be/fr/sitemap.xml (as mentioned in admin / settings / xml sitemap) (in the fr section)
reffer to
http://www.fire-proof.be/fr/sitemap0.xml (which contains the correct sitemap)
BUT
http://www.fire-proof.be/fr/sitemap0.xml (fr)
http://www.fire-proof.be/nl/sitemap0.xml (nl)
contains the same sitemap (and only de fr links to the documents)
When submitting a sitemap of the following to google:
http://www.fire-proof.be/nl/sitemap.xml
http://www.fire-proof.be/fr/sitemap.xml
http://www.fire-proof.be/fr/sitemap0.xml
the google error is:
The Sitemap must be located at http://www.fire-proof.be/. To add a Sitemap at http://www.fire-proof.be/nl/, first add that site to your account and then click the Add a Sitemap link beside it.
I added http://www.fire-proof.be/ but not http://www.fire-proof.be/nl/ (it is not my intention to do so, because it is the same site)
Any feedback on this.
| Comment | File | Size | Author |
|---|---|---|---|
| #14 | gsitemap-149693.patch | 1.94 KB | darren oh |
| #8 | gsitemap_10.patch | 997 bytes | voidberg |
Comments
Comment #1
melon commentedsubscribing
Comment #2
rondev commentedWhat I noticed is:
If you modified xmlsitemap modules parameters when fr language is active, the sitemap is linked to the /fr/ site. If you do when nl active, the sitemap refer to /nl/.
There is only one sitemap active. I don't know how to do to have other sitemaps active.
I added those lines in my .htaccess:
And tell google to go to domain.com/frgsitemap ; domain.com/engsitemap ; domain.com/esgsitemap.
But is not needed as there is only one sitemap active. All those sitemaps refer to the same thing.
Does anyone have a solution?
Ronan
Comment #3
cburschkaHardcoded Rewrite? The path module does an excellent job at providing alias tables.
I'd suggest auto-aliasing [lang]/sitemap.xml to lang-sitemap.xml rather than langsitemap.xml, simply for readability.
The problem that remains is figuring out where to hook into the i18n module so that the alias can be created when a locale is added.
Perhaps a better way to do this would be to add an option to auto-alias all existing locale sitemaps in the sitemap settings page. This would have to be revisited whenever new locales are added, but you'll probably go to that page anyway when submitting a new sitemap to Google.
Comment #4
rondev commentedThank you for the tip. I didn't thought about that as I use less modules for better maintenance. It would delay sometimes the upgrade if a module is not supported anymore.
For the following of your comments, I can't say more as my level is more the same as a simple end user than a developer. Help is much appreciate. It would be very nice if xmlsitemap and i18n work well together. I hope it will happen before Drupal 6 (supporting multilingual system I eared) were released.
Ronan
Comment #5
rondev commentedAnother thing I realized is that if changes are realized in xmlsitemap settings, the sitemap contains only the pages of the language (83 references in my case). If I do a modification in my module activation list, the sitemap contains pages of all languages (113 pages in my case). That is strange for me. Now in xmlsitemap, pages of every languages are referenced in one xml page. fr/sitemap0.xml and en/sitemap0.xml gives the same result. I don't know if it is important for Google to have a better PR.
Ronan
Comment #6
PixelClever commentedI have been dealing with this exact same problem. Is there any news as to how to patch it? I am tinkering with the code myself, but I am no expert at this level of programming.
Comment #7
PixelClever commentedI am willing to offer a bounty of $100 to anyone who can fix this module so that internationalization and google sitemap generator can work together. I know it's not much, but that's what I can afford, and I think a fix will benefit the project as a whole. Maybe if someone else is willing to add to it we can afford to pay more.
Comment #8
voidberg commentedThe problem is that gsitemap uses the url function to generate links to chunks. i18n adds the current language code to all links generated by url and this is where the problem is.
The solution is the following: remove the call to url from the generation of the links to chunks and generate the url with another method.
The patch was created under OS X so beware of the line endings. However it's small enough that it can be applied manually.
One could take this further and use the same code in the chunk generation code to generate the links in the sitemap without the 'en/' or 'fr/' since i18n is smart enough to prepend the language code.
Comment #9
PixelClever commentedIt Works. The url that is listed for pinging google still shows the "en" in the url, but the that is easily changed by typing in the correct url. That's just a detail, but it would be good to have that fixed for the long term.
What you did justifies the bounty... so should we set something up through paypal?
Comment #10
voidberg commentedThe easiest way to remove the lang addition to the url's generated in the chunks is to remove the following code from _gsitemap_get_path_alias which is located at the end of gsitemap.module:
Warning: this also breakes any other custom url rewrites that other modules could supply.
vinayakaya, I wrote you an email regarding the bounty.
Comment #11
darren ohComment #12
SubZero5 commentedthe patches will not fix anything as i see...
and there might be a problem here:
$script = (strpos($_SERVER['SERVER_SOFTWARE'], 'Apache') === FALSE) ? 'index.php' : '';This code might blow things off. Many hosts just seem to hide the Apache tag. Dev's must lean on the Clean URLs setting because "This option makes Drupal emit 'clean' URLs (i.e. without ?q= in the URL.)"..
Comment #13
cburschkaclean_url doesn't enter into it - "index.php" is here used for the purpose of generating an absolute link on non-Apache webservers. Drupal's url() function does the same thing, by the way:
I'm not sure how the circumvention of Drupal's url() function solves the original problem (since that is all the patch seems to do), but this line is used in the heart of Drupal's core.
Comment #14
darren ohThe attached patch tries to prevent i18n from rewriting XML Sitemap links while allowing other modules to do so. Please test.
Comment #15
SubZero5 commentedit did not no anything... just in case, I have reverted back to a previous backup..
Comment #16
dsp1 commentedwhat exactly is the patch suppose to fix? having a sitemap for each language?
making it so there is not sitemap.xml and sitemap0.xml?
i did not notice any difference after patching.
Comment #17
SubZero5 commentedHold on a sec..
> having a sitemap for each language?
why require a sitemap for each language? It is meaningless. Each path must be preceeded with its own language to be precise and included in the single sitemap. :)
> making it so there is not sitemap.xml and sitemap0.xml?
This is one of greatest requests I have ever seen. If there is only sitemap0.xml, why on earth do we have sitemap.xml sitemap index?
> i did not notice any difference after patching.
I can not access the error (or php warning) logs on my server. If you can access those, can you please take a look at it? It might not permit you to re-define a function for some off reason.. :)
Comment #18
darren ohThe patch is supposed to prevent i18n from adding the language prefix to site map URLs.
Comment #19
SubZero5 commentedit did not work/did nothing at my site... still has links like www.abc.com/en/sitemap0.xml
Comment #20
darren ohHave you tried deleting your cached site map files?
Comment #21
darren ohBy the way, the files would be in your temp directory.
Comment #22
SubZero5 commentedyes darren, I simply have cleared them all from the tmp folder..
BTW, are you going to use xmlsitemap.module or will you still use the gsitemap.module? When will there be a complete move?
Comment #23
darren ohThere will be a complete move as soon as we finish working the bugs out. I've found the solution to this problem and will be posting details shortly.
Comment #24
darren ohI got the problem fixed in CVS commit 83768. For the fix to work, gsitemap must be loaded before i18n. i18n is a bit unstable at the moment, so for now this has to be done by setting the module weights manually. Please check issue 111047 if this causes problems.
Comment #25
SubZero5 commentedstrange. did not fix my issue. same as the old version.. :(
Comment #26
darren ohStep by step:
Comment #27
dsp1 commentedSubZero5, why not have a sitemap for each language? currently my sitemap only contains english nodes. no other language. is it suppose to put all nodes from all languages on the one sitemap?
Comment #28
SubZero5 commentedyes dsp1, currently that part is faulty.. for n from 0 to inf., sitemap(n).xml has all the links of the site when number of links > chunk size (curr: 50k).. which means all the /en/.. links, /fr/... links, /de/... links, and non-language related links must be included in the sitemap. if the chunk is not filled, then there will be no sitemap(n+1).xml :D
thanks darren, the module was written as -10. to fix this, I have given xmlsitemap a -11 weight. that fixed the issue... but this caused another issue.. my site links now does not have the language prefixes and the sitemap itself only has the english (my primary) pages... :(
Comment #29
darren ohIt seems like the best solution would be to generate and submit separate site maps for each language. The current code is not capable of determining what language content is in. Please open a new issue if you would like to discuss adding that ability.
Comment #30
(not verified) commented