Hi,

I have my website translated into several languages.
English - www.example.com/
French - www.example.com/fr/
Chinese - www.example.com/ch/

I would like to create a sitemap for each and successfully did so with this module.
English - www.example.com/sitemap.xml
French - www.example.com/fr/sitemap.xml
Chinese - www.example.com/ch/sitemap.xml

Now I would like to register the sitemaps with google.com google.fr and google.cn but not sure how to? Does submitting all 3 sitemaps at www.google.com/webmasters/tools or via this module make it show in the french and chinese search engines as well? Or do I submit the english at google.com and somehow find the place to submit the french and chinese versions?

Also, does this module add the sitemap links into the robot.txt? I read search engines will pickup the sitemaps if it is included in the robot.txt file. Do we need to indicate one sitemap is for french and another is for chinese somehow?

I am new to sitemaps and would really appreciate some guidance and usage with this module.

Thanks!!

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Dave Reid’s picture

Glad to hear you're finding the module useful!

I'm not sure about the process for submitting multilingual sitemaps with Google. Is there a French or Chinese version of Google Webmaster Tools? I'll have to look into it more.

As for robots.txt, if you have the RobotsTxt module, the XML sitemap module (beta1) will automatically add the appropriate language sitemap to the robots.txt file. For example:
www.example.com/robots.txt would reference www.example.com/sitemap.xml
www.example.com/fr/robots.txt would reference www.example.com/fr/sitemap.xml

robby.smith’s picture

Hi Dave,

Thanks for the reply. I am not too clear on what RobotsTxt module does.
If I add in my current robot.txt file the following lines, would it be enough or no?

# Sitemaps
Sitemap: http://www.yourDrupalsite.com/sitemap.xml
Sitemap: http://www.yourDrupalsite.com/fr/sitemap.xml
Sitemap: http://www.yourDrupalsite.com/ch/sitemap.xml

Just wondering as adding the lines would be a one time thing vs adding another module to my site =)

Thanks for your help!

p.s. I will also ask maintainers of the robotstxt module #787678: adding xml sitemaps to robot.txt

Dave Reid’s picture

Status: Active » Fixed

Well, the robotstxt.module allows other modules (like XML sitemap) to automatically 'hook' into your robots.txt file, so you don't have to edit that file whenever changes happen. Let's say you add a new language on your site. Without the module, you'll have to remember that you need to add it manually to the robots.txt file. With the module, it will happen automatically.

It's really up to you since both manually editing the file and running the robots.txt work just fine.

robby.smith’s picture

thanks Dave! I will most likely try the manual route as this site I'm working on now will be small.
have a good day!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

michaelpporter’s picture

Category: support » feature
Status: Closed (fixed) » Needs review
As for robots.txt, if you have the RobotsTxt module, the XML sitemap module (beta1) will automatically add the appropriate language sitemap to the robots.txt file. For example:
www.example.com/robots.txt would reference www.example.com/sitemap.xml
www.example.com/fr/robots.txt would reference www.example.com/fr/sitemap.xml

It is my understanding that this is not proper robots.txt function. The robots.txt file must reside in the root of the domain. A robots.txt file located in a subdirectory isn't valid, as bots only check for this file in the root of the domain. For instance, http://www.example.com/robots.txt is a valid location. But, http://www.example.com/mysite/robots.txt is not. If you
don't have access to the root of a domain, you can restrict access using the Robots META tag.

I propose a patch to loop over all XML files and add it to the root robots.txt file.

michaelpporter’s picture

This is the function that worked for us. I am sure it could be cleaner code, I pieced this together in a few min.

function xmlsitemap_robotstxt() {
    $query = db_query("SELECT * FROM {xmlsitemap_sitemap}");
    while ($sitemap = db_fetch_array($query)) {
      $smid = $sitemap['smid'];
      $sitemaps[$smid] = $sitemap;
      $sitemaps[$smid]['context'] = unserialize($sitemap['context']);
      $sitemaps[$smid]['uri'] = xmlsitemap_sitemap_uri($sitemaps[$smid]);
      $robotstxt[] = 'Sitemap: ' . url($sitemaps[$smid]['uri']['path'], $sitemaps[$smid]['uri']['options']);
    }
    return $robotstxt;
}
michaelpporter’s picture

Component: Other » xmlsitemap.module
FileSize
704 bytes

Attached is a patch that addresses (what I feel) is an issue when adding the xmlsitemaps to the robots.txt file via the robotstxt module. Per the note above there should only be a robots.txt file in the root of the domain, not in a subfolder for each language; the search engines do not look robots.txt in subfolders.

YK85’s picture

subscribing - I am interested in learning more.

mrfelton’s picture

The previous patch didn't work properly, as the path prefix wasn't included. Revised patch attached.

roderik’s picture

Status: Needs review » Needs work

I have a multilanguage site with domain name (not path prefix) language selection.

The current unpatched code adds one line to robots.txt, with my domain:

www.mydomain.nl/robots.txt:

User-agent: *
Disallow: /stuff
Sitemap: http://www.mydomain.nl/sitemap.xml

www.mydomain.com/robots.txt:

User-agent: *
Disallow: /stuff
Sitemap: http://www.mydomain.com/sitemap.xml

This is exactly what I want, right? I don't want to include sitemaps for other domains.

So basically,
- if your multilanguage site has 'path prefix' selection, you want to include sitemaps for all contexts (as per above patches);
- if your multilanguage site has 'domain name' selection, you want to include only the sitemap for the current context.
(And I don't know enough about the xmlsitemap 'contexts' concept, to know if both these statements always holds true :) )

Jawi’s picture

Did someone already find a way to exclude the automatic creation of sitemap lines in the robots.txt file?

leymannx’s picture

Issue summary: View changes

I opened a feature request and provided a patch that adds an option to completely disable RobotsTxt support: #3020155: Add config to disable RobotsTxt support.

leymannx’s picture

Status: Needs work » Closed (outdated)

Oops, wrong version. This issue actually is outdated.