I know there is a separate issue created where people want a separate sitemap per i18n language (#182442: sitemaps for each language - i18n) - this is a bit different I suppose. I'd like a single sitemap with all translated nodes included.

Whenever the sitemap is created, only nodes with no language, and nodes assigned to the currently selected language (of user that triggered the regeneration) are included. A quick example:

1. Create a node and specify it is in English and save it.
2. The sitemap is regenerated with all "no language" nodes, and all English nodes. No other language nodes are included.
3. Now translate that new English node to French and save the translated node.
4. The sitemap is regenerated with all "no language" nodes and all French nodes included. All the English nodes are removed and lost from the sitemap.

Now, from what I can see from a small piece of debugging, _xmlsitemap_node_links() selects all eligible nodes, including translations. However, somewhere later the translated nodes are being filtered out and not included in the sitemap.

This leads me to:

Darren Oh says this here: http://drupal.org/node/182442#comment-676408

i18n was adding the language prefix to every URL, so we use i18n_get_lang_prefix() to get the URL without a language prefix.

This will be the situation until someone who knows the i18n code well provides a patch that can ensure that only the appropriate links have a language prefix, or can split the languages into separate site maps.

I don't understand why you are doing this. By default i18n prefixes all paths with a language prefix. Therefore the search engine should be looking for the path with the language prefix. If you don't want language prefixes for your default language, then you can patch i18n to not do it (#208712: Option to not prefix paths with the default language). I have applied that patch so all my English nodes don't have a path alias, and therefore I don't have any SEO problems with article URLs that I wrote before installing i18n changing. Then I always put the language code in the translated nodes' path aliases, so they are stored in the url_alias table.

example path aliases on some nodes:
English node: /articles/new-car
German node: /de/articles/new-car (just an example, not bothered translating the words)

So I tried this experiment:
I commented out the i18n_get_lang_prefix($result, TRUE); line that strips the language prefixes and re-generated the sitemap.
Results:
1. When English language is selected, sitemap looks correct with no language prefixes (because of i18n patch above). But no translated nodes are included. This proves the decision to strip language prefixes is a mistake because i18n should control this completely - and you can get a sitemap without language prefixes without stripping it manually.
2. When French language is selected, every path in the sitemap has a FR language prefix (except front), but no English nodes are included. Again correct behaviour aside from the missing English nodes - when browsing the site in French you want to view untranslated nodes with an FR prefix to maintain your language setting.

But the big problem is obvious there - the output of the sitemap is dependent on the language you are currently using when causing the sitemap to be regenerated.

Ideally, the sitemap should be generated using the default language, and should include all translated nodes with their language prefixes as supplied by drupal core + i18n.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mr.j’s picture

Aaagh I missed a small step in the steps to reproduce the problem. It should be:

1. Create a node and specify it is in English and save it.
2. The sitemap is regenerated with all "no language" nodes, and all English nodes. No other language nodes are included.
3. Switch to another language (eg French)
4. Now translate that new English node to French and save the translated node.
5. The sitemap is regenerated with all "no language" nodes and all French nodes included. All the English nodes are removed and lost from the sitemap.

mr.j’s picture

Oh and one more thing I just remembered - because this module strips the language prefix from path aliases, I cannot put the homepage for my site in different languages into the sitemap.

i.e.
mysite.com/ <-- English homepage
mysite.com/us <-- USA homepage
mysite.com/fr <-- French homepage

If you manually add /us or /fr as additional links to the sitemap, they are changed to / as soon as you save the sitemap configuration.

My workaround was to add the panels path that I am using for each homepage (/enhome, /ushome, /frhome) to the sitemap and using .htaccess to 301 redirect requests to those paths to /, /us, and /fr. Ugly but effective.

lazly’s picture

Hi!

I have an site, where i met with this probleme. Im using xmlsitemap, i18n and others...
Hungary --> site.eu/ (default)
English --> site.eu/en (and the future will site.com)

And is I look site.eu/sitemap.xml will diplay english and dont translate contents. I try hu/sitemap.xml, en/sitemap.xml en-sitemap.xml, ensitemap.xml and others, but these isnt work.
How can I make a hu/sitemap.xml with hun, and en/sitemap.xml with eng contents (and sitemap.xml with all contents).

Thanks

Darren Oh’s picture

Google rejects site maps that are not in the site root directory, so we prevent them from being generated.

lazly’s picture

Thanks your answer.

If the path name is not hu/sitemap.xml but hu_sitemap.xml... ;)
And the problem is (because i writeing now) that the sitemap.xml is include only english and dont translated contentes (and the default language is hungary)

Have a nice day!

Jose Reyero’s picture

Subscribing

caktux’s picture

Here's how I got to have all nodes with language prefixes in one sitemap.xml, commenting the i18n_get_lang_prefix() line also :

/**
 * Get all site map links.
 * @return An array of links from hook_xmlsitemap_links().
 */
function _xmlsitemap_links() {
  static $links;
  if (!isset($links)) {
    $entries = module_invoke_all('xmlsitemap_links');

// Added code here
    global $base_url;
    foreach ($entries as $frentry) {
      if ($frentry['#loc'] != ($base_url.'/'.drupal_get_path_alias(translation_url(drupal_get_normal_path(str_replace($base_url.'/','',$frentry['#loc'])),'fr')))) {
        $frlinks[] = array('#loc' => str_replace('/en/','/fr/',$base_url.'/'.drupal_get_path_alias(translation_url(drupal_get_normal_path(str_replace($base_url.'/','',$frentry['#loc'])),'fr'))), '#lastmod' => $frentry['#lastmod'], '#changefreq' => $frentry['#changefreq'], '#priority' => $frentry['#priority']);
      }
    }
    $entries = array_merge($entries, $frlinks);
// End added code

    if (!empty($entries)) {
      foreach ($entries as $key => $link) {
        $lastmod[$key] = $link['#lastmod'];
      }
      array_multisort($lastmod, $entries);
    }
    $links = $entries;
  }
  return $links;
}

I know it's a really dirty solution but it works for me... I hope it can help someone or lead to a better solution. [edit] nodes are also affected, prefix changed with another str_replace.... so dirty lol...

claudiu.cristea’s picture

Title: i18n & localizer: translated nodes not included in sitemap » i18n: translated nodes not included in sitemap
Priority: Critical » Normal

This bug is also present on sites where nodes are translated with the localizer module. On my site I have Localizer 5.x-1.10 & XML Sitemap 5.x-1.4. The sitemap results are restricted to language neutral pages and default language pages. No translations in the XML Sitemap results...

So... this bug is not an i18n issue. It affects all multilingual sites... and I think that is critical.

claudiu.cristea’s picture

Title: i18n: translated nodes not included in sitemap » i18n & localizer: translated nodes not included in sitemap
Priority: Normal » Critical

Also change the issue title to reflect #8 and switch to critical.

Florian’s picture

Title: i18n: translated nodes not included in sitemap » i18n & localizer: translated nodes not included in sitemap
Priority: Normal » Critical

I am sure the answer is to use a sitemapindex.xml (http://www.puzzle.ro/sitemapindex.xml) which will include all per language sitemaps. This proved to be a good method, everything been indexed correctly for years and no duplicate content penalty from Google.

I use to have multisites for multilingual content and now I am moving them to multilanguage single site using i18n module and the behavior of XML sitemap module does affect the way search engines see the content.

Comitto’s picture

FileSize
4.36 KB

I suggest using the patch i have attached. This does not fix xmlsitemap_user module (we don't use it), but the rest works fine.

mr.j’s picture

Comitto's patch is tested and I can confirm it works mostly on drupal 5.7. However the xmlsitemap_node.module could not be patched straight out so I had to apply it manually (we're using the release version 5.x-1.4) - maybe you created it against the dev version Comitto?

Also, it does not preserve the patch that I have applied against i18n to remove the language prefix on default language nodes - i.e. any node with the default language set will always show the prefix in the sitemap. But I am prepared to live with that as it can be solved with .htaccess if necessary.

I likewise don't use the xmlsitemap_user.module module but I am sure someone must so it should really be fixed too (if necessary).

I urge the devs to clean up and commit this patch asap because right now this problem really is a major PITA. Any time we save a node in a language other than the default EN, the sitemap is regenerated with the current language prefix applied to all urls. So we have to then switch back to English, edit the English translation and save it again, otherwise Google webmaster tools starts logging errors as the url to every page on the site is in constant flux. And seeing as we have a forum, if any user adds a topic while using a language other than EN - bang - there goes our sitemap until we notice and fix it.

Thank goodness this patch fixes that problem! On further inspection, no it doesn't. Very annoying problem.

claudiu.cristea’s picture

Status: Active » Needs work

I failed to patch xmlsitemap_node.module with this patch file. I done it manually. Now the _xmlsitemap_node_links() function is (is it okay Comitto?):

function _xmlsitemap_node_links($excludes = array()) {
  $links = array();
  if (module_exists('i18n')) {
  	$select_ext .= ", i18n.language AS lang";
  	$join_ext .= " LEFT JOIN {i18n_node} i18n ON n.nid = i18n.nid";
  }
  if (module_exists('comment')) {
  	$select_ext .= ", s.comment_count, s.last_comment_timestamp, xn.previous_comment";
  	$join_ext .= " LEFT JOIN {node_comment_statistics} s ON s.nid = n.nid";
  }
  $sql = "
    SELECT n.nid, n.type, n.promote, n.changed, xn.previously_changed, xn.priority_override, ua.dst AS alias $select_ext
    FROM {node} n
    LEFT JOIN {xmlsitemap_node} xn ON xn.nid = n.nid
    LEFT JOIN {url_alias} ua ON ua.pid = xn.pid $join_ext
    WHERE n.status > 0
    AND (n.type NOT IN ('". implode("', '", $excludes) ."') AND xn.priority_override IS NULL OR xn.priority_override >= 0)
    AND n.nid <> %d";
  $result = db_query(db_rewrite_sql($sql), _xmlsitemap_node_frontpage());
  while ($node = db_fetch_object($result)) {
    $priority = xmlsitemap_node_priority($node);
    if ($priority > -1) {
      $lang_prefix = ($node->lang) ? $node->lang.'/' : '';
      if ($node->alias) {
      	if (substr($node->alias, 0, 3) == $lang_prefix)
      	  $alias = $node->alias;
      	else
      	  $alias = $lang_prefix.$node->alias;
      } else {
      	$alias = '';
      }
      $links[] = array(
        'nid' => $node->nid,
        '#loc' => xmlsitemap_url($lang_prefix.'node/'. $node->nid, $alias, NULL, NULL, TRUE),
        '#lastmod' => variable_get('xmlsitemap_node_count_comments', TRUE) ? max($node->changed, $node->last_comment_timestamp) : $node->changed,
        '#changefreq' => xmlsitemap_node_frequency($node),
        '#priority' => $priority,
      );
    }
  }
  return $links;
}

However, this patch is not fixing the sitemap when XML Sitemap is used with localizer module. I'm not very familiar with i18n and localizer core mechanism... Should I assume that if (module_exists('i18n')) can be applied also to localizer? Like: if (module_exists('localizer'))

attiks’s picture

Just to confirm that the node part is working, didn't tested (nor need) the taxonomy part.

Thanks for the patch

Peter

Comitto’s picture

Patch against 5.x-1.4

I do not use localizer. Patch use some code specific to i18n module, so it does not work with localizer module.

attiks’s picture

Patch is working great, but I had doubles in my sitemap.

Solved by adding Distinct to the select query

Peter

Darren Oh’s picture

I believe this issue is being confused with issue 182442. Please go there if you want language prefixes in your site map. The issue here is that some multi-language settings prevent translated nodes from appearing in the site map at all. The patches offered here do not seem to address this issue.

Darren Oh’s picture

Version: 5.x-1.4 » 5.x-2.x-dev
Darren Oh’s picture

Status: Needs work » Fixed

Fixed in CVS commit 117947.

Comitto’s picture

Darren: now I'm really confused. mr.j who created this issue wrote that attached patch is almost ok. So what is a problem? Issue and posted patch is addressed to 1.x version, so why are you changed version to 2.x-dev ?! There are some people which still use 1.x version. I think that you should just post info that issue is fixed in 2.x version and leave this issue opened or if you are only person who can fix 1.x version, just set status to "won't fix". Moreover if you want make people use 2.x version - mark is as stable not development.

Darren Oh’s picture

Version: 5.x-2.x-dev » 5.x-1.x-dev
Status: Fixed » Patch (to be ported)

The patch addressed a completely different issue. That's why I provided a link to the correct issue. We can port the fix for this issue to the 5.x-1.x version.

Darren Oh’s picture

Status: Patch (to be ported) » Fixed

Fixed in CVS commit 119033.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

claudiu.cristea’s picture

Version: 5.x-1.x-dev » 6.x-1.x-dev
Status: Closed (fixed) » Active

Reopened because this error is present in 6.x-1.x-dev.

When using with i18n only the sitemap is generated only for default language.

For example if we have a site in english (EN) and italian (IT) and i18n is configured with path prefix then a call to http://www.example.com/sitemap.xml will redirect to http://www.example.com/en/sitemap.xml and only the EN nodes will be listed in sitemap. Even if you will try to access http://www.example.com/it/sitemap.xml you will get only EN localized nodes.

apaderno’s picture

Title: i18n & localizer: translated nodes not included in sitemap » Support for translated nodes
Category: bug » support
Status: Active » Closed (duplicate)

The 6.x-1.x-dev doesn't even have the code for that, yet. i18n is not even necessary with Drupal 6, IMHO, as Drupal already comes with the Locale, and Content translation modules.

I am changing the category to support request; there is already an issue report with the similar topic, so I am setting this as duplicate of #349406: Localized aliases don't appear in the site map.