The site I'm working on have 7 different languages and I'm facing an issue where I only want the node translations with an actual translation available to be indexed.

Example of what I'm getting now:

  <xhtml:link rel="alternate" hreflang="en" href="http://site.dev/some-page"/>
  <xhtml:link rel="alternate" hreflang="sv" href="http://site.dev/sv/node/366"/>
  <xhtml:link rel="alternate" hreflang="fr" href="http://site.dev/fr/node/366"/>
  <xhtml:link rel="alternate" hreflang="de" href="http://site.dev/de/node/366"/>

As the content is only available in english it should look like this:

  <xhtml:link rel="alternate" hreflang="en" href="http://site.dev/some-page"/>

Is there a way to achieve this?

Thanks in advance.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

sebastianwestberg created an issue. See original summary.

gbyte’s picture

Version: 8.x-2.3 » 8.x-2.x-dev
Component: User interface » Code
Category: Support request » Feature request

In #2707557: Sitemap contains untranslated nodes I have argued that providing language specific links to content that is not translated would not harm SEO. I am not sure about this however; if you can find someone stating that sitemap hreflang links pointing to non-translated content are harmful, I will gladly take a closer look at this.

The implementation is not difficult at all, we just need a check if an entity has a translation:

if ($entity->hasTranslation($language->getId())) {
  $url_object->setOption('language', $language);
  $urls[$language->getId()] = $url_object->toString();
}

Where I see a problem with your approach is that a page is not made up exclusively of the entity (eg node) in question. It may also have blocks/views around it (sidebar, footer) with content that may be translated and this content would not be indexed if we exclude entity pages which do not have a translation.
Another problem is posed by mixed pages like the homepage and panel pages which do not have that 'main entity' which could be checked for translations. Maybe a configuration option to choose one approach or the other is needed?

I'm changing this to 'feature request' and invite everyone to share their thoughts on this issue.

matsbla’s picture

I don't have any data to support that including hreflang links to pages without translation is bad for SEO, but I made a website with 300 languages where obviously most content did not have translations in all languages. In Google webmaster I got very many error messages about dead hreflang links.

I guess it can make sense for a website with only some few languages and where translations are frequently added and updated, to have it though.

Maybe we could make it a configuration option and then website builders can choose to turn it on or off depending on their own need?

gbyte’s picture

As a temporary solution you guys can implement hook_simple_sitemap_links_alter in a custom module to exclude some languages completely, something along these lines:

function mymodule_simple_sitemap_links_alter(&$links) {
  foreach($links as &$link) {
    foreach($link['urls'] as $lang => $url) {
      if ($lang != 'en')
        unset($link['urls'][$lang]);
    }
  }
}

Obviously you could do some more magic by loading the entity from the path, checking if it is translated and so on, but this will cost some performance.

matsbla’s picture

I don't think that is a good solution. Different nodes have different translations. I don't want to hard-code languages to exclude them for hreflang liks for the whole site! I simply want to remove the hreflang links when there doesn't exist a translation

goodwillhinton’s picture

I agree that it is not a good solution to hard code the exclusion of individual languages. And it is a problem to submit a sitemap with inaccurate links to non-translated nodes.

You might want to take a look at this core patch that provides similar functionality to include hreflang tags only for those nodes that are published and translated: https://www.drupal.org/node/2521782

gbyte’s picture

Thanks @goodwillhinton

I'm going to look into it soon, but would be happy to review any patches (the change should be implemented in \Drupal\simple_sitemap\Batch::generateBundleUrls). If you guys can help out this may make it into the next stable version.

guilopes’s picture

Hello, I did an patch for it, could you guys review that?

It add an checkbox on content type to user select if he want or not check the translation before generate the link

matsbla’s picture

Thanks for helping to push for this, I tested it and here are my feedbacks:

- Why should this be configured for each content type? For me it makes more sense to have a global setting for this at the sitemap configuration page (path: admin/config/search/simplesitemap). I think to make different settings for this for each content type is confusing, and also we run the risk of many people getting inconsistent xml sitemaps as they have inconsistent configurations for each content type. However would be interesting to hear if there are any specific use case for this configuration by content-type?

- When I tested it doesn't seem to work. It now seems like the frontpage is always indexed with hreflang links (no matter if it has translation or not), but content doesn't have the hreflang links no matter if I check the box "Check Translation" is used or not.

- The text is a little bit confusing: "It will check if exists translation of this content type before index it". It should express more clear if it will include these translations or not, like "Only index published translation of this content"

gbyte’s picture

Status: Needs review » Needs work

Hi guilopes, After taking a brief look at your code, I agree with matsbla's review. A global setting on the setting page seeems to be the right place for this feature. If you guys can come up on a clean implementation, I will be happy to push this.

guilopes’s picture

hey, It's just for Content Types that's why not work on front page. I think all the languages have the front page so it should show all the hreflang.

I used the configuration on Content Type because it's more flexible, I can select the bundle that I want to select not for all bundles, it makes sense when I have Blog Posts for example and I don't wan't to show hreflang because it's not translated in the other hang I have a basic page, just with file attach that is the same for all the languages and I want to have the hreflang, make sense? I don't think that it's confuse because we already have some configurations in each bundle type. Do you guys agree with that?

I agree with the message and I can change it, just let me know if you guys agree with the rest of configuration and I do these changes

guilopes’s picture

To hreflang work with different languages, we need to translate the content and Check "Check Translation" on content type, it should work

guilopes’s picture

I found one issue in the patch, when the language is not on default language it's not generating the properly link.

I fixed it and also the message on the following patch

guilopes’s picture

I forgot to change the text on check translation.

Again:

guilopes’s picture

Status: Needs work » Needs review
Fidelix’s picture

It seems @guilopes made per-content type settings to address the scenario described in #2.

That is: If a content type doesn't need to be translatable but still needs to be displayed in different languages, there should be an option for that.
ie. A user page with blocks that are translatable, the home page (if it were a non-translatable entity), file entities (or a similar node content type), etc.

Yes, it is slightly less straightforward for the user. The extra flexibility may be worth it, though. It covers a lot more use cases IMO.

We can decide on some sane defaults to reduce friction for the user. I think everything should be "language-checked" by default so as not to generate incorrect links.

guilopes’s picture

I forgot to sent one thing on batch.php to pass the default langcode. I'm sending it in attach

sgurlt’s picture

Awesome, I was hoping for a patch like this and their it is. Works perfect, thanks :)

gbyte’s picture

Status: Needs review » Needs work

Hi guilopes, appreciate your effort and I'm also looking forward for it to be pushed.

I have tested #17 and found a few problems:

  1. After adding the 'page' bundle to index, I got this error during generation:
    Notice: Undefined index: und in Drupal\simple_sitemap\SitemapGenerator::generateSitemapChunk() (line 169 of modules/contrib-dev/simple_sitemap/src/SitemapGenerator.php). I have English set as default language and German.
  2. admin/config/search/simplesitemap/entities
    The 'check translation' checkbox value for non-bundle (atomic) entity types (like user or file) does not get saved.
  3. admin/config/search/simplesitemap/entities
    The JS inside atomic entity type fieldsets e.g (like user or file) is not hiding the 'check translation' fields dynamically.
  4. simple_sitemap.schema.yml
    The check_translation key has the wrong label.
  5. admin/structure/types/manage/page
    The summary inside the tab doesn't reflect the bundle's 'check translation' setting (JS).

Beside that, I think I know how we can vastly improve user friendliness in this feature:

a) either implement a global setting 'check translation' on the module setting for all types and another setting 'Override setting on a per-bundle basis' which will then show the 'check translation' checkboxes on entity bundle edit pages as well as on admin/config/search/simplesitemap/entities for atomic (non-bundle) entity types.

b) If the above is too complex, we could start by simply implementing one global setting 'Show translation settings on entity type pages' which will expose the checkbox you implemented on the entity bundle edit pages as well as on admin/config/search/simplesitemap/entities for atomic entity types. In other words, this checkbox would enable the feature you implemented and would be unchecked by default.

Any thoughts? I'm looking forward to review the next iteration. Please do not forget to pull the new dev version first, as there has been lots of code refactoring to improve testability.

meramo’s picture

Doesn't apply against the latest dev (f07ee24)

guilopes’s picture

it's based on b4ba757c138301ac86a471029e3b8520be2be458

sic’s picture

I'd love to apply the patch but how can I do that? It doesnt work with the latest dev or stable version.

Untranslated nodes/ entites really should be excluded. I tend to have a global setting for that, and not per content-type.

  • gbyte.co committed 60cdc00 on 8.x-2.x
    Issue #2736583 by guilopes, sebastianwestberg: Exclude entries without...
gbyte’s picture

Status: Needs work » Fixed

Alright I just pushed code adding a global 'Skip non-existent translations' setting. This does not allow to set the setting on a per-bundle basis, but all the problems from #19 should be removed plus the setting also works on custom links which point to entities.

@guilopes Sorry I couldn't use your code, but feel free to take what I wrote and create a further patch allowing setting this on a per-entity type (on non-bundle entities) and per-bundle basis. I'll be happy to review and push.

Please test and reopen issue if you find problems.

Status: Fixed » Needs work
gbyte’s picture

Status: Needs work » Fixed
guilopes’s picture

hey, I created a new patch. I hope it can help :)

shadcn’s picture

Status: Needs review » Active

Hi Pawel, thanks for the fix.

We were having the same issue on one of our sites. I just tested the 8.x.2.6 version with the fix but I'm still seeing a few issues.

Here's some more information about our setup. We have 2 languages on the site: English (default) and German.

Test A: Success

  1. Create a node in the default English language.
  2. Regenerate the sitemap.
  3. Output is correct. Only hreflang="en" is in sitemap.xml
<url>
  <loc>http://example.com/english-only-content</loc>
  <xhtml:link rel="alternate" hreflang="en" href="http://example.com/english-only-content"/>
  <priority>0.5</priority>
 </url>

Test B: Failure

  1. Create a node in the German language.
  2. Regenerate the sitemap.
  3. Output is not correct. Both hreflang="en" and hreflang="ge" in sitemap.xml.
<url>
  <loc>http://example.com/node/1002</loc>
  <xhtml:link rel="alternate" hreflang="en" href="http://example.com/node/1002"/>
  <xhtml:link rel="alternate" hreflang="de" href="http://example.com/de/german-only-content"/>
  <priority>0.5</priority>
 </url>

Can you help us? Thanks.

guilopes’s picture

have you test the #27 patch ?

gbyte’s picture

@arshadcn
It appears English is your default language.

Drupal\simple_sitemap\Batch\BatchUrlGenerator:

          if (!$this->batchInfo['skip_untranslated'] || $language->isDefault() || $entity->hasTranslation($langcode)) {
            $url_object->setOption('language', $language);
            $alternate_urls[$langcode] = $url_object->toString();
          }

The condition $language->isDefault() adds the link to the sitemap for the default language regardless whether the node exists in this language. This was added because the condition $entity->hasTranslation($langcode) returned TRUE only for translations of the default language.

The question is, is it incorrect for the sitemap to always return a link to the default language; if not, the above code is what needs to be tweaked?

guilopes’s picture

with the patch #27 the issue about default languages always have the href is solved. it's make sense because we need to check only if the current entity have the language

shadcn’s picture

@gbyte.co, yes this is what we expected: if we create a node in one language, regardless if the language is default or not, only 1 entry for the node in that language should be in the sitemap. Is this the correct behavior?

guilopes’s picture

@arshadcn same bahavior we have here. the #27 should work but you should apply based on c8633a94226de25e72cd324338e19999b7d1688b something like this:

projects[simple_sitemap][download][type] = git
projects[simple_sitemap][download][url] = https://git.drupal.org/project/simple_sitemap.git
projects[simple_sitemap][download][branch] = 8.x-2.x
projects[simple_sitemap][download][revision] = "c8633a94226de25e72cd324338e19999b7d1688b"
projects[simple_sitemap][patch][] = https://www.drupal.org/files/issues/simple_sitemap-adding_option_to_check_translation-2736583-27-D8.patch
guilopes’s picture

@gbyte.co any chance of my patch to be merged?

guilopes’s picture

Status: Active » Needs review
shadcn’s picture

Thanks @guilopes. I'll take a look. The patch didn't apply to the latest 8.x-2.x.

guilopes’s picture

@arshadcn yeah, I know. I don't know if the mantainer of the module will apply my patch so I can't update it every new dev commit. You need to apply on top of this git hash c8633a94226de25e72cd324338e19999b7d1688b

shadcn’s picture

Thanks @guilopes. I'll test it.

gbyte’s picture

@guilopes Thanks a lot for your effort! The problem with your patch is that non-translated entities (where language is set to 'not specified'), are not indexed at all. That would often be most pages, as most editors leave the content language on 'not specified' until a translation is made, so that the content can be reached regardless of the current website language setting.
Also, the 'regenerate sitemap' checkbox on setting pages does not appear dynamically when you toggle the 'skip non-existing translations' checkbox.
The third thing I found is that your code does not work with custom links (admin/config/search/simplesitemap/custom).

@arshadcn I've modified my approach to prevent the default language to always appear in URLs. Could you please test for me the most recent dev which should be up in a few minutes? Please close the issue if all is working well.

  • gbyte.co committed fc3f808 on 8.x-2.x
    Issue #2736583 by guilopes, arshadcn: Exclude entries without...
shadcn’s picture

Status: Needs review » Fixed

@gbyte.co this is perfect. Thanks.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

guilopes’s picture

gbyte’s picture

Revisiting this issue.

Any of you guys think the here implemented skip_untranslated setting should be ticked by default for new installations of the module? Thanks for your feedback everyone.

guilopes’s picture

@gbyte.co yes, I think it should be the default. Make more sense since in most of the cases you want to expose just the nodes that you have translated.

mbovan’s picture

I just found this issue when I was looking for a way to exclude non-existent translations.

+1 for making this default behavior (#44).

I quickly created a follow-up #2989456: Skip non-existent translations by default.

mbovan’s picture

Oops #46 was already fixed in #2961233: Allow for multiple sitemap types on one Drupal instance and comitted to 3.x branch only. Closed #2989456: Skip non-existent translations by default as duplicate.