I am building a multilanguage site with English as the default language and I need my nodes' paths to be in this form:

../en/node-name/title
../de/node-name/title
../fr/node-name/title

..and so on.

I use path prefix with language fallback + pathauto auto-URLs for nodes set to [type]/[title-raw] and I do achieve exactly this. There is a culprit though... Take the case of a page with the title 'News' for example. I get URLs like these:

../en/page/news
../de/page/nachrichten
../fr/page/nouvelles

This would be ok for most of the languages, but for some locales that use UTF I get very long weird-looking URLs in IE (because IE converts the characters to their corresponding ASCII numbers). So, until this is (ever) fixed in IE or most site visitors and people in general start using decent browsers, I need the URLs to be in this form:

../en/page/news
../de/page/news
../fr/page/news

In other words, I need the /[title-raw] part of the URL to remain the same for all translations of a page. This should be the node title in the default language (English in my case).

I understand that if this was implemented with a new pattern somehow, there still would be an issue when there is no translation of a page in the site's default language. So I'll need to either force people that create content to create the node in the default language first (even a node with an empty body that can be translated later on will do) or this implementation would use a fallback. So, on each node save or update...

step1: check if the node's language is the site's default. if so, go on and create the URL using the node's title
step2: if not, check to see if a translation for this node in the site's default language exists. if so, use that node's title instead to form the URL
step3: if the node's language is different than the site's default and there is no translation for the node in the site's default language either, then create the URL using /[title-raw] and let it be updated to the title of the default language on next translation sync.
step4: once the URL is computed, check to see if it is the same as the current (in the case of an update - not during initial node creation) and update it only if it differs.

If there already is a workaround/patch/hack to achieve what I am after, please point me to it.

Hope I made myself understood and the problem I am facing is clear to everyone. Thanx in advance to anyone that will take a look into this.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

klonos’s picture

I know the name of the token is not perfect, but the alternative in order to make sense would be [title-of-node-with-site-default-language] ...or something.

As you can all see that is looong and ugly ;)

If anyone else has a better name for it, please free to suggest it.

Dave Reid’s picture

Why not just use the Transliteration module to help get rid of the ugly characters?

klonos’s picture

ok, I already use Transliteration for sanitizing file uploads, but the thing is this this only solves the ugly chars in the URL with IE. Well, for most languages that use a latin-like alphabet where for example I would simply transliterate 'ö' to 'o' and it wouldn't make that much of a difference. What about say Greek or Arabic or Hebrew? Transliterating titles of these languages (if and where possible) would produce non-making-sense URLs.

In the meantime I re-thought the whole thing and realized that it is better to have:

../en/page/news
../de/page/news
../fr/page/news

rather than:

../en/page/news
../de/page/nachrichten
../fr/page/nouvelles

I cannot figure a way to achieve that with current modules and configuration. So, I need a way to have the default title in my URLs. Still, thank you, because what you propose is a valid workaround I'll have to settle with. ...Until my request gets implemented ;)

klonos’s picture

I would also settle for a setting in the translation sync options of each content type that disables URL synchronization. But I think there's another issue for something like that somewhere...

greggles’s picture

Status: Active » Closed (won't fix)

Thanks for submitting this idea.

I think it's pretty site specific and shouldn't be in Pathauto itself.

However, this could be done in your own token. See the API.txt and token_starterkit.module for examples.

Dave Reid’s picture

Version: 6.x-2.x-dev » 7.x-1.x-dev
Assigned: Unassigned » Dave Reid
Status: Closed (won't fix) » Active

I think this actually is a good idea for a D7 token since we have chaining. Core provides a 'tnid' or the NID of the source node of the current translated node, so why can't we provide a [node:source] that could be chained like [node:source:title]?

greggles’s picture

Project: Pathauto » Token
Component: Tokens » Code

OK.

Dave Reid’s picture

Status: Active » Needs review
FileSize
3.96 KB

Patch for review with included test.

Dave Reid’s picture

Title: define a new [site-default-language-title] token » Add a [node:source] token for source node of a translated node
Status: Needs review » Fixed

Passed the bot so committed to CVS!
http://drupal.org/cvs?commit=339194

klonos’s picture

@ Dave: can I has a D6 version of the patch? ...pretty please??

At least will this be ported to D6? ...pretty please again?? Or is the tnid D7 core specific?

Dave Reid’s picture

Version: 7.x-1.x-dev » 6.x-1.x-dev
Status: Fixed » Patch (to be ported)

Chaining tokens like that is a D7 only. I'd like to leave it up to discussion to backporting a [source-nid] and [source-title] tokens for D6 with the other maintainers.

klonos’s picture

Ok, thanx in advance. Please consider it people.

Till then, I am curious to find out how others have solved this issue, because it seems so common that it can't be I am the only one with it... Anyone?

klonos’s picture

To answer my question above...

#737746: using language English as for all other languages as the url alias
#164709: Pathauto does not support unicode (Hebrew, Arabic) Title

So, I guess a lot of people need this one (kinda already knew). Once this is fixed the token should be made available to pathauto's replacement patterns.

@Dave: I leave it up to you to mark these issues as duplicates if you agree(?)

adrianmak’s picture

Besides Greek or Arabic or Hebrew, asian languages like chinese, japanese, korean has no help with the Transliteration module.
As klonos said, it should be fallback to source node id default language , English for most case.

klonos’s picture

I am copy-pasting some comments over from those duplicate issues in order to reply here and to keep the conversations in one place...

klonos’s picture

Freso - March 10, 2010 - 13:40

"Transliteration is the practice of converting a text from one writing system into another in a systematic way." (Wikipedia)
In the case of the Transliteration module, it systematically converts non-Latin (really non-US-ASCII) text into Latin (or US-ASCII). So yes, the module is also for languages that do not use a "Latin-like alphabet". (It could be argued that it is particularly for those.)

However, the original request of using the source node's title is indeed a duplicate of #736178: Add a [node:source] token for source node of a translated node - so let's mark it as such.

klonos’s picture

adrianmak - March 10, 2010 - 14:17

I tried transliteration module on some other project and knows what it deal with on non-lantin like alphabet like those asian languages

say for chinese like 聯絡我們, meaning of contact us, with transliteration module, it will convert to something like
lian-luo-wo-men. And for japanese, since there are hanji chars in japanese, the conversion will mix up of japanese pronunciation (ie hinagana) and chinese pronunciation (ie mandarian)

Be honest I don't prefer this kind of conversion and I prefer using English instead for a multilingual site with a language prefixed cht/contact-us

klonos’s picture

@Freso: I am aware of the term ;)

The thing is though, as you already noted, that transliteration for those languages produces non-making-sense strings that end up as part of the URL. So yes, it is an issue.

Luckily in Greek, we already have a way to transliterate latin strings. We even have a name for this 'language', we call it Greeklish ;) That still remains simply a workaround and in a lot of cases we need really complex transliteration rules to achieve a decent, understandable result.

@adrianmak: you'll find that the only browser not handling non-US-ASCII characters correctly is IE. Still, most users (and site visitors) use it, so I know how you feel. What can I say, be patient and wait till this issue gets its way in. In the meantime you can help test any patches that the maintainers will provide.

adrianmak’s picture

By moving discussion to token module, what's that mean of using token instead of pathauto module to solve non-us-ascii chars, right ?

I'm pleased to help for testing. :)

klonos’s picture

pathauto depends on the token module for its replacement patterns. So, I guess this needs to be addressed in token first.

klonos’s picture

I believe we need to also do this for taxonomy terms and vocabularies as well. I mean once #290421: pathauto patch to provide localized and entity translated taxonomy through i18n gets in.

What do you people think?

Dave Reid’s picture

Then maybe i18n should be the one implementing these tokens, not token.module.

klonos’s picture

I couldn't possibly know that. I trust your judgment.

Dave Reid’s picture

Token should basically support whatever capabilities core supports. Nodes can have translations and sources in core, so that's why I'm open to supporting it in Token. Taxonomies and terms are not translatable in core, so those should be supported in i18n.tokens.inc.

klonos’s picture

I see your point, but as I said I am not that familiar with the code so that I can have a say. Sorry.

As for the i18n.tokens... are you perhaps referring to any of these two(?):

#305639: i18n support for the tokens?
#535522: Using translated content type from i18ncontent as a token for pathauto

Or is it another issue I'm missing?

zbb’s picture

*Bump*

I have the same problem. Can we expect a patch for D6 or is it more prudent to not count on the pathauto module (since it's practically useless for a multilingual site without being able to deliver predictable, readable urls).

Please let me know either way.

greggles’s picture

@zbb - if you have a need you should work on it. "Bumping" an issue and complaining that a module is "useless" when you are talking to people who put thousands of hours of volunteer effort into it is only likely to frustrate them.

The patch needs to be ported - why don't you work on porting it?

adrianmak’s picture

I'm just a newbie drupal module developer and now I'm studying the source of token module and try to create a new token for referring the node source language’s node title.

How to check (helper function ) whether a node has a translation ?

update:
After digging the $node variable, I found that if a translated node has a source language, variable tnid is pointed to the node id of source language. For instance, a non-zero value of tid mean to be there is a source language.

Can I use it (reliable ?)as an indicator ?

adrianmak’s picture

FileSize
16.86 KB
10.22 KB

Need help!!

As according to assumption of my last post, I wrote a simply custom module to add new new token for that job.
NO CHECKING at this moment, just to prove whether it is work or not.

Here is the custom cmodule

function custom_token_values($type, $object = NULL, $options = array()) {
  $values = array();
  switch ($type) {
    case 'node':
      $node = $object;
      $values['title-source-language'] = node_get_source_language_title($node);
      break;
  } 
  return $values;
}

function custom_token_list($type = 'all') {
  if ($type == 'node' || $type == 'all') {
    $tokens['node']['title-source-language'] = t('Node title of source language');
    
    return $tokens;
  }
}

function node_get_source_language_title($node = NULL) {
  $abc = translation_node_get_translations($node->tnid);
  return $abc[en]->title;
}

My new token shown up in the pathauto page (admin/build/path/pathauto) capture.png

And I put the new token in my language node path setting. capture2.png

Then I create a English content and did an translation. NOT WORK.
The url still shown up as cht/node/13 of the translated content

But If I check the "Bulk generate aliases for nodes that are not aliased" under pathauto, the alias can be generated where it will pickup the source language (ie English)

klonos’s picture

@Dave, #11: Can we have an update please? Did you have that discussion about backporting a [source-nid] and [source-title] tokens to D6 with the other maintainers? ...anyways, just pinging you for an update.

klonos’s picture

Ping?

John Carbone’s picture

Needed this same functionality for a site I'm working on. Picking up where #29 leaves off, here's what I came up with. It adds new token for node title and sync's the urls when a translated node is created. It doesn't update a translated node's url when the parent is updated though. Wasn't sure if it should or not. If I'm remembering correctly, that's the default behavior of pathauto; for instance when a path uses menu paths and menu items get moved. Running "Update URL alias" does update both though. Anyway, here's my working version. Haven't decided on a name, so I just called the module "path_test"... lol.

function path_test_token_list($type = 'all') {
  if ($type == 'node' || $type == 'all') {
    $tokens['node']['title-source-language'] = t('Node title of source language');
    return $tokens;
  }
}

function path_test_token_values($type, $object = NULL, $options = array()) {
  $values = array();
  switch ($type) {
    case 'node':
      $node = $object;
      $values['title-source-language'] = path_test_node_get_source_title($node);
      break;
  } 
  return $values;
}

function path_test_node_get_source_title($node = NULL) {
  $default_language = language_default();
  $default_lang_code = $default_language->language;
	
	//untranslated or default language neutral node
  if (($node->language == $default_lang_code )||($node->language == '')) {
  	return pathauto_cleanstring($node->title);
  }
	
	//translated node
  elseif ($node->language != $default_lang_code) {
  	$translations = translation_node_get_translations($node->tnid);
  	
  	if ($translations == NULL && $node->is_new == TRUE) {
  		return pathauto_cleanstring($node->translation_source->title);
  	}
  	
  	elseif ($translations) {
  		return pathauto_cleanstring($translations[$default_lang_code]->title);
  	}
    } 
}
Dave Reid’s picture

This would require a whole another node_load() for every single time node tokens are generated, which I am very, *very* hesitant to do since 6.x-1.x is already a big mess performance-wise.

donquixote’s picture

For Drupal 6 I would prefer [tnid] instead of [node:source].
Here is how to do this in a custom module:

<?php
function xxx_token_list($type = 'all') {
  if ($type === 'node' || $type === 'all') {
    $tokens['node']['tnid'] = t('Node translation nid (tnid)');
    return $tokens;
  }
}

function xxx_token_values($type, $object = NULL) {
  $values = array();
  if ($type === 'node') {
    $values['tnid'] = $object->tnid ? $object->tnid : $object->nid;
  }
  return $values;
}
?>

I hope it helps for creating a patch.

yub_yub’s picture

subscribing

pingwin4eg’s picture

FileSize
7.66 KB

Hmm, it seems to be one more problem with node:source token in 7.x (I don't know if I should start another issue).
If I set [node:source:title] for path pattern (for example) then the source node does not get url alias at all.
I thought the original idea of this post was to get the same result (url aliases) for all nodes in translation set including the source node.
URL Aliases / Patterns
Right now for a workaround I can set [node:title] for English paths and [node:source:title] for all others. But what if the source node would not be English? It won't have a url alias.

RogerRogers’s picture

@pingwin4eg I'm trying to achieve what you have done, but when I use your approach my non-source nodes don't get the path alias, I still end up with translated node path. I've tried with [node:source:title] and [node:source:title-field]. It just isn't working. I've tried clearing the cache too.

I can't believe there isn't a way to support Chinese site translations without using the non-latin Chinese characters. Frustrating!

drupalfan81’s picture

Oh boy, I'm glad I finally found this post on the web, after hours of searching around and trying to troubleshoot this issue myself. I haven't made much progress but I have EXACTLY the same concern and issue klonos first started this post with. I have a site with source English and Japanese and Chinese language available. I think the easiest solution would be to use transliteration and somehow disable the darn feature that causes each version of the translated node to update it's URL alias everytime one of the versions of the node is updated. Is this something to do with Drupal or this module itself?

I think ALL sites should have english only URLs. Reason is, yes, modern browsers can display the URL properly in the users web browser. So for example my site page loaded in a browser will look like this: http://www.sitename.com/ja/united-states/listing/united-states/food-drinks/1344/キングス ハワイアン ベーカ

But the moment a user copies that url from their browser window and pastes it on their facebook wall, it looks like this: http://www.sitename.com/ja/united-states/listing/united-states/food-drinks/1344/%EF%BC%88%E3%82%AD%E3%83%B3%E3%82%B0%E3%82%B9%E3%80%80%E3%83%8F%E3%83%AF%E3%82%A4%E3%82%A2%E3%83%B3%E3%80%80%E3%83%99%E3%83%BC%E3%82%AB

Yeah...U-G-L-Y!

Anyway, I was able to follow this guide (https://drupal.org/node/185664) and upload the FULL translation file, which contains all the UTF8 characters (so basically all my Chinese and Japanese) characters that my site would encounter. So now the URLs have only English in them and they look better. I agree with Klonos though, it would be nice just to have the path based off the source language. However, with my site, users could be English, Japanese and Chinese, so the source language of the content they post could be either English, Japanese, or Chinese. So using this method wouldn't work for me. So I'm okay with using path auto with transliteration. So that's fine, but I still have an issue because my site will update the darn path alias on all translated pages that are linked together.

Since I use taxonomy terms which have built in translations, depending on whether the user is editing the node in English, Japanese, or Chinese, the token values will be different. And once that user saves the node, bam path alias goes and updates and now I have new paths for the English, Chinese, and Japanese version of the Node Listing with whatever language the user was editing in. Does this make sense? I can give examples if it's not clear. But this is basically gotten me stuck where I don't know where to go from here.

Has anyone solved Klonos problem for Drupal 6 or found a way to disable the automatic updating of the translated nodes. If the automatic path alias could be limited to just the node itself and ignore what happens to other nodes that are translated, that would solve all my problems. It's like Klonos mentioned "I would also settle for a setting in the translation sync options of each content type that disables URL synchronization. But I think there's another issue for something like that somewhere..."

I think this would be the easiest solution to all our problems. Yes some languages would have paths that aren't as helpful, but I could care less at this point as long as things just worked automatically and without my intervention. THANKS in ADVANCE!

mvlabat’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
Category: Feature request » Bug report
Status: Patch (to be ported) » Needs work

There is a real bug in this feature described in #36.
Marked #2476845: node:source tokens don't work if node has no translations as duplicate of this issue.

liquidcms’s picture

is this only for Titles and only for node translation (i.e. not field translation, i.e. ET)?

i am trying to use this with ANT and i set my title to be replaced by [node:source:field-url:title] where field-url:title is the title portion of a Link field with machine name field_url

and with Token 7.x-1.6 this does not work.

I'll likely just make my own custom token; but wanted to ask/report my findings.

liquidcms’s picture

there is a sandbox project though which supplies ET Tokens: https://www.drupal.org/sandbox/villette/2485083