Download & Extend

Use Transliteration module for transliteration

Project:Pathauto
Version:7.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

It would be nice if pathauto uses existing transliteration module for easier maintenance and broader character support.

Comments

#1

I agree with this idea (note that Pathauto predates that module by a few years...). Thanks for submitting it. I do not have time to create a patch for this now but if nobody gets around to it in the next few months then I may do so.

#2

Version:5.x-2.x-dev» 6.x-1.x-dev

Based on the Transliteration project's documentation, it shouldn't be that hard to implement. But. Should support for i18n-ascii.txt go out the plank, or should it be kept as a backup? (I'd definitely be in favour of making it go out the plank, but I'm not either module's maintainer. :))

I'm also thinking that it should probably go against the latest code, with the possibility of being backported.

#3

@Freso - great questions/points. I think that if the Transliteration project is used for Pathauto then yes, the i18n-ascii.txt file will be removed and would no longer be supported. Further, if I implement this feature I don't plan on providing an "upgrade path" aside from users manually reconfiguring the Transliteration module. That is to say, if someone has custom rules based on the i18n-ascii.txt file then they will have to figure out how to implement them using the Transliteration project manually and the rules won't be automatically created by any transition code.

Also, definitely this should be done in the 6.x branch of code and, almost definitely in a 6.x-2.x.

#4

Status:active» needs review

This is way too simple for me to believe one bit that it works as-is. But I don't have time to test this right now. Hm. And I should probably also provide a simple patch that doesn't include INSTALL.txt changes and i18n-ascii.example.txt removal... but not tonight.

Also, we probably want something other than "?" to replace unknown characters in the aliases... should I just make it use "_" hardcoded, or should something like variable_get('pathauto_separator', '-'); be used? (I prefer "_" to "-", but then, it's likely for another purpose as well... but maybe not?)

AttachmentSizeStatusTest resultOperations
pathauto_using_transliterate.patch10.38 KBIgnored: Check issue status.NoneNone

#5

Status:needs review» needs work

Indeed the patch needs a little help to separate those extra little changes.

Also, if possible, the transliteration_get() should respect the language that a node is written in as mentioned in #111110: i18n-ascii.txt does not respect section i.e. if I write a node in German then the German transliteration rules should be used to transliterate.

So...who can test? ;)

#6

For the language awareness, I've opened #255646: Add a language parameter to transliteration_get() over in the Transliteration queue.

I'll make a double patch later today then, for easier review. And I'll try to test it as well. :)

#7

Status:needs work» needs review

Actually, I figured just not removing the i18n file in the patch would improve its readability greatly. So here we go.

AttachmentSizeStatusTest resultOperations
pathauto_using_transliterate_no_rm.patch4.42 KBIgnored: Check issue status.NoneNone

#8

Status:needs review» needs work

Two small issues:
1. pathauto_cleanstring() should additionally validate the presence of transliteration.module using module_exists(), since Transliteration could've been uninstalled in the meantime.
2. Be nice to translators: when transliteration.module isn't active, the disabled message should be appended to $translation_help instead of duplicating it.

#9

  1. Agreed. Will incorporate in next patch.
  2. Two things, 1) in this case I think I agree with you, but there are many similar cases where some languages are forced to making bad translations due to strings being split up in the source code, thus removing power from the translator to actually do a proper translation. Oh, and 2) this won't make it to this patch, but I'll add it to the list in #256340: String clean up. :)

#10

Status:needs work» needs review

Here's take two, which validates the presence of transliteration.module before calling transliteration_get().

AttachmentSizeStatusTest resultOperations
pathauto_using_transliterate_no_rm.patch4.39 KBIgnored: Check issue status.NoneNone

#11

Forgot to include a comment update.

AttachmentSizeStatusTest resultOperations
pathauto_using_transliterate_no_rm.patch4.51 KBIgnored: Check issue status.NoneNone

#12

And another re-roll after a wee batch of code cleaning (not much more left to clean!). :)

AttachmentSizeStatusTest resultOperations
pathauto_using_transliterate_no_rm.patch4.52 KBIgnored: Check issue status.NoneNone

#13

Making a node (article) with the title "Lill' Mari, / prøve vi / javnlet å tænk' o det høj'," produces "content/lill_mari_prove_vi_javnlet_a_taenk_o_det_hoj" as expected. (Unpatched Transliteration module, so no attempt at language awareness.)

#14

Title:Use existing transliteration module» Use existing Transliteration module
Version:6.x-1.x-dev» 7.x-1.x-dev

Code applied with some offset. Attached is an identical re-roll, just with fixed offset. :)

Also, should module_exists('transliteration') perhaps be replaced with function_exists('transliteration_get')? If either is true, the other should hopefully also be, but... *shrugs* Just thought I'd toss the thought out there.

This also doesn't implement language awareness which is in Transliteration's HEAD. However, this might also be unneeded, as Transliteration will use the current/content language if a language isn't specified, so... we'll have to test that.

AttachmentSizeStatusTest resultOperations
247758_pathauto_using_transliterate_no_rm-14.patch4.49 KBIgnored: Check issue status.NoneNone

#15

Alright, it's definitely not language aware. However, this is not a regression, so... could this be checked in and we could then work on implementing the language aware transliteration later.

#16

New patch adds language awareness. However, I'm not shure if this condition really is enough:

$language = isset($object->language) ? $object->language : NULL;

where $object is the node/user/category/foo object passed to token.

Things that are unclear:
1. Can this lead to unexpected results? I.e., can there be a language property in some object that doesn't denote the object's language (but something completely different)?
2. Will $node->language always be set to a valid, two letter code, or only if the translation module is enabled?

Since this patch needs testing feel free to commit #14 first, will happily re-roll my patch later.

AttachmentSizeStatusTest resultOperations
pathauto.patch6.62 KBIgnored: Check issue status.NoneNone

#17

Title:Use existing Transliteration module» Use Transliteration module for transliteration
Status:needs review» needs work

@smk-ka: Awesome! Thanks for the patch! Since you're on it, I decided to check in the patch in #14, as it will then be easier to review your language changes on their own. So, yeah, please re-roll. :)

#18

Status:needs work» needs review

No problem, here you go.

AttachmentSizeStatusTest resultOperations
pathauto.patch2.97 KBIgnored: Check issue status.NoneNone

#19

Actually, after realising I was testing with an outdated version of Transliteration (ie., I was testing with Danish transliterations in a branch where the patch with the Danish stuff wasn't in...), I found out that the code I checked in earlier today, actually seems to be language aware. There are a few... "funny" cases, but they seem to be the same with and without your patch from #18.

If anyone could confirm this, I'd be thrilled.

#20

I started documentation on this at http://groups.drupal.org/node/12685 - though obviously it could be expanded ;)

#21

Status:needs review» fixed

Okay, I think this one can be closed now. Transliteration seems to both work and work per-locale. Thank you smk-ka, for your awesome responsiveness and work in fixing the small issues there was! =)

#22

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

nobody click here