Unfortunately this doesn't work (at least for me) when the transliteration option is checked because the transliteration happens first and converts the ® mark into an 'r' character before the punctuation code has a chance to remove it. Any removal operations should occur before transliteration.
In pathauto.inc I just rearranged the transliteration section with the punctuation section so the punctuation section runs first:
// Replace or drop punctuation based on user settings
$output = strtr($output, $cache['punctuation']);
// Optionally transliterate (by running through the Transliteration module)
if ($cache['transliterate']) {
$output = transliteration_get($output);
}
Basically yes, but the point is to allow users to decide if they want to remove or replace these chars with the placeholder. Having these under the punctuation settings gives people a more fine-grained control of what happens with specific chars.
Does that answer your concerns or did you mean something else?
I just tested the 'Reduce strings to letters and numbers with all the punctuation in this patch and it in fact removed them correctly. Can you please retry this?
So yes, it does work. I don't know what I might have done wrong when testing back in #14.
Still, the point of this patch is to offer people a more fine-grained control over each and every one of these characters separately (for example to replace them instead of remove them).
@rpayanm: I think enabling the Transliteration module and subsequent support in Pathauto would cover that use case - but it would be good to confirm this behavior. I don't think we want to be adding every single possible character.
temkinCreditAttribution: temkin as a volunteer commented
Status:
Needs review
» Closed (works as designed)
I don't know if we want to keep adding special characters to the list of already existing ones. Over the years it may grow to include each and every possible character. There is already an API to expand that list (hook_pathauto_punctuation_chars_alter) and that should be a preferred way to handle special cases on a case by case basis. Unless you guys think otherwise.
I'm not going to reopen this, as it's maybe off topic ~ but the suggestions around moving the; // Replace or drop punctuation based on user settings
...block, are probably sound? The reason is as is said above, when you have a trademark symbol/ copyright/ etc ... it's converted to letters, i.e. "tm" so any processing then can't filter it out... For us it means that our URLs just can't be sanitised using the alter hook...
edit: The issue in case it isn't clear, is that we need both? We have an i18n site, so we require transliteration, but we also require to drop items such as the trademark symbol mentioned previously.
I think otherwise. The problem isn't to have or not a list of punctuation, but the timing when it happen. When transliteration happen before any other step, we can't target some elements, like the "registered" sign.
I had this problem in Drupal 7, I currently have this problem in Drupal 8. Making the punctuation change happen before any other step would solve this issue.
Switching the order of transliteration and punctuation replacement/removal could be good, but that will then 'break' other cases. For example, if transliteration changes ¡ to !, it would then be too late for that to be removed in the same way as !. Or for the case of the copyright symbol, which gets transliterated to '(c)', the parentheses would not get removed/replaced.
Would it be safe/sensible to run the punctuation replacement before and after the transliteration perhaps?! Or provide some other way to allow altering transliterated replacements in this specific context?
(My particular interest is that I want transliteration, but I also want to strip £ symbols entirely rather than allow them to be transliterated to 'ps' in URLs.)
Here's a patch that does run the punctuation replacement before as well as after transliteration, to solve these cases where some symbols need removing entirely rather than transliterating, whilst still using transliteration for other characters.
Given that hook_pathauto_punctuation_chars_alter() could be used, I don't see a huge benefit in extending the list of punctuation explicitly supported out-of-the-box.
Comments
Comment #1
legolasboAttached patch adds the punctuation settings
Comment #2
legolasboComment #3
legolasboIn related issue #1986530: Add emdash and endash to list of punctuation characters the addition of emdash and endash was requested. I've also added these characters to the patch.
I'll changed the title of this issue to a more general one.
Comment #4
klonos...seems a no-brainer to me. It only adds these lines:
...and gets the job done. Thanx ;)
Comment #5
klonos...ok minor nit-pick: the first letter of the label should be capitalized:
t('registered trade mark sign')
Sorry ;)
Comment #6
legolasboNo need to apologise klonos, I should have paid better attention to detail ;)
Attached patch fixes this enormous error ;)
Comment #7
klonos...aaand back to RTBC. Thanx ;)
Comment #8
klonos...can we get this in please?
Comment #9
klonos...another friendly ping.
Comment #10
GregSmith104 CreditAttribution: GregSmith104 commentedUnfortunately this doesn't work (at least for me) when the transliteration option is checked because the transliteration happens first and converts the ® mark into an 'r' character before the punctuation code has a chance to remove it. Any removal operations should occur before transliteration.
In pathauto.inc I just rearranged the transliteration section with the punctuation section so the punctuation section runs first:
Comment #11
legolasboUpdated the patch to incorporate the suggestion from #10
Comment #12
klonos...works as before for me + it includes what @GregSmith104 suggested in #10. So back to RTBC.
Would love it if this was committed so that it is included at least in the latest dev.
Comment #13
Dave ReidIs the purpose to remove these characters? If so, why not enable the 'Reduce strings to letters and numbers' option?
Comment #14
klonos@Dave Reid: using the 'Reduce strings to letters and numbers' option does not work with © for example. Did you mean that you'd prefer to update the patch to make such characters work with that option instead of adding it in the punctuation settings?
Basically yes, but the point is to allow users to decide if they want to remove or replace these chars with the placeholder. Having these under the punctuation settings gives people a more fine-grained control of what happens with specific chars.
Does that answer your concerns or did you mean something else?
Comment #15
Dave ReidI just tested the 'Reduce strings to letters and numbers with all the punctuation in this patch and it in fact removed them correctly. Can you please retry this?
Comment #16
klonosHere's what I did:
1. http://simplytest.me/project/drupal/7.x?add[]=pathauto&add[]=token
2. Once the sandbox at simplytest.me is launched, go to the Modules admin page (admin/modules) -> enable both Pathauto and Token -> Save configuration.
3. Then go to Configuration -> Search and metadata -> URL aliases -> 'Settings' tab ->enable the 'Reduce strings to letters and numbers' checkbox -> Save configuration.
4. Finally go to Add content -> Basic page enter the following as title:
test " \ ` , . - _ : ; | { [ } ] + = * & % ^ $ # @ ! ~ ( ) ? < > / \ © ™ ® – — title
-> SaveResults:
- "Basic page test " \ ` , . - _ : ; | { [ } ] + = * & % ^ $ # @ ! ~ ( ) ? < > / \ © ™ ® – — title has been created."
- the path of the page is
node/test-title
So yes, it does work. I don't know what I might have done wrong when testing back in #14.
Still, the point of this patch is to offer people a more fine-grained control over each and every one of these characters separately (for example to replace them instead of remove them).
Comment #17
rpayanmIn english only exist "!" and "?" in the end of the sentence, examples:
That is!
What?
but in spanish exist too "¡", "¿", examples:
¡Así es!
¿Qué?
Then my suggestion is adding the characters "¡", "¿" to the punctuation settings.
Greetings.
Comment #18
Dave Reid@rpayanm: I think enabling the Transliteration module and subsequent support in Pathauto would cover that use case - but it would be good to confirm this behavior. I don't think we want to be adding every single possible character.
Comment #19
rpayanmI installed Transliteration but without result :(
Transliteration is not for file names ?
Here my case:
Title: ¿Esto es un ejemplo?
Pathauto result: ¿esto-es-un-ejemplo
Desirable result: esto-es-un-ejemplo
Greetings.
Comment #20
temkin CreditAttribution: temkin as a volunteer commentedI don't know if we want to keep adding special characters to the list of already existing ones. Over the years it may grow to include each and every possible character. There is already an API to expand that list (
hook_pathauto_punctuation_chars_alter
) and that should be a preferred way to handle special cases on a case by case basis. Unless you guys think otherwise.Comment #21
Pobtastic CreditAttribution: Pobtastic at ArcadeGeek LTD commentedI'm not going to reopen this, as it's maybe off topic ~ but the suggestions around moving the;
// Replace or drop punctuation based on user settings
...block, are probably sound? The reason is as is said above, when you have a trademark symbol/ copyright/ etc ... it's converted to letters, i.e. "tm" so any processing then can't filter it out... For us it means that our URLs just can't be sanitised using the alter hook...
edit: The issue in case it isn't clear, is that we need both? We have an i18n site, so we require transliteration, but we also require to drop items such as the trademark symbol mentioned previously.
Comment #22
arakwar CreditAttribution: arakwar commented@temkin
I think otherwise. The problem isn't to have or not a list of punctuation, but the timing when it happen. When transliteration happen before any other step, we can't target some elements, like the "registered" sign.
I had this problem in Drupal 7, I currently have this problem in Drupal 8. Making the punctuation change happen before any other step would solve this issue.
Comment #23
james.williams CreditAttribution: james.williams at ComputerMinds commentedSwitching the order of transliteration and punctuation replacement/removal could be good, but that will then 'break' other cases. For example, if transliteration changes ¡ to !, it would then be too late for that to be removed in the same way as !. Or for the case of the copyright symbol, which gets transliterated to '(c)', the parentheses would not get removed/replaced.
Would it be safe/sensible to run the punctuation replacement before and after the transliteration perhaps?! Or provide some other way to allow altering transliterated replacements in this specific context?
(My particular interest is that I want transliteration, but I also want to strip £ symbols entirely rather than allow them to be transliterated to 'ps' in URLs.)
Comment #24
james.williams CreditAttribution: james.williams at ComputerMinds commentedHere's a patch that does run the punctuation replacement before as well as after transliteration, to solve these cases where some symbols need removing entirely rather than transliterating, whilst still using transliteration for other characters.
Given that
hook_pathauto_punctuation_chars_alter()
could be used, I don't see a huge benefit in extending the list of punctuation explicitly supported out-of-the-box.Comment #25
Omar AlahmedThe below code adds the Arabic diacritics and special symbols to the punctuation list using hook_pathauto_punctuation_chars_alter:
I hope this would help.