Support from Acquia helps fund testing for Drupal Acquia logo

Comments

legolasbo’s picture

Attached patch adds the punctuation settings

legolasbo’s picture

Status: Active » Needs review
legolasbo’s picture

Title: Add '™', '©' and '®' to the punctuation settings with default action to remove. » Add several characters to the punctuation settings.
FileSize
7.01 KB

In related issue #1986530: Add emdash and endash to list of punctuation characters the addition of emdash and endash was requested. I've also added these characters to the patch.
I'll changed the title of this issue to a more general one.

klonos’s picture

Status: Needs review » Reviewed & tested by the community

...seems a no-brainer to me. It only adds these lines:

$punctuation['copyright']             = array('value' => '©', 'name' => t('Copyright sign'));
$punctuation['trademark']             = array('value' => '™', 'name' => t('Trade mark sign'));
$punctuation['registered_trademark']  = array('value' => '®', 'name' => t('registered trade mark sign'));
$punctuation['en_dash']               = array('value' => '–', 'name' => t('En dash'));
$punctuation['em_dash']               = array('value' => '—', 'name' => t('Em dash'));

...and gets the job done. Thanx ;)

klonos’s picture

Status: Reviewed & tested by the community » Needs work

...ok minor nit-pick: the first letter of the label should be capitalized: t('registered trade mark sign')

Sorry ;)

legolasbo’s picture

Status: Needs work » Needs review
FileSize
7.01 KB

No need to apologise klonos, I should have paid better attention to detail ;)

Attached patch fixes this enormous error ;)

klonos’s picture

Status: Needs review » Reviewed & tested by the community

...aaand back to RTBC. Thanx ;)

klonos’s picture

...can we get this in please?

klonos’s picture

...another friendly ping.

GregSmith104’s picture

Unfortunately this doesn't work (at least for me) when the transliteration option is checked because the transliteration happens first and converts the ® mark into an 'r' character before the punctuation code has a chance to remove it. Any removal operations should occur before transliteration.

In pathauto.inc I just rearranged the transliteration section with the punctuation section so the punctuation section runs first:

  // Replace or drop punctuation based on user settings
  $output = strtr($output, $cache['punctuation']);

  // Optionally transliterate (by running through the Transliteration module)
  if ($cache['transliterate']) {
    $output = transliteration_get($output);
  }
legolasbo’s picture

Issue summary: View changes
Status: Reviewed & tested by the community » Needs review
FileSize
7.93 KB

Updated the patch to incorporate the suggestion from #10

klonos’s picture

Status: Needs review » Reviewed & tested by the community

...works as before for me + it includes what @GregSmith104 suggested in #10. So back to RTBC.

Would love it if this was committed so that it is included at least in the latest dev.

Dave Reid’s picture

Is the purpose to remove these characters? If so, why not enable the 'Reduce strings to letters and numbers' option?

klonos’s picture

@Dave Reid: using the 'Reduce strings to letters and numbers' option does not work with © for example. Did you mean that you'd prefer to update the patch to make such characters work with that option instead of adding it in the punctuation settings?

Is the purpose to remove these characters?

Basically yes, but the point is to allow users to decide if they want to remove or replace these chars with the placeholder. Having these under the punctuation settings gives people a more fine-grained control of what happens with specific chars.

Does that answer your concerns or did you mean something else?

Dave Reid’s picture

using the 'Reduce strings to letters and numbers' option does not work with © for example

I just tested the 'Reduce strings to letters and numbers with all the punctuation in this patch and it in fact removed them correctly. Can you please retry this?

klonos’s picture

Here's what I did:

1. http://simplytest.me/project/drupal/7.x?add[]=pathauto&add[]=token
2. Once the sandbox at simplytest.me is launched, go to the Modules admin page (admin/modules) -> enable both Pathauto and Token -> Save configuration.
3. Then go to Configuration -> Search and metadata -> URL aliases -> 'Settings' tab ->enable the 'Reduce strings to letters and numbers' checkbox -> Save configuration.
4. Finally go to Add content -> Basic page enter the following as title: test " \ ` , . - _ : ; | { [ } ] + = * & % ^ $ # @ ! ~ ( ) ? < > / \ © ™ ® – — title -> Save

Results:

- "Basic page test " \ ` , . - _ : ; | { [ } ] + = * & % ^ $ # @ ! ~ ( ) ? < > / \ © ™ ® – — title has been created."
- the path of the page is node/test-title

So yes, it does work. I don't know what I might have done wrong when testing back in #14.

Still, the point of this patch is to offer people a more fine-grained control over each and every one of these characters separately (for example to replace them instead of remove them).

rpayanm’s picture

Status: Reviewed & tested by the community » Needs review

In english only exist "!" and "?" in the end of the sentence, examples:
That is!
What?

but in spanish exist too "¡", "¿", examples:
¡Así es!
¿Qué?

Then my suggestion is adding the characters "¡", "¿" to the punctuation settings.

Greetings.

Dave Reid’s picture

@rpayanm: I think enabling the Transliteration module and subsequent support in Pathauto would cover that use case - but it would be good to confirm this behavior. I don't think we want to be adding every single possible character.

rpayanm’s picture

I installed Transliteration but without result :(
Transliteration is not for file names ?

Here my case:

Title: ¿Esto es un ejemplo?
Pathauto result: ¿esto-es-un-ejemplo
Desirable result: esto-es-un-ejemplo

Greetings.

temkin’s picture

Status: Needs review » Closed (works as designed)

I don't know if we want to keep adding special characters to the list of already existing ones. Over the years it may grow to include each and every possible character. There is already an API to expand that list (hook_pathauto_punctuation_chars_alter) and that should be a preferred way to handle special cases on a case by case basis. Unless you guys think otherwise.

Pobtastic’s picture

I'm not going to reopen this, as it's maybe off topic ~ but the suggestions around moving the;
// Replace or drop punctuation based on user settings

...block, are probably sound? The reason is as is said above, when you have a trademark symbol/ copyright/ etc ... it's converted to letters, i.e. "tm" so any processing then can't filter it out... For us it means that our URLs just can't be sanitised using the alter hook...

edit: The issue in case it isn't clear, is that we need both? We have an i18n site, so we require transliteration, but we also require to drop items such as the trademark symbol mentioned previously.

arakwar’s picture

Status: Closed (works as designed) » Active

@temkin

I think otherwise. The problem isn't to have or not a list of punctuation, but the timing when it happen. When transliteration happen before any other step, we can't target some elements, like the "registered" sign.

I had this problem in Drupal 7, I currently have this problem in Drupal 8. Making the punctuation change happen before any other step would solve this issue.

james.williams’s picture

Switching the order of transliteration and punctuation replacement/removal could be good, but that will then 'break' other cases. For example, if transliteration changes ¡ to !, it would then be too late for that to be removed in the same way as !. Or for the case of the copyright symbol, which gets transliterated to '(c)', the parentheses would not get removed/replaced.

Would it be safe/sensible to run the punctuation replacement before and after the transliteration perhaps?! Or provide some other way to allow altering transliterated replacements in this specific context?

(My particular interest is that I want transliteration, but I also want to strip £ symbols entirely rather than allow them to be transliterated to 'ps' in URLs.)

james.williams’s picture

Here's a patch that does run the punctuation replacement before as well as after transliteration, to solve these cases where some symbols need removing entirely rather than transliterating, whilst still using transliteration for other characters.

Given that hook_pathauto_punctuation_chars_alter() could be used, I don't see a huge benefit in extending the list of punctuation explicitly supported out-of-the-box.

Omar Alahmed’s picture

The below code adds the Arabic diacritics and special symbols to the punctuation list using hook_pathauto_punctuation_chars_alter:

function mymodule_pathauto_punctuation_chars_alter(array &$punctuation) {
  // Add the Arabic diacritics and special symbols.
  $punctuation['fatha'] = ['value' => 'َ', 'name' => t('Fatha symbol')];
  $punctuation['damma'] = ['value' => 'ُ', 'name' => t('Damma symbol')];
  $punctuation['ksrah'] = ['value' => 'ِ', 'name' => t('Ksrah symbol')];
  $punctuation['tanween_fateh'] = ['value' => 'ًِ', 'name' => t('Tanween fateh symbol')];
  $punctuation['tanween_dumm'] = ['value' => 'ٌِ', 'name' => t('Tanween dumm symbol')];
  $punctuation['tanween_kser'] = ['value' => 'ٍِ', 'name' => t('Tanween kser symbol')];
  $punctuation['shaddah'] = ['value' => 'ّ', 'name' => t('Shaddah symbol')];
  $punctuation['sokon'] = ['value' => 'ْ', 'name' => t('Sokoon symbol')];
  $punctuation['maddah'] = ['value' => 'ِ~', 'name' => t('Maddah symbol')];
  $punctuation['tamdeed'] = ['value' => 'ِـ', 'name' => t('Tamdeed symbol')];
  $punctuation['right_guillemet'] = ['value' => '»', 'name' => t('Right Guillemet symbol')];
  $punctuation['left_guillemet'] = ['value' => '«', 'name' => t('Left Guillemet symbol')];
  $punctuation['question_mark_rtl'] = ['value' => '؟', 'name' => t('Question mark rtl')];
}

I hope this would help.