Our clients' newspaper site posted a story earlier today that used ‘curly single quotes’ in the title. PathAuto dutifully created a URL from it, curly quotes intact.

For whatever reason, the link worked in the taxonomy listing page, but $node->path was definitely returning the path with the unescaped quote characters, causing a 404 when someone tried to access the story through a custom-written Flash rotator on our site. The fix was easy enough -- manually removing the quotes from the path -- but I think PathAuto, if it's removing "straight" apostrophe characters, should be smart enough to remove their curly bretheren as well.

Resolution

See comment #29 below for a code sample on how to add your own punctuation characters to pathauto.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

greggles’s picture

Can you look at the function in the bottom of pathauto.inc and please provide a patch.

Garrett Albright’s picture

Could you be more specific, please? Which function?

If it's pathauto_punctuation_chars(), is it a matter of just adding the characters to the $punctuation array?

greggles’s picture

You got it - pathauto_punctuation_chars is the place to add it. Just look at the examples, copy, paste, add your code and descriptive text and we should be all set.

Garrett Albright’s picture

Status: Active » Needs review
FileSize
6.57 KB

Okay, here's a patch. Brief testing shows it works well, though I must admit I'm unfamiliar with any considerations I must keep in mind when using non-ASCII characters in PHP strings -- it has always just worked for me, including in this case. I had to adjust the spacing of the lines a bit in order to keep things lining up in nice pretty columns…

I must say, though, that I think the way this is done -- specifying each individual character that should/could be filtered out -- is somewhat untenable. My patch handles English quotes (and apostrophes), but what about quote characters in languages other than English -- and that's just quote characters… In other words, there seems to be no limits to how large this $punctuation array can grow!

mlsamuelson’s picture

Tried to apply the patch, but it fails:

Hunk #1 FAILED at 389.

mlsamuelson

Garrett Albright’s picture

I hate to say it worked for me, but… It worked for me. Are you patching against the most recent version of Pathauto? What command are you using?

Garretts-iMac:~ Albright$ patch /Users/Albright/Desktop/pathauto/pathauto.inc /Users/Albright/Desktop/left-right-quotes.patch.txt 
patching file /Users/Albright/Desktop/pathauto/pathauto.inc
Garretts-iMac:~ Albright$
mlsamuelson’s picture

Status: Needs review » Reviewed & tested by the community

I had the 2.x version instead of 2.0. So... when I used the correct version of the module, the patch applied correctly. Funny, that.

Not only did it apply cleanly, but it also worked as advertised.

As a test, I left the pathauto settings at default, and then created a story node with the title quotation marks ( ‘ ’, “ ” ). Pathauto aliased it as content/quotation-marks.

Then I tested setting the action as "do nothing" for left quotes in the punctuation settings of Pathauto, and left quotes appeared in the URL as expected.

Looks good.

mlsamuelson

greggles’s picture

Priority: Minor » Normal

Well, I would like to commit this, but it fails for me.

So...what am I doing wrong now?

greg$ cat CVS/Tag 
TDRUPAL-5--2
greg$ patch < left-right-quotes.patch
patching file pathauto.inc
Hunk #1 FAILED at 389.
1 out of 1 hunk FAILED -- saving rejects to file pathauto.inc.rej
greg$ cvs stat pathauto.inc
===================================================================
File: pathauto.inc      Status: Up-to-date

   Working revision:    1.1.2.24
   Repository revision: 1.1.2.24        /cvs/drupal-contrib/contributions/modules/pathauto/pathauto.inc,v
   Commit Identifier:   44cb4766e78d4567
   Sticky Tag:          DRUPAL-5--2 (branch: 1.1.2)
   Sticky Date:         (none)
   Sticky Options:      (none)

greg$
Garrett Albright’s picture

FileSize
6.52 KB

Hmm. Well, I must admit that I'm not entirely experienced in the art of creating patches for others' consumption. I'm wondering if the problem is a (filesystem) path problem, since the opening lines of the patch do reference absolute paths to the files as they are on my drive. Greg, could you maybe try explicitly specifying the file to patch using the syntax I used (`patch path/to/pathauto.inc path/to/patch.patch`)?

EDIT: I re-RTFM'd the "Creating patches" page. Here's a new patch created from the Drupal root. Hopefully this'll work better for you folks. Sorry for my n00bishness.

mlsamuelson’s picture

I have no idea how I got that first patch to work. Weird.

The newest patch worked great, however. I ran it through the same tests as before, and it checked out.

mlsamuelson

Cameron Tod’s picture

Add this line to pathauto_punctuation_chars() get rid of Word's rritating long hyphens:

 $punctuation['long_hyphen']         = array('value' => "–", 'name' => t('Long hyphen –'));
Freso’s picture

Version: 5.x-2.0 » 7.x-1.x-dev
  1. This isn't going in for any 5.x branch. Moving to 6.x-2.x.
  2. I think this would be better accomplished by using transliteration, rather than adding a bunch of Unicode characters to Pathauto's punctuation characters to check.
greggles’s picture

If Transliteration module provides support for stuff like this then that sounds excellent to me and I agree that it should be marked duplicate in favor of that.

Yet another alternate (more scalable) proposal is to provide 3 textboxes (one per action) where people can put their own punctuation and have it provide some default values.

Freso’s picture

Status: Reviewed & tested by the community » Closed (duplicate)

Well, the goal and purpose of Transliteration is to change Unicode stuff into ASCII/ANSI stuff. Which includes punctuation. I'm not sure whether curly quotes are currently in Transliteration's tables, but they could easily be added. (See my #257041: More transliterations (x21??) for adding/updating one of the tables to include transliterating of more characters.)

I just went and looked at x20.php (Unicode hyphens are x2010 and x2012-15), and it looks like they're already taken care of. I don't know where the curly quotes are, but chances are they're already in there as well.

So, in short: Take a look at #247758: Use Transliteration module for transliteration and test it if you can! (Note that Transliteration kicks in before the character rules played with so far in this issue, so when Transliteration has transliterated “” to "" and – to --, Pathauto will then use its settings to determine whether to remove or replace or ignore these in the alias.)

therzog’s picture

Hi: I tried this patch and it didn't work on my test string. Instead, I had to use the pack function to specify the matching strings like this. It might help others, and it seems more portable:

RCS file: /cvs/drupal/contributions/modules/pathauto/pathauto.inc,v
retrieving revision 1.1.2.51
diff -u -r1.1.2.51 pathauto.inc
--- pathauto.inc        18 Jun 2008 20:05:08 -0000      1.1.2.51
+++ pathauto.inc        19 Dec 2008 20:13:20 -0000
@@ -466,9 +466,13 @@
 function pathauto_punctuation_chars() {
   $punctuation = array();

-  // Handle " ' ` , . - _ : ; | { [ } ] + = * & % ^ $ # @ ! ~ ( ) ? < > \
+  // Handle " ' ` , . - _ : ; | { [ } ] + = * & % ^ $ # @ ! ~ ( ) ? < > \ and curly quotes
   $punctuation['double_quotes']      = array('value' => '"', 'name' => t('Double quotes "'));
-  $punctuation['quotes']             = array('value' => "'", 'name' => t("Single quotes (apostrophe) '"));
+  $punctuation['left_double_quotes']  = array('value' => pack('ccc', 0xe2, 0x80, 0x9c), 'name' => t('Left double quotes “'));
+  $punctuation['right_double_quotes'] = array('value' => pack('ccc', 0xe2, 0x80, 0x9d), 'name' => t('Right double quotes ”'));
+  $punctuation['quotes']              = array('value' => "'", 'name' => t("Single quotes (apostrophe) '"));
+  $punctuation['left_quotes']         = array('value' => pack('ccc', 0xe2, 0x80, 0x98), 'name' => t('Left single quotes ‘'));
+  $punctuation['right_quotes']        = array('value' => pack('ccc', 0xe2, 0x80, 0x99), 'name' => t('Right single quotes ’'));
   $punctuation['backtick']           = array('value' => '`', 'name' => t('Back tick `'));
   $punctuation['comma']              = array('value' => ',', 'name' => t('Comma ,'));
   $punctuation['period']             = array('value' => '.', 'name' => t('Period .'));

jromine’s picture

subscribe

maijs’s picture

subscribe

jwilson3’s picture

Status: Closed (duplicate) » Needs review
FileSize
379 bytes

This is a fairly major oversight neither handled in the transliteration file (i18n-ascii.example.txt) nor the confounded ui for specifying to strip quotes.

I tested this and fixed it easily by adding both of the curly quotes (opening and closing) to the i18n-ascii.txt file, and enable the Transliteration option in pathauto (which is recommendable for ANY website).

I'd prefer that these two make their way into both d7 and d6, but I dont have time to dl and test this for D7, but provide a patch for d6, for anyone inclined to run with the ball and test / reroll for d7.

Status: Needs review » Needs work

The last submitted patch, pathauto-n207840.patch, failed testing.

Dave Reid’s picture

Status: Needs work » Closed (duplicate)

If curly quotes aren't already handled by the Transliteration module then you need to file an issue in its issue queue.

jakew’s picture

I'm having this issue with 6.x-1.5.

balsama’s picture

Status: Closed (duplicate) » Needs review
FileSize
1.03 KB
1.02 KB

@Dave Reid, am I missing something? Isn't the original request to add support for curly quotes to pathauto?

Patches attached for 6.x-2.x-dev and 7.x-1.x-dev versions of the module.

Both patches add support for the following characters:

  • Left Double Quotation Mark “ “
  • Right Double Quotation Mark ” ”
  • Left Single Quotation Mark ‘ ‘
  • Right Single Quotation Mark ’ ’
balsama’s picture

Last patch had syntax error. Reattaching.

Status: Needs review » Needs work

The last submitted patch, pathauto-207840-23-7.x-1.x-dev.patch, failed testing.

balsama’s picture

One more try.

balsama’s picture

Status: Needs work » Needs review
balsama’s picture

Status: Needs review » Closed (duplicate)

Ok. Now that I read this a little closer, it looks like the official stance of pathauto is that "if your site is likely contain characters beyond ASCII 128" you should just use the transliteration module.

I still think it would be a good idea to include the above patch in pathauto OR make transliteration a requirement since, sooner or later, a large percentage of sites are likely to have an editor paste a curly quote into a title used for pathauto. But since it looks like the module maintainers are going a different route, I don't want to clutter the issue queue.

fletchgqc’s picture

Status: Closed (duplicate) » Needs review

Use Case Overlooked

I apologise for re-opening this issue, but I believe a reasonable use-case has been overlooked. If I'm wrong just close it again. Here is the use-case: I want to use native characters for Japanese, Russian, whatever (i.e. I don't want to transliterate), but I don't want punctuation in my URLs. How do I do this?

The Transliteration module is not easily customisable. You can't decide to only transliterate punctuation and nothing else. So transliteration can't do the job, but neither can pathauto, currently.

You might say: "if you are happy with native characters, then be happy with curly apostrophes". But this is not fair. The point is that I don't want any punctuation at all. Now, admittedly we are sort of opening a can of worms to start supporting the removal of every kind of punctuation, because sooner or later someone will ask you to remove the ¿ character, etc.

So far I put my punctuation into the box marked "Strings to remove. Don't use this for punctuation" :-). I don't know why I'm not allowed to use this for punctuation, because it seems to work. But it doesn't work for curly speech marks pointing right, I know because I just tried it and ended up finding this issue.

A More Scalable Solution?

So I'm fully in support of the proposed patches. But I propose a more scalable solution. Scrap the long list of punctuation and just have three boxes:

  • Punctuation to remove
  • Punctuation to replace by separator

There is a third option on the punctuation select lists: No action. But obviously this can be implemented by just not including that in either of the boxes.

With this solution, the subject will never need to be discussed again. Everyone will just put the punctuation they need into these boxes. And the screen will get a lot shorter :-). Default content of the boxes could be based on the current default settings of the select boxes. Does this sound reasonable?

codesidekick’s picture

Side note to this issue:

Don't feel like patching Pathauto or installing another module? You can implement the alter hook pathauto_punctuation_chars_alter() like so:

MYMODULE_pathauto_punctuation_chars_alter(&$punctuation) {
    $punctuation['double_curly_left']  = array('value' => '“', 'name' => t('Double curly left'));
    $punctuation['double_curly_right'] = array('value' => '”', 'name' => t('Double curly right'));
    $punctuation['single_curly_left']  = array('value' => '‘', 'name' => t('Single curly left'));
    $punctuation['single_curly_right'] = array('value' => '’', 'name' => t('Single curly right'));
}

Add anymore punctuation you don't want in paths, clear cache and enjoy.

jwilson3’s picture

Status: Needs review » Closed (works as designed)

^ thats a great suggestion. Thank you. I think probably with this we can reclose this?

jwilson3’s picture

Issue summary: View changes

Added a reference to comment #29.

jberg1’s picture

Version: 7.x-1.x-dev » 7.x-1.2

This doesn't seem to work for me. I patched the pathauto.inc. Now on URL Aliases->Setting, I see the new Double curly left, Double curly right, Single curly right, Single curly left but in the ( ) it is empty. And it is not removing those punctuations from the auto generated path.

Am I doing something wrong?
Thanks for any help.

millenniumtree’s picture

Be sure to clear your cache, and also make sure your editor isn't messing up the 'fancy' characters you have to enter into the hook definition function.

I added the unicode 'dash' to mine.

function MYMODULE_pathauto_punctuation_chars_alter(&$punctuation) {
  $punctuation['unicode_dash'] = array('value' => '–', 'name' => t('Unicode Dash'));
}

Someone entered one of these into a node title and pathauto actually killed the whole page with a PHP error. :P

jberg1’s picture

I made sure to clear the cache, and I'm just using a plain text editor to add the characters (not sure how else to place them). I'm also running into the same issue with "é". Parenthesis are just empty and it doesn't remove character. How else could I enter those "fancy" characters into the function so it recognizes them?

$punctuation['accent_e'] 		= array('value' => 'é', 'name' => t('Accent e'));
      $punctuation['double_curly_left']  = array('value' => '“', 'name' => t('Double curly left'));
      $punctuation['double_curly_right'] = array('value' => '”', 'name' => t('Double curly right'));
      $punctuation['single_curly_left']  = array('value' => '‘', 'name' => t('Single curly left'));
      $punctuation['single_curly_right'] = array('value' => '’', 'name' => t('Single curly right'));

Thanks for any help.

jwilson3’s picture

@jberg1, you will be better off using the Transliteration module, which could help you do clever things like convert the "é" to a regular "e", to create a legible url.

jwilson3’s picture

Issue summary: View changes

Link to comment #29.

jenlampton’s picture

Category: Feature request » Bug report
Issue summary: View changes
Status: Closed (works as designed) » Active

I understand that transliteration is the recommended solution here, but I don't have any reason to use transliteration on my site other than to remove the curly quotes from URLs, and it *appears* that there's already an option for removing them right here in pathauto. I see an option for Reduce strings to letters and numbers which says "Filters the new alias to only letters and numbers found in the ASCII-96 set."

Are curly quotes in the ASCII-96 set? I checked online and it didn't look like it.

I'm going to re-open this issue because it appears as though the solution to removing curly quotes from URLs should be as simple as checking this option. But that's not working. (I'm also changing the status of this issue to a bug.)

jenlampton’s picture

Okay, I see what's going on. Instead of removing the characters that aren't in the ASCII-96 set, it looks like pathauto is replacing them with separators.

It's certainly not clear from the checkbox description that this is what will happen, and I think the expected behavior (removal) is more likely the intended feature. I'm going to write a patch that strips them out instead of adding the separator, but if the current behavior is in fact what's intended, the description should be updated instead.

jenlampton’s picture

Status: Active » Needs review
FileSize
450 bytes
greggles’s picture

What would you change the description to?

Dave Reid’s picture

I would still think that the punctuation solution seems the best, compared to changing the reduce ascii method, although I agree that the description of that feature could be improved (but filed as a separate issue).

temkin’s picture

Status: Needs review » Closed (won't fix)

Agree that changing 'reduce_ascii' logic may come as unexpected for site owners who already rely on the current implementation. Suggested solution should be through hook_pathauto_punctuation_chars_alter.

I also created a follow-up ticket to improve the description of 'reduce_ascii' option to avoid the confusion in future - #2905169.

Changing this ticket to "Won't fix", but please re-open if there are any objections.

jwilson3’s picture

Have a library client with a lot of extra punctuation marks in their titles. Maybe this would be useful for someone else (or even myself) in the future:

/**
 * Implements hook_pathauto_punctuation_chars_alter().
 *
 * Strip Dashes, Curly quotes and Angle quotes from URL alias generation.
 *
 * These will appear in the UI at /admin/config/search/path/settings under the
 * "Punctuation" section.
 */
function MYMODULE_pathauto_punctuation_chars_alter(&$punctuation) {
  // Unicode dashes & hyphens
  $punctuation['horizontal_bar'] = ['value' => '―', 'name' => t('Horizontal bar')];
  $punctuation['figure_dash'] = ['value' => '‒', 'name' => t('Figure dash')];
  $punctuation['em_dash'] = ['value' => '—', 'name' => t('Em dash')];
  $punctuation['en_dash'] = ['value' => '–', 'name' => t('En dash')];
  $punctuation['two_em_dash'] = ['value' => '⸺', 'name' => t('Two-em dash')];
  $punctuation['three_em_dash'] = ['value' => '⸻', 'name' => t('Three-em dash')];
  $punctuation['small_em_dash'] = ['value' => '﹘', 'name' => t('Unicode Small em dash')];
  $punctuation['unicode_hyphen'] = ['value' => '‐', 'name' => t('Unicode hyphen')];
  $punctuation['nonbreaking_hyphen'] = ['value' => '‑', 'name' => t('Non-breaking hyphen')];
  $punctuation['small_hyphen_minus'] = ['value' => '﹣', 'name' => t('Small hyphen minus')];
  $punctuation['fullwidth_hyphen_minus'] = ['value' => '-', 'name' => t(' Fullwidth hyphen minus')];

  // Curly quotes
  $punctuation['fullwidth_quote'] = ['value' => '"', 'name' => t('Fullwidth quotation mark')];
  $punctuation['double_curly_left']  = ['value' => '“', 'name' => t('Double curly left quotation mark')];
  $punctuation['double_curly_left_high']  = ['value' => '‟', 'name' => t('Double curly left high quotation mark')];
  $punctuation['double_curly_right'] = ['value' => '”', 'name' => t('Double curly right quotation mark')];
  $punctuation['double_curly_right_low'] = ['value' => '„', 'name' => t('Double curly right low quotation mark')];
  $punctuation['single_curly_left']  = ['value' => '‘', 'name' => t('Single curly left quotation mark')];
  $punctuation['single_curly_left_high'] = ['value' => '‛', 'name' => t('Single curly left high quotation mark')];
  $punctuation['single_curly_right'] = ['value' => '’', 'name' => t('Single curly right quotation mark')];
  $punctuation['single_curly_right_low'] = ['value' => '‚', 'name' => t('Single curly right low quotation mark')];

  // Angle quotes
  $punctuation['double_angle_left'] = ['value' => '«', 'name' => t('Double angle left quotation mark')];
  $punctuation['double_angle_right'] = ['value' => '»', 'name' => t('Double angle right quotation mark')];
  $punctuation['single_angle_left'] = ['value' => '‹', 'name' => t('Single angle left quotation mark')];
  $punctuation['single_angle_right'] = ['value' => '›', 'name' => t('Single angle right quotation mark')];
}
codewatson’s picture

For myself and others, @jwilson3 #41 provided a good start (thank you!), but I found that just putting the characters in the value did not work, I had to convert them to UTF-8 code units using PHP's pack() function:

/**
 * Implements hook_pathauto_punctuation_chars_alter().
 *
 * Strip Dashes, Curly quotes and Angle quotes from URL alias generation.
 *
 * These will appear in the UI at /admin/config/search/path/settings under the
 * "Punctuation" section.
 */
function MYMODULE_pathauto_punctuation_chars_alter(&$punctuation) {
  // Unicode dashes & hyphens
  $punctuation['horizontal_bar'] = ['value' => pack('ccc', 0xe2, 0x80, 0x95), 'name' => t('Horizontal bar')];
  $punctuation['figure_dash'] = ['value' => pack('ccc', 0xe2, 0x80, 0x92), 'name' => t('Figure dash')];
  $punctuation['em_dash'] = ['value' => pack('ccc', 0xe2, 0x80, 0x94), 'name' => t('Em dash')];
  $punctuation['en_dash'] = ['value' => pack('ccc', 0xe2, 0x80, 0x93), 'name' => t('En dash')];
  $punctuation['two_em_dash'] = ['value' => pack('ccc', 0xe2, 0xb8, 0xba), 'name' => t('Two-em dash')];
  $punctuation['three_em_dash'] = ['value' => pack('ccc', 0xe2, 0xb8, 0xbb), 'name' => t('Three-em dash')];
  $punctuation['small_em_dash'] = ['value' => pack('ccc', 0xef, 0xb9, 0x98), 'name' => t('Unicode Small em dash')];
  $punctuation['unicode_hyphen'] = ['value' => pack('ccc', 0xe2, 0x80, 0x90), 'name' => t('Unicode hyphen')];
  $punctuation['nonbreaking_hyphen'] = ['value' => pack('ccc', 0xe2, 0x80, 0x91), 'name' => t('Non-breaking hyphen')];
  $punctuation['small_hyphen_minus'] = ['value' => pack('ccc', 0xef, 0xb9, 0xa3), 'name' => t('Small hyphen minus')];
  $punctuation['fullwidth_hyphen_minus'] = ['value' => pack('ccc', 0xef, 0xbc, 0x9d), 'name' => t('Fullwidth hyphen minus')];

  // Curly quotes
  $punctuation['fullwidth_quote'] = ['value' => pack('ccc', 0xef, 0xbc, 0x82), 'name' => t('Fullwidth quotation mark')];
  $punctuation['double_curly_left'] = ['value' => pack('ccc', 0xe2, 0x80, 0x9c), 'name' => t('Double curly left quotation mark')];
  $punctuation['double_curly_left_high']  = ['value' => pack('ccc', 0xe2, 0x80, 0x9f), 'name' => t('Double curly left high quotation mark')];
  $punctuation['double_curly_right'] = ['value' => pack('ccc', 0xe2, 0x80, 0x9d), 'name' => t('Double curly right quotation mark')];
  $punctuation['double_curly_right_low'] = ['value' => pack('ccc', 0xe2, 0x80, 0x9e), 'name' => t('Double curly right low quotation mark')];
  $punctuation['single_curly_left'] = ['value' => pack('ccc', 0xe2, 0x80, 0x98), 'name' => t('Single curly left quotation mark')];
  $punctuation['single_curly_left_high'] = ['value' => pack('ccc', 0xe2, 0x80, 0x9b), 'name' => t('Single curly left high quotation mark')];
  $punctuation['single_curly_right'] = ['value' => pack('ccc', 0xe2, 0x80, 0x99), 'name' => t('Single curly right quotation mark')];
  $punctuation['single_curly_right_low'] = ['value' => pack('ccc', 0xe2, 0x80, 0x9a), 'name' => t('Single curly right low quotation mark')];

  // Angle quotes
  $punctuation['double_angle_left'] = ['value' => pack('cc', 0xc2, 0xab), 'name' => t('Double angle left quotation mark')];
  $punctuation['double_angle_right'] = ['value' => pack('cc', 0xc2, 0xbb), 'name' => t('Double angle right quotation mark')];
  $punctuation['single_angle_left'] = ['value' => pack('ccc', 0xe2, 0x80, 0xb9), 'name' => t('Single angle left quotation mark')];
  $punctuation['single_angle_right'] = ['value' => pack('ccc', 0xe2, 0x80, 0xba), 'name' => t('Single angle right quotation mark')];
}

This was helpful for finding the correct codes: https://r12a.github.io/app-conversion/

jwilson3’s picture

interesting @codewatson. Maybe your copy/paste into your code editor didn't work because the file was opened and then saved with the wrong File encoding format? I use Sublime Text which has an option to File > Save with Encoding > UTF8.