How to filter typographic apostrophes from URLs?

ArjanLikesDrupal - December 21, 2008 - 21:39
Project:Pathauto
Version:6.x-2.x-dev
Component:I18n stuff
Category:support request
Priority:critical
Assigned:Unassigned
Status:closed
Description

In the Punctuation Settings I have 'remove' enabled for apostrophe and I use [title-raw] to create the alias. The apostrophe remains in the aliased path.
I'm using Pathauto 6.x-2.x-dev (2008-Dec-14).

#1

JaredAM - December 31, 2008 - 16:51

I've also noticed this in 5.x-2.3.

The punctuation list should also be expanded to catch the left/right single quotation mark (&#8216 and 8217) and the left/right double quotation mark (&#8220 8221). If people cut and paste into a text field, these come across but aren't caught.

Thanks for the fantastic module.

#2

tobyspark - December 31, 2008 - 19:30

...was just going to report this, on my site running 5.x-2.3.

adding the " ’ " character (not " ' ") to the "Strings to Remove" list doesn't have any effect.

#3

ArjanLikesDrupal - January 6, 2009 - 23:37

I just noticed the double quote (") does not get filtered either, and letters with accents, like á, don't get transliterated.
I have the transliteration option enabled (using Transliteration 6.x-2.0). (not sure if the second point is a separate issue, or the same problem as the punctuation not being removed).

#4

mokargas - April 2, 2009 - 04:47

This is still an issue, I've tried modifying the module to remove the "typographic" apostrophe "’", but no success. Maybe I'm placing it in the wrong place.

This code works as it should if I just make a page and try, yet adding similar code to pathauto gives no love.

<?php
        $string
= "mike o’leary";
       
$new_string = ereg_replace("[’]", "", $string);
        echo
$new_string;
?>

Any solutions as yet? Is there something I'm overlooking? Regards, MK

#5

Freso - April 2, 2009 - 05:51

The regular apostrophe "'" is handled by the punctuation settings. Apostrophe (and any other) characters outside ASCII are handled by the transliteration settings, including the left/right single quotation mark (&#8216 and 8217) and the left/right double quotation mark (&#8220 8221). And transliteration is run before punctuation cleaning, so it should be safe to transliterate to punctuation characters.

#6

mokargas - April 2, 2009 - 05:54

Funny I just figured that out and was about to report! Thanks Freso.

Additionally, for a quick fix I used the following in the punctuation array at the bottom of the pathauto.inc file, with no problems

<?php

  $punctuation
['left_quotes'] = array('value' => pack('ccc', 0xe2, 0x80, 0x98), 'name' => t('Left single quotes ‘'));
 
$punctuation['right_quotes'] = array('value' => pack('ccc', 0xe2, 0x80, 0x99), 'name' => t('Right single quotes ’'));
?>

#7

joostvdl - June 16, 2009 - 08:32
Priority:normal» critical

I installed the 6.x-2.x-dev release and still gives an apostrophe a problem that the title-raw is ignored resulting in an empty path.

Problem was also in the 6.x-1.1 release.

#8

Sam-Inet - July 25, 2009 - 21:11

Uhg, Okay, I'm just not getting it. And this has been bugging me since 4.x????

[Quote Freso - April 2, 2009 - 05:51]
regular apostrophe "'" is handled by the punctuation settings. Apostrophe (and any other) characters outside ASCII are handled by the transliteration settings
[/Quote]

How and where exactly do you get rid of a regular apostrophe? Seemingly no matter what I try I still get a Title field of "Title's Test" turned into "/title039s-test"

I tried these which didn't work:

In admin/build/path/pathauto
- Punctuation settings >> Single quotes (apostrophe) ': Set to both "Remove" and "Replace by separator"
- General settings >> Strings to Remove: Set to 039,'
- Added to i18n-ascii.txt: ' = "XXX"

#####################################

Sum-of-a-Twit!@

Found it, but I doubt that this is a legitimate "fix"

In your 18n-ascii.txt file add this line:

039 = ""

I'll take a wild a** guess that you can add the other odd ball numbers punctuation gets converted into in the same fashion.

Anyone who actually knows pathauto care to comment on the possibility of adding "039" (etc.) to the 18n file causing other issues?

Wish I'd gotten frustrated enough 3 years ago to try this,
Sam

-----------
Works in Versions:
Drupal 6.13
Pathauto 6.x-1.1

Partially works in (returns "/title_s_test" but better than 039 showing up)
Drupal 5.2
Pathauto 5.x-2.0-beta2

#9

mokhin - August 6, 2009 - 07:11

I vote for adding the option of filtering out left/right quotation marks, too.

It's not nice to see, for example, http://..../content/kde-430-«Caizen» in URLs

#10

greggles - August 7, 2009 - 23:22
Status:active» fixed

You vote to add something that already exists? Great!

Use 6.x-2.x. Configuration transliteration.

The regular apostrophe "'" is handled by the punctuation settings. Apostrophe (and any other) characters outside ASCII are handled by the transliteration settings, including "the left/right single quotation mark (&#8216 and 8217) and the left/right double quotation mark (&#8220 8221)." And transliteration is run before punctuation cleaning, so it should be safe to transliterate to punctuation characters.

#11

mokhin - August 10, 2009 - 13:48

Sorry and thanks for fixing that in dev branch. Unfortunately I'm still using the stable version, because http://drupal.org/project/pathauto and especially http://drupal.org/node/95354 clearly warn against using 2.x-dev in production.

#12

System Message - August 24, 2009 - 13:50
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

#13

colincalnan - October 19, 2009 - 20:34
Status:closed» patch (to be ported)

Here's a patch I made that will handle both single quotes and double quotes.

AttachmentSize
pathauto-349882-13.patch 1.36 KB

#14

greggles - October 19, 2009 - 21:58
Status:patch (to be ported)» closed

Thanks, but the rest of the issue makes it pretty clear: This is considered to be "fixed" already.

#15

aasarava - October 28, 2009 - 23:03
Title:PathAuto: apostrophe not filtered from path alias» How to filter typographic apostrophes from URLs?
Component:Code» Documentation
Category:bug report» support request
Status:closed» active

Greggles, I'm reopening this as a support request / documentation issue. Pathauto is an awesome module, but clearly several people are unsure how to configure it to remove typographic apostrophes (&#8216 and 8217). I too am having this problem and I've been developing complex Drupal sites for years.

Thanks to your feedback and Freso's feedback in this thread, I understand that I first need to transliterate the typographic apostrophes into standard (single quote) apostrophes, and then pathauto punctuation settings will take over from there. But what steps should one take to implement transliteration of the typographic apostrophes? Do we edit the i18n-ascii.txt file? If so, what do we add to the bottom of the file? (I tried 8216, 8217, the hex equivalents, and pasting in the apostrophes directly, but wasn't able to make it work.)

I think a quick post with specific steps on how to do this will put an end to questions about this issue. Thanks so much.

#16

greggles - October 28, 2009 - 23:08
Component:Documentation» I18n stuff

I don't know how to use the system. I'm relying on Freso to handle this whole section.

#17

meerkat - October 31, 2009 - 12:37

I encountered the problem of apostrophes in paths this morning, googled for a solution and found this thread. These are the steps I took to get the transliteration working:

Copy modules/pathauto/i18n-ascii.example.txt to modules/pathauto/i18n-ascii.txt

Open i18n-ascii.txt in a text editor

Enter the following at the end of the file, one for each character that needs to be replaced

{char} = ""

were {char} is a character pasted in from a suitable source (I used the Mac character viewer).

To test, I entered some quoted text in a Word doc, e.g. Lorem 'ipsum' dolor, let Word autocorrect the quotes, pasted the result into the title field of a new node and saved. The path generated by pathauto had stripped out the quotes.

Drupal 6.14, pathauto 6.x-1.1

#18

greggles - October 31, 2009 - 17:00
Status:active» fixed

Excellent. Thanks for the update meerkat.

#19

System Message - November 14, 2009 - 17:10
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.