Break URLs at separator character and remove the separator

tcconway - June 12, 2008 - 19:51
Project:Pathauto
Version:6.x-2.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:closed
Description

Often times, we are forced to have a very long title name. When that happens, PathAuto often trunkates the path as it should, but it sometimes keeps the trailing space (and converts it to a - as it should)

Possible for PathAuto to delete the trailing slash (trailing -, when its converted)
An example url that PathAuto generates is:
http://www.example.com/press_releases/2008/05/aol-awards-ohio-university...

Steps to repeat:
1. Create a posting. For the title, give it a REALLLYYY LOOONNGGG name.
2. Submit it.

Thanks!

#1

Freso - June 12, 2008 - 22:36
Version:5.x-2.2» 5.x-2.x-dev
Status:active» active (needs more info)

I just added a lot of pattern trimming to both the 5.x-2.x and 6.x-1.x branches. Could please confirm this is still happening with the 5.x-2.x-dev version? (You may have to wait a bit though, until the "Last updated" field says "June 13" (or later).)

#2

tcconway - June 24, 2008 - 16:02

Hey Freso,
Just installed 5.x-2.x-dev and still having the same issue. It still creates a url that ends in "-".
Thanks.

#3

greggles - June 24, 2008 - 16:15
Title:Long Title Names - PathAuto trunkates and includes a trailing space» Punctuation character at end of url is left as separator
Status:active (needs more info)» active

Steps taken:

1. Create a node with the title Often times, we are forced to have a very long title name. When that happens, PathAuto often trunkates the path as it should, bu
2. Resulting alias was often-times-we-are-forced-to-have-very-long-title-name-when-that-happens-pathauto-often-trunkates-th

That seems as expected to me.

So, I tried
1. Create a node with the title Often times, we are forced to have a very long title name. When that happens, PathAuto often trunkates            the path as it (note all of the spaces after trunkates, I expected pathauto to cut the line there and replace the space with hyphen) but...
2. Resulting alias was often-times-we-are-forced-to-have-very-long-title-name-when-that-happens-pathauto-often-trunkates-th

So I tried
1. Create a node with the title Often times, we are forced to have a very long title name. When that happens, PathAuto often trunkates t he path as it (note the space in the middle of the word "the" in "trunkates t he")
2. Resulting alias was often-times-we-are-forced-to-have-very-long-title-name-when-that-happens-pathauto-often-trunkates-t-

So, I'd say that is the bug, right?

#4

tcconway - June 24, 2008 - 17:38

Well played. That's exactly what we are experiencing as well.

To me, there are two issues:

  • If the 100th caracter is a space, it doesnt trim it off, but replaces it with a "-" (or whatever the administrator defines as the separator).
  • To me, it should cut off at the last word(s) that go beyond the 100character cutoff...and prevent things like often-times-we-are-forced-to-have-very-long-title-name-when-that-happens-pathauto-often-trunkates-th. Resulting in: often-times-we-are-forced-to-have-very-long-title-name-when-that-happens-pathauto-often-trunkates

#5

Freso - June 24, 2008 - 17:56

Hohum. I'm inclined to re-categorise into a feature request for what's described in #4. This also means that I'm inclined to not want to spend time fixing this for 5.x, but have it moved to 6.x-2.x instead.

#6

greggles - June 25, 2008 - 14:44
Title:Punctuation character at end of url is left as separator» Break URLs at separator character and remove the separator

@Freso - Agreed.

I've also received several requests that when urls are shortened for whatever reason they should be shortened to a good logical point (i.e. separators) instead of just the exact point where 100 characters lands. Since those seem similar in my mind, I'll just bundle that one in here.

#7

greggles - July 21, 2008 - 22:17

Something like http://api.drupal.org/api/function/truncate_utf8 will sure help when we need to do this.

#8

Freso - July 28, 2008 - 10:12
Version:5.x-2.x-dev» 6.x-2.x-dev

#9

Freso - July 28, 2008 - 13:12
Category:bug report» feature request
Status:active» patch (code needs review)

This should work, but I haven't tested it. So please do.

AttachmentSize
269929_Break_alias_at_word-8.patch742 bytes

#10

Freso - August 4, 2008 - 15:59
Status:patch (code needs review)» patch (code needs work)

Just tested it, and it doesn't seem to work. Bugger.

#11

greggles - August 4, 2008 - 16:28
Status:patch (code needs work)» patch (code needs review)

How about this?

AttachmentSize
269929_Break_alias_at_word-11.patch1.75 KB

#12

Freso - August 4, 2008 - 16:38
Status:patch (code needs review)» patch (reviewed & tested by the community)

Yay! It works! :D

It had a line of pure whitespace though, so fixed that. I also added a line documenting what that block of code does. Should be good to go.

AttachmentSize
269929_Break_alias_at_word-12.patch1.4 KB

#13

greggles - August 4, 2008 - 16:45
Status:patch (reviewed & tested by the community)» fixed

Awesome - thanks, Freso.

http://drupal.org/cvs?commit=131526

#14

greggles - August 4, 2008 - 17:09
Status:fixed» patch (code needs work)

Per Freso, this needs a little more testing...

#15

greggles - August 4, 2008 - 18:45
Status:patch (code needs work)» patch (code needs review)

The problem was that if you created a node with a short title this still lopped off the last word.

1. Create a node with the title "this is a short title"
Expect results:
aliased as content/this-short-title

Actual results:
aliased as content/this-short

The attached patch fixes that problem, generalizes this code so it can be re-used elsewhere in pathauto, and then re-uses it where individual tokens are built and shortened to the appropriate length.

#16

greggles - August 4, 2008 - 18:46

Attached...now...

AttachmentSize
269929_Break_alias_at_word-15.patch2.21 KB

#17

Freso - August 5, 2008 - 00:41

+1 for separating the logic to its own function, but perhaps call it _pathauto_truncate_url() or truncate_chars instead? (FWIW, the 7.x version of truncate_utf8() will be drupal_truncate_chars().)

Also "A Pathauto friendly version of truncate_utf8" should be finished off with a ".". ;)

Apart from this, the patch looks good. Looking forward to test it in the morrow!

#18

greggles - August 5, 2008 - 02:56

Good points.

AttachmentSize
269929_Break_alias_at_word-18.patch2.26 KB

#19

Freso - August 5, 2008 - 15:39
Status:patch (code needs review)» patch (reviewed & tested by the community)

Works now with both long and short aliases! Yay! And I see no more nits to pick at (well, perhaps truncate_utf8 could be referenced with a pair of parentheses... but I'm somewhat indifferent on that). :p

#20

Freso - August 5, 2008 - 15:51
Status:patch (reviewed & tested by the community)» patch (code needs work)

It breaks the tests though, but this is likely because the tests themselves need to be updated for this new logic.

#21

Freso - August 5, 2008 - 16:25
Status:patch (code needs work)» patch (reviewed & tested by the community)

Actually, without this follow-up patch, it'll break two tests, but with it, it'll only break one. So it's an improvement. The test still breaking is "[testPathAuto]: Node accessible through alias at [[...]/sites/all/modules/pathauto/tests/pathauto.test line 84]", which tries to fetch the node using the alias.

#22

Freso - August 5, 2008 - 16:27

And even reverting the originally committed patch from this issue doesn't take the fail away.

#23

greggles - August 5, 2008 - 17:02
Status:patch (reviewed & tested by the community)» fixed

And fixed - http://drupal.org/cvs?commit=131753

Thanks for the reviews/testing.

#24

Anonymous (not verified) - August 19, 2008 - 17:03
Status:fixed» closed

Automatically closed -- issue fixed for two weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.