Download & Extend

Remove leading and trailing slashes in path aliases (and removing dots)

Project:Drupal core
Version:8.x-dev
Component:path.module
Category:bug report
Priority:normal
Assigned:rocket_nova
Status:needs work
Issue tags:GoogleUX2012, needs backport to D7, Needs tests, Usability

Issue Summary

A contributor on one of my sites just discovered what seems a bug. Here is how to reproduce it:

  1. work on a drupal site, say http://www.drupal.org/
  2. create a post, say a story
  3. prior to submitting, define the "URL path settings" : do not read the description that says these must be local, and type in some absolute URL, like http//:www.example.com/.
  4. submit

At this point, you find yourself on http://www.drupal.org/http%3A//www.example.com, which causes a "page not found" error.

This seems to be a bug: absolute paths should probably cause a post validation error. Just because users can't read doesn't mean sites shouldn't give some clues instead of bailing out.

Comments

#1

A possible solution would be to sanitize aliases so that incorrect urls like absolute ones are converted to something that will actually be recognizable when requested and point to the same page.

#2

Title:Absolute path not avoided» Disallow absolute urls as path aliases

Agreed that this should be fixed; I had it happen to me once or twice when I just started using drupal.

#3

In ver 4.6 I was using absolute urls as path aliases for some static html pages. When I upgraded to 4.7.4 that no longer worked but there was no help given to fix it.
What I found that worked for me is this.
Drupal resides in http:www.mysite.com/drupal. The static page resides at http://www.mysite.com.
The old absolute url path was http://mysite.com/index.html
The new url path that works is ../index.html

#4

Status:active» needs review

Rather than return an error, this patch silently removes any leading or trailing forward slashes. I think this is the best solution in terms of usability. Without this patch it can be troublesome to fix nodes given an incorrect alias, because the links to the aliased url will be broken too.

I tested with http://www.example.com and it worked fine on FF and IE without any patch, so I didn't include a fix for that input.

#5

Dang, forgot to attach the patch again.

AttachmentSizeStatusTest resultOperations
path_absolute.patch749 bytesIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch path_absolute.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.View details | Re-test

#6

So am I being told that there is no way to do an absolute URL to another site?
I tried putting a link to another site of mine using fill HTML and it was converted to a local link.
What is going on?????

#7

Forgot to say I am using 4.7.4.
At this rate, every time I upgrade to the latest version I loose at least one thing that I need. My last upgrade I lost the album module. This time I lost absolute URLs, authenticated user and all of the usable info under the status column in the users list.
I am beginning to think that upgrading is not a good idea.

#8

Darrell, this is not what is at point here: the problem being discussed is that you should not create a page on site A and set for it an alias residing on site B, since site B would normally have no way to redirect you to the original page on site A.

This has nothing to do with links in post content.

#9

Version:5.0-beta2» 6.x-dev

#10

Status:needs review» needs work

patch no longer applies

# patch -p0 < path_absolute.patch
patching file modules/path/path.module
Hunk #1 FAILED at 128.
1 out of 1 hunk FAILED -- saving rejects to file modules/path/path.module.rej

#11

Status:needs work» needs review

New patch for HEAD. This patch trims leading or trailing forward slashes, and removes the warning from the field description. It seems that absolute paths no longer break anything anyway.

AttachmentSizeStatusTest resultOperations
trim-slashes.patch2.28 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch trim-slashes.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.View details | Re-test

#12

looks good. i have not tested.

#13

Status:needs review» reviewed & tested by the community

Verified issue
Changed url to /my/page/
Upon viewing, was directed to my/page
Editing the page also shows corrected my/page URL
Warning message did not appear in URL setting directions

Ready for commit

#14

Status:reviewed & tested by the community» needs review

Erm, what happens to http://www.example.com/ when trim-ed with /? IMHO http://example.com Better? How does this solve the issue?

#15

Title:Disallow absolute urls as path aliases» Remove leading and trailing slashes in path aliases
Status:needs review» reviewed & tested by the community

I might misunderstand your question but I think maybe the confusion is that the issue has morphed and is now mistitled.

The original issue was about absolute urls, but in head that issue is already fixed and no longer breaks anything. So actually this patch is to fix relative aliases that are misentered, like if someone enters a path /node/14, so that it will just fix it rather than having to include fineprint to warn you not to do it.

#16

Status:reviewed & tested by the community» needs review

Ah, good titles matter a lot. So what about adding a comment to that effect to the trim(). I remember we were puzzled on some parts of this function before, so it is always good to have this documented. I'd add something along the lines of

// Remove possible slashes from both ends to avoid invalid aliases.

#17

Yep, I had a comment in my original patch...

#18

Status:needs review» reviewed & tested by the community

I'll get to massaging this a bit later, but it looks good (although it changes a string, it fixes a quite often encountered usability issue).

#19

If we anyway have to change the string, we should put improve on the wording:

In path_admin_form():

  • "Specify an alternative path by which this data can be accessed." sounds too technical. Instead of "data", we could actually use "page", which would IMHO still be correct and a lot more specific and understandable.
  • "For example, type "about" when writing an about page." is misleading here: the about page must already be written, when being aliased here! We could either say "when aliasing an about page" or "to refer to an about page".

In both occurrences:

  • I would reorder the sentences and move the example to the end. The quotes might also be misleading, maybe <kbd>about</kbd> is better... (Looks like about then)?
  • On the admin page we use the term "path", in nodeapi we use "URL". We should stick with one.

Proposal:
"Specify a path alias by which this page can be accessed. Use a relative path. For example, type <kbd>about</kbd> to refer to an about page."

and

"Optionally specify a path alias by which this node can be accessed. Use a relative path. For example, type <kbd>about</kbd> when writing an about page."

Also it is very misleading that the module is called "Path" (path.module) and the admin page on the other hand "URL aliases". I always intuitively search for a "path" menu entry, then scroll the list down and once I find "URL aliases", I remember: "Ahh, that was the misworded one".

#20

Status:reviewed & tested by the community» needs review

Back to discussion then.

#21

Some comments would be great, since we are mostly done!
All that's left is the correct naming conventions and the wording of two strings (see #19).

So let's start with the most general issue:
The module's name is "path". Right now, the functionality of path aliasing is inconsistently named "URL alias" (mostly), "Path alias", "alternative path".
While core shouldn't generally follow contrib - the important pathauto module is not called URLauto, but pathauto and it consistently uses the terminology "path alias".
Now I think we should make up a decision whether in the functionality be consistently called "Path alias" or "URL alias" in core... some opinions on this, please!

#22

I made a new patch with Pancho's text changes. I agree with his points about the text.

AttachmentSizeStatusTest resultOperations
trim-slash.patch2.28 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch trim-slash.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.View details | Re-test

#23

+1 for #22, though this only covers part of the naming issues.

Mainly I was criticizing the inconsistant usage of "URL alias" vs. "Path alias". We would need to unify this to "Path alias" IMO. The enclosed patch covers this. Unfortunately we'd need to rename two permissions, change four more strings and remove one. While this might sound inacceptable, this would IMHO be a large UI improvement and finally streamline the inconsistant naming.

Don't know if we should split that off this issue here, as we moved away from the starting point.
As the trim-slash part we all agree upon should be committed asap, this shouldn't hold back the patch in #22 anyway.

AttachmentSizeStatusTest resultOperations
trim-slash-and-streamline-naming.patch8.24 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch trim-slash-and-streamline-naming.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.View details | Re-test

#24

Version:6.x-dev» 7.x-dev

#25

Status:needs review» needs work

No longer applies to HEAD:

patching file modules/path/path.module
Hunk #1 FAILED at 21.
Hunk #2 FAILED at 37.
Hunk #3 succeeded at 60 with fuzz 2 (offset -4 lines).
Hunk #4 succeeded at 84 (offset -1 lines).
Hunk #5 succeeded at 119 (offset -4 lines).
Hunk #6 FAILED at 126.
Hunk #7 succeeded at 183 (offset 2 lines).
Hunk #8 succeeded at 186 (offset -4 lines).
Hunk #9 FAILED at 201.
4 out of 9 hunks FAILED -- saving rejects to file modules/path/path.module.rej
patching file modules/path/path.admin.inc
Hunk #2 succeeded at 53 (offset 2 lines).
Hunk #4 succeeded at 80 (offset 2 lines).
Hunk #6 succeeded at 126 (offset 2 lines).
Hunk #8 succeeded at 158 (offset 2 lines).
Hunk #10 succeeded at 182 (offset 2 lines).

This is also a large patch that is hard to review, please consider splitting up this patch to make it easier to review quickly.

Robin

#26

Agreed, Robin. The purpose of this patch is to remove leading and trailing slashes in user-entered paths, which is a usability improvement.

Pancho's naming consistency issue should be a separate issue.

The patch in #22 should be rerolled and retested for HEAD.

#27

Status:needs work» needs review

Patch removes leading and trailing slashes in user-entered path aliases and adjusts path alias form description text to remove the no-trailing-slash requirement.

AttachmentSizeStatusTest resultOperations
103680.patch2.45 KBIdleFailed: 8987 passes, 0 fails, 1 exceptionView details | Re-test

#28

Status:needs review» needs work

The last submitted patch failed testing.

#29

Status:needs work» needs review

I couldn't reproduce the test problem, resubmitting.

AttachmentSizeStatusTest resultOperations
103680.patch2.35 KBIdleFailed: Failed to install HEAD.View details | Re-test

#30

Status:needs review» needs work

The last submitted patch failed testing.

#31

tagging, failing test

#32

Marked #250525: Disallow trailing slashes in path aliases as a duplicate of this issue.

#33

Status:needs work» needs review

reroll

AttachmentSizeStatusTest resultOperations
trim-alias.patch2.4 KBIdlePassed: 11845 passes, 0 fails, 0 exceptionsView details | Re-test

#34

Just reviewing this - if you enter an alias with a . after the / you get the same page not found issue.

Try with alias "http://example.com/."

#35

Updated patch to fix dot issue noted in #34

AttachmentSizeStatusTest resultOperations
trim-alias.patch2.49 KBIdleFailed: Failed to apply patch.View details | Re-test

#36

Status:needs review» needs work

The last submitted patch failed testing.

#37

Resubmitting for testing, incorporates changes made in #33

AttachmentSizeStatusTest resultOperations
trim-alias-no-dots.patch2.08 KBIdleFailed: Failed to apply patch.View details | Re-test

#38

Removing all dots is a bad idea, for example a person might want to use an alias of test.html to match old paths. Seems better to use trim to remove dots from the beginning and end.

#39

Title:Remove leading and trailing slashes in path aliases» Remove leading and trailing slashes in path aliases (and removing dots)
Status:needs work» needs review

Agree - patch modified to trim dots from either end.

Also removes "./" and "/." from aliases which cause page not found errors.

AttachmentSizeStatusTest resultOperations
trim-alias-no-dots.patch2.17 KBIdleFailed: Failed to apply patch.View details | Re-test

#40

Status:needs review» needs work

The last submitted patch failed testing.

#41

Status:needs work» needs review

re-rolling for another test...

AttachmentSizeStatusTest resultOperations
trim-alias-no-dots.patch2.15 KBIdlePassed: 11845 passes, 0 fails, 0 exceptionsView details | Re-test

#43

Status:needs review» needs work

The description changes and trimming the slash is great. However, str_replace is not, what if you have something like foo/bar/../baz then you can't just str_replace those pieces out , you need a validation error instead. Also, needs a test.

#44

If we took out the 2 str_replace lines in the patch and just submit the trim slashes functionality, would this then be able to get into D7?

#45

Assigned to:Anonymous» Xano

We had some discussion on this in IRC and I'll finish this tonight. I'll merge this issue with #394402: Path field needs better help text and validation.

#46

Version:7.x-dev» 8.x-dev
Status:needs work» needs review
Issue tags:+needs backport to D7

Since #332333: Add a real API to path.module resolved the trim() and str_replace() issues by including trim() in all relevant cases. I believe what's left here is to the update the help text and to check in the alias validation for non-relative URLs. Patch attached to that effect.

Also tagging for backport to D7, though the string updates should probably be left out due to translation issues.

AttachmentSizeStatusTest resultOperations
path-alias-103680-46.patch2.39 KBIdlePASSED: [[SimpleTest]]: [MySQL] 33,287 pass(es).View details | Re-test

#47

Status:needs review» needs work
Issue tags:+Needs tests, +Novice

#46 looks good to me, and I agree with the point about a backport without the string changes. We'll also want to add some tests for this.

Note that the Drupal 8.x patch will need to be rerolled, because the core directory structure for Drupal 8 has now changed. (For more information, see #22336: Move all core Drupal files under a /core folder to improve usability and upgrades). When the patch has been rerolled, please set the issue back to "Needs Review."

Tagging as novice for the task of rerolling the Drupal 8.x patch.

#48

Here is the rerolled patch, for further work.

AttachmentSizeStatusTest resultOperations
path-alias-103680-48.patch2.43 KBIdlePASSED: [[SimpleTest]]: [MySQL] 33,992 pass(es).View details | Re-test

#49

Status:needs work» needs review
Issue tags:-Novice

Thanks! Sending to testbot.

#50

Status:needs review» needs work

Still needs tests.

And the patch submitted in #46 and rerolled in #48 is not correct:
It check the alias is not an external url, and set an error if it is. But why? This issue is about trailing slashes. Why is it problem to set an external url for alias? It will works without this patch, I think it isn't a problem.

But the patch don't do anything with trailing slashes, and if we set an alias with them, Drupal will still give a page not found page.

#51

Assigned to:Xano» rocket_nova
Status:needs work» needs review

@Désiré you're right. I re-read the history of this issue and agree with you. Here is a test for leading and trailing slashes and a patch to fix leading and trailing slashes.

AttachmentSizeStatusTest resultOperations
path-alias-test-only-103680-51.patch2.38 KBIdleFAILED: [[SimpleTest]]: [MySQL] 34,227 pass(es), 6 fail(s), and 0 exception(es).View details | Re-test
path-alias-with-test-103680-51.patch3.83 KBIdlePASSED: [[SimpleTest]]: [MySQL] 34,226 pass(es).View details | Re-test

#52

Thanks for the tests. Not sure about the new patch, though... What happened to validation errors for /. in the middle and the updated UI text, though? That's what the url_is_external() bit was for.

#53

@xjm I'm feeling a bit confused about what the issue is and what the solution should be for it. It seems like external URLs are handled ok by the system, so I didn't address that in the patch. Should this issue be about removing leading and trailing slashes and dots or about external URLs?

#54

Hmmm, I don't think external URLs "work." You end up with URLs like:
http://localhost:8888/d7/http%3A//localhost%3A8888/d7/foo

And it is related. E.g., for especial fun, create a node with this path alias if your site's domain is mysite.com:
http://mysite.com/foo/../

A user might enter an exernal URL for the same reason he or she might enter a relative path using . or .., and I think it makes sense to fail validation on that.

Maybe we just need to expand the scope of this issue?

In any case, the current patch above also does not fix other patterns that result in a 404, like:
/d7/foo/../bar
/d7/foo/./bar
...and I think these at least were in the original scope of the issue.

#55

Status:needs review» needs work

Marking NW for the last case above, at least. We can strip leading or trailing slashes, sure, but I think we should be adding better validation on the field.

#56

ran into this again at usability testing.

#57

Tagging.

#58

Wait. Why removing dots? "index.html" is a perfectly valid path alias. If the concern is that people put an entire URL in there, then prefix the field with the site address. (e.g. http://example.com/[_______________________])

But +1000 for stripping out leading/trailing slashes. That's just silly to do to people.

#59

Not removing all dots... just path components with only dots and no alphanumeric characters. See #54. We need a more sophisticated regex.

#60

Issue tags:+GoogleUX2012

updating tags

And here's a painful video showing the Google Usability study where someone entered a path with a leading slash and got an error.