Replace regular internal paths with URL aliases
hass - January 16, 2007 - 22:03
| Project: | Path Filter |
| Version: | HEAD |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | duplicate |
Jump to:
Description
i'm using your nice module, but one thing that comes up more and more to me is - why do i insert a "internal:" string before every URL? Why isn't this checked with a regexp for href="node/ and href="taxonomy/? Checking for "admin/" makes no sense from my point of view, but maybe i'm wrong. Not using internal: sound simpler... what the idea behind?

#1
Aliases.
One good reason is that with path aliases internal paths can be anything, not just node/..., taxonomy/..., etc. Another reason is that you might want to translate internal paths in contexts other than
<a href="...">(e.g.<img src="...">), so I think it's best to keep it explicit.#2
Without the "internal:" i'm able to upgrade to drupalX if the filter is not upgraded... i'd like more to do generic string matches. search for "href" and "src" sounds like anything, while this are all well known HTML attributes... and if they do not start with "http?" it should be an internal path, isn't it?
#3
@hass wrote:
"Without the "internal:" i'm able to upgrade to drupalX if the filter is not upgraded"
I'm not sure I follow you. If you're saying that your links will not break if you disable the filter, then I would argue that you don't need the filter in the first place.
If I understand your suggestion, you would like the filter to handle anything that looks like relative URL, right? I think this confuses things. While something like
<a href="some/page">looks like a relative URL, it would really be an absolute URL to something like "/mydrupalsite/some/page" (depending on the path to your installation). I think it's much better to be explicit about that by using something like "internal:".For what it's worth, if the filter worked like you suggest, it would be impossible to use a true relative URL for anything (though I confess that I can't think of a good reason you should want to).
In the end, it may be a matter of preference. I prefer to be explicit about what I want to be treated as an internal drupal path. Others may prefer to be able to use internal drupal paths as if they were URL's and have them converted to URLs automatically. It should be pretty trivial to modify the code to do that for anyone who is any good with regexp's.
#4
today, if i disable the filter my site is completely broken. Now if i don't have the filter, while you don't update the filter, my site gets broken on a update. i'm using the "internal:" not only for the switch from dev to live... the major reason are the aliases! The dev/live switch isn't related here. So, if my aliases are broken, this is nothing totally bad for the function of the site. the site is working as before - however bad it is to loose the aliases from SEO side...
if you write your pages you only add links like "path/file", isn't it? if we know what we know from drupal side - this url will start with "node/" or generally it is a relative drupal path - every time. drupal generates absolute paths if you have a base_path variable, but this is only visible on the client, not on the filter process, isn't it? the url function attaches the base_path, not the filter themself.
i hope this makes things clearer...
#5
Even if the filter worked as you suggest, I believe your links will still break if you disable the filter. For example, suppose your $base_url='http://www.example.com' and node/1 is a page whose content includes
<a href="node/2">. With the filter enabled, the link will point to the absolute URL "http://www.example.com/node/2". With the filter disabled, the link will point to "node/2", which your browser will resolve to "http://www.example.com/node/node/2" (broken), since it is relative to the current page (node/1).It's this confusion between relative URLs (relative to the currently viewed page, resolved by the browser) and relative drupal paths (relative to $baseurl, handled by the filter) that leads me to prefer the more explicit "internal:" for this filter to clearly identify it as the latter.
#6
it sounds like we are talking about different things or i simply don't understand you.
Step-by-step:
1. we have a production drupal install under domain http://www.example.com
2. base_path = /, therefor drupal is not installed inside a subdirectory
3. we create some dummy text pages or articles with auto generated drupal url's "node/1", "node/2". Additional a "node/3" with an alias "test/example".
4. now we add an url to "node/2" body
<a href="internal:node/3">Aliastest</a>5. now your filter comes in action, it see "internal:node/3" and call the drupal url function with the url
<?php url('node/3') ?>6. all what your filter have done is grap/catch/extract the drupal url "node/3" out of the HTML body code
7. drupals function
url()does all the rest, lookup the alias table to find out the real path alias "test/example"8. your filter replaces the "internal:node/3" string with the variable returned by drupals url() function. whatever this variable contains on the end of the day... you get "test/example" on the client HTML code, isn't it?
So i don't see why this *must* be named "internal:", if we do a regex search (or something different) for
src="node/"we are fine. To make this generic, as you said there are some different, BUT well known internal (!) url names - it makes sense to put something like a config option to the filter to extend the regexp for e.g. .This follows all the same way, how you are able to configure a block for e.g. to be displayed on a specific page. I'm sure you have seen this administer > blocks > config named Show block on specific pages: with a textarea Pages, where i'm able to add the following list of URLs with wild cards.
node/*taxonomy/*
blog/*
by this way i'm able to specify what urls are looked up in URL alias table and i'm fine. everything on the website is intact, however the filter works or is broken, asside of the broken url aliases if the filter gets broken or is not installed on a system.
#7
minor typo:
you get "/test/example" on the client HTML code, isn't it?
#8
The main reason in my view for explicitness is that we can much more easily find out where to apply PathFilter. My reason for starting to work on this module was for the aliases as well. If I want to change an alias I can use internal:node/8 and change the alias as much as my heart desires and not worry about it. The initial version of PathFilter just found all href="" and changed all relative links, but we found out that using internal: was the way to go for a few of the reasons RayZ pointed out. Also, remember that other pages besides node/X are aliased.
@hass - Open a feature request (even better a patch) for that blog/*, node/* alias and we'll look at that. If you open separate feature requests for any of this stuff it will help nail down specifically how you'd like PathFilter to change. We're always open to improvements.
I should be branching for D5 soon, not sure there are even any code changes though.
#9
I am working on a filter for cleaning up URLs as well as training users to input correct URLs. I don't like the idea of having to train my thousands of users to use "internal:", so that is clean out. In fact, I can only ever imagine developers using that syntax reliably, so your filter is only useful for sites on which content is created exclusively by developers.
The filter should handle URLs of the following common forms:
path
/path
http://base/url/path
https://base/url/path
Users should be encouraged to enter internal URLs in the canonical form, which I take to be the true, non-aliased path for the resource with no leading slash (ie, node/3). Aliases can't be relied upon because they may change, leaving a broken link. Whenever a URL is entered which is not in the canonical form for the resource, the filter should try to handle the URL correctly, but should ask the user to fix their error:
Or something. I'm not a copywriter, but I'll get someone to write the help text.
The user should be encouraged to use the canonical form, but a pretty, aliased form (the path above filtered through url())should be the output. Comments, suggestions before implementing?
Eric
#10
nope, this is what i'm talking about... however i'm interested how you solve this with very good performance and 120.000 URL aliases - what i have :-).
#11
The filter would only affect performance when it is run (on add/edit submit/preview, and whenever the stored text is re-filtered for display), and the number of aliases shouldn't have a major impact on performance. There are lookups both ways (from alias to real and back) but the url_aliases table is indexed in both directions, so those lookups are fast enough. This filter won't fix up your giant url_aliases table.
#12
This filter is running on every page request as i know... not only on a submit/preview with edit/add! therefor this may impact performance very much! it makes no sense to have this filter only running on a save - if you change a path inside your site you don't like to save let say 1000 nodes linking to this one...
#13
I second this request.
@RobRoy/RayZ: Is there a possibility to develop this within Path Filter module or should this be a new module? Your previous comments are implying that you rather would not like to see this in Path Filter.
@edrex: Do you already have code?
I would be happy to maintain such a filter.
http://drupal.org/node/201286 has been marked as duplicate of this issue.
#14
While I would not personally use the option, I wouldn't have a problem with someone else implementing this as an optional mode for Path Filter under one condition ...
Someone writes up a clear, comprehensive list of pros and cons to each mode to be included in the documentation. Even better would be to distill some sort of "best practices" document for handling linking to internal content, with input from the dev list, explaining why one of the modes is the default and under what circumstances you might want to use the other mode.
#15
Has there been any movement on this? I'd be very much in favour of a filter that checked every href="" in a filtered textarea for preferred paths (in addition to the existing "internal:" pattern).
#16
I'm pretty sure that what is being discussed here is that same as what has been developed over at #148300: [Push #1]: Automatic url conversion to internal links. Making this as a duplicate.