Relative urls in feeds
| Project: | Drupal |
| Version: | 8.x-dev |
| Component: | aggregator.module |
| Category: | feature request |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | needs work |
The feeds Drupal creates include node->teaser or node->body (depending on configuration) in the description fields items. This text can contain relative links to other content on the same domain. Many aggregators, including Drupal's own and bloglines.com, can not handle these relative links. The RSS specification specifies that all links should be absolute.
The attached patch fixes the feed descriptions in format_rss_items (common.inc). It adds a function named 'filter_absolute_urls' to common.inc, which can also be used by aggregator.module and other modules that need absolute urls (such as modules that send out emails).
There are alternative approaches to this fix. One could be to use nodeapi 'rss items' hook, using a contrib module. However in the current implementation of this hook in node.module, modules can not modify the body and/or teaser of items. The cleanfeeds module uses this approach, and needs a patch for node.module to get it to work.
Another approach would be to enable the filter system for feeds, and make a filter which creates absolute urls. Having a 'filter_absolute_urls' function in common.inc would still be very usefull in this scenario.
The 'filter_absolute_urls' function can probably be made a little bit more efficient; I currently have two regexps, one for relative urls ("path/to/somewhere") and one for urls relative to the domain root ("/path/to/somewhere").
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| filter_absolute_urls.patch | 1.28 KB | Ignored | None | None |

#1
Same problem here...
when content is published on other sites, links are wrong as they don't contain a full url...
#2
I guess this issue is not getting much attention because its against 4.7.x as opposed to CVS.
I'll see if I can find time to install CVS and see if the problem persists in the current CVS. Nevertheless, this is an ugly issue in 4.7.x too, because the aggregator in Drupal is not even displaying its Drupal generated feeds properly.
#3
1. create a node
2. insert a relative link (eg. href="node/add")
3. access the rss.xml
4. links are still relative and must e absolute
#4
#5
This is a pretty major issue. Any update on it?
#6
The added function needs some phpDOC comments. This hasn't seen the light of day because the status is set to patch (code needs work) meaning that someone has reviewed the code and decided that it needed work. Others will not look until it is patch (code needs review). So once you've made the changes, change the status.
#7
Just pointing out that it seems like a relatively big issue to leave open. The RSS spec asks for absolute URLs and Drupal is producing relative ones -- so Drupal RSS feeds won't validate (or, in some cases, work at all).
FYI: to those that are really stuck with this problem, you can set the $base_url in settings.php and that will add a absolute path to all of your RSS feed links.
Phillip.
#8
I have copied this function on my common.inc (drupal 5.2) and after add the call on format_rss_item function in that way:
$output .= ' <description>'. check_plain(filter_absolute_urls($description)) ."</description>\n";and works!!! Now no problem with feed and feedburner!
Thanks!!!
Saluti
BES
#9
@ahoeben: Are you planning to provide a modified patch to provide the comments?
#10
I would like to see this get into 6.x first. Here are problems I see:
- Function documentation is incomplete. Need more @param and @return.
- Second return statement is never reached.
#11
Reroll, with some comments.
#12
#13
The token amount of documentation is just not enough.
Rename the parameter $url to $html, as you're not passing in one url at a time to this function, you're passing in html.
Change the documentation to something like:
@param $html
A block of html containing relative and absolute urls in 'src' and 'href' attributes
@param $absolute_url
An absolute URL that points to this Drupal website.
Is the second parameter really needed for this function? why can't we just use url('')?
#14
Cleanup and comments. Changed the name of the function -- you must not call this filter* something as it's outside of filter module namespace.
#15
tested with relative and absolute paths (e.g.: /node/add/story, node/add/story, fullurl.com/node/add/story), all three gives the same, correct results in the feed as absolute url.
#16
does it work relatively in a subdirectory?
If your install is in
drupaland you link to/drupal/node/adddoes it work?#17
my drupal installation was located under http://localhost/projects/drupal/drupal6 and it worked
#18
Thanks for picking this up! Had to take a break from Drupal for a while, sorry 'bout disappearing... Patch looks good, works for me.
#19
Conceptual issues:
- I can't sew how does this handle some situations, like if I have a link from http://example.com/news/local to "worldwide", which is a relative link to "http://example.com/news/worldwide".
- I see that the second preg matches a link to '/', but I found it hard to come up with examples myself for cases when the first matches or when the second matches. In other words: why do we need two different matchers?
- This function only matches lower case attribute names, which might not be that big of a problem for those following XHTML rules, but frankly, not everyone cares. I am not too comfortable with doing a case insensitive match, because it is a lot sloooooooower, then what we have now, but this still need to be on our mind.
Cosmetic issues:
- "html" should be "HTML" in comments
- The function does not concentrate on "a href", but rather "... href", so this should be corrected. Other tags have href attributes, so this is not exclusive to "a".
#20
Reroll of my original patch, with cosmetic changes and documentation.
The second argument to the function is useful to have:
#21
woops
#22
,$base_document misses a space.
#23
Before I roll a new patch for that space, any more opinions on whether or not we should do case insensitive pregs instead? I am partial to case insensitivity, since the result of this function will likely be cached anyway...
#24
New patch
#25
DrupalTestbedBot tested ahoeben's patch (http://drupal.org/files/issues/drupal_force_absolute_urls_1.patch), the patch failed. For more information visit http://testing.drupal.org/node/130
#26
Thank you, mister DrupalTestbedBod. Is this patch better?
#27
This isn't a bug, IMO. I'm not a big fan of regex'ing things into place.
The XML specification provides a clean solution for this; the xml:base attribute. The feeds generated by Drupal support include this attribute. What needs fixing is the aggregator module, not the feed generation.
In documentation, we write 'URL', not 'url'.
#28
Point taken for outgoing RSS feeds.
However, there are still situations where the suggested 'drupal_force_absolute_urls' function is useful; One such situation would be for applying the xml:base attribute to relative links on incoming RSS feeds.
IMHO it is still a critical bug for 6.x that Drupal can not correctly aggregate its own feeds. I'm a bit reluctant to mark this as an 'aggregator.module' bug, because there are other situations where the same problem exists and where xml:base is not a solution. Links in outgoing emails for example.
#29
subscribing
oh I just changed the version because I thought people wouldn't be working on it, if it's not the current version. n00b mistake apparently sorry :(
this is beyond my current capabilities and I hope it gets fixed!
#30
Version shouldn't have been changed.
#31
Is this still a problem? I bet it is! Someone please reroll it and add a test.
#32
Per Dries' comment #27, moving to aggregator.module. And, features are really 8.x now. Though if someone wants to try to argue that not implementing the aforementioned "xml:base attribute" part of the RSS spec is a bug, this might have a shot at 7.x. (This is frustrating, not a day goes by without broken links and images on Drupal Planet.)
Also marked #395764: Aggregator: Convert all relative URLs to absolute URLs in feed items as a duplicate.