Relative urls in feeds

ahoeben - October 9, 2006 - 08:52
Project:Drupal
Version:8.x-dev
Component:aggregator.module
Category:feature request
Priority:critical
Assigned:Unassigned
Status:needs work
Description

The feeds Drupal creates include node->teaser or node->body (depending on configuration) in the description fields items. This text can contain relative links to other content on the same domain. Many aggregators, including Drupal's own and bloglines.com, can not handle these relative links. The RSS specification specifies that all links should be absolute.

The attached patch fixes the feed descriptions in format_rss_items (common.inc). It adds a function named 'filter_absolute_urls' to common.inc, which can also be used by aggregator.module and other modules that need absolute urls (such as modules that send out emails).

There are alternative approaches to this fix. One could be to use nodeapi 'rss items' hook, using a contrib module. However in the current implementation of this hook in node.module, modules can not modify the body and/or teaser of items. The cleanfeeds module uses this approach, and needs a patch for node.module to get it to work.

Another approach would be to enable the filter system for feeds, and make a filter which creates absolute urls. Having a 'filter_absolute_urls' function in common.inc would still be very usefull in this scenario.

The 'filter_absolute_urls' function can probably be made a little bit more efficient; I currently have two regexps, one for relative urls ("path/to/somewhere") and one for urls relative to the domain root ("/path/to/somewhere").

AttachmentSizeStatusTest resultOperations
filter_absolute_urls.patch1.28 KBIgnoredNoneNone

#1

ednique - October 19, 2006 - 16:39

Same problem here...

when content is published on other sites, links are wrong as they don't contain a full url...

#2

ahoeben - October 22, 2006 - 15:13

I guess this issue is not getting much attention because its against 4.7.x as opposed to CVS.

I'll see if I can find time to install CVS and see if the problem persists in the current CVS. Nevertheless, this is an ugly issue in 4.7.x too, because the aggregator in Drupal is not even displaying its Drupal generated feeds properly.

#3

magico - January 18, 2007 - 10:55
Version:4.7.3» 5.x-dev

1. create a node
2. insert a relative link (eg. href="node/add")
3. access the rss.xml
4. links are still relative and must e absolute

#4

dmitrig01 - February 17, 2007 - 20:44
Version:5.x-dev» 6.x-dev
Category:bug report» feature request
Priority:normal» critical

#5

phillipadsmith - March 2, 2007 - 01:20

This is a pretty major issue. Any update on it?

#6

earnie - March 2, 2007 - 12:44

The added function needs some phpDOC comments. This hasn't seen the light of day because the status is set to patch (code needs work) meaning that someone has reviewed the code and decided that it needed work. Others will not look until it is patch (code needs review). So once you've made the changes, change the status.

#7

phillipadsmith - March 2, 2007 - 13:33

Just pointing out that it seems like a relatively big issue to leave open. The RSS spec asks for absolute URLs and Drupal is producing relative ones -- so Drupal RSS feeds won't validate (or, in some cases, work at all).

FYI: to those that are really stuck with this problem, you can set the $base_url in settings.php and that will add a absolute path to all of your RSS feed links.

Phillip.

#8

H-BES - September 6, 2007 - 15:40
Version:6.x-dev» 5.2
Category:feature request» bug report

I have copied this function on my common.inc (drupal 5.2) and after add the call on format_rss_item function in that way:

$output .= ' <description>'. check_plain(filter_absolute_urls($description)) ."</description>\n";

and works!!! Now no problem with feed and feedburner!

Thanks!!!

Saluti
BES

#9

earnie - September 6, 2007 - 16:04

@ahoeben: Are you planning to provide a modified patch to provide the comments?

#10

drumm - September 12, 2007 - 06:29
Version:5.2» 6.x-dev

I would like to see this get into 6.x first. Here are problems I see:
- Function documentation is incomplete. Need more @param and @return.
- Second return statement is never reached.

#11

Wim Leers - September 19, 2007 - 19:19

Reroll, with some comments.

AttachmentSizeStatusTest resultOperations
filter_absolute_urls_0.patch1.68 KBIgnoredNoneNone

#12

Wim Leers - September 19, 2007 - 19:21
Assigned to:Anonymous» Wim Leers
Status:needs work» needs review

#13

Steven Jones - September 19, 2007 - 22:18
Status:needs review» needs work

The token amount of documentation is just not enough.

Rename the parameter $url to $html, as you're not passing in one url at a time to this function, you're passing in html.

Change the documentation to something like:

@param $html
A block of html containing relative and absolute urls in 'src' and 'href' attributes
@param $absolute_url
An absolute URL that points to this Drupal website.

Is the second parameter really needed for this function? why can't we just use url('')?

#14

chx - September 20, 2007 - 22:02
Assigned to:Wim Leers» chx
Status:needs work» needs review

Cleanup and comments. Changed the name of the function -- you must not call this filter* something as it's outside of filter module namespace.

AttachmentSizeStatusTest resultOperations
absolute-urls-88183-14.patch1.4 KBIgnoredNoneNone

#15

snufkin - September 21, 2007 - 12:55
Status:needs review» reviewed & tested by the community

tested with relative and absolute paths (e.g.: /node/add/story, node/add/story, fullurl.com/node/add/story), all three gives the same, correct results in the feed as absolute url.

#16

dmitrig01 - September 22, 2007 - 17:02

does it work relatively in a subdirectory?
If your install is in drupal and you link to /drupal/node/add does it work?

#17

snufkin - September 23, 2007 - 09:35

my drupal installation was located under http://localhost/projects/drupal/drupal6 and it worked

#18

ahoeben - September 23, 2007 - 09:59

Thanks for picking this up! Had to take a break from Drupal for a while, sorry 'bout disappearing... Patch looks good, works for me.

#19

Gábor Hojtsy - September 25, 2007 - 13:05
Status:reviewed & tested by the community» needs review

Conceptual issues:

- I can't sew how does this handle some situations, like if I have a link from http://example.com/news/local to "worldwide", which is a relative link to "http://example.com/news/worldwide".
- I see that the second preg matches a link to '/', but I found it hard to come up with examples myself for cases when the first matches or when the second matches. In other words: why do we need two different matchers?
- This function only matches lower case attribute names, which might not be that big of a problem for those following XHTML rules, but frankly, not everyone cares. I am not too comfortable with doing a case insensitive match, because it is a lot sloooooooower, then what we have now, but this still need to be on our mind.

Cosmetic issues:

- "html" should be "HTML" in comments
- The function does not concentrate on "a href", but rather "... href", so this should be corrected. Other tags have href attributes, so this is not exclusive to "a".

#20

ahoeben - September 26, 2007 - 08:47

Reroll of my original patch, with cosmetic changes and documentation.

The second argument to the function is useful to have:

  • The ' http://example.com/news/local to "worldwide" ' scenario: Relative urls in a Drupal page are not relative to $base_url. It's bad practice to use relative urls in Drupal pages (we should use relative-to-the-document-root urls instead), but some people still do. Having the link to the drupal page as a 'relative' instead of $base_url at least makes the rss feed behave the same as the original document.
  • When the function is used in an aggregator module with incoming relative links, the links are relative to the source specified in the incoming feed, not to the Drupal site.
AttachmentSizeStatusTest resultOperations
drupal_force_absolute_urls.patch1.63 KBIgnoredNoneNone

#21

ahoeben - September 26, 2007 - 08:51

woops

AttachmentSizeStatusTest resultOperations
drupal_force_absolute_urls_0.patch1.63 KBIgnoredNoneNone

#22

chx - September 26, 2007 - 18:45

,$base_document misses a space.

#23

ahoeben - September 27, 2007 - 16:01

Before I roll a new patch for that space, any more opinions on whether or not we should do case insensitive pregs instead? I am partial to case insensitivity, since the result of this function will likely be cached anyway...

#24

ahoeben - October 11, 2007 - 14:08

New patch

  • Combines two regexps in one preg_replace call
  • Makes regexps case sensitive for people using HreF= or SRC=
  • Fixes relative links to http://example.com/node/1 (makes them relative to http://example.com/node)
  • Adds missing space (thanks chx ;-)
AttachmentSizeStatusTest resultOperations
drupal_force_absolute_urls_1.patch1.71 KBIgnoredNoneNone

#25

System Message - October 11, 2007 - 17:31

DrupalTestbedBot tested ahoeben's patch (http://drupal.org/files/issues/drupal_force_absolute_urls_1.patch), the patch failed. For more information visit http://testing.drupal.org/node/130

#26

ahoeben - October 12, 2007 - 09:08

Thank you, mister DrupalTestbedBod. Is this patch better?

AttachmentSizeStatusTest resultOperations
drupal_force_absolute_urls_2.patch1.67 KBIgnoredNoneNone

#27

Dries - October 12, 2007 - 14:59
Version:6.x-dev» 7.x-dev
Category:bug report» feature request
Status:needs review» needs work

This isn't a bug, IMO. I'm not a big fan of regex'ing things into place.

The XML specification provides a clean solution for this; the xml:base attribute. The feeds generated by Drupal support include this attribute. What needs fixing is the aggregator module, not the feed generation.

In documentation, we write 'URL', not 'url'.

#28

ahoeben - October 12, 2007 - 15:47

Point taken for outgoing RSS feeds.

However, there are still situations where the suggested 'drupal_force_absolute_urls' function is useful; One such situation would be for applying the xml:base attribute to relative links on incoming RSS feeds.

IMHO it is still a critical bug for 6.x that Drupal can not correctly aggregate its own feeds. I'm a bit reluctant to mark this as an 'aggregator.module' bug, because there are other situations where the same problem exists and where xml:base is not a solution. Links in outgoing emails for example.

#29

wolfderby - December 15, 2007 - 04:37
Version:7.x-dev» 5.x-dev

subscribing

oh I just changed the version because I thought people wouldn't be working on it, if it's not the current version. n00b mistake apparently sorry :(

this is beyond my current capabilities and I hope it gets fixed!

#30

Wim Leers - December 14, 2007 - 12:36
Version:5.x-dev» 7.x-dev

Version shouldn't have been changed.

#31

chx - May 22, 2009 - 18:32
Assigned to:chx» Anonymous

Is this still a problem? I bet it is! Someone please reroll it and add a test.

#32

grendzy - October 16, 2009 - 04:42
Version:7.x-dev» 8.x-dev
Component:base system» aggregator.module

Per Dries' comment #27, moving to aggregator.module. And, features are really 8.x now. Though if someone wants to try to argue that not implementing the aforementioned "xml:base attribute" part of the RSS spec is a bug, this might have a shot at 7.x. (This is frustrating, not a day goes by without broken links and images on Drupal Planet.)

Also marked #395764: Aggregator: Convert all relative URLs to absolute URLs in feed items as a duplicate.

 
 

Drupal is a registered trademark of Dries Buytaert.