Project:Drupal core
Version:8.x-dev
Component:aggregator.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:needs review

Issue Summary

The feeds Drupal creates include node->teaser or node->body (depending on configuration) in the description fields items. This text can contain relative links to other content on the same domain. Many aggregators, including Drupal's own and bloglines.com, can not handle these relative links. The RSS specification specifies that all links should be absolute.

The attached patch fixes the feed descriptions in format_rss_items (common.inc). It adds a function named 'filter_absolute_urls' to common.inc, which can also be used by aggregator.module and other modules that need absolute urls (such as modules that send out emails).

There are alternative approaches to this fix. One could be to use nodeapi 'rss items' hook, using a contrib module. However in the current implementation of this hook in node.module, modules can not modify the body and/or teaser of items. The cleanfeeds module uses this approach, and needs a patch for node.module to get it to work.

Another approach would be to enable the filter system for feeds, and make a filter which creates absolute urls. Having a 'filter_absolute_urls' function in common.inc would still be very usefull in this scenario.

The 'filter_absolute_urls' function can probably be made a little bit more efficient; I currently have two regexps, one for relative urls ("path/to/somewhere") and one for urls relative to the domain root ("/path/to/somewhere").

AttachmentSizeStatusTest resultOperations
filter_absolute_urls.patch1.28 KBIdleFAILED: [[SimpleTest]]: [MySQL] Invalid patch format in filter_absolute_urls.patch.View details | Re-test

Comments

#1

Same problem here...

when content is published on other sites, links are wrong as they don't contain a full url...

#2

I guess this issue is not getting much attention because its against 4.7.x as opposed to CVS.

I'll see if I can find time to install CVS and see if the problem persists in the current CVS. Nevertheless, this is an ugly issue in 4.7.x too, because the aggregator in Drupal is not even displaying its Drupal generated feeds properly.

#3

Version:4.7.3» 5.x-dev

1. create a node
2. insert a relative link (eg. href="node/add")
3. access the rss.xml
4. links are still relative and must e absolute

#4

Version:5.x-dev» 6.x-dev
Category:bug report» feature request
Priority:normal» critical

#5

This is a pretty major issue. Any update on it?

#6

The added function needs some phpDOC comments. This hasn't seen the light of day because the status is set to patch (code needs work) meaning that someone has reviewed the code and decided that it needed work. Others will not look until it is patch (code needs review). So once you've made the changes, change the status.

#7

Just pointing out that it seems like a relatively big issue to leave open. The RSS spec asks for absolute URLs and Drupal is producing relative ones -- so Drupal RSS feeds won't validate (or, in some cases, work at all).

FYI: to those that are really stuck with this problem, you can set the $base_url in settings.php and that will add a absolute path to all of your RSS feed links.

Phillip.

#8

Version:6.x-dev» 5.2
Category:feature request» bug report

I have copied this function on my common.inc (drupal 5.2) and after add the call on format_rss_item function in that way:

$output .= ' <description>'. check_plain(filter_absolute_urls($description)) ."</description>\n";

and works!!! Now no problem with feed and feedburner!

Thanks!!!

Saluti
BES

#9

@ahoeben: Are you planning to provide a modified patch to provide the comments?

#10

Version:5.2» 6.x-dev

I would like to see this get into 6.x first. Here are problems I see:
- Function documentation is incomplete. Need more @param and @return.
- Second return statement is never reached.

#11

Reroll, with some comments.

AttachmentSizeStatusTest resultOperations
filter_absolute_urls_0.patch1.68 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch filter_absolute_urls_0.patch.View details | Re-test

#12

Assigned to:Anonymous» Wim Leers
Status:needs work» needs review

#13

Status:needs review» needs work

The token amount of documentation is just not enough.

Rename the parameter $url to $html, as you're not passing in one url at a time to this function, you're passing in html.

Change the documentation to something like:

@param $html
A block of html containing relative and absolute urls in 'src' and 'href' attributes
@param $absolute_url
An absolute URL that points to this Drupal website.

Is the second parameter really needed for this function? why can't we just use url('')?

#14

Assigned to:Wim Leers» chx
Status:needs work» needs review

Cleanup and comments. Changed the name of the function -- you must not call this filter* something as it's outside of filter module namespace.

AttachmentSizeStatusTest resultOperations
absolute-urls-88183-14.patch1.4 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch absolute-urls-88183-14.patch.View details | Re-test

#15

Status:needs review» reviewed & tested by the community

tested with relative and absolute paths (e.g.: /node/add/story, node/add/story, fullurl.com/node/add/story), all three gives the same, correct results in the feed as absolute url.

#16

does it work relatively in a subdirectory?
If your install is in drupal and you link to /drupal/node/add does it work?

#17

my drupal installation was located under http://localhost/projects/drupal/drupal6 and it worked

#18

Thanks for picking this up! Had to take a break from Drupal for a while, sorry 'bout disappearing... Patch looks good, works for me.

#19

Status:reviewed & tested by the community» needs review

Conceptual issues:

- I can't sew how does this handle some situations, like if I have a link from http://example.com/news/local to "worldwide", which is a relative link to "http://example.com/news/worldwide".
- I see that the second preg matches a link to '/', but I found it hard to come up with examples myself for cases when the first matches or when the second matches. In other words: why do we need two different matchers?
- This function only matches lower case attribute names, which might not be that big of a problem for those following XHTML rules, but frankly, not everyone cares. I am not too comfortable with doing a case insensitive match, because it is a lot sloooooooower, then what we have now, but this still need to be on our mind.

Cosmetic issues:

- "html" should be "HTML" in comments
- The function does not concentrate on "a href", but rather "... href", so this should be corrected. Other tags have href attributes, so this is not exclusive to "a".

#20

Reroll of my original patch, with cosmetic changes and documentation.

The second argument to the function is useful to have:

  • The ' http://example.com/news/local to "worldwide" ' scenario: Relative urls in a Drupal page are not relative to $base_url. It's bad practice to use relative urls in Drupal pages (we should use relative-to-the-document-root urls instead), but some people still do. Having the link to the drupal page as a 'relative' instead of $base_url at least makes the rss feed behave the same as the original document.
  • When the function is used in an aggregator module with incoming relative links, the links are relative to the source specified in the incoming feed, not to the Drupal site.
AttachmentSizeStatusTest resultOperations
drupal_force_absolute_urls.patch1.63 KBIdleFAILED: [[SimpleTest]]: [MySQL] Invalid patch format in drupal_force_absolute_urls.patch.View details | Re-test

#21

woops

AttachmentSizeStatusTest resultOperations
drupal_force_absolute_urls_0.patch1.63 KBIdleFAILED: [[SimpleTest]]: [MySQL] Invalid patch format in drupal_force_absolute_urls_0.patch.View details | Re-test

#22

,$base_document misses a space.

#23

Before I roll a new patch for that space, any more opinions on whether or not we should do case insensitive pregs instead? I am partial to case insensitivity, since the result of this function will likely be cached anyway...

#24

New patch

  • Combines two regexps in one preg_replace call
  • Makes regexps case sensitive for people using HreF= or SRC=
  • Fixes relative links to http://example.com/node/1 (makes them relative to http://example.com/node)
  • Adds missing space (thanks chx ;-)
AttachmentSizeStatusTest resultOperations
drupal_force_absolute_urls_1.patch1.71 KBIdleFAILED: [[SimpleTest]]: [MySQL] Invalid patch format in drupal_force_absolute_urls_1.patch.View details | Re-test

#25

DrupalTestbedBot tested ahoeben's patch (http://drupal.org/files/issues/drupal_force_absolute_urls_1.patch), the patch failed. For more information visit http://testing.drupal.org/node/130

#26

Thank you, mister DrupalTestbedBod. Is this patch better?

AttachmentSizeStatusTest resultOperations
drupal_force_absolute_urls_2.patch1.67 KBIdleFAILED: [[SimpleTest]]: [MySQL] Invalid patch format in drupal_force_absolute_urls_2.patch.View details | Re-test

#27

Version:6.x-dev» 7.x-dev
Category:bug report» feature request
Status:needs review» needs work

This isn't a bug, IMO. I'm not a big fan of regex'ing things into place.

The XML specification provides a clean solution for this; the xml:base attribute. The feeds generated by Drupal support include this attribute. What needs fixing is the aggregator module, not the feed generation.

In documentation, we write 'URL', not 'url'.

#28

Point taken for outgoing RSS feeds.

However, there are still situations where the suggested 'drupal_force_absolute_urls' function is useful; One such situation would be for applying the xml:base attribute to relative links on incoming RSS feeds.

IMHO it is still a critical bug for 6.x that Drupal can not correctly aggregate its own feeds. I'm a bit reluctant to mark this as an 'aggregator.module' bug, because there are other situations where the same problem exists and where xml:base is not a solution. Links in outgoing emails for example.

#29

Version:7.x-dev» 5.x-dev

subscribing

oh I just changed the version because I thought people wouldn't be working on it, if it's not the current version. n00b mistake apparently sorry :(

this is beyond my current capabilities and I hope it gets fixed!

#30

Version:5.x-dev» 7.x-dev

Version shouldn't have been changed.

#31

Assigned to:chx» Anonymous

Is this still a problem? I bet it is! Someone please reroll it and add a test.

#32

Version:7.x-dev» 8.x-dev
Component:base system» aggregator.module

Per Dries' comment #27, moving to aggregator.module. And, features are really 8.x now. Though if someone wants to try to argue that not implementing the aforementioned "xml:base attribute" part of the RSS spec is a bug, this might have a shot at 7.x. (This is frustrating, not a day goes by without broken links and images on Drupal Planet.)

Also marked #395764: Aggregator: Convert all relative URLs to absolute URLs in feed items as a duplicate.

#33

Version:8.x-dev» 7.x-dev

Oh, its totally a bug. Even to be backported to Drupal 6 if at all possible.

#34

Category:feature request» bug report

#35

#395764: Aggregator: Convert all relative URLs to absolute URLs in feed items has a working patch for Drupal 7 that only needs tests. Asked @brianV to move the patch over.

#36

As requested by Gabor, here is the patch from the other issue.

Note that this patch applied cleanly in March, 2009, so is going to need to be updated to current HEAD. Also, it needs a test case.

I would appreciate if someone else take it over from here, as I don't have the free time to focus on it for the next while.

AttachmentSizeStatusTest resultOperations
aggregator-fix-relative-links-rev-5.patch2.74 KBIdlePASSED: [[SimpleTest]]: [MySQL] 22,861 pass(es).View details | Re-test

#37

Priority:critical» major

+++ modules/aggregator/aggregator.parser.inc 24 Mar 2009 06:34:05 -0000
@@ -325,3 +330,40 @@ function aggregator_parse_w3cdtf($date_s
+ * Convert all relative links contained in the src and href elements within ¶
+ * a feed item's description to absolute links.

1) Trailing white-space.

2) See http://drupal.org/node/1354 for code documentation standards.

+++ modules/aggregator/aggregator.parser.inc 24 Mar 2009 06:34:05 -0000
@@ -325,3 +330,40 @@ function aggregator_parse_w3cdtf($date_s
+  $item_url_parts = parse_url($link);

You likely want to use drupal_parse_url(), which probably simplifies this code a lot.

Powered by Dreditor.

#38

Status:needs work» needs review

I found some time after all to produce an updated patch w/ sun's feedback. Note that there is still no test case.

AttachmentSizeStatusTest resultOperations
aggregator-fix-relative-links-88183-rev6.patch2.19 KBIdlePASSED: [[SimpleTest]]: [MySQL] 22,855 pass(es).View details | Re-test

#39

Since I know it's only a matter of time until it's pointed out - rev7 is attached which fixes the extra whitespace...

AttachmentSizeStatusTest resultOperations
aggregator-fix-relative-links-88183-rev7.patch2.18 KBIdlePASSED: [[SimpleTest]]: [MySQL] 22,862 pass(es).View details | Re-test

#40

Status:needs review» needs work

Setting to CNW. I realize that I forgot about the case where things are relative to the root - they should be against the working dir.

#41

Status:needs work» needs review

Here is a new revision. I went back to the old parse_url() logic, as that drupal_parse_url() doesn't give the base url of the link, which is needed in one of the two use cases.

AttachmentSizeStatusTest resultOperations
aggregator-fix-relative-links-88183-rev8.patch4.78 KBIdlePASSED: [[SimpleTest]]: [MySQL] 22,847 pass(es).View details | Re-test

#42

*facepalm*

Ignore that last patch. Sorry about wasting your time and attention. I'll have a new roll on Monday when I'm a little less burnt out.

#43

Priority:major» normal

This bug seems to exists for a long time already, so the major priority isn't qualified.

+++ aggregator-fix-relative-links-88183-rev7.patch
@@ -0,0 +1,56 @@
+ ¶
++    // Try to replace all relative links with absolute links.

1) Trailing white-space.

2) Duplicate + signs indicate a wrongly re-rolled patch.

Powered by Dreditor.

#44

Version:7.x-dev» 8.x-dev
nobody click here