A module of mine had a bug in it and, after sometime I decided to fix it. The problem was my code failed to handle redirects correctly. So I decided to switch to using drupal_http_request(). Useful little function indeed. But then, after deleting all my old cak code I was bemused, it still failed to handle redirects. Odd, what was going on, was the real bug somewhere else?

Well, no, as it turns out (after some print_r madness) it appears drupal_http_request() wasn't handling redirects either (for totally different reasons to my own cak handed code).

Here's what I found. Using this as a test URL http://www.alliance-leicester.co.uk which will redirect you to http://www.alliance-leicester.co.uk/home/index.asp?

Using wget -O > /dev/null this is what I get:-

Resolving www.alliance-leicester.co.uk... 194.130.105.121
Connecting to www.alliance-leicester.co.uk|194.130.105.121|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /home/index.asp [following]
--18:56:05--  http://www.alliance-leicester.co.uk/home/index.asp
           => `-'
Reusing existing connection to www.alliance-leicester.co.uk:80.
HTTP request sent, awaiting response... 200 OK
Length: 28,976 (28K) [text/html]

Notice Location: /home/index.asp isn't a fully qualified URL, it's relying on on the parent/original URL for scheme and host. Now look at what drupal_http_request() does, it attempts to go straight there. But drupal_http_request() (recursively) requires a fully quallified URL.

The attached patch checks the URI in the Location for scheme and host. If they are missing it uses the original scheme/host details supplied.

After applying this patch/mod my module began working just fine and redirects are handled.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Gábor Hojtsy’s picture

Status: Needs review » Needs work

This looks straightforward and logical, BUT:

1. HTTP 1.1 defines Location to be an absolute URI, so it should be that: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30
2. Even if it could be a relative URI, if we don't find a protocol, we should only use the path from the response IMHO, as returning a domain and a path without a protocol seems like an extremely broken response.

AjK’s picture

  1. Yup, I understand about the standard but there's obviously a lot of "non-standard" sites out there. If everyone insisted on using standards only systems no one would use IE ;)
  2. Not sure I understand your second point. The patch ensures a protocol is used, either the supplied in the Location value or, if missing, that of the parent URL. If you can explain your concerns a bit more I'll redo that to address them.
Gábor Hojtsy’s picture

To clarify (2): Now you use the original protocol if a new one was not provided, use the original domain if a new one was not provided (what about the port??), and you use the path if a new one was not provided. We'd like to support cases where only a path was provided. Is is realistic to support the case when a domain and a path was provided, but not a protocol? That was the concern.

AjK’s picture

Status: Needs work » Needs review
FileSize
1.35 KB

Hopefully a better one that addresses (2).

This patch tests for scheme and host in the Location header. If these two compontents are not provided attempt to prepend sheme/host/port based on the parent URL and use that.

Dries’s picture

- We'll want to document this better in the code. Please sprinkle with some code comments.

- I wonder if that code is secure. Can we use the raw data like that?

AjK’s picture

Status: Needs review » Needs work

- I will add some comments to make it clearer.

- From a "secure" point of view the old code did basically the same thing (passed the user input "Location" field) recursively along. I'll have a closer look at what possibilites exist to exploit the code "as it is now" and how it would become after the patch.

janusman’s picture

subscribing

Damien Tournoud’s picture

Title: drupal_http_request() does fully handle redirects » drupal_http_request() does handle (invalid) non-absolute redirects
Version: 6.x-dev » 7.x-dev

Bumping to 7.x now.

hass’s picture

Uhhh yeah that's also one of my problems - I expected this issue when I started developing of Linkchecker module with drupal_http_request() function, but haven't yet found the time to test. Thank you very much for saving time.

@AjK: Are you working on the patch? We should also backport this patch to D5 and D6 asap.

hass’s picture

I have found another funny server with it's own rules...

--00:00:00--  http://www.oldiemuseum.de/
           => `index.html'
Resolving www.oldiemuseum.de... 212.80.228.99
Connecting to www.oldiemuseum.de|212.80.228.99|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: ../cms [following]
--00:00:00--  http://www.oldiemuseum.de/../cms
           => `cms'
Connecting to www.oldiemuseum.de|212.80.228.99|:80... connected.
HTTP request sent, awaiting response... 400 Bad Request
00:00:00 ERROR 400: Bad Request.
janusman’s picture

FileSize
1.02 KB
janusman’s picture

Status: Needs work » Needs review
FileSize
1.02 KB

Drat, forgot to mark as needs review.

Status: Needs review » Needs work

The last submitted patch failed testing.

Kjartan’s picture

Version: 7.x-dev » 8.x-dev

If you are adding support for Location: /path you probably need to make a full path handler as if you are allowing values that don't respect the standard you should support other relative paths such as Location: ../path and Location: ./path. This is possible, but as an alternative we can just return an error when the redirect isn't valid.

hass’s picture

hass’s picture

Can someone verify if Guzzle has support for this? Than we can move the case back to D7.

mikeytown2’s picture

hass’s picture

Cool, how about HTTPRL?

mikeytown2’s picture

hass’s picture

Thanks. I will update linkchecker readme to point this out. Have you seen #10? Codewise HTTPRL looks not able to resolve this, but it maybe really an edge case that we don't need to worry about.

jcisio’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
1.12 KB

Reroll for D7

Status: Needs review » Needs work

The last submitted patch, 21: 164365-drupal_http_request-relative-21.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 21: 164365-drupal_http_request-relative-21.patch, failed testing.

jcisio’s picture

Status: Needs work » Needs review
FileSize
1.12 KB

Fixed tests.

Status: Needs review » Needs work

The last submitted patch, 24: 164365-drupal_http_request-relative-23-D7.patch, failed testing.

jcisio’s picture

I think we should remove failed tests, because those tests suppose that we can't use relative urls.

amitaibu’s picture

Status: Needs work » Needs review
FileSize
754 bytes

Coming here from #2164219: Meta: Feeds "looking" broken due to -1002 missing schema..

Attached a simpler patch.

Status: Needs review » Needs work

The last submitted patch, 27: 164365-drupal_http_request-relative-27.patch, failed testing.

amitaibu’s picture

Oops, fix typo in variable name.

amitaibu’s picture

Status: Needs work » Needs review

... and correct status.

Status: Needs review » Needs work

The last submitted patch, 29: 164365-drupal_http_request-relative-29.patch, failed testing.

amitaibu’s picture

Status: Needs work » Needs review
FileSize
2.37 KB

Adapting tests to the change

Status: Needs review » Needs work

The last submitted patch, 32: 164365-drupal_http_request-relative-32.patch, failed testing.

amitaibu’s picture

Status: Needs work » Needs review
FileSize
2.4 KB
590 bytes

This patch should fix the failing tests.

amitaibu’s picture

Tests are passing :)

hass’s picture

Status: Needs review » Needs work

This looks not like a correct patch as it ignores the 3 existing cases.

This get's broken by latest patch (status -1002 is no longer tested):

function system_test_redirect_noscheme() {
  header("Location: localhost/path", TRUE, 301);
  exit;
}

New examples for case #164365:

function system_test_redirect_relative_path() {
  header("Location: ../path", TRUE, 301);
  exit;
}

function system_test_redirect_relative_path_traversal1() {
  header("Location: ../../../../../path", TRUE, 301);
  exit;
}

function system_test_redirect_relative_path_traversal2() {
  header("Location: ../foo/../../../path", TRUE, 301);
  exit;
}

function system_test_redirect_absolute_path() {
  header("Location: /foo/path", TRUE, 301);
  exit;
}

And we may also need to think about protocol relative urls. I believe a fallback to http:// should be useful or we need a new parameter like default_scheme.

Dave Reid’s picture

Title: drupal_http_request() does handle (invalid) non-absolute redirects » drupal_http_request() does handle (invalid) non-absolute redirects (RFC 7231)

Adding the specific RFC keyword since it was missing in my search results

pacproduct’s picture

Following patch does not address @hass's remarks. It's a simple reroll of #34 so the patch can be applied on Drupal 7.67.

pacproduct’s picture

apaderno’s picture

Issue tags: +affects drupal.org

This bug affects Drupal.org too, for example Planet Drupal, which uses the Aggregator module.