A module of mine had a bug in it and, after sometime I decided to fix it. The problem was my code failed to handle redirects correctly. So I decided to switch to using drupal_http_request(). Useful little function indeed. But then, after deleting all my old cak code I was bemused, it still failed to handle redirects. Odd, what was going on, was the real bug somewhere else?
Well, no, as it turns out (after some print_r madness) it appears drupal_http_request() wasn't handling redirects either (for totally different reasons to my own cak handed code).
Here's what I found. Using this as a test URL http://www.alliance-leicester.co.uk which will redirect you to http://www.alliance-leicester.co.uk/home/index.asp?
Using wget -O > /dev/null this is what I get:-
Resolving www.alliance-leicester.co.uk... 194.130.105.121
Connecting to www.alliance-leicester.co.uk|194.130.105.121|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /home/index.asp [following]
--18:56:05-- http://www.alliance-leicester.co.uk/home/index.asp
=> `-'
Reusing existing connection to www.alliance-leicester.co.uk:80.
HTTP request sent, awaiting response... 200 OK
Length: 28,976 (28K) [text/html]
Notice Location: /home/index.asp isn't a fully qualified URL, it's relying on on the parent/original URL for scheme and host. Now look at what drupal_http_request() does, it attempts to go straight there. But drupal_http_request() (recursively) requires a fully quallified URL.
The attached patch checks the URI in the Location for scheme and host. If they are missing it uses the original scheme/host details supplied.
After applying this patch/mod my module began working just fine and redirects are handled.
Comment | File | Size | Author |
---|---|---|---|
#39 | 164365-drupal_http_request-relative-39.patch | 2.36 KB | pacproduct |
#34 | interdiff.txt | 590 bytes | amitaibu |
#34 | 164365-drupal_http_request-relative-34.patch | 2.4 KB | amitaibu |
#32 | 164365-drupal_http_request-relative-32.patch | 2.37 KB | amitaibu |
#29 | 164365-drupal_http_request-relative-29.patch | 754 bytes | amitaibu |
Comments
Comment #1
Gábor HojtsyThis looks straightforward and logical, BUT:
1. HTTP 1.1 defines Location to be an absolute URI, so it should be that: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30
2. Even if it could be a relative URI, if we don't find a protocol, we should only use the path from the response IMHO, as returning a domain and a path without a protocol seems like an extremely broken response.
Comment #2
AjK CreditAttribution: AjK commentedComment #3
Gábor HojtsyTo clarify (2): Now you use the original protocol if a new one was not provided, use the original domain if a new one was not provided (what about the port??), and you use the path if a new one was not provided. We'd like to support cases where only a path was provided. Is is realistic to support the case when a domain and a path was provided, but not a protocol? That was the concern.
Comment #4
AjK CreditAttribution: AjK commentedHopefully a better one that addresses (2).
This patch tests for scheme and host in the Location header. If these two compontents are not provided attempt to prepend sheme/host/port based on the parent URL and use that.
Comment #5
Dries CreditAttribution: Dries commented- We'll want to document this better in the code. Please sprinkle with some code comments.
- I wonder if that code is secure. Can we use the raw data like that?
Comment #6
AjK CreditAttribution: AjK commented- I will add some comments to make it clearer.
- From a "secure" point of view the old code did basically the same thing (passed the user input "Location" field) recursively along. I'll have a closer look at what possibilites exist to exploit the code "as it is now" and how it would become after the patch.
Comment #7
janusman CreditAttribution: janusman commentedsubscribing
Comment #8
Damien Tournoud CreditAttribution: Damien Tournoud commentedBumping to 7.x now.
Comment #9
hass CreditAttribution: hass commentedUhhh yeah that's also one of my problems - I expected this issue when I started developing of Linkchecker module with
drupal_http_request()
function, but haven't yet found the time to test. Thank you very much for saving time.@AjK: Are you working on the patch? We should also backport this patch to D5 and D6 asap.
Comment #10
hass CreditAttribution: hass commentedI have found another funny server with it's own rules...
Comment #11
janusman CreditAttribution: janusman commentedComment #12
janusman CreditAttribution: janusman commentedDrat, forgot to mark as needs review.
Comment #14
Kjartan CreditAttribution: Kjartan commentedIf you are adding support for
Location: /path
you probably need to make a full path handler as if you are allowing values that don't respect the standard you should support other relative paths such asLocation: ../path
andLocation: ./path
. This is possible, but as an alternative we can just return an error when the redirect isn't valid.Comment #15
hass CreditAttribution: hass commentedMarked #1212740: Relative URIs as redirects as duplicate.
Comment #16
hass CreditAttribution: hass commentedCan someone verify if Guzzle has support for this? Than we can move the case back to D7.
Comment #17
mikeytown2 CreditAttribution: mikeytown2 commentedhttps://github.com/guzzle/guzzle/blob/master/src/Guzzle/Http/RedirectPlu...
Comment #18
hass CreditAttribution: hass commentedCool, how about HTTPRL?
Comment #19
mikeytown2 CreditAttribution: mikeytown2 commentedWorks in HTTPRL as well :)
http://drupalcode.org/project/httprl.git/blob/d023e1f82c18df277f31f4532b...
Comment #20
hass CreditAttribution: hass commentedThanks. I will update linkchecker readme to point this out. Have you seen #10? Codewise HTTPRL looks not able to resolve this, but it maybe really an edge case that we don't need to worry about.
Comment #21
jcisio CreditAttribution: jcisio commentedReroll for D7
Comment #24
jcisio CreditAttribution: jcisio commentedFixed tests.
Comment #26
jcisio CreditAttribution: jcisio commentedI think we should remove failed tests, because those tests suppose that we can't use relative urls.
Comment #27
amitaibuComing here from #2164219: Meta: Feeds "looking" broken due to -1002 missing schema..
Attached a simpler patch.
Comment #29
amitaibuOops, fix typo in variable name.
Comment #30
amitaibu... and correct status.
Comment #32
amitaibuAdapting tests to the change
Comment #34
amitaibuThis patch should fix the failing tests.
Comment #35
amitaibuTests are passing :)
Comment #36
hass CreditAttribution: hass commentedThis looks not like a correct patch as it ignores the 3 existing cases.
This get's broken by latest patch (status
-1002
is no longer tested):New examples for case #164365:
And we may also need to think about protocol relative urls. I believe a fallback to
http://
should be useful or we need a new parameter likedefault_scheme
.Comment #37
Dave ReidAdding the specific RFC keyword since it was missing in my search results
Comment #38
pacproduct CreditAttribution: pacproduct commentedFollowing patch does not address @hass's remarks. It's a simple reroll of #34 so the patch can be applied on Drupal 7.67.
Comment #39
pacproduct CreditAttribution: pacproduct commentedSame patch, fixed filename...
Comment #40
apadernoThis bug affects Drupal.org too, for example Planet Drupal, which uses the Aggregator module.