Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
If there isn't already, it would be great to supply an array of allowed content types, or a way to limit to only text/html. So that if the url is a pdf, or zip file etc, then it doesn't try to download it.
Also, to specify content length too would be good, so it doesn't try to download a 500mb file for example.
Comment | File | Size | Author |
---|---|---|---|
#18 | httprl-1426854-18-open-ended-range.patch | 4.54 KB | mikeytown2 |
#17 | httprl-1426854-17-fix-edge-case.patch | 2.28 KB | mikeytown2 |
#14 | httprl-1426854-14-range-only-get.patch | 1.07 KB | mikeytown2 |
#13 | httprl-1426854-13-strict-range-request.patch | 5.3 KB | mikeytown2 |
#9 | httprl-1426854-9-add-docs.patch | 1.16 KB | mikeytown2 |
Comments
Comment #1
mikeytown2 CreditAttribution: mikeytown2 commentedBeing able to enforce the max content length is difficult to do due to chunked transfer encoding. The connection will timeout so I'm not super worried about this one.
Having an
allowed_content_types
array is possible. Feel free to add it to the options under httprl_request(). Be sure to not set a default in this case. It would get enforced in httprl_send_request().Comment #2
hass CreditAttribution: hass commentedIn linkchecker I've added 'Range' support some weeks ago. I'm using HEAD mostly (range not required), but users are able to use GET and than I force downloading the first 0-1024bytes only... This works great. This is just a hint how you may solve it for every content type and server that supports ranges (should be normal today).
Just keep in mind you will not get "200 OK", it't "206 Partial Content".
Comment #3
mikeytown2 CreditAttribution: mikeytown2 commentedsee httprl_send_request(). I already get 1024 byte chunks until I get all the headers. If I see a redirect, I kill the connection right there. This is a fairly easy problem to solve for HTTPRL.
Comment #4
hass CreditAttribution: hass commentedI have not understood the details behind you chunks logic. When i implemented it I was not aware of httprl and made it for core and to prevent the 500 or 5GB downloads with GET mehod :-). But without range limit httprl must download all, what is all correct.
Comment #5
mikeytown2 CreditAttribution: mikeytown2 commentedIn terms of limiting the number of bytes downloaded; that can be done. Now that I think about the requirements for a link checker, this would be a nice feature. I can limit the total bytes transfered, just not the message size due to chunked transfer encoding.
Comment #6
hass CreditAttribution: hass commentedNothing required by you... 'Range' header is the way all linkcheck modules should go if they have a need to limit transfered bytes. It's the standard way how webservers work. Why should we add any other stuff? :-)
Comment #7
mikeytown2 CreditAttribution: mikeytown2 commentedAh nice, just set the Range header http://stackoverflow.com/questions/716680/difference-between-content-ran....
The other one is the
Accept
header but most web servers seem to ignore it thus the need for the array of content types we wish to download.I should add to the documentation of
httprl_request()
some of the more useful headers.Comment #8
hass CreditAttribution: hass commentedComment #9
mikeytown2 CreditAttribution: mikeytown2 commentedThis has been committed. If servers do not respect the Accept header let me know; I'll implement strict enforcement if needed. http://www.gethifi.com/blog/browser-rest-http-accept-headers
Comment #10
mikeytown2 CreditAttribution: mikeytown2 commentedClosing this issue. Open to patches though.
Comment #11
mikeytown2 CreditAttribution: mikeytown2 commentedI will be creating a patch for servers that do not accept the Range header. This is useful as anything downloaded in httprl gets loaded into memory. Requesting a URL that returns a lot of data could cause PHP to run out of memory. In this case, if the Range header is sent out and server does not reply with a 206 but a 200, httprl will download up to the last byte of data needed in order to fulfill the range request and then close the connection; turning the 200 back into a 206.
Functions I have so far for parsing the Range header
Comment #12
mikeytown2 CreditAttribution: mikeytown2 commentedThis will go in the 1.9 release
Comment #13
mikeytown2 CreditAttribution: mikeytown2 commentedRange headers are now strict. If a 200 is returned when a 206 was expected, httprl will turn the 200 into a 206 if that will allow us to cut the connection to the server sooner.
This patch has been committed.
Comment #14
mikeytown2 CreditAttribution: mikeytown2 commentedForgot to make sure this only runs on a GET request.
This patch has been committed to 6.x & 7.x
Comment #15
mikeytown2 CreditAttribution: mikeytown2 commentedTested and this breaks with chunked transfer encoding.
I need to decode the chunks first and then split up the data based off of the byte range.
Comment #16
mikeytown2 CreditAttribution: mikeytown2 commentedThinking about this and cutting the download off if Transfer-Encoding or Content-Encoding is used would be a bad idea. Will move forward with this in mind.
Comment #17
mikeytown2 CreditAttribution: mikeytown2 commentedThe following patch has been committed to 6.x & 7.x.
Comment #18
mikeytown2 CreditAttribution: mikeytown2 commentedSupport for
bytes=1024-
bytes=-1024
has been added.
This patch has been committed to 6.x & 7.x.