Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Only allow X number of read/write streams open at a time. While you can issue over a thousand request in under a second, it might not be the best idea.
Comment | File | Size | Author |
---|---|---|---|
#6 | httprl-1268096-6-implement-rate-limiter.patch | 2.93 KB | mikeytown2 |
Comments
Comment #1
hass CreditAttribution: hass commentedNot only a good idea, per server you need to limit to only 8 request per RFC. We may use the module in linkchecker, but than yes - will push 1000 urls in the array and shoot servers...
Comment #2
mikeytown2 CreditAttribution: mikeytown2 commentedWhich RFC so i can reference to it in a code comment?
Comment #3
hass CreditAttribution: hass commentedPuhh... Cannot remember - it's the limit that browsers have implemented... IE had a limit of 2 in past... You can change by registry. There have been some firefox addons that highered it to 20, but with a note that this may cause troubles. Maybe some sites lock down after 8 connections...
Only to clarify this does not meen 8 in general, it's 8 per same domain or hostname... So you could run 1000 requests from your server, if you make sure one hostname like www.example.com get's only hurt by 8 simultanous requests.
Comment #4
mikeytown2 CreditAttribution: mikeytown2 commentedRFC 2616 (HTTP 1.1)
Looks like it is not in RFC 1945 (HTTP 1.0)
http://forums.mozillazine.org/viewtopic.php?p=546087&sid=fee0f75ad18d72f...
HTTPRL should define 2 limits. Global max and per domain max. Domain max of 8 works for me. The global max would be something like 128, meaning it will only keep 128 open connections at a time; non-blocking requests close the connection so it is still possible to flood a server (no good way to get around this issue). Something to note is HTTPRL does not use persistent connections and is a 1.0 client; it sets
Connection: closed
. A persistent connections would look likeConnection: keep-alive
. Being nice to servers is generally a good idea so setting this to 8 sounds like a plan.Comment #5
mikeytown2 CreditAttribution: mikeytown2 commentedComment #6
mikeytown2 CreditAttribution: mikeytown2 commentedThis patch has been committed.
Comment #8
mc0e CreditAttribution: mc0e commentedI think this per-server limit wants to be reviewed downwards. If I saw a process hitting my server at this sort of speed, I'd block it. Many sites have mechanisms to do so automatically.
The limits referred to in RFCs are for browsers, and not appropriate in this context. Ie most of those requests are for static files, and the usage pattern is characterised by brief bursts of activity with long gaps between.
In this context it's more appropriate to refer to recommendations for crawler behaviour, which generally specify how many seconds wait between each fetch for a given domain, not how many fetches to do in parallel. Major crawlers like google, yahoo, etc typically wait about 3 seconds between hits on a site with a few hundred thousand pages. They mostly go slower for smaller sites.
https://en.wikipedia.org/wiki/Web_crawler#Politeness_policy
Comment #9
hass CreditAttribution: hass commentedCurrent rate limit has been reduced to two requests per domain, only.
Comment #10
mikeytown2 CreditAttribution: mikeytown2 commentedCreated this feature request... #2032523: Add in domain sleep and global sleep parameters