Right now, the result of every URL requested by anyonymous users gets stored in cache_page - including invalid urls (page not found). This is a possible attack vector for DOS attacks in two ways:

1) If an attacker just called random pages or the same page with varying bogus GET params, he can quickly produce many thousands of cache entries, which can potentially clog up the system.

2) Since only known URLs are fetched from cache, an attacker can produce increased system load by submitting random GET params with each request, thus eluding the cache entirely.

An effective but not very pretty countermeasure (e.g. implemented by Typo3) is to append every URL with a salted hash string of the URL. If the system receives a request in which the hash does not match the preceding params, the link obviously has been tempered with, and the result is just an error message.

There's probably other ways of doing this as well.

Comments

joachim’s picture

> An effective but not very pretty countermeasure (e.g. implemented by Typo3) is to append every URL with a salted hash string of the URL.

So you'd have http://example.com/node/1/randomstring ? That's indeed not very pretty!

At first glance, the ways of countering attack methods 1 and 2 are mutually exclusive. To avoid clogging up the cache with 404 pages for bogus paths, we shouldn't store 404 pages in the cache.

But if we don't store 404s in the cache, then http://example.com/bogus-url-number-765 has to be generated fully rather than drawn from cache.

However, we could make sure we store just one 404 page, at a canonical path. Also, do we have anything that throttles requests from the same IP, or even adds them to the block list?

ralf.strobel’s picture

> So you'd have http://example.com/node/1/randomstring ? That's indeed not very pretty!

Yes. Though I just remembered a prettier variation of it. You store every generated outgoing URL (or a hash of it) in a database table and when an incoming URL is not in there, you reject it or at least disable caching. That may of course create a relatively large table over time.

---------------

Blacklisting IPs is a good idea, and shouldn't even have to happen within PHP but can theoretically be done by the server firewall. Not sure how effective it is against DDoS, though.

Storing only one 404 is something that should probably be done anyway. There's also a module fast_404, which aims to do something similar in Drupal 7.

---------------

However, that doesn't solve the problem of fake get params like "http://example.com/node/1/?random=76528373". This would currently lead to the same page being re-rendered and re-cached on every request.

The only effective way to approach this I could think of is to whitelist individual GET params. I guess this would call for another info hook, in which modules have to declare their use of URL params, including: Their names, the basic path on which they appear and a validation callback.

When the system receives a request of "http://example.com/search/?word=hello", it would check which module has declared to be using the param "word" on the path "search/", and then send of the value "hello" to the validation callback. Unless the module confirms that this is a valid value, the param would be deleted. In case it is valid, the callback can additionally tell the system that this is a one-time result and should not be cached.

joachim’s picture

> The only effective way to approach this I could think of is to whitelist individual GET params. I guess this would call for another info hook, in which modules have to declare their use of URL params, including: Their names, the basic path on which they appear and a validation callback.

That's a potentially huge list. Think of a View with several exposed filters, for instance.
I don't think you'd want to do it with an info hook. Rather, it could happen as part of the Context system, and modules could 'claim' a parameter as it's passed through the context-building process.

andy inman’s picture

Assigned: Unassigned » andy inman
Priority: Minor » Normal
Issue tags: +vulnerability

I found this issue by Googling for drupal page cache dos attack - it occurred to me today that URL variation could be a very easy way to bring a site down by using up excessive disk space.

GET parameters are not the only problem. Standard Drupal behaviour is to try to find a matching URL path, and serve the page which is the closest fit. So, this issue page is at http://drupal.org/node/1245482 but I can also get to it at http://drupal.org/node/1245482/something-here and http://drupal.org/node/1245482/something-else-here

So, on a standard installation, all variations of a URL would get cached separately - a (D)DOS attack could very easily start filling the cache (a database table in the case of a basic installation), loading mysql or whatever, potentially filling the disk.

But, the problem is even more extensive: Many modules and Drupal internals store objects in the cache, not just pages. Any time that an object is cached, and the cache-id is dependent on URL parameters, the same issue comes into effect - multiple cache entries of the *same* cacheable object.

So, in my mind, the correct solution would be to detect that an object is already cached - take an md5 (etc) of the object, keep a list of md5 values that we have already stored in the cache. When requested to cache an object, check if the same object (by md5) has already been cached. If it has, then store a *pointer* to that cache-object rather than storing the entire object in cache again. The space required to store the pointer could be very small - an index into an array of something like $cache_index[$index_value][$md5][$target_cache_id].

Then there is the issue that we might just happen to get an md5 collision. So we might need a "check for collision" option, which would mean retrieving the existing cached object and checking if it's identical to the object currently requested to be cached. If it is not identical, we need to cache the new object separately. So then we would need $cache_index[$index_value][$md5][$variant_id][$target_cache_id]. How to generate the variant_id? I don't know. Some other hash algorithm maybe. But what about a double-collision? So, a much simpler alternative would be to just flag that md5 as "not cacheable", and resort to old behaviour (always store object in cache) for any object which had a matching md5 - in practice this would "never" happen.

Some other complexities: We might need an "instance count" for the referenced cached object. Then, on a cache_clear request, clear the target object from the cache when the instance count reaches zero. We might choose to update the expiry data on an existing cached object when receiving a request to cache an identical object. Or probably better, we could store the new expiry in the cache index item. Some module somewhere might need to reliably *read* the timestamp for items which it has cached - this is possible with the current cache system, so should probably still be supported in some enhanced cache system.

Finally, we need to prevent the cache index from growing excessively (potentially the same DOS vulnerability that we started with). That's not too difficult - give it a maximum size, and clear out "least-used" values when it gets full. Ok, now we need an algorithm for that, LRU or LFU etc. - http://en.wikipedia.org/wiki/Least_recently_used - LRU is simple to implement, and probably adequate.

Additionally, we could potentially *detect* DOS attacks. If we find we are getting large numbers of requests at distinct urls which result in *identical* objects passed to cache_set(), then we have apparently have a DOS attempt. Offending requests could be ignored, redirected, returned a 404, etc.

Ok, this would all be considerable processing overhead, but in my mind it's the only way to address the underlying issue. Probably too much overhead to be included core, but perhaps as a module, for those that wanted it. The alternative approach of somehow detecting "valid" URLs is probably not feasible, for the reasons that others have already mentioned above (Views filters, etc.).

A much simpler alternative - just limit the total number of entries or total size of the cache, probably with independent limits per "bin". Use LRU or whatever to clear-out "old" items. The problem with that approach is that an attacker could effectively defeat the cache, leading to increased load, similar to point (2) in the original post above. But, it would at least solve the issue of using excessive disk space (which could otherwise crash the server etc). A simple mechanism like this could, perhaps should, be in core. "Advanced" users would be using Varnish, APC, etc, which can provide their own limits on storage size, but the majority of Drupal sites use the core system cache with database tables and so would be vulnerable. Currently such sites running D5/D6/D7 *are* vulnerable to this type of attack.

I've taken the liberty of assigning this issue to myself. I'm now thinking of writing a module to implement at least a simple limit on cache size, and maybe with a longer term aim of extending it to provide the more complex functionality I've described.

ralf.strobel’s picture

Nice to see this old issue finally got some attention again.

I've been reading up on this problem some more since I first posted it. Many experts seem to agree that DoS attacks should be handled not at the cache level, but at the request level. I have to agree to that, because even if you counteract the cache flooding effect, the attack would still potentially slow down the site to being practically unusable.

A common DoS protection at the request level is to apply a request limit per IP on a rolling time window (like 30 per minute), which is what I am doing now on all of my sites, because it's very easy to set up using nginx. A more sophisticated way is to detect and count only suspicious requests and then to block only the source IP specifically.

It could still be useful to implement these strategies as a Drupal module as well, for all site owners who don't have access to their server configuration. What you could do is count the number of cache entries generated per IP per time window and then just blacklist an IP if it exceeds a threshold.

andy inman’s picture

I agree with ralf, the "right" approach is to detect attacks somehow - in an ideal world, you would spot attacker activity and block their IP. Blocking based on IP address is relatively ineffective against DDoS. Reliably detecting an attack as compared to legitimate use is not easy.

I've spent some time thinking about how the suggestions I made might be implemented. I think it would be feasible, but far from simple. So, whilst I'm still hoping to have time to start working on a module of some sort, I'm nowhere close yet. Further input appreciated.

ralf.strobel’s picture

Trying to fortify Drupal against large-scale DDoS attacks is pointless anyway, because they would overload your server already on a network level. Someone with enough resources to do so would not bother to target specific weaknesses in the Drupal cache implementation but just go for a brute force UDP/SYN flood using existing DDoS tools.

What I had in mind when I created this thread was more of a small-scale attack, which anyone could execute from a single or very few IPs.

betoaveiga’s picture

Issue summary: View changes

I read every comment carefully and I want to contribute with an idea.

I think that would be good to have a "smaller system" or mecanism to put in front of requests in case of an attack.

First, we need to detect a burst of traffic to our site. Knowing how much our server can handle we could determine when we are getting close to DoS. This detection could start in a higher level than Drupal, or even in Drupal level (the worst protection is not to have any). Anyhow, technical considerations to determine DoS could be done somehow.

Second: when the flag of a DoS rises every IP will be suspicious (every IP is blacklisted until proven it's innocence).

Third: A small system (it could be inside of Drupal or not) would ask for a captcha. If captcha couldn't be solved the IP doesn't get access to our site. Captchas could be pre-generated (let's say 5000). The requests to check if captcha was correctly solved will hit a URL that only does this check (not loading Drupal at any phase). If a captcha get's solved the IP will be whitelisted.

Probably the web server must be part of this mecanism in order to redirect requests (in APACHE probably we will need to change .htaccess when the attack is detected).

I know this idea lacks of details, but could be a start or part of something better.

Finally, my opinion: preventing a DoS attack or DDoS attack requires multiples lines of defense at various levels of software & hardware. At some point an attacker could win, but it's important for Drupal Websites that we provide OUR line of defense against this threat.

Thanks.

mgifford’s picture

Assigned: andy inman » Unassigned
andy inman’s picture

@mgifford Thanks for taking a load off my mind :)

mgifford’s picture

Anything to help... :)

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

devarch’s picture

Version: 8.6.x-dev » 8.8.x-dev
Category: Feature request » Bug report

This is strange. After 8 years no resolution for a really problematic issue. Even if some attempts to solve this were made, there's still no definite solution.

The database_cache_max_rows introduced in Drupal 8.4 to limit the grow of the cache bins is NOT a hard limit, it is only "enforced", meaning the number of rows of one bin will be reduced to the max_rows, when the cron is run. Meanwhile the cache tables will grow unlimited.

So if I will generate a flood of bad requests to a Drupal site, they will be all inserted into the cache_page table, which in turn, will get huge and will fill up the database in which resides. Some Hosting providers do put a limit to the databases used, some are limited by the allotted quota on disk, but still this will get filled pretty quick.

In a recent deploy, that replaced an old CMS with a Drupal 8 instance, a small traffic site, the page_cache table filled up the 1GB database limit in two days with old urls that returned 404. When the google bot started reindexing the site, the 1GB limit was reached in a couple of hours. So no bad behaving users, just normal traffic in the case explained above, can make a Drupal site nose dive into the dust.

In my opinion there are 2 problems:
1. Implement the max_rows as a hard limit, no records allowed above the maximum stated.
2. Improve the cache_page bin not to store the source (the requested) url but the destination one, meaning whatever URL requested if it will generate a 404, the 404 will be pulled out from cache. In this way the 404 is cached only once, not thousands of times.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

dakwamine’s picture

Hello. :)

The issue is still active (at least, on Drupal 8.9). We are getting websites with thousands of calls on the home page with non-existing query arguments, and for each of them, a row is added in cache_page, containing the rendered page.

The max_rows as a hard limit would impact too much the site performance if the table is quickly filled with garbage before relevant calls are made. But I guess it's still better than having the disk full, causing a potential database lock. So I'm okay with this idea as the last line of defense.

Caching once a 404 page seems a nice idea. But it does not prevent the cache_page table from filling up. (https://homepage/ and https://homepage/?toto lead to the same valid 200 page but have each their row in the cache_page table).

Here is another issue which seems related: #3011426: Page cache size is unlimited when arbitrary query parameters are requested. We may link them together.

My (naive) idea would be the following: for each cache entry, we add a counter to determine how many times the cache entry has been requested.
Then, on cron, when the cache is full, we remove the least requested entries (like, maybe 10% (could be configurable)). Popular requests will be kept, unpopular ones will be discarded. This will create new "space" for newer entries in the cache, leaving a chance for new pages to be cached.
Finally, when the cron ends, we reduce the score (math to determine) of all the cache entries (or increment a "current min score", imagine a water level under which entries will be discarded), so that we may discard over time old entries which were popular in the past and did not get any attention in recent times. (Or instead of this water level idea, having a dedicated table which holds cache entries request count, and truncating it when the cron ended.)

fgm’s picture

An approach I've seen elsewhere is to move the problem one step away from Drupal (especially in Varnish), by rewriting query URLs to remove any unexpected arguments and ordering them, which nicely removes that problem and reduces the key space significantly. The problem, of course, being that rules need to be defined for every page as to what arguments are accepted, which can be a lot of setup work initially. But on heavily trafficked sites, or those often a victim of these attempts, it's really efficient and cheap.

dakwamine’s picture

@fgm The external solution is interesting. The downside is the time and effort to micromanage URL arguments (time is money
😛) and install another service on the server, which requires some knowledge and know-how of system administration. But for big websites, it sure is a good idea.

Following your input, it gave me another idea: we could try to "sanitize" the URL arguments using the rules you mentioned such as order enforcement and argument filtering (config form to design, maybe default config to provide or even better, automated detection which would involve a learning phase when specific trusted user roles navigation history is analyzed [this last one would be really cool imo]), right before building the cid which is used by page_cache to determine if the URL is already cached or not (I guess it's working this way nowadays). Is there an existing service which could provide this sort of mechanism?

What do you think?

fgm’s picture

One thing which can be done is:

  • annotate all your public routes in *.routing.yml files with an extra parameter listing accepted query params
  • modify your Routing providers, if any, with the same data
  • add a Routing listener service that will parse all the routes, gather these data, and store them in a KV system (remember, Drupal core has a builtin KV; don't forget default args like q and page
  • create an `http.middleware` tagged service coming in before the internal page cache, that will hit this KV and rewrite the inbound request accordingly, stripping unexpected args and ordering the others

You'll also probably want to use the module which removes almost all core routes, to limit the ones you'll actually have to handle.

Just an idea off the top of my head, it probably needs refining.

catch’s picture

Component: cache system » page_cache.module
Status: Active » Closed (duplicate)
Issue tags: +Bug Smash Initiative

#2526150: Database cache bins allow unlimited growth: cache DB tables of gigabytes! stops this being a DDOS issue - as long as you run cron frequently enough.

There are other solutions for it too:

1. Switch to memcache or redis caching - then you have an LRU.

2. Use varnish caching and disable the internal page cache.

3. Use dynamic page cache and disable the internal page cache.

Marking this as duplicate.

codebymikey’s picture

The ideas from #27 are interesting enough (especially from an optimization standpoint if you don't want to integrate a reverse proxy for caching purposes).

The https://www.drupal.org/project/page_cache_query_ignore module seems to address a subset of the functionality without the necessary granularity for controlling the query strings supported for specific routes. The ideas proposed here can be potentially incorporated in that module.

fgm’s picture

@catch Do you have the reference of any issue of which this might be a duplicate ?

I agree that the mechanisms you mention may be part of the solution to the problem, but they are not sufficient on their own and do not reduce load as much as this would unless adding a lot more off-Drupal configuration.

If this issue was marked as a duplicate only because of these mechanisms, it does not seems like it is actually a duplicate, and would merit being pursued on its own.

dakwamine’s picture

If this issue is resolved only by requiring external systems, I agree with @fgm to not mark this as duplicate.

The module shared by @codebymikey in the comment #29 implements some of the ideas from here, and can be augmented with other ones.

Listing some of the ideas gathered here so far for a quick recap:

- work within an early http.middleware service.
- ensure a consistent order for the query parameters.
- ignore query parameters based on an allow/disallow list.
- read this allow/disallow list set by the routing definition, gathered by a Routing listener service into KV records for quick checks.
- perform a single (1 per request?) preemptive (DB table size [better IMO] OR rows count) check before inserting an item into the cache, and (ignore until the next cron run OR prune) if full, instead of pruning AFTER the limit was exceeded.
- partial pruning using a popularity index, instead of the current max rows approach.

Maybe some of them have been addressed in the recent core releases.