Active
Project:
Varnish
Version:
7.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
27 Aug 2012 at 17:06 UTC
Updated:
18 Jun 2020 at 22:20 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
heddnPatches rolled against both 7.x-1.x-dev & 7.x-1.0-beta1.
Comment #2
cmlaraHello heddn,
I believe addslashes() is too vauge for this:
From PHP Manual
addslashes() Returns a string with backslashes before characters that need to be quoted in database queries etc. These characters are single quote ('), double quote ("), backslash (\) and NUL (the NULL byte).
Your not to backslash a single quote in varnish.
You are to backslash a double quote and a backslash.
Wouldn't expect to see NULL byte however so not relevant.
The Varnish console is a bit odd that you cant 'raw' any character by using \ direct. I will have to test if \\a means "a" or "\a" on banning to see how this is interpreted.
Above this a backslash is not even permitted as part of a URL which means it really should never have made it this far down the chain in the first place.
req.url for "www.example.com/tes\t/" would be "/tes%5Ct/" your browser may show it as /tes\t/ but it sends it as /tes%5Ct/ per the RFC's.
Question is whos responsibility is it to urlencode() a URL for expiration purging?? Expire module or Varnish? (I'm not that familiar with the cache api's to say who is suppose to do this)
It does seem though we do have a issue with slashes being allowed through though obviously that needs a bit more of a fix then just addslashes()
Comment #3
heddnVarnish seems to support backslash in the URL using the CLI. However, they need to be properly escaped. Since many folks call varnish directly to flush cache, I'd lean towards including that logic in varnish. However, I could be convinced differently.
Yeah, I'm pretty sure that addslashes was a bad call. How about addcslashes as a better option?
Comment #4
cmlaraHello heddn,
Ah much better, wasn't even thinking about addcslashes so rarely use I had forgotten about it.
This would solve the console refusing the command issue.
I'll set it to Needs Review.
This patch does affect downstream though as it significantly changes how data is parsed by varnish_purge(); Notably this affects the expire module and others.
I would advise we not set this reviewed until downstream has been researched a bit more to make sure we will not break anything. (fabsor comments?)
It is better a poorly formated ban be rejected instead of a correctly formated ban that worked before acting different (regression)
I suspect not much is being done however considering a backslash made it this far down the chain in the first place but we should look it up anyways.
Comment #5
cmlaraHello all,
I've done a review of upstream on Expire module.
The big question for this patch:
Should varnish_purge assume everything coming to it is pre-escaped.
If Yes, then this patch should not be applied.
If No, then we should RTBC this patch as it looks clean and does its job correctly and it will no harm current in use upstream.
the addcslashes() in varnish_purge would make sure the data is always clean to varnishd, but if we clean up the data before varnish_purge() then any unescaped data would be considered a bug in the upstream function and that varnish rejected it would be considered 'as designed'
The bigger picture for this issue:
The Expire module says it will pass 'URL's' to hook_expire_cache it does not specify (and obviously per this test case they are not) Encoded URL's
varnish_expire_cache trims the list and puts it into an array. and then calls varnish_purge();
If urlencode() is not done in Expire then we should do it in varnish_expire_cache() That would remove the issue of the \ getting through. Then we only need to worry about properly formating our own data before passing to varnish_purge()
Comments?
Comment #6
heddnI agree that varnish_expire_cache is probably a better place to put this logic, if we keep it in varnish. Apparently lots of folks call varnish_expire_cache directly, see https://drupal.org/node/1323418#comment-5753202. Does that push us towards putting it here or moving it to expire?
Alternatively, from the expire project page,
If expire wants to support multiple caching engines beside varnish, would they want to urlencode so they have better universal support? Or maybe they don't if some random cache doesn't support escaped urls... I'm pretty sure that they would want to urlencode since I can't image any reverse proxy caches wouldn't support urlencoded urls.
Comment #7
cmlaraHello Heddn,
That is the part I'm not sure of on all the modules that use expire if its better to have the logic there or if its better to have it down stream.
It might be good to open a ticket under Expire and find out from mikeytown2 on this one to see what his views are. If he says it should be done in Expire then it gets added there. If not then we need to put it in varnish_cache_expire() to do urlencode(); I honestly don't have any alias in my lab system so if you can duplicate the test again it might be better if you opened this case upstream.
Personally I want to consider varnish_purge() a 'low level' function that all data going to it should be 'clean' by the time it gets there. That way upstream (that knows full urls and host domains, etc) can make the decision of what needs to be escaped and how. This would become especially important if wildcards ever became available in varnish or new control characters, down at varnish_purge() we wouldn't know what to escape and what not to escape.
(Obviously fabsor as the project manager has the final say on that)
Comment #8
heddnsee #1780144: Should expire.module urlencode urls. waiting for a response from expire before proceeding with a solution
Comment #9
heddnWhile waiting to hear back from upstream, here's an improved patch per comments in #5. I really feel this should go into varnish and not upstream, so that's why I put in the extra work to re-work the patch.
Comment #10
heddnComment #11
cmlaraHello heddn,
Thanks for coming back on this one. I meant to get back to it a couple months ago as well.
Considering upstream has not responded in this time I would have to agree that we would want to code a fix ourselves.
Per my #7 I think urlencode() would be better.
we would urlencode() $paths before we join it into $purge instead of using addcslahses() on $purge at the end.
$paths is an array so I believe a forloop through each item would be needed for urlencode() to properly apply.
urlencode() would ensure that backslashes coming in from other functions get converted to the %5C URL format, along with any other special character.
This has the added advantage that URL's being sent to Varnish will actually match whats in the varnish cache eg:
Currently:
http://www.example.org/Test Space Document.txt
is stored in the backend of varnish as:
http://www.example.org/Test%20Space%20Document.txt
but Expire would pass it with the spaces so the document never expires.
urlencode() on $paths would solve the root issue of backslash and other issues of upstream passing unencoded links (such as spaces)
Downside is if anyone else is using this function (fabsor has said people do) and IF they are correctly urlencode() (I am not sure they are or are not) we would be breaking their submissions with urlencode();
Comment #12
heddnHad some time today to rework this per #11.
Comment #13
cmlaraHello heddn,
My dev lab is offline at the moment right now so I can't test this personally but I believe this looks good.
I will have to test it to mark it however.
Comment #14
bibo commentedThis issues progress seems to have stopped over a year ago. I still think it is relevant, mostly I just worked on a patch on expire-module, where I had to do some custom escaping for regex that is sent to Varnish CLI.
I have a few comments, which might or might not be new information to you.
Expire developer Spleshka answered on June 26, 2013 at 3:45pm:
So, they expire-module is going to react to whatever is decided here. Which seems to be to include the escaping in Varnish-module. In my opinion the best solution is quite simple: just add an admin-option with a checkbox to either do the string escaping in the varnish-module, or not. Defaulting to on or not. This would make it easy for any other modules to rely on varnish or do their own thing. And just instruct users to enable it in Varnish or not to.
Apart from the mentioned Varnish-issue ( https://www.varnish-cache.org/trac/ticket/755 ), this issue seem relevant too: https://www.varnish-cache.org/trac/ticket/900.
cmlara:
The #715 issue example says:
I tested it myself and "purge req.url ~ "test\\.css" does indeed purge "test.css"-files, and not files named "test\.css". So, it looks weird on command line, but works.
The issue is marked as fixed in Varnish 3.0 some 3 years ago, but the double backslash is still required (which may be not well documented). Since I made my own expire drush extension to purge urls with or without regex, I've noticed that I have to triple-backslash it(!). (One extra backslash for any escape chars on command line/or string inside PHP. Oh, and 4 backslashed resulted in same (succesfull) ban as 3. Interesting:
My own Varnish regex handles requests either as "urls" without regex, or with regex. The normal urls get formatted like this:
$url = '^/' . addcslashes(preg_quote($url), '\\\\"') . '$';From what I've tested, urlencode() (on urls that are intended not to include regex) doesn't actually help with this backslash-issue. It may be useful for matching the correct url in some cases, but it's not adding the extra "\" that varnish CLI wants. At the moment, people who create Varnish cache_page-purges (for example with cache_actions), to purge urls with ".", will purge more than they expect. That is, "test.css" purges of course "test{ALMOSTANYCHARACTER}css".
Purging with regex is fine (or great even), but I for claritys sake I would prefer the Varnish module to offer both a purge-function that supports and expects regex, and a function that expects only urls without regex. The latters ban commands being automatically double-escaped, and the first used as is.
As an example, this is a bigger partition of the drush extension I wrote, which supports both approaches (based on boolean $is_pattern):
I'm also using _varnish_terminal_run()-there directly, because I want to flush urls regardless of hostname, which is currently not possible with varnish_purge(). I'm also hoping that choosing to filter with base_url/hostname will become an admin option sooner or later.
EDIT: added some extra backslashes to code examples, because d.org PHP-filter likes to filter them out.
EDIT: I'd also like to add that addcslashes() and preg_quote() together seem to really do the trick, at least with Varnish 3, which uses PCRE regexes.
Comment #15
mgiffordPatch no longer applies.
Comment #16
matthijsThe attached patch should fix these issues.
Comment #17
matthijsComment #18
matthijsI've updated the patch to fix an issue with wildcards in combination with query parameters.
Comment #19
jelle_sRerolled patch.
Comment #20
misc commentedThanks, but is this really needed - backslashes in URLs should not be there - seems that we are solving the problem in the wrong place.
Comment #21
misc commentedNo more info in ten months. Closing the issue for now.
Comment #22
donquixote commentedWe had problems with malformed urls: A space in one url (coming from menu_links table) causes the entire ban command to fail.
We should treat these urls as untrusted data.
We need to encode/escape/sanitize in a way that does not break the command no matter what is in the urls.
Comment #23
donquixote commentedAn alternative would be to validate the url and reject malformed/broken urls.
Or perhaps a combination. Malformed urls should be rejected, well-formed urls still need some kind of encoding or escaping for use in a cli command.
Comment #24
donquixote commentedI just checked:
There seems to be no validation in the menu link edit form to prevent spaces in the menu link path.