So, I don't know why one of our editors added a redirect with a backslash, but they did. So, we had a url that drupal were sending over to varnish with a goofy url. And varnish kept croaking on it. See https://www.varnish-cache.org/trac/ticket/755 & https://www.varnish-cache.org/trac/ticket/917

WD varnish: Recieved status code 100 running ban req.http.host ~ localhost && req.url ~ [error]
"^/about/aboutUs/sites\/all\/modules\/google_analytics$". Full response text:
Unknown request.
Type 'help' for more info.
Syntax Error: Invalid backslash sequence

Comments

heddn’s picture

Patches rolled against both 7.x-1.x-dev & 7.x-1.0-beta1.

cmlara’s picture

Status: Needs review » Needs work

Hello heddn,

I believe addslashes() is too vauge for this:

From PHP Manual
addslashes() Returns a string with backslashes before characters that need to be quoted in database queries etc. These characters are single quote ('), double quote ("), backslash (\) and NUL (the NULL byte).

Your not to backslash a single quote in varnish.
You are to backslash a double quote and a backslash.
Wouldn't expect to see NULL byte however so not relevant.

The Varnish console is a bit odd that you cant 'raw' any character by using \ direct. I will have to test if \\a means "a" or "\a" on banning to see how this is interpreted.

Above this a backslash is not even permitted as part of a URL which means it really should never have made it this far down the chain in the first place.

req.url for "www.example.com/tes\t/" would be "/tes%5Ct/" your browser may show it as /tes\t/ but it sends it as /tes%5Ct/ per the RFC's.

Question is whos responsibility is it to urlencode() a URL for expiration purging?? Expire module or Varnish? (I'm not that familiar with the cache api's to say who is suppose to do this)

It does seem though we do have a issue with slashes being allowed through though obviously that needs a bit more of a fix then just addslashes()

heddn’s picture

Varnish seems to support backslash in the URL using the CLI. However, they need to be properly escaped. Since many folks call varnish directly to flush cache, I'd lean towards including that logic in varnish. However, I could be convinced differently.

Yeah, I'm pretty sure that addslashes was a bad call. How about addcslashes as a better option?

cmlara’s picture

Status: Needs work » Needs review

Hello heddn,

Ah much better, wasn't even thinking about addcslashes so rarely use I had forgotten about it.

This would solve the console refusing the command issue.

I'll set it to Needs Review.

This patch does affect downstream though as it significantly changes how data is parsed by varnish_purge(); Notably this affects the expire module and others.

I would advise we not set this reviewed until downstream has been researched a bit more to make sure we will not break anything. (fabsor comments?)

It is better a poorly formated ban be rejected instead of a correctly formated ban that worked before acting different (regression)

I suspect not much is being done however considering a backslash made it this far down the chain in the first place but we should look it up anyways.

cmlara’s picture

Hello all,

I've done a review of upstream on Expire module.

The big question for this patch:
Should varnish_purge assume everything coming to it is pre-escaped.

If Yes, then this patch should not be applied.
If No, then we should RTBC this patch as it looks clean and does its job correctly and it will no harm current in use upstream.

the addcslashes() in varnish_purge would make sure the data is always clean to varnishd, but if we clean up the data before varnish_purge() then any unescaped data would be considered a bug in the upstream function and that varnish rejected it would be considered 'as designed'

The bigger picture for this issue:

The Expire module says it will pass 'URL's' to hook_expire_cache it does not specify (and obviously per this test case they are not) Encoded URL's

varnish_expire_cache trims the list and puts it into an array. and then calls varnish_purge();

If urlencode() is not done in Expire then we should do it in varnish_expire_cache() That would remove the issue of the \ getting through. Then we only need to worry about properly formating our own data before passing to varnish_purge()

Comments?

heddn’s picture

I agree that varnish_expire_cache is probably a better place to put this logic, if we keep it in varnish. Apparently lots of folks call varnish_expire_cache directly, see https://drupal.org/node/1323418#comment-5753202. Does that push us towards putting it here or moving it to expire?

Alternatively, from the expire project page,

This module acts as a grid to detect and act upon events that will expire URLs from caches like reverse proxy caches.

If expire wants to support multiple caching engines beside varnish, would they want to urlencode so they have better universal support? Or maybe they don't if some random cache doesn't support escaped urls... I'm pretty sure that they would want to urlencode since I can't image any reverse proxy caches wouldn't support urlencoded urls.

cmlara’s picture

Hello Heddn,

That is the part I'm not sure of on all the modules that use expire if its better to have the logic there or if its better to have it down stream.

It might be good to open a ticket under Expire and find out from mikeytown2 on this one to see what his views are. If he says it should be done in Expire then it gets added there. If not then we need to put it in varnish_cache_expire() to do urlencode(); I honestly don't have any alias in my lab system so if you can duplicate the test again it might be better if you opened this case upstream.

Personally I want to consider varnish_purge() a 'low level' function that all data going to it should be 'clean' by the time it gets there. That way upstream (that knows full urls and host domains, etc) can make the decision of what needs to be escaped and how. This would become especially important if wildcards ever became available in varnish or new control characters, down at varnish_purge() we wouldn't know what to escape and what not to escape.

(Obviously fabsor as the project manager has the final say on that)

heddn’s picture

Status: Needs review » Postponed

see #1780144: Should expire.module urlencode urls. waiting for a response from expire before proceeding with a solution

heddn’s picture

StatusFileSize
new717 bytes

While waiting to hear back from upstream, here's an improved patch per comments in #5. I really feel this should go into varnish and not upstream, so that's why I put in the extra work to re-work the patch.

heddn’s picture

Status: Postponed » Needs review
cmlara’s picture

Hello heddn,

Thanks for coming back on this one. I meant to get back to it a couple months ago as well.

Considering upstream has not responded in this time I would have to agree that we would want to code a fix ourselves.

Per my #7 I think urlencode() would be better.

we would urlencode() $paths before we join it into $purge instead of using addcslahses() on $purge at the end.

$paths is an array so I believe a forloop through each item would be needed for urlencode() to properly apply.

urlencode() would ensure that backslashes coming in from other functions get converted to the %5C URL format, along with any other special character.

This has the added advantage that URL's being sent to Varnish will actually match whats in the varnish cache eg:
Currently:
http://www.example.org/Test Space Document.txt
is stored in the backend of varnish as:
http://www.example.org/Test%20Space%20Document.txt
but Expire would pass it with the spaces so the document never expires.

urlencode() on $paths would solve the root issue of backslash and other issues of upstream passing unencoded links (such as spaces)

Downside is if anyone else is using this function (fabsor has said people do) and IF they are correctly urlencode() (I am not sure they are or are not) we would be breaking their submissions with urlencode();

heddn’s picture

StatusFileSize
new855 bytes

Had some time today to rework this per #11.

cmlara’s picture

Hello heddn,

My dev lab is offline at the moment right now so I can't test this personally but I believe this looks good.

I will have to test it to mark it however.

bibo’s picture

Issue summary: View changes

This issues progress seems to have stopped over a year ago. I still think it is relevant, mostly I just worked on a patch on expire-module, where I had to do some custom escaping for regex that is sent to Varnish CLI.

I have a few comments, which might or might not be new information to you.

Expire developer Spleshka answered on June 26, 2013 at 3:45pm:

So let's leave this issue postponed until #1759040: Varnish.module doesn't properly escape backslashes in url is solved. More over, pay attention at 7.x-2.x branch.

So, they expire-module is going to react to whatever is decided here. Which seems to be to include the escaping in Varnish-module. In my opinion the best solution is quite simple: just add an admin-option with a checkbox to either do the string escaping in the varnish-module, or not. Defaulting to on or not. This would make it easy for any other modules to rely on varnish or do their own thing. And just instruct users to enable it in Varnish or not to.

Apart from the mentioned Varnish-issue ( https://www.varnish-cache.org/trac/ticket/755 ), this issue seem relevant too: https://www.varnish-cache.org/trac/ticket/900.

cmlara:

The Varnish console is a bit odd that you cant 'raw' any character by using \ direct. I will have to test if \\a means "a" or "\a" on banning to see how this is interpreted.

Above this a backslash is not even permitted as part of a URL which means it really should never have made it this far down the chain in the first place.

The #715 issue example says:

There's double-escaping going on (I'll track it down when this course is over) when it comes to purges. Example:

purge req.url ~ "\.css"
100 85
Unknown request.
Type 'help' for more info.
Syntax Error: Invalid backslash sequence

purge req.url ~ "\\.css"
200 0

I tested it myself and "purge req.url ~ "test\\.css" does indeed purge "test.css"-files, and not files named "test\.css". So, it looks weird on command line, but works.

The issue is marked as fixed in Varnish 3.0 some 3 years ago, but the double backslash is still required (which may be not well documented). Since I made my own expire drush extension to purge urls with or without regex, I've noticed that I have to triple-backslash it(!). (One extra backslash for any escape chars on command line/or string inside PHP. Oh, and 4 backslashed resulted in same (succesfull) ban as 3. Interesting:

  default git:(develop) drush expire-regex "^/test.css$"
Executed varnish cli with command: ban req.url ~ "^/test.css$"                                                                                 [status]
➜  default git:(develop) drush expire-regex "^/test\.css$"
WD varnish: Recieved status code 100 running ban req.url ~ "^/test\.css$". Full response text: Unknown request.                                [error]
Type 'help' for more info.
Syntax Error: Invalid backslash sequence

Executed varnish cli with command: ban req.url ~ "^/test\.css$"                                                                                [status]
➜  default git:(develop) drush expire-regex "^/test\\.css$"
WD varnish: Recieved status code 100 running ban req.url ~ "^/test\.css$". Full response text: Unknown request.                                [error]
Type 'help' for more info.
Syntax Error: Invalid backslash sequence

Executed varnish cli with command: ban req.url ~ "^/test\.css$"                                                                                [status]
➜  default git:(develop) drush expire-regex "^/test\\\.css$"
Executed varnish cli with command: ban req.url ~ "^/test\\.css$"                                                                               [status]
➜  default git:(develop) drush expire-regex "^/test\\\\.css$"
Executed varnish cli with command: ban req.url ~ "^/test\\.css$"                                                                               [status]

My own Varnish regex handles requests either as "urls" without regex, or with regex. The normal urls get formatted like this:
$url = '^/' . addcslashes(preg_quote($url), '\\\\"') . '$';

From what I've tested, urlencode() (on urls that are intended not to include regex) doesn't actually help with this backslash-issue. It may be useful for matching the correct url in some cases, but it's not adding the extra "\" that varnish CLI wants. At the moment, people who create Varnish cache_page-purges (for example with cache_actions), to purge urls with ".", will purge more than they expect. That is, "test.css" purges of course "test{ALMOSTANYCHARACTER}css".

Purging with regex is fine (or great even), but I for claritys sake I would prefer the Varnish module to offer both a purge-function that supports and expects regex, and a function that expects only urls without regex. The latters ban commands being automatically double-escaped, and the first used as is.

As an example, this is a bigger partition of the drush extension I wrote, which supports both approaches (based on boolean $is_pattern):

 // Make sure normal (internal) urls are not interpreted as regex.
  if(!$is_pattern){
    // More info on: http://fi2.php.net/preg_quote and http://www.php.net/manual/en/function.addcslashes.php
    // ...and in this varnish issue: https://drupal.org/node/1759040 & https://www.varnish-cache.org/trac/ticket/900
    $url = '^/' . addcslashes(preg_quote($url), '\\\\"') . '$';
  }

  // Run the command (defined in varnish-module).
  $ret = _varnish_terminal_run(array("$command req.url ~ \"$url\""));
  if(is_array($ret)){
     drupal_set_message(t("Executed varnish cli with command:") . " $command req.url ~ \"$url\"");
  }


I'm also using _varnish_terminal_run()-there directly, because I want to flush urls regardless of hostname, which is currently not possible with varnish_purge(). I'm also hoping that choosing to filter with base_url/hostname will become an admin option sooner or later.

EDIT: added some extra backslashes to code examples, because d.org PHP-filter likes to filter them out.

EDIT: I'd also like to add that addcslashes() and preg_quote() together seem to really do the trick, at least with Varnish 3, which uses PCRE regexes.

mgifford’s picture

Status: Needs review » Needs work

Patch no longer applies.

matthijs’s picture

Status: Needs work » Needs review
StatusFileSize
new5.25 KB

The attached patch should fix these issues.

matthijs’s picture

matthijs’s picture

StatusFileSize
new5.25 KB

I've updated the patch to fix an issue with wildcards in combination with query parameters.

jelle_s’s picture

StatusFileSize
new6.02 KB

Rerolled patch.

misc’s picture

Status: Needs review » Postponed (maintainer needs more info)

Thanks, but is this really needed - backslashes in URLs should not be there - seems that we are solving the problem in the wrong place.

misc’s picture

Status: Postponed (maintainer needs more info) » Closed (works as designed)

No more info in ten months. Closing the issue for now.

donquixote’s picture

Status: Closed (works as designed) » Active

We had problems with malformed urls: A space in one url (coming from menu_links table) causes the entire ban command to fail.

backslashes in URLs should not be there

We should treat these urls as untrusted data.
We need to encode/escape/sanitize in a way that does not break the command no matter what is in the urls.

donquixote’s picture

We need to encode/escape/sanitize

An alternative would be to validate the url and reject malformed/broken urls.

Or perhaps a combination. Malformed urls should be rejected, well-formed urls still need some kind of encoding or escaping for use in a cli command.

donquixote’s picture

I just checked:
There seems to be no validation in the menu link edit form to prevent spaces in the menu link path.