Caching, such as the block cache, filter cache, field cache, etc doesn't respect protocol (http vs https) when caching. So, you can run into a couple problems.

  1. If you don't have a valid ssl cert, the cached item was generated under ssl by someone, and a visitor hits a non-ssl version of the page the item included will not be displayed because the invalid cert has not been accepted.
  2. If a cached item was generated on a non-ssl page and is then displayed in an ssl page a ssl error will be displayed for including non-ssl assets in the page.

Code to come soon.

Comments

mfer’s picture

Priority: Normal » Major
pwolanin’s picture

Priority: Major » Normal

To clarify the problem here's an example - the filter cache may return content a rendered IMG tag (rendered, for example, by media module) that references a non-https URL if it was cached while begin viewed over http. This would lead to mixed content warnings for the page.

mfer’s picture

After working on an alternative database caching backend as an option in the interim I think I've come to the conclusion that core shouldn't attempt to automate checking if a cache should have a marker to signify https. It can work as a way around but isn't clean.

Instead, when something caches html that has absolute links in content or has the possibility of absolute links it should properly handle the cache id in a manner to reflect https if present. We should fix core to work this way, document this in the handbook/api docs, and file bugs against any modules we know violating this.

If that works as an idea I can start working on the patch.

drumm’s picture

For the media example, I think a good solution is to avoid the problem and use a schemeless URI. I haven't seen a lot of info about them, I encountered them at #1180646: Linking to google_service.js with https protocol when SSL enabled. Seems like they are good to go, except for referencing style sheets, http://stackoverflow.com/questions/2181207/is-it-safe-to-use-schemeless-....

mfer’s picture

@drumm Drupal 7 core uses an absolute URL for image style urls which is where media module gets the absolute url on images. This is a core behavior. Going schemaless would be nice and maybe something we should try getting into Drupal 8 and Drupal 7 contrib.

mfer’s picture

To add a little more detail and context, file_create_url() takes internal Drupal paths (e.g., admin/modules) - that is those that don't start with a full domain or a / - and prepends them with the $base_url. This is how CSS and JavaScript end up with a full URL when rendered.

mfer’s picture

@drumm I wrote up a description of how to implement protocol relative URLs in D7. That advice was very useful. http://engineeredweb.com/blog/11/12/protocol-relative-urls-drupal-7

dalin’s picture

HTTPS isn't the only case where this can rear its head. If for example you want to route authenticated users through a separate domain for added security (via HTTP authentication, VPN, or something else), then you are also likely to see cached data containing the private domain. This can lead to random HTTP authentication popups, or just plain missing content.

Rob C’s picture

Confirming #8 Images uploaded via SSL are NOT visible on the Non-SSL site, link to the images is https://site/image.png while the request is done over http://site/image.png. (and we run on a self-signed certificate... boem... error) So, let's do some heavy url rewriting for now...

seanburlington’s picture

I'm running into this both for the http/https issue

But also because I'm have a cluster of servers, and depending on whether I want to access a single server or a load balancer I use different hostnames - but then I'm seeing these stored in the cache

I really want to just have relative urls in the page

effulgentsia’s picture

Marked #1032210: drupal_render_cid_parts() should add http vs. https protocol to cache keys a duplicate, so bringing over its tags to here.

I think we have these questions to figure out:
- For D8, should we switch to protocol relative URLs for everything except CSS?
- If yes to above, do we want to by default add protocol to the keys for caches mentioned in the issue summary anyway, so that we work with protocol absolute URLs added by contrib/custom code?
- If yes to above, do we want a way to easily toggle that behavior so that sites that stick to protocol relative URLs can get more cache reuse?

chemical’s picture

As stated in #5:

Going schemaless would be nice and maybe something we should try getting into Drupal 8 and Drupal 7 contrib.

is definitely the right way to do it.

#6 is spot on that this is an issue in file_create_url(). So it isn't a cache system issue but more a file system issue.

I have a patch made for D7 (see #1633558: file_create_url() changes relative URL to absolute URL for shipped files) which I am rewriting for D8 at the moment (at DrupalCon Munich).

chemical’s picture

Status: Active » Needs review
StatusFileSize
new2.77 KB

This patch change file_create_url() so schemeless urls are kept schemeless by expanding a path relative to the Drupal installation to absolute path of the web server.

Status: Needs review » Needs work
chemical’s picture

Status: Needs work » Needs review
StatusFileSize
new4.77 KB

Updated patch to fix the 3 failed tests by using $GLOBALS['base_path'] instead of $GLOBALS['base_url'] . '/' and creating an absolute url for curl from a path (relative url) to reflect change of file_create_url() which no longer will add scheme to schemeless urls.

effulgentsia’s picture

Status: Needs review » Needs work
+++ b/core/includes/file.inc
@@ -460,9 +460,12 @@ function file_create_url($uri) {
+      // with the base path prepended. Base path is used because there should be
+      // no need to make a relative URI absolute which the use of base URL
+      // would do.
+      return $GLOBALS['base_path'] . drupal_encode_path($uri);

I'm not sure about this. This would fail to include the host name, so wouldn't be usable in an email or anywhere else absolute URLs are needed.

chemical’s picture

@effulgentsia you are absolutely right that the use file_create_url() wouldn't work where absolute URLs are required when the function is used on schemeless URLs (paths).

The original first part of file_create_url():

function file_create_url($uri) {
  // Allow the URI to be altered, e.g. to serve a file from a CDN or static
  // file server.
  drupal_alter('file_url', $uri);

  $scheme = file_uri_scheme($uri);

  if (!$scheme) {
    // Allow for:
    // - root-relative URIs (e.g. /foo.jpg in http://example.com/foo.jpg)
    // - protocol-relative URIs (e.g. //bar.jpg, which is expanded to
    //   http://example.com/bar.jpg by the browser when viewing a page over
    //   HTTP and to https://example.com/bar.jpg when viewing a HTTPS page)
    // Both types of relative URIs are characterized by a leading slash, hence
    // we can use a single check.
    if (drupal_substr($uri, 0, 1) == '/') {
      return $uri;
    }
    else {
      // If this is not a properly formatted stream, then it is a shipped file.
      // Therefore, return the urlencoded URI with the base URL prepended.
      return $GLOBALS['base_url'] . '/' . drupal_encode_path($uri);
    }
  }

Either all schemeless URLs passed to file_create_url() should be made absolute to ensure that they can be used in emails which would make caching very difficult to implement correctly or you should use the function url() to generate absolute URLs for emails which will give you complete control over how that resulting URLs look.

If schemeless URLs must be made absolute to make them useful in emails which protocol should added (http or https) and how to determine that—at the moment this depends on whether the action creating the email is done through https or not.

The real question is: When does Drupal Core need to generate absolute URLs for files inside the Drupal installation?

  • Not for javascript files
  • Not for files containing CSS
  • Not for image files
  • Not for emails because Core only sends emails in plain text
chemical’s picture

Updated patch with some documentation about the return value.

I am still convinced that the api of file_create_url() is broken and schemeless paths should not have a scheme added. The behavior is a reminiscence from D6 where all file path are expanded to absolute urls. Further more the documentation of the function does not describe whether or not paths for shipped files are changed.

For a backport to D7 a variable could be used for switching between behavior. If the variable is set file_create_url() behaves as the old broken api if actually needed by any module.

chemical’s picture

Strange the same patch has been attached twice!?

liam morland’s picture

I have been working on a similar patch in #1494670: References to CSS, JS, and similar files should be root-relative URLs: avoids mixed content warnings & fewer bytes to send. Mine adds a parameter to file_create_url() allowing one to control whether or not an absolute URL is generated.

liam morland’s picture

Status: Needs review » Needs work
liam morland’s picture

Status: Needs work » Needs review
StatusFileSize
new6.95 KB

Status: Needs review » Needs work
liam morland’s picture

Status: Needs work » Needs review
StatusFileSize
new6.97 KB
JeremyFrench’s picture

I think in general schemaless is ok, but it won't solve the issue if you need to server https content from a different domain than you do http. As could be the case if you were using a CDN, which didn't support SSL.

liam morland’s picture

liam morland’s picture

Status: Needs review » Needs work
liam morland’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
idebr’s picture

Status: Needs work » Needs review
StatusFileSize
new4.67 KB

Straight reroll.

Status: Needs review » Needs work
idebr’s picture

Status: Needs work » Needs review
StatusFileSize
new1.75 KB
new6.41 KB

I updated the tests added in #2276203: CSS Aggregation breaks URLs with Query String to assert a relative url.

Status: Needs review » Needs work
deepakaryan1988’s picture

Assigned: Unassigned » deepakaryan1988
Issue tags: +india, +SprintWeekend2015
deepakaryan1988’s picture

StatusFileSize
new3.49 KB

Re-rolling the patch

deepakaryan1988’s picture

Assigned: deepakaryan1988 » Unassigned
deepakaryan1988’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 39: re-rolling-patch#35-1377840-39.patch, failed testing.

deepakaryan1988’s picture

Issue tags: -india, -SprintWeekend2015

Removing sprint weekend tag!!
As suggested by @YesCT

mfb’s picture

Priority: Normal » Major

This honestly seems like a major bug for sites that are available on both HTTP and HTTPS. For example, the logo in the branding block could use an HTTP URL on HTTPS pages, which causes a mixed content warning.

mfb’s picture

I didn't yet have a chance to read thru all of the above discussion, but I think adding url.site to the required_cache_contexts would be a reasonable solution for the problem I'm seeing: required_cache_contexts: ['languages:language_interface', 'theme', 'user.permissions', 'url.site']

dawehner’s picture

Interesting. I'm wondering whether we could switch to protocol relative URLs everywhere. This would allow us to not care about HTTP/HTTPs ... but yeah in reality url.site doesn't add a incredible big amount of additional cache entries, especially when in the future HTTPs will be the de facto standard

liam morland’s picture

Even better than protocol-relative URLs would be server-relative. There is a patch for that in #1494670: References to CSS, JS, and similar files should be root-relative URLs: avoids mixed content warnings & fewer bytes to send.

mfb’s picture

Status: Needs work » Needs review
StatusFileSize
new2.24 KB

Let's see if this passes tests.

berdir’s picture

It will certainly not pass since tons of tests are looking for cache contexts in various places.

Can't we instead make sure that we add that cache context where we don't already? I know that every absolute url generated through the UrlGenerator does add this, so those should be fine.

Status: Needs review » Needs work

The last submitted patch, 48: add_url.site_to_required_cache_contexts-1377840-48.patch, failed testing.

mfb’s picture

46 tests to be exact. That sounds like a reasonable approach, if we can easily add cache context everywhere that the base URL is used.

webchick’s picture

This got brought up in the context of #2606918: [securelogin] Secure Login so calling this a contrib project blocker.

wim leers’s picture

Issue tags: +HTTP/2

I didn't yet have a chance to read thru all of the above discussion, but I think adding url.site to the required_cache_contexts

-1. We should only do that if absolutely necessary.

Can't we instead make sure that we add that cache context where we don't already? I know that every absolute url generated through the UrlGenerator does add this, so those should be fine.

+1.

This will also be a problem for HTTP/2 support.

wim leers’s picture

This issue is equivalent with #1494670: References to CSS, JS, and similar files should be root-relative URLs: avoids mixed content warnings & fewer bytes to send. This issue is older, so strictly speaking we should be solving it here. But, #1494670: References to CSS, JS, and similar files should be root-relative URLs: avoids mixed content warnings & fewer bytes to send was much further along with a fix. So, closing this issue in favor of that one. A patch that is ready for final review is awaiting you there. Hopefully see you there!