This error is being displayed to users when using the latest 7.x-1.x-dev release.

purge.inc can be hacked as follows to get around the issue:

function purge_urls($purge_urls) {
[...]
    $purge_url_parts = parse_url($purge_url);
    // Determine the host

-    $purge_url_host = $purge_url_parts['host'];
-    // Add portnames to the host if any are set
-    if (array_key_exists('port', $purge_url_parts)) {
-      $purge_url_host = $purge_url_host . ":" . $purge_url_parts['port'];
-    }

+   $purge_url_host = '';
+    if (isset($purge_url_parts['host'])) {   
+      $purge_url_host = $purge_url_parts['host'];
+      // Add portnames to the host if any are set
+      if (array_key_exists('port', $purge_url_parts)) {
+        $purge_url_host = $purge_url_host . ":" . $purge_url_parts['port'];
+      }
+   }

[...]

      if ($method == 'purge') {
        // Make it a PURGE request (not GET or POST)
        $purge_requests[$current_purge_request]['request_method'] = 'PURGE';
        // Construct a new url
-        $proxy_url_base = $proxy_url_parts['scheme'] . "://" . $proxy_url_parts['host'];
-        if (array_key_exists('port', $proxy_url_parts)) {
-          $proxy_url_base = $proxy_url_base . ":" . $proxy_url_parts['port'];
-        }

+        $proxy_url_base = '';
+        if (isset($proxy_url_parts['host'])) {
+         $proxy_url_base = $proxy_url_parts['scheme'] . "://" . $proxy_url_parts['host'];
+         if (array_key_exists('port', $proxy_url_parts)) {
+           $proxy_url_base = $proxy_url_base . ":" . $proxy_url_parts['port'];
+         }
+        }

[...]

        $purge_requests[$current_purge_request]['purge_url'] = $proxy_url_base . $purge_path;
        // Set the host header to the sites hostname
-      $purge_requests[$current_purge_request]['headers'] = array("Host: " . $purge_url_host);

+        if (isset($purge_url_host)) {
+          $purge_requests[$current_purge_request]['headers'] = array("Host: " . $purge_url_host);
+        }
+        else {
+          $base_url_parts = parse_url($base_url);
+          $purge_url_host = $base_url_parts['host'];
+          $purge_requests[$current_purge_request]['headers'] = array("Host: " . $purge_url_host);
+        }
      }

Errors will continue to be recorded in watchdog, but will no longer be displayed to users.

The problem might relate to the "Include base URL in expires" setting in the Content Expiration module; this is unchecked in our case. (I haven't investigated.)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

anavarre’s picture

Was about to submit this issue too. Confirmed.

Rj-dupe-1’s picture

Status: Active » Needs review
SqyD’s picture

Version: 7.x-1.x-dev » 7.x-2.x-dev
Status: Needs review » Needs work

I'm sure this is related to unchecking the "Include base URL in expires" option in expires. I see two separate option:
- Add a check for the said checkbox in expire and notify the user when it's setting is conflicting. I'll add a bug for this in the current stable branch.
- Work on #1442996: Remove expire dependency and include something similar to your code to make sure purge can provide it own hostname in case non is provided. This will go into the 2.x branch

SqyD’s picture

The solution to this problem I'm implementing in 2.x branch:
- Dropping hard dependency on expire though inserting base_url when no host is found.
- Adding configuration options to add domains manual
- Adding code to manipulate the option through a drupal_alter implementation.

Might take some time. Still doing this in my own time.

carteriii’s picture

Version: 7.x-2.x-dev » 7.x-1.5-rc1
Priority: Normal » Major

I too confirm the issue, and will add a few more related details. Some of this might be caused by the expires module, depending on how the URLs to purge are being handed to the purge module.

When "Include base URL in expires" is checked, the URL's are incorrectly formed when they get into purge_urls(). I have not debugged the stack above that call (to see whether this is the fault of expires), but I am seeing $purge_url values that do not include a separator between the domain and path. For example,

http://www.mydomain.comnode/114 (should be http://www.mydomain.com/node/114)
http://www.mydomain.comtest (should be http://www.mydomain.com/test)

In my second example, this causes an undefined index error on line 65 when $purge_url_parts['path'] is being referenced since the 'path' does not exist ("www.mydomain.comtest" is considered the entire domain).

Then, when not checking "Include base URL in expires" (as originally reported in this issue), the 'host' becomes missing.

SqyD, you mention the 2.x branch, but I don't see a 2.x dev release on drupal.org. Is that only in git? Is it something anyone should consider using at this point?

This issue would seem to prevent anyone from using this module "as is" in any situation. I see 451 sites using the module, but that would imply everyone has modified the module to get it to work, which seems unlikely.

carteriii’s picture

The lack of separation between the host and path is a known problem in the expires module:

http://drupal.org/node/1471926: Invalid expire URLs when "Include base URL in expires" is enabled.

SqyD’s picture

Priority: Major » Normal

As you have found out these issues are caused by bugs in expire and a solution should be implemented there. As it stands now the Purge 1.x branch will not receive the solution I implemented in the upcoming 2.x branch which is giving purge it's own routines to determine hostnames.

At this moment the 2.x branch is still nowhere near production quality but you're free to check it out http://drupal.org/node/1488648 I expect it to become useable in the first weeks of 2013.

For now I will keep this issue here for reference to other users encountering this.

iSoLate’s picture

Version: 7.x-1.5-rc1 » 7.x-1.x-dev
Status: Needs work » Needs review
FileSize
2.23 KB

I think this is actually an issue with the purge module itself.

Dropping hard dependency on expire though inserting base_url when no host is found.

.
Why would you want to force to use the base_url, in some cases you don't want that. It could be you're banning/purging on a hosting ID where multiple domains point to. In this case no "host" will need to be set because you're using a different variable/header to ban/purge on and not a direct url.

oskar_calvo’s picture

The last patch (#8) doesn't works with the last version (7.x-1.x-dev -2015-Mar-03)

iSoLate’s picture

iSoLate’s picture

Updated patch naming.

sjancich’s picture

This patch didn't work for me. As it turned out, I had some entities enabled that do not have URIs -- for example Search API entities. That what was causing the PHP warning in my case.

There's a note to this affected on the Varnish Tag Invalidate module page (https://www.drupal.org/project/varnish_tag_invalidate).

michel.g’s picture

According to me your warning does not have its cause in the purge module. The purge_expire_cache function calls the purge_urls function while your entities do not have any URIs. I think the cause lies in the Expire module since that module calls the expire_cache hook.

ron_s’s picture

I've investigated this issue quite a bit, and I believe I have the proper patch.

This issue has to do with a conflict between the Expire and Purge modules. Expire recommends to disable the "Include base URL in expires" checkbox if using Varnish or Acquia Purge. This works fine for Expire, but causes a problem for Purge.

The issue is the use of PHP's parse_url function. When the checkbox is not selected, Expire is passing a URL fragment (for example, "node/219") rather than a full URL ("https://localhost/node/219"). The code in the Purge module expects $purge_url_parts['path'] to return a standard parse_url "path", which will include a leading slash.

However, when a fragment is passed to parse_url, the result returned has no leading slash for "path". This causes the purge request URL to have a missing slash (for example, https://localhostnode/219).

The way to fix this is to check the setting for the Expire module's "Include base URL in expires" setting, and determine if a slash needs to be added or not.

Please review the attached patch, this is working for us with 7.x-1.x-dev. Thanks.

ron_s’s picture

Sorry, just realized there is another issue with the patch in #14. Will post an update shortly.

ron_s’s picture

Ok here is an update. This should take care of it.

Need to handle both parts of $purge_url_parts: 'host' and 'path'. This ensures the purge is done correctly when using the default PURGE method, and the URL is displayed as it should be in the headers.

Please review, thanks.

donquixote’s picture

I think the solution should be more agnostic of the expire module and global settings.
The purge_urls() function should handle any relative paths or absolute paths sent to it.

There are 3 cases:
- The purge url is a full url including the host.
- The purge url is just a path, without the slash.
- The purge url is just the path but contains a leading slash for some reason. This could happen if someone passes the 'path' part from a parse_url() to the purge_urls() function. I am not aware of any contrib module which does this, but better safe than sorry.

We can use '/' . ltrim($purge_url_parts['path'], '/') to normalize the path part.
We can use isset($purge_url_parts['host']) to check if it is a relative or absolute url.