I had issues with Drupal's caching of input filters results, on a site which is accessible, behind a reverse proxy, by several different URLs :
- external use : http://domain.tld/sitename1 and http://domain.tld/sitename2
- internal user : https://internaldomain.tld/sitename3

I was hoping to use Pathologic to prevent issues with incorrect URLs in content, but as content is cached after being filtered once, the URL matched the one used to access the node for the first time (eg. http://domain.tld/sitename2/mylink). I know it is possible to enable a "no-cache" option in hook_filter, would it be possible to add it as an option, with something like this in pathologic_filter():

  elseif($op === 'no cache') {
    return variable_get('pathologic_no_cache', FALSE);
  }
CommentFileSizeAuthor
#1 pathologic_no_cache-0.patch1.47 KBmdupont

Comments

mdupont’s picture

StatusFileSize
new1.47 KB

Quick patch attached.

Garrett Albright’s picture

I don't see how your patch changes Pathologic's or the filter system's behavior… Maybe you could spell it out for me?

mdupont’s picture

Sure. I found in the API documentation that hook_filter() has an $op parameter 'no cache', which will disable caching for the filter when TRUE is returned, so that filters will be run against the content at each request. You can find an example implementation at http://api.drupal.org/api/drupal/developer--examples--filter_example--fi...

mdupont’s picture

Status: Active » Needs review
Garrett Albright’s picture

Okay, I get it. I was actually unfamiliar with the "no cache" $op value…

But have you tried unchecking the "Output full absolute URLs" option (and then clearing your site cache)? I think that might solve the problem you outline in the original post. Give it a try and let me know if it works for you; if not, I'll consider this patch further (probably with some more dire text warning people that they should not enable it unless they know what they're doing).

mdupont’s picture

Actually the examples I gave in the original post weren't exactly right. Depending on the domain name used for access, the relative path is not the same. The setup is more like the following: you can access the site from http://subdomain1.domain1.tld, or from http://internaldomain.tld/subfolder1, or from http://internaldomain.tld/subfolder2, in which case the relative URL didn't work because the subfolder is not always the same.

Garrett Albright’s picture

I opted not to include this code in the new release I just made. I think it could be there in the future, though. When I can find the time, I'll set up my local dev machine to be accessible from many URLs like that and do some testing.

mdupont’s picture

OK, so let's keep it in "Needs review" for now. Thanks for having taken the time to look after this issue, even though it's a corner case.

apemantus’s picture

As an update/datapoint: I'm using Domain Access and i18n together.

Usecase: if I have example.com (in English) and example.cn (in Chinese, but with English as second language with prefix) then on example.com/foo I want to link to bar, on example.cn/en/foo I want to link to en/bar - foo and bar being identical nodes under Domain Access.

Due to the caching, it depends which site gets hit first to see which url is generated.

In an ideal world, there'd be a way for caching to work for each separate domain, but I have no idea if this is even possible. As it is, it sounds like turning off the cache would be a decent compromise (I'm using Boost to cache the pages as traffic is anonymous apart from admins so I'm not too fussed about the performance hit at this stage).

I'll set up a test installation and try and test this patch out.

apemantus’s picture

I've only run a quick test and I'm not sure this patch fully works: once I applied it, it didn't initially work.

Uninstalling the module, reinstalling it and then saving the format did. However, it looks like (I could be wrong) that $op === "no cache" is only run once (when the filter is saved or the filter installed - not sure which) and that you can't toggle it on and off. I could totally be wrong about this, as I've spent a fair bit of time already uninstalling/reinstalling/resaving etc on my slow local machine.

Maybe you can handle the issue this way, but there's a chance we (people who want this feature) may have to hardcode a hack into every release of pathologic.

apemantus’s picture

Another update: it looks like if one input filter on a format institutes "no cache" than none of the other filters are cached either (which I guess makes sense), so rather than hack pathlogic in its current state, I'll create a custom filter to be included alongside it which does nothing other than call "no cache".

mdupont’s picture

Makes sense, maybe we could create such a dummy filter module.

apemantus’s picture

If it's of any use to anybody, this is what I stuck in my custom module:

function example_filter($op, $delta = 0, $format = -1, $text = '', $cache_id = 0) {
  switch ($op) {
    case 'list':
      return array(0 => t('No cache filter'));
    case 'description':
      return t('Prevents input filter being cached - used to chain to pathologic to help multisite links');
    case 'no cache':
      return true;
    default:
      return $text;
  }
}
mdupont’s picture

Status: Needs review » Fixed

Considering this issue fixed, as there is a solution and that it is not specific to Pathologic.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

mdupont’s picture

Small utility module to achieve that at http://drupal.org/project/no_cache