Hi, would it be possible to implement a wild-card when defining additional paths to be considered local?
We use tons of subdomains and it is painful to insert them all in the path box, and keep them updated.

Instead of writing
http://sub0.example.com
http://sub1.example.com
http://sub2.example.com
http://sub3.example.com

writing
http://*.example.com

Or just
*.example.com

would be great

THANKS, GREAT MODULE!

Comments

Garrett Albright’s picture

An interesting idea, but I'm not sure I see the practical application. Are you really sharing content among several subdomains like that?

An option to allow absolute paths to be output without the server fragment may help here if you're only concerned about web output (as opposed to feeds).

jm.federico’s picture

Hi Garrett

We do have a complex setup, multi site with one domain and multiple subdomains, sharing content, different template per subdomain, secure sites, pff, it is a long list, and we are using absolute path with no server fragment, but the problem is that when content gets created sometimes ppl mange to put the full path (with subdomain and everything*) and those links are not being converted by pathologic, so some links are redirecting visitors from one subdomain to another one and the whole site changes.

*I know how they accomplish that, when they are linking to internal pages they just copy/paste from the address bar and even when we've asked them not to do that or at least remove the server fragment, surprise, they still do it.

Finally I gave up, I figure I need to find a solution, which in this case is writing the subdomains as local-paths but it is a looooong list and I have to add the http and https versions for all of them.

That's why a wild card would make my life just easier.

I took a look at the code but even when I'm OK with PHP, regex are a different story, and I just don't get them.

Any way, not a matter of life or death, but would be nice to have it there.

Cheers ;)

dkruglyak’s picture

Category: feature » task

This feature is really needed. We have a very similar setup and facing similar configuration headaches.

I also strongly suggest breaking out "paths to be considered local" configuration into separate domain and path components. If you have X domains and Y possible paths under those domains right now you have to manually enter X*Y combinations!

We ran into this problem with combinations because we had http://domain.com line precede http://domain.com/extra-path. Because the regex parsing happens only once with the first match triggered only, we ended up with incorrect /extra-path links.

So right now even the *order* of "paths to be considered local" is significant and it should not really be. Regex should make two passes, first to match domains ONLY, second to match sub-paths ONLY...

Garrett Albright’s picture

Hmm.

and I have to add the http and https versions for all of them.

In case you weren't aware, I recommend against using Pathologic on sites where HTTP and HTTPS are mixed. There may be a solution to this in the future, but it will be a trade-off.

I also strongly suggest breaking out "paths to be considered local" configuration into separate domain and path components. If you have X domains and Y possible paths under those domains right now you have to manually enter X*Y combinations!

So are you saying your sites look something like:

http://server1.xyz/foo/
http://server1.xyz/bar/
http://server2.xyz/foo/
http://server2.xyz/bar/

Frankly, I'm not going to take your idea of using two path fields, as I think that's just complicating things for this particular edge case. But perhaps permitting asterisks in this field is not a bad idea.

dkruglyak’s picture

@Garrett: You are correct, this is exactly how our sites look like. Even if you do not want to go with two path fields, just allowing wildcards is going to go a long way to solving the problem we are having.

It would also be incredibly helpful to expose pathologic's services to be called from other modules, not just as an input filter. This is immediately necessary here #477332: Skip root-relative and protocol-relative URLs and support the Pathologic module. I have not yet rolled the patch for that module, but right now I basically have to copy / paste the code from pathologic ending up with something really ugly like that:

  // If pathologic is enabled, use its settings to perform similar processing on the path
  if (module_exists('pathologic')) {

    // Retrieve matching paths settings
    $paths = array();
    $directive = trim(variable_get("filter_pathologic_abs_paths_1", '')); // using Filtered HTML format
    if ($directive !== '') {
      $paths = array_map('trim', explode("\n", $directive)); // get rid of white space on each line
    }
    $paths[] = _pathologic_url('<front>');
    $paths = array_unique($paths);

    // Build regexp, match twice (with or without leading slash) and then trim slashes
    $path_suffix =  '/?(index.php)?(\?q=)?'; // the optional suffix to match
    $path_regexp = '(' . implode($path_suffix.'|',$paths) . $path_suffix . ')';
    $original_path = $path;  
    $path = preg_replace($path_regexp, '', $path);
    $path = preg_replace($path_regexp, '', '/'.$path);
    $path = trim($path, '\\/'); 
  }

Hope you could instead re-organize the module to make it easy to call from outside.

P.S. There should be at least documentation to explain that the order of configured paths matters. It took me some trial and debugging to find out that is the problem.

Garrett Albright’s picture

Okay, I committed asterisk support to the D7 branch of Pathologic. It turned out to be suspiciously easy… so it probably introduced some bugs or broke some edge cases or something. If you have a D7 install, please give it a try.

This doesn't help the D6 branch, but the D7 branch is pretty much a complete rewrite, so a simple backport won't be possible. That being said, it would probably be good to just backport the entire D7 branch as a new D6 release one of these days. Maybe even tonight… but I'm done coding for the moment.

dkruglyak’s picture

What do you mean "simple backport won't be possible" ??? D7 is not the option for most people for at least a year.

Seems like the backport should be the simple matter of moving regex. Right?

Garrett Albright’s picture

No, because as I said, the D7 port is a complete rewrite. There is almost no code shared between the two. Part of this is due to the inherent differences between how D6 and D7 handle input filters, and part of it is due to challenging myself to find a more efficient way to do what Pathologic does.

Garrett Albright’s picture

Version: 6.x-2.0-beta23 » 6.x-3.x-dev

Okay, I've backported the 7.x-1.x branch into a 6.x-3.x branch which contains this fix. If you're feeling up to it, please back up your database and try upgrading to 6.x-3.x (it should correctly update your settings). Thanks.

dkruglyak’s picture

Awesome thanks. Did you do at least some basic testing? How ready is 6.x-3.x branch in general?

jm.federico’s picture

@Garrett Albright
Thanks for that.

@dkruglyak
will test and report back.

Garrett Albright’s picture

Status: Active » Fixed
dkruglyak’s picture

Title: wild card for Additional paths » Wildcards & URL rewriting hooks
Status: Fixed » Needs work
StatusFileSize
new1.17 KB

I thought I'd reopen this issue and post here, given that I commented on integration with CDN module in #5.

Pathologic module should process not only HREF / SRC properties as input filter, but also provide rewriting / streamlining services for any URL generated in Drupal. The immediate use case for this is integration with the CDN module, which can be accomplished by adding one simple hook pathologic_file_url_alter.

Initial patch is attached, though I should say I only implemented the bare minimum regex that does not take care of wildcards and might be missing some other use cases (e.g. non-clean URLs). Hopefully this patch can be adopted here, corrected as needed and ultimately integrated into the module.

Garrett Albright’s picture

That is really a separate issue, though. It's also sort of outside the scope of what Pathologic as an input filter can do, as well. Note that you've hard-coded the input format ID in order to load the variable with the paths in it ('filter_pathologic_local_paths_1' - in real-world use, that 1 at the end could be hundreds of different integers). This hook implementation doesn't provide any context as to which input format is in use - understandable, as under normal cases, it probably doesn't matter.

Perhaps this could work if there were an admin interface so admins could select which input format to pretend is in use when this hook is called.

But all that being said, it's going to be impossible for me to test this, as I don't use CDN myself. Perhaps you could consider spinning that patch off into your own module which depends on Pathologic and upload it to Drupal.org?

dkruglyak’s picture

Correct, this could be a separate issue - I posted it here temporarily to start discussion about possible solutions.

The real question is whether we should view Pathologic module as just an input filter vs. all-purpose module to ensure that all URLs cranked out by Drupal are properly localized. Right now these URLs reside either in content (which is why input filter is a good start) OR being generated as links.

I believe it would make great sense to share the core config and regex functionality for both input filters and basic URL choppers. My patch here simply takes a first crack at solving this in a way I could use right away. Definitely, we could have a different config for this vs. hardcoding it into default filter. This by the way brings the question of whether Pathologic should duplicate URL processing rules for different filters or take that configuration outside.

What I would like to do above all is to have shared regex for processing URLs whether they are found in content or passed on standalone basis. Unfortunately with the current code structure there was no clean way call your existing code. Presumably though I could use a URL I want to localize to generate "fake" HTML mask, like <a href="myurl"></a> to pass to your existing code, though maybe there could be a more organic API/solution...

As far as the hook/function being added, I believe it will be present in core D7, put there specifically to accommodate CDN requirements... Adding this chunk of code to Pathologic in D6 should not imperil its existing functionality as input filter. I might just suggest restructuring it a little to better manage configs and regex code.

So what do you recommend we do for the ultimate solution here?

Garrett Albright’s picture

Title: Wildcards & URL rewriting hooks » wild card for Additional paths
Status: Needs work » Fixed

Branching off to another issue: #820910: Non-filter URL rewriting

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

leon kessler’s picture

Version: 6.x-3.x-dev » 8.x-1.x-dev
Category: Task » Feature request
Issue summary: View changes
StatusFileSize
new2.84 KB

It seems as though this feature got lost somehow on the journey to 8/9.

This is something we needed for our project, for the same reasons as those who posted here 12 years ago (can't quite believe how old some of this stuff on Drupal is).

I've updated the ticket to the 8.x-1.x version, although I'm unable to reopen it (as only maintainers can do this).

Patch attached uses fnmatch to accommodate wildcards, which although is specifically designed for filename matching, I think works just as well in this case (and is less confusing to use than regex).

eli-t’s picture

Update of the patch in #18 rerolled against 2.0.0-alpha2