Running 2 websites:
A: http://www.mywebsite.com (Drupal)
B: http://www.mywebsite.com/mysubsite (non-Drupal)

The Drupal website is multilanguage. Currently I'm unable to link to pages of the subsite because Pathologic is rewriting those URL's & adding the language-code to it.

For example: a link to http://www.mywebsite.com/mysubsite/my-page get's rewritten to http://www.mywebsite.com/fr/mysubsite/my-page .

What are my options?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Garrett Albright’s picture

Arg, yeah, that's a tough case. Could you maybe have the non-Drupal site use a separate subdomain instead of just a directory under your main domain name - so something like mysubsite.mywebsite.com?

rp7’s picture

Tought about that - but unfortunately not possible in my scenario (or atleast for the client's scenario). It's not 1 subsite, it's dozens.

How do you like to idea of implementing a hook, giving the possibility to skip pathologic rewrites (say for example only if the hook implementation returns FALSE)?

I'm willing to write a patch for this.

freblasty’s picture

Version: 7.x-2.1 » 7.x-2.x-dev
Status: Active » Needs review
FileSize
2.45 KB

This patch will allow modules to implement their own url generator logic using the hook_pathologic_url(). If a module has no interest in a particular url then it can return NULL which will cause pathologic to move on to the next hook.

The Pathologic url generator logic is used as fallback if none of the implementing modules generated an url. This will ensure that websites already using the pathologic module won't break.

rp7’s picture

+1 for commit

Garrett Albright’s picture

I'm just not sold on the idea for this yet… It seems like an unnecessary complication. Can you change my mind?

Vacilando’s picture

+1 for @freblasty's patch!

rp7’s picture

One use case is the one I described in my original post. I can imagine there are more websites which are set up like this.

Implementing this hook, I could just return the same URL & stop Pathologic from further processing it, which solves my problem.

rp7’s picture

against latest dev

freblasty’s picture

Another use case is contextual link generators: adding node references to a text using linkit will cause the text to contain anchor tags to node/nid. However when the pathologic filter processes the text it will ignore the language of the node and apply the default display language to the url (depending on your language detection setting).

Using the patch mentioned in #3 en #8 will allow module developers to write contextual link generators. In the above situation a link generator could check the node language and generate the approriate language url.

In short: pathologic can gain a contextual knowledge about the urls it is processing without having the worry about the implementation details.

Please note that the patch currently uses a hook but this could also be implemented as a plugin for pathologic.

Andrey Inkin’s picture

The alternative solution would be to create a setting field 'Exclude URLs' and check for that field before processing a URL.
Here I've done it with regular expressions:
in settings you specify /\/mysubsite\// and in _pathologic_replace before it goes into all the processing it runs the URL agains this regex:

  // Check if the path should be excluded from processing
  if (!empty($settings['current_settings']['exclude_regex'])) {
    if (preg_match($settings['current_settings']['exclude_regex'], $matches[2])) {
      return $matches[0];
    }
  }
freblasty’s picture

I still prefer a hook or plugin over the regex solution because it allows more flexibility from a developer standpoint.

Garrett Albright’s picture

Category: support » task
Status: Needs review » Needs work

RaF and freblasty, instead of hooks which returned a completed URL, would it be acceptable for hooks that altered the parameters that Pathologic eventually passes to url() itself? So the code I'd add to Pathologic would look something like…

// Params passed to url()
$url_params = array(
  'path' => $parts['path'],
  'options' => array(
    'query' => $parts['qparts'],
    // …etc
  ),
);

// Call implementations of hook_pathologic_alter()
drupal_alter('pathologic', $url_params, $parts, $settings);

$url = url($url_params['path'], $url_params['options']);

And your modules would implement something that looked like this:

function mymodule_pathologic_alter(&$url_params, $parts, $ettings) {
  // if my setting is set, add a "foo" query parameter
  if ($settings['current_settings']['my_setting']) {
    $url_params['options']['query']['foo'] = 'bar';
  }
}

I think this is a bit more subtle than the approach in the patch, but will likely still allow you to accomplish what you need to. Please let me know.

freblasty’s picture

@Garrett the patch sole purpose was to give a clear view of the functionality we need from pathologic. The code changes mentioned in #12 should provide the same functionality.

Just got one remark/question, what if two or more modules are rewriting the url params? This could result in some weird looking urls. In the patch #3 and #8 only the first hook that returns a value different from null was processed and all the others get ignored.

Andrey Inkin’s picture

According to the issue description, you need to exclude certain URLs from parsing, and with the #10 approach it's done in a more efficient way - in the beginning of the function rather than deconstructing the URL and then constructing it back with no change.
Besides if the subsite URL has 'q' parameter in it, then the $url_params['path'] variable is going to have wrong value, which is essentially why we want to exclude the URL from processing.

Garrett Albright’s picture

Just got one remark/question, what if two or more modules are rewriting the url params? This could result in some weird looking urls. In the patch #3 and #8 only the first hook that returns a value different from null was processed and all the others get ignored.

It's a valid concern, but not one I expect to happen very often - I'll bet that a year after this code gets in the module, probably less than 5% going to be using any modules which implement this alter hook at all, much less more than one. And in the rare cases where more than one will be necessary, they won't necessarily conflict with each other. Still, the precedent of "let whatever modules want to alter this go ahead and do so, regardless of potential conflicts" is a more Drupally concept than "see if any modules want to alter this and stop after the first one."

According to the issue description, you need to exclude certain URLs from parsing, and with the #10 approach it's done in a more efficient way - in the beginning of the function rather than deconstructing the URL and then constructing it back with no change.
Besides if the subsite URL has 'q' parameter in it, then the $url_params['path'] variable is going to have wrong value, which is essentially why we want to exclude the URL from processing.

Pathologic does account for the fact that the path might be in the 'q' query parameter;

<?php
  // Examine the query part of the URL. Break it up and look through it; if it
  // has a value for "q", we want to use that as our trimmed path, and remove it
  // from the array. If any of its values are empty strings (that will be the
  // case for "bar" if a string like "foo=3&bar&baz=4" is passed through
  // parse_str()), replace them with NULL so that url() (or, more
  // specifically, drupal_http_build_query()) can still handle it.
  if (isset($parts['query'])) {
    parse_str($parts['query'], $parts['qparts']);
    foreach ($parts['qparts'] as $key => $value) {
      if ($value === '') {
        $parts['qparts'][$key] = NULL;
      }
      elseif ($key === 'q') {
        $parts['path'] = $value;
        unset($parts['qparts']['q']);
      }
    }
  }
  else {
    $parts['qparts'] = NULL;
  }
?>

…So if that's the reason that you're wanting to exclude paths that Pathologic processes, then either that's not a real problem, or there's a bug going on - perhaps do further testing and create another issue if things still seem broken.

That being said, perhaps there could be a way that an alter hook implementation says "just return the original path" and url() is bypassed.

Andrey Inkin’s picture

What it's doing there with the 'q' parameter {$parts['path'] = $parts['qparts']['q'];} is correct, because in Drupal 'q' parameter can only be a path. But in other applications 'q' can be anything, so there is a case where the URL is parsed wrongly:

http://my-drupal-site.com/non-drupal-subsite/?q=some-parameter
it becomes
http://my-drupal-site.com/some-parameter

I still think it's better to not go into URL parsing if you just need the original URL.

freblasty’s picture

It's a valid concern, but not one I expect to happen very often - I'll bet that a year after this code gets in the module, probably less than 5% going to be using any modules which implement this alter hook at all, much less more than one. And in the rare cases where more than one will be necessary, they won't necessarily conflict with each other. Still, the precedent of "let whatever modules want to alter this go ahead and do so, regardless of potential conflicts" is a more Drupally concept than "see if any modules want to alter this and stop after the first one."

You're right it's a Drupally concept. When you think of it if a developer implements an alter hook then he/she is implementing some sort of special case and will therefor check for specific conditions which should reduce the chances of a conflict from happening.

That being said, perhaps there could be a way that an alter hook implementation says "just return the original path" and url() is bypassed.

Do mean like some sort of "do not process url" functionality?

Garrett Albright’s picture

http://my-drupal-site.com/non-drupal-subsite/?q=some-parameter
it becomes
http://my-drupal-site.com/some-parameter

This will be a case where a module that implements the hook I'm proposing could step in and stop Pathologic from inserting an altered URL (see below). No, it won't stop Pathologic from picking apart and trying to process the URL in the first place, but I don't really care - performance has always been somewhat of a secondary concern for Pathologic, since the output of text formats is (almost) always cached and only one node's content is filtered at a time in (almost) all cases anyway.

Do mean like some sort of "do not process url" functionality?

Something like that. I'm thinking that maybe we create a $url_params['options']['use_original'] which is FALSE by default, but we'll check it after the alter hooks run, and if it's true, we just return the original URL without running anything through url(). I'll add a $parts['original'] parameter too which will have the original unadulterated URL. So a hook implementation could do something like this:

<?php
function mymodule_pathologic_alter(&$url_params, $parts, $settings) {
  // Don't alter the URL if it has "foo" in it
  $url_params['options']['use_original'] = preg_match('/foo/', $parts['original']);
}
?>
dankobiaka’s picture

+1 for Andrey Inkin's solution.

The hook / alter approach is great for more complex problems, but why not use both? Being able to input a simple regex using the administration form is much simpler and adequate for most situations.

The issue Andrey has described with regards to the "q" parameter is big concern for us. It is commonplace for search engines to use "q" for their query parameter, but Pathologic strips it out.

Can we get a solution put together ASAP?

Garrett Albright’s picture

The hook / alter approach is great for more complex problems, but why not use both? Being able to input a simple regex using the administration form is much simpler and adequate for most situations.

Because implementing both means implementing both and manually testing both and writing automated tests for both and supporting both. And seeing as how the hook approach encompasses the regex text box approach, I'd rather just implement/test/etc that. If you know how to write regular expressions, creating a Drupal module with a hook implementation should be well within your abilities as well, at any rate. So that is what I shall do, probably over the weekend.

Can we get a solution put together ASAP?

"We?" Oh, sure! Let me just email you my bank transfer details, and I'll start coding as soon as I see the deposit come in. :P

dankobiaka’s picture

"We?" Oh, sure! Let me just email you my bank transfer details, and I'll start coding as soon as I see the deposit come in. :P

Seems like there is several people in this thread alone, myself included, that'd be more than happy to help you get this solution developed. My team have created a workaround this issue in our local version, I'd just like to have the module properly updated so we can continue to get updates for this useful module.

freblasty’s picture

+1 for Garrett Albright's alter hook implementation.

Something like that. I'm thinking that maybe we create a $url_params['options']['use_original'] which is FALSE by default, but we'll check it after the alter hooks run, and if it's true, we just return the original URL without running anything through url(). I'll add a $parts['original'] parameter too which will have the original unadulterated URL.

Like the idea and should definitely be added to the hook alter functionality.

Garrett Albright’s picture

Title: Stop Pathologic from replacing certain URL's » Allow modules to alter Pathologic's behavior - hook_pathologic_alter()
Status: Needs work » Fixed

Bagged and tagged! I'll make a new release soon.

I've added a pathologic.api.php file which documents the new hook pretty well and has a couple examples of what you should be able to do with it. Everyone, please give it a try and report back here if you spot anything not working as expected. Also added a couple of tests to the .test file, if anyone cares.

Thanks to all.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.