Note: This documentation is targeted for the 7.x-3.x branch of Pathologic. The 7.x-2.x branch works similarly, except that the global configuration method is not available; only per-format configuration is possible. The 6.x-3.x branch is “current” for those still using Drupal 6; however, due to fundamental changes in the way the current Drupal 7 code works, this documentation won’t exactly match.

Pathologic is an input filter which attempts to alter paths in your content so that they are correct in situations which would otherwise cause them to “break;” for example, if the URL of the site changes, or the content was moved to a different server. Pathologic can also solve the problem of missing images and broken links in your site’s RSS feeds.

Example use cases

Here’s some hypothetical situations in which Pathologic can save the day.

  • Links and/or images in your site content use relative paths (eg, <a href="tag/food/pizza"> instead of <a href="http://example.com/tag/food/pizza">) which work fine for people reading content on your site, but break gracelessly when content is syndicated via RSS, Atom, or maybe even a REST interface or the like. Pathologic can ensure that those paths are always full paths with a server fragment so that the paths always work no matter how or where the content is consumed.
  • The address of your site has changed. Perhaps you moved to a shiny new domain name, or perhaps you moved the Drupal installation from one subdirectory to another. Now all the images and internal links in your content don’t work. Using Pathologic is an alternative to going through all of the site’s content and correcting the paths manually.
  • Your site has more than one copies at separate URLs; for example, testing and production servers. Or perhaps it is accessible via both HTTP and HTTPS, and when links or images switch between the two on the same page, web browsers throw scary warnings at visitors. Perhaps copy-editors edit content on the testing server, and that content eventually gets pushed over to the production server. When the editors link to other content on the site, perhaps they sometimes link to content using the test server’s URL; these links break when the content is published to the production server. Pathologic can correct those paths so that they’re always pointing at the “current” correct URL.
  • Your Drupal site has been up for a while, but you’ve recently discovered the Clean URLs feature and enabled it. Your links still work, but they still have that ugly ?q= thing in them, and you have better things to do with your time than go through all your content to prettify the links. Or maybe you’re going the other way; you used to have Clean URLs enabled, but you’ve had to disable it, and now your links are broken. Pathologic to the rescue!

Installation

Pathologic is an input filter, so getting it installed and configured is a little bit more difficult than standard modules, but the instructions below will walk you through the process.

  1. Install the Pathologic module as normal. (If you’re a total Drupal newbie, you can read up on how to install Drupal modules to your site – and welcome to the community, by the way!)
  2. Go to Administration » Configuration » Content authoring » Text formats (admin/config/content/formats). A list will appear of the various input formats your site uses. Find one in the list with which you want to use Pathologic, and click the “configure” link for that format. If you’re unfamiliar, you can learn more about text formats and input filters.
  3. On the next page, find the section labeled “Enabled filters.” Check the box next to “Pathologic.” Scroll down a bit to the “Filter processing order” section and ensure that Pathologic is at the bottom of the list; if it is not, rearrange the filters using the draggable arrows in the left column (or the Weight menus, if you have JavaScript disabled) so that it is. Click the “Save configuration” button at the bottom. (If your browser has JavaScript disabled, you’ll have to click “Save configuration” between each step.)
  4. If you wish to use Pathologic with other input formats, go back to step 2 and repeat the process.
  5. Pathologic is now working on all old and new content which uses the input format(s) you added it to.

The reason why Pathologic should almost always be the last input filter to run on the text is because it will only work properly on pure HTML, so any input filters which convert some sort of non-HTML markup (BBCode, Markdown, Textile, etc) to HTML need to run first.

How Pathologic works

Depending on how you intend to use Pathologic and how the paths in your currently-existing content are formed, further configuration may not be necessary. To understand if further configuration is necessary in your case, and to explain how to go about that configuring, allow me to take a moment to explain how Pathologic works.

Pathologic looks at paths that are located in href attributes of links (<a> tags), as well as the src attributes of image tags and tags for other embedded media (<img>, <embed>, etc). After finding a path in an attribute, Pathologic then determines if a path is “local”, It does its magic on local paths, but leaves other paths alone.

Let’s assume that your Drupal site is up and running at http://example.com/drupal/. Pathologic considers a path local if:

  • The path is a relative path. That is, it does not have a protocol fragment (such as http://) and does not begin with a slash. For example, tags/food/pizza will be considered a local path, but /tags/food/pizza and http://drupal.org/tags/food/pizza are not.
  • The path is an absolute path that points to a resource located within your Drupal installation. Our example is located at http://example.com/drupal/, so http://example.com/drupal/tags/food/pizza is considered a local path. However, while http://example.com/not_drupal/ points to a resource on the same domain name, it points to something outside of the Drupal installation, so it is not considered local.
  • The path contains only an anchor fragment, such as #pizza.
  • The path is an absolute path which begins with a URI of another Drupal installation which you’ve instructed Pathologic to consider local.

Aha! That last one is where things start getting interesting. Let’s say you’ve grown tired of using http://example.com/drupal/, so you’ve moved your site over to http://example.net/. (For those interested in using Drupal in a test/production server paradigm, imagine that example.com is the test server and example.net is the production server.) If all the paths in your content are relative paths, then Pathologic will handle them perfectly – no need for further configuration. However, if they are absolute paths that begin with http://example.com/drupal/, then Pathologic will not consider them local paths and will ignore them. However, we can tell Pathologic to consider such paths as local paths and to fix them.

Configuring Pathologic

Pathologic stores configuration in two ways: globally, and per-text format. By default, when you add the Pathologic filter to a format, it will use the global configuration unless you configure it otherwise. In most cases, just sticking with global configuration will work fine, and reduces the potential for confusion resulting when Pathologic works one way with one text format and a different way with another. However, you may want to use per-format configuration in certain cases; for example, if you want content on your site to use protocol-relative URLs, but content syndicated via RSS to use absolute URLs.

To modify the global configuration for Pathologic:

  1. Go to Administration » Configuration » Content authoring » Pathologic (admin/config/content/pathologic).
  2. Select the desired output format of Pathologic-processed paths from the “Processed URL format” field. The explanatory text for the field should explain the consequences of each option.
  3. Enter the paths of other/previous Drupal installations which should be considered local in the “Also considered local” text field. Enter one path per line. For the above example, we’d want to enter http://example.com/drupal/.
  4. Click the “Save configuration” button when done.

(Note that it’s fine to put the path for the “current” server in the “Also considered local” field. Pathologic will simply remove it when it does its trick. In other words, both the example.com and example.net servers can have both http://example.com/drupal/ and http://example.net/ in the field. This means that each server can be configured identically. This will make life easier if you’re using Features to manage configuration.)

To set per-text format options for Pathologic:

  1. Go to Administration » Configuration » Content authoring » Text formats (admin/config/content/formats). Find the format you wish to set Pathologic options for, and click the corresponding “configure” link.
  2. Find Pathologic’s settings in the “Filter settings” section of the format settings form. You’ll see a “Settings source” radio button with two options: “Use global Pathologic settings” and “Use custom settings for this text format.” Select the latter option.
  3. In the “Custom settings for this text format” section, configure Pathologic as above.
  4. Click the “Save configuration” button when done.
  5. Should you decide you want Pathologic to again use the global settings for this text format, simply edit the format settings again and change “Settings source&rquo; back to “Use global Pathologic settings.” The other local settings will be ignored and Pathologic will return to using the global settings.

Now sit back and enjoy the fruits of Pathologic’s labor.

WYSIWYG editor compatibility

If the site is using a WYSIWYG content editor such as CKeditor, TinyMCE, etc and Pathologic doesn’t seem to be doing anything, it may be due to the fact that such editors often try to output paths which begin with a slash character. Such paths are usually ignored by Pathologic, because Pathologic considers such paths to be absolute. However, you can trick Pathologic into working with such paths by using the “Also considered local” field. If the Drupal installation is at the root level of a web site (such as http://example.com/), simply enter a single slash into the “All base paths for this site” field. If it’s in a subdirectory (such as http://example.com/foo/drupal/), enter the full subdirectory path, with slashes at both the beginning and end (so /foo/drupal/ in this case). See the “Configuring Pathologic” section above for more information.

Migrating from Path Filter

Path Filter is an input filter which works similarly to Pathologic, but requires one to type a prefix of “internal:” or “files:” before all internal paths they want Path Filter to function on. A down side to this is that a site’s content becomes strewn with these bits, and if Path Filter is disabled, those “internal:” prefixes are going to be spat out to web browsers that won’t know what to do with them. That’s one of the reason I avoided using such “hints” in Pathologic.

If you are interested in migrating from Path Filter to Pathologic, be aware that Pathologic will automatically look for a prefix of “internal:” or “files:” in your paths, and behave appropriately. This means you should be able to use Pathologic as a drop-in replacement to Path Filter, with no additional configuration.

Alter Pathologic’s behavior - hook_pathologic_alter()

If you are a developer, you may be interested to know that Pathologic implements a hook which allows you to alter how it will construct a new URL, or even bypass constructing a new URL entirely. Check out the pathologic.api.php file in the module directory for documentation and example code for hook_pathologic_alter(). Some examples of things you could do by implementing this hook include:

  • Have Pathologic bypass constructing a new URL if the path would be to a particular file, or to a file in a particular subdirectory (handy if you have a non-Drupal directory under your root Drupal directory which you want to link to).
  • Have paths to images altered so that they point to a copy of the image on your site’s CDN instead of its main server.
  • Remove or add query parameters to the URL that will be generated.
  • Alter older path structures to reflect newer ones. For example, if your articles used to have paths like articles/new-pizza-trends.html, but your paths now look like magazine/articles/new-pizza-trends, that alteration could be done in a hook_pathologic_alter() implementation so that links in the old format in site content would continue to work.

Caching issues

Drupal caches the output of input formats for speed. This can cause some stale data problems with the paths that Pathologic creates if circumstances change to make those paths incorrect. See this issue and this issue for examples of this sort of problem which have come up in real-world use. Unfortunately, there’s no real good way to fix this without making Pathologic something other than a standard input filter (and cacheable). To avoid these sorts of problems, consider these tips:

  • Do not change the URL path of established nodes, particularly if you have linked to them in your site content. Decide on a good URL path when the node is created and keep it. (If changing the path is truly necessary, change the path on the node editing form as normal, then go to Administration > Configuration > Search and metadata > URL aliases (admin/config/search/path) and create a new path which points the “old” path to the node to avoid breaking both internal and external links.)
  • When migrating Drupal database contents from one site to another, exclude the contents of the cache tables (basically, all tables with names which begin with “cache”). This is actually a good idea whether you’re using Pathologic or not. If you are unable to exclude cached data from your dumps or otherwise avoid migrating cache data, you should clear your site’s cache after importing the data; you can do this by going to Administration > Configuration > Development > Performance (admin/config/development/performance) and clicking the “Clear all caches” button near the top of the page, or by running drush cc all if your server has Drush installed.

Upgrading from Pathologic 7.x-2.x

If you're already familiar with the 7.x-2.x branch of Pathologic, this brief section of the manual will cover important changes in the 7.x-3.x branch, and, by extension, the 8.x-1.x branch. If you intend to upgrade a Drupal 7 site using Pathologic to Drupal 8, you must upgrade Pathologic to 7.x-3.x first; an upgrade path from 7.x-2.x will not be supported.

As of Pathologic 7.x-3.x, Pathologic can now be configured both on a per-text format basis, as in 7.x-2.x, and globally. The global configuration allows several or all instances of Pathologic across different text formats to easily be configured with the same settings without having to configure every instance of the filter across every format the same way. By default, your Pathologic instances on your site’s currently-existing text formats will continue to use per-text format settings, so behavior shouldn’t change at all between 7.x-2.x and 7.x-3.x just by upgrading (if it does, please consider it a bug and report it in Pathologic’s issue queue). However, you can convert those filter instances to use the global configuration instead, and adding Pathologic to a new format will cause it to use the global settings by default.

To modify the global settings, check out the new form at Administration » Configuration » Content authoring » Pathologic (admin/config/content/pathologic). The form is pretty similar to the per-text format form as existed before.

To toggle Pathologic between using global and per-format settings for a format, edit the Pathologic settings for the format. You’ll see a new “Settings source” radio button with two options: “Use global Pathologic settings” and “Use custom settings for this text format.” Select the former option if you want Pathologic to use the global settings for that format, or the latter one to use the per-format settings and thus behave as it did in 7.x-2.x.

Questions? Suggestions? Need help?

Please open an issue on Pathologic’s issue queue or contact the author and I’ll get back to you soon. Thanks for trying Pathologic!

Comments

Nicolas Bouteille’s picture

Just wanna give some feedback about a problem I just fixed. Why posting here ? Because I really thought Pathologic would help when it finally couldn't.
I need to add images dynamically from a javascript file. On my production server the site is on the root when on my local machine it is located at localhost:8888/mysite/.

Pathologic doesn't not seem to be able to do its magic on javascript files, only on Drupal's text formats so because my path were wrong in my javascript files it simply couldn't load and Pathologic never had the change to correct anything.

The solution I found is to use the Drupal variable Drupal.settings.basePath just like that :
$("li.custom_item").prepend('<img class="custom_item" src="'+Drupal.settings.basePath+'sites/all/themes/ etc...

Now on my production site it automatically generates http://www.mysite.com/sites and on my local machine : http://localhost:8888/mysite/sites

The other problem I had was with CKEditor when giving the path to my custom CSS file. I used to write /sites/all/themes/.../css/myfile.css and it worked on my production site. But on my local machine it was looking for http://localhost:8888/sites instead of localhost:8888/mysite/sites.

Here, again CKEditor was looking for the css path dynamically in javascript but I couldn't use the magical Drupal basePath variable anymore. However I saw that CKEditor was providing a "host" placeholder "%h". I didn't see the use of it when I was working on my production server but it became clear. So now my path is %hsites/all/themes/.../myfile.css note that there isn't any space between %h and sites

Hope this will help anybody one day !

Cheers
Drupal rocks

reblutus’s picture

I have a problem with pathologic which I will describe in the forum. But I wanted to view/edit the Pathologic settings but can't find themI tried the instruction on this page (in Configuring Pathologic, point 1.). On my instalation (D7), there is no option "Pathologic" in Content Authoring.

I have the Pathologic module enabled...

Did the setttings changed place and this page has not updated its info?

Garrett Albright’s picture

reblutus, global settings for Pathologic, which is what you'll find in the "Pathologic" section under "Content authoring," is only available in the 7.x-3.x brach of Pathologic, so if you can't find it, I'm guessing you're still using the 7.x-2.x branch.

freelylw’s picture

the reason I install this module because I am using the "forward" module to let people email the nodes out. but there is problem with the link of the imagefield and the viewfield which including inside the node.

After I install this Pathologic, I have put the site url "http://mysite.com/" in the "All base paths for this site", but when I email the nodes out by using the forward module, the links for imagefield and viewfield still broken, missing the part of the site url in the front.

Is pathologic not support this imagefield things or is there anything I can do to fix this ? Thanks.

Garrett Albright’s picture

freelyw, sorry, but I'm not going to continue to provide support in this comment thread - it will be too messy. If you're still having trouble, please create an issue in the issue queue for the project. I don't always respond quickly, but I will eventually.

esuarez’s picture

Recently I migrate my site to a new server, i found problems with the route of my images, to solve this I installed pathologic to try to resolve this problem, and work fine, but when I add a new image, the route of the image was duplicate.
http://www.anep.edu.uy/anep2/anep2/sites/default/files/images/2018/notic...
instead of http://www.anep.edu.uy/anep2/sites/default/files/images/2018/noticias/18...
I use Drupal 7.59 and I have installed pathologic 7.x-3.1

thanks

fkelly12054@gmail.com’s picture

I let myself be misled by the documentation: "Pathologic stores configuration in two ways: globally, and per-text format. By default, when you add the Pathologic filter to a format, it will use the global configuration unless you configure it otherwise. In most cases, just sticking with global configuration will work fine, and reduces the potential for confusion resulting when Pathologic works one way with one text format and a different way with another"

I first set up the global configuration and was puzzled that it wasn't doing anything. Then I realized that, while you can use the global configuration, for it to have any effect you also have to activate it within a text format (full html in my case). Within that format you can tell it to use the global configuration but, for it to have any effect it needs to be set in BOTH places. Hope this is correct and saves someone else an hour or two of hacking around. Thanks for the module.

solideogloria’s picture

Note: If you use the content filter restricting images to the current site:

Disallows usage of <img> tag sources that are not hosted on this site by replacing them with a placeholder image.

In order for Pathologic to work, the Pathologic filter must be applied before the image restriction filter. Otherwise, the placeholder image (like this: Only local images are allowed.) will show instead for images with base paths that aren't the current site's base path, even if you have it in the settings.