Community Documentation

Pathologic

Last updated February 1, 2011. Created by Garrett Albright on May 10, 2008.
Edited by Glottus, not_Dries_Buytaert, deviantintegral, hass. Log in to edit this page.

Note: This documentation is in need of a bit of updating. Your experience may vary somewhat from what is described below. Sorry about that. Please contact me or file an issue if you encounter any confusion.

Pathologic is an input filter which can correct paths in links and images in your Drupal content in situations which would otherwise cause them to “break;” for example, if the URL of the site changes, or the content was moved to a different server. Pathologic can also solve the problem of missing images and broken links in your site’s RSS feeds. This module also automatically transforms unaliased (local/ internal) paths to aliased paths, if one exists, as explained here.

Example use cases

Here's some hypothetical situations in which Pathologic can save the day.

  • You run a personal site, and the address of your site has recently changed. Perhaps you moved to a shiny new domain name, or perhaps you moved the Drupal installation from one subdirectory to another. Now all the images and internal links in your content don't work. You could go through your site node by node and update all the paths… or you could install Pathologic.
  • You oversee a site which has testing and production servers at separate URLs. Copy-editors (and/or you) edit content on the testing server, and that eventually gets pushed over to the production server. When those darn editors link to other content on the site, they sometimes link to content using the test server's URL; these links break when the content is published to the production server. You could get frustrated at your editors (and/or yourself) when this happens… Or you could install Pathologic and never have to worry about it again.
  • Your Drupal site has been up for a while, but you've recently discovered the Clean URLs feature and enabled it. Your links still work, but they still have that ugly "?q=" thing in them, and you have better things to do with your time than go through all your content to prettify the links. Or maybe you're going the other way; you used to have Clean URLs enabled, but you've had to disable it, and now your links are broken. Pathologic to the rescue!
  • Links and/or images in your site content use relative paths (eg, <a href="tag/food/pizza">) which work fine for people reading content on your site, but break gracelessly for people reading the content through RSS or some other sort of external feed. You could just start using absolute paths instead, but you're too set in your ways and would rather have a tool like Pathologic do it for you.

Installation

Pathologic is an input filter, so getting it installed and configured is a little bit more difficult than standard modules, but the instructions below will walk you through the process.

  1. Install the Pathologic module as normal. (If you’re a total Drupal newbie, these instructions for installing modules may be helpful – and welcome to the community, by the way!)
  2. Go to Administration > Site configuration > Input formats (Drupal 6) or Administration > Configuration > Content authoring > Text formats (Drupal 7). A list will appear of the various input formats your site uses. Find one in the list which you want to use Pathologic with, and click the “configure” link for that format.
  3. On the next page, find the section labeled “Filters” or “Enabled filters.” Check the box next to “Pathologic.” All other options on this page can be left alone. If you're using Drupal 7, rearrange the filters in the “Filter processing order” section so that Pathologic is at the bottom. Click the “Save configuration” button at the bottom.
  4. Drupal 6 only: This will take you back to the same page, with a message telling you “The input format settings have been updated.” Now, find the “Rearrange” tab at the top of the page and click it. This will bring you to a list of filters that this input format uses, in the order that they are executed. Ninety-nine-point-nine percent of the time, Pathologic should be the last input filter run on the text, so it should be at the bottom of this list. It may already be at the bottom; if it is not, drag Pathologic to the bottom of the list or adjust the values in the “Weight” column so that Pathologic has the highest value. Click “Save configuration” when done. (Exception: If you use the Image Resize Filter, it may need to run after Pathologic; see this issue. In this case, the Image Resize Filter should be below Pathologic in this list.)
  5. If you wish to use Pathologic with other input formats, go back to step 2 and repeat the process.
  6. Pathologic is now working on all old and new content which uses the input format(s) you added it to.

The reason why Pathologic should almost always be the last input filter to run on the text (step 5) is because it will only work properly on pure HTML, so any input filters which convert some sort of non-HTML markup (BBCode, Markdown, Textile, etc) to HTML need to run first.

Is configuration necessary?

Depending on how you intend to use Pathologic and how the paths in your currently-existing content are formed, further configuration may not be necessary. To understand if further configuration is necessary in your case, and to explain how to go about that configuring, allow me to take a moment to explain how Pathologic works.

Pathologic looks at paths that are located in href attributes of links (<a> tags), as well as the src attributes of image tags and tags for other embedded media (<img>, <embed>, etc). After finding a path in an attribute, Pathologic then determines if a path is “local.” It does its magic on local paths, but leaves other paths alone.

Let's assume that your Drupal site is up and running at http://example.com/drupal/. Pathologic considers a path local if:

  • The path is a relative path. That is, it does not have a protocol fragment (such as http://) and does not begin with a slash. For example, tags/food/pizza will be considered a local path, but /tags/food/pizza and http://drupal.org/tags/food/pizza are not.
  • The path is an absolute path that points to a resource located within your Drupal installation. Our example is located at http://example.com/drupal/, so http://example.com/drupal/tags/food/pizza is considered a local path. However, while http://example.com/not_drupal/ points to a resource on the same domain name, it points to something outside of the Drupal installation, so it is not considered local.
  • The path contains only an anchor fragment, such as #pizza.
  • The path is an absolute path which begins with a URI of another Drupal installation which you’ve instructed Pathologic to consider local.

Aha! That last one is where things start getting interesting. Let’s say you’ve grown tired of using http://example.com/drupal/, so you’ve moved your site over to http://example.net/. (For those interested in using Drupal in a test/production server paradigm, imagine that example.com is the test server and example.net is the production server.) If all the paths in your content are relative paths, then Pathologic will handle them perfectly – no need for further configuration. However, if they are absolute paths that begin with http://example.com/drupal/, then Pathologic will not consider them local paths and will ignore them. However, we can tell Pathologic to consider such paths as local paths and to fix them.

Configuring Pathologic

If you've determined that configuring Pathologic may be necessary, here's how to go about it.

  1. Go to Administration > Site configuration > Input formats (Drupal 6) or Administration > Configuration > Content authoring > Text formats (Drupal 7). A list will appear of the various input formats your site uses. Find one in the list which you are using Pathologic with, and click the “configure” link for that format. (Note that if you are using Pathologic with more than one input format, you will have to repeat this configuration process for each input format.)
  2. Drupal 6 only: Click the “Configure” tab at the top of the next page. Find the Pathologic section on the next page – it should be near the bottom.
  3. Enter the paths of other/previous Drupal installations which should be considered local in the “Also considered local” text field. Enter one path per line. For the above example, we’d want to enter http://example.com/drupal/.
  4. Click the “Save configuration” button when done.

(Note for those using testing and production servers; in cases where it would be inconvenient to have separate settings on each server, it’s safe to put the path for the “current” server in the “Also considered local” field. Pathologic will simply remove it when it does its trick. In other words, both the example.com and example.net servers can have both http://example.com/drupal/ and http://example.net/ in the field.)

Rule Order

Note that the order in which you include the rules may be important. In a scenario with three servers (for example, development on http://local.example.com/drupal, testing on http://example.com/drupal, and production on http://example.com) and using a WYSIWYG editor (as noted below), the following rules (paths to be matched) will work in order:

http://local.example.com/drupal/
http://local.example.com/
http://example.com/drupal/
http://example.com/
/drupal/
/

Pathologic works by stripping out the path as defined above and ensuring that what remains will work as a path relative to the site's $base_url (the URL you see in the location bar when looking at your <front> page), no matter which server it is on. It works down the list in order and only applies the first rule that matches a given link. Placing http://local.example.com first in the list, for instance, will rewrite http://local.example.com/drupal/about as /drupal/about, and this will NOT be rewritten further by the /drupal/ rule, so all possible permutations should be included in order of deepest to shallowest matching path. Some of the example rules may not be necessary for your situation, but it is safe to include even unlikely paths that you would like corrected (removed).

Now sit back and enjoy the fruits of Pathologic’s labor.

WYSIWYG editor compatibility

If the site is using a WYSIWYG content editor such as FCKeditor, TinyMCE, etc and Pathologic doesn’t seem to be doing anything, it may be due to the fact that such editors often try to output paths which begin with a slash character. Such paths are usually ignored by Pathologic, because Pathologic considers such paths to be absolute. However, you can trick Pathologic into working with such paths by using the “Also considered local” field. If the Drupal installation is at the root level of a web site (such as http://example.com/), simply enter a single slash into the “Also considered local” field. If it's in a subdirectory (such as http://example.com/foo/drupal/), enter the full subdirectory path, with slashes at both the beginning and end (so /foo/drupal/ in this case). See the “Configuring Pathologic” section above for more information.

Site copies within a subdirectory

For local development copies of a website, it is common to place them in their own subdirectory, so that each website is in http://localhost/project1, http://localhost/project2, and so on. To fix links beginning with a slash, use the same configuration as for WYSIWYG editors by entering a single slash into the “Also considered local” field. Depending on the site configuration, the Pathologic filter may need to be enabled for all input formats, including those without a WYSIWYG editor associated with them (such as Full HTML).

Migrating from Path Filter

Path Filter is an input filter which works similarly to Pathologic, but requires one to type a prefix of “internal:” before all internal paths they want Path Filter to function on. A down side to this is that a site’s content becomes strewn with these bits, and if Path Filter is disabled, those “internal:” prefixes are going to be spat out to web browsers that won’t know what to do with them. That’s one of the reason I avoided using such “hints” in Pathologic.

If you are interested in migrating from Path Filter to Pathologic, be aware that Pathologic will automatically look for a prefix of “internal:” in your paths, and will ignore it if found. This means you should be able to use Pathologic as a drop-in replacement to Path Filter, with no additional configuration.

Caching issues

Drupal caches the output of input formats for speed. This can cause some stale data problems with the paths that Pathologic creates if circumstances change to make those paths incorrect. See this issue and this issue for examples of this sort of problem which have come up in real-world use. Unfortunately, there’s no real good way to fix this without making Pathologic something other than a standard input filter (and cacheable). To avoid these sorts of problems, consider these tips:

  • Do not change the URL path of established nodes, particularly if you have linked to them in your site content. Decide on a good URL path when the node is created and keep it. (If changing the path is truly necessary, change the path on the node editing form as normal, then go to Administer > Site building > URL aliases and create a new path which points the “old” path to the node to avoid breaking both internal and external links.)
  • Pathologic may behave unpredictably if only part of your site or some of your users are connected via an HTTPS connection; namely, some of the links will have an https:// protocol prefix and some will have an http:// one, depending on which sort of connection the user is using when the content is run through the input format. To avoid this, I suggest that HTTPS support be all-or-nothing on your site; either all connections use it, or none. Also, if your site did not previously use HTTPS connections but you’ve recently enabled this (or vice versa), flush your site’s cache so that Pathologic rebuilds paths in your content to use (or not use) the https:// prefix. To flush the site’s cache, go to Administer > Site information > Performance and click the “Clear cached data” near the bottom of the page.
  • When migrating Drupal database contents from one site to another, exclude the contents of the cache tables (basically, all tables with names which begin with “cache”). This is actually a good idea whether you're using Pathologic or not. On a Unix-like system, the following shell command can strip data out of a database dump which generally shouldn’t be migrated when moving a Drupal database from one site to another.

    sed -E -e "/^INSERT INTO \`(cache|watchdog|sessions)/d" < /path/to/dump.sql > /path/to/dump-stripped.sql

    You will need to tweak the regular expression a bit if your database uses a table prefix. If you are unable to run this command or otherwise avoid migrating cache data, you should clear your site’s cache after importing the data; you can do this by going to Administer > Site configuration > Performance and clicking the “Clear cached data” button near the bottom of the page.

Questions? Suggestions? Need help?

Please open an issue on Pathologic’s issue queue or contact the author and I’ll get back to you soon. Thanks for trying Pathologic!

Page status

No known problems

Log in to edit this page

About this page

Drupal version
Drupal 6.x, Drupal 7.x

Site Building Guide

Drupal’s online documentation is © 2000-2012 by the individual contributors and can be used in accordance with the Creative Commons License, Attribution-ShareAlike 2.0. PHP code is distributed under the GNU General Public License.
nobody click here