Note: This documentation is targeted for the 7.x-2.x branch of Pathologic. The 6.x-3.x branch is also “current” for those still using Drupal 6; however, due to fundamental changes in the way the current Drupal 7 code works, this documentation won’t exactly match.
Pathologic is an input filter which attempts to alter paths in your content so that they are correct in situations which would otherwise cause them to “break;” for example, if the URL of the site changes, or the content was moved to a different server. Pathologic can also solve the problem of missing images and broken links in your site’s RSS feeds.
Example use cases
Here’s some hypothetical situations in which Pathologic can save the day.
- The address of your site has changed. Perhaps you moved to a shiny new domain name, or perhaps you moved the Drupal installation from one subdirectory to another. Now all the images and internal links in your content don’t work. Using Pathologic is an alternative to going through all of the site’s content and correcting the paths manually.
- Your site has more than one copies at separate URLs; for example, testing and production servers. Or perhaps it is accessible via both HTTP and HTTPS, and when links or images switch between the two on the same page, web browsers throw scary warnings at visitors. Perhaps copy-editors edit content on the testing server, and that content eventually gets pushed over to the production server. When the editors link to other content on the site, perhaps they sometimes link to content using the test server’s URL; these links break when the content is published to the production server. Pathologic can correct those paths so that they’re always pointing at the “current” correct URL.
- Your Drupal site has been up for a while, but you’ve recently discovered the Clean URLs feature and enabled it. Your links still work, but they still have that ugly
?q=thing in them, and you have better things to do with your time than go through all your content to prettify the links. Or maybe you’re going the other way; you used to have Clean URLs enabled, but you’ve had to disable it, and now your links are broken. Pathologic to the rescue!
- Links and/or images in your site content use relative paths (eg,
<a href="tag/food/pizza">instead of
<a href="http://example.com/tag/food/pizza">) which work fine for people reading content on your site, but break gracelessly for people reading the content through RSS or some other sort of external feed. Pathologic can ensure that those paths are always full paths with a server fragment so that the paths always work.
Pathologic is an input filter, so getting it installed and configured is a little bit more difficult than standard modules, but the instructions below will walk you through the process.
- Install the Pathologic module as normal. (If you’re a total Drupal newbie, you can read up on how to install Drupal modules to your site – and welcome to the community, by the way!)
- Go to Administration » Configuration » Content authoring » Text formats (
admin/config/content/formats). A list will appear of the various input formats your site uses. Find one in the list which you want to use Pathologic with, and click the “configure” link for that format. If you’re unfamiliar, you can learn more about text formats and input filters.
- If you wish to use Pathologic with other input formats, go back to step 2 and repeat the process.
- Pathologic is now working on all old and new content which uses the input format(s) you added it to.
The reason why Pathologic should almost always be the last input filter to run on the text is because it will only work properly on pure HTML, so any input filters which convert some sort of non-HTML markup (BBCode, Markdown, Textile, etc) to HTML need to run first.
How Pathologic works
Depending on how you intend to use Pathologic and how the paths in your currently-existing content are formed, further configuration may not be necessary. To understand if further configuration is necessary in your case, and to explain how to go about that configuring, allow me to take a moment to explain how Pathologic works.
Pathologic looks at paths that are located in
href attributes of links (
<a> tags), as well as the
src attributes of image tags and tags for other embedded media (
<embed>, etc). After finding a path in an attribute, Pathologic then determines if a path is “local”, It does its magic on local paths, but leaves other paths alone.
Let’s assume that your Drupal site is up and running at
http://example.com/drupal/. Pathologic considers a path local if:
- The path is a relative path. That is, it does not have a protocol fragment (such as
http://) and does not begin with a slash. For example,
tags/food/pizzawill be considered a local path, but
- The path is an absolute path that points to a resource located within your Drupal installation. Our example is located at
http://example.com/drupal/tags/food/pizzais considered a local path. However, while
http://example.com/not_drupal/points to a resource on the same domain name, it points to something outside of the Drupal installation, so it is not considered local.
- The path contains only an anchor fragment, such as
- The path is an absolute path which begins with a URI of another Drupal installation which you’ve instructed Pathologic to consider local.
Aha! That last one is where things start getting interesting. Let’s say you’ve grown tired of using
http://example.com/drupal/, so you’ve moved your site over to
http://example.net/. (For those interested in using Drupal in a test/production server paradigm, imagine that
example.com is the test server and
example.net is the production server.) If all the paths in your content are relative paths, then Pathologic will handle them perfectly – no need for further configuration. However, if they are absolute paths that begin with
http://example.com/drupal/, then Pathologic will not consider them local paths and will ignore them. However, we can tell Pathologic to consider such paths as local paths and to fix them.
- Go to Administration » Configuration » Content authoring » Text formats (
admin/config/content/formats). A list will appear of the various input formats your site uses. Find one in the list which you are using Pathologic with, and click the “configure” link for that format. (Note that if you are using Pathologic with more than one input format, you will have to repeat this configuration process for each input format.)
- Select the desired output format of Pathologic-processed paths from the “Processed URL format” field. The explanatory text for the field should explain the consequences of each option.
- Enter the paths of other/previous Drupal installations which should be considered local in the “Also considered local” text field. Enter one path per line. For the above example, we’d want to enter
- Click the “Save configuration” button when done.
(Note that it’s fine to put the path for the “current” server in the “Also considered local” field. Pathologic will simply remove it when it does its trick. In other words, both the
example.net servers can have both
http://example.net/ in the field. This means that each server can be configured identically. This will make life easier if you’re using Features to manage configuration.)
Now sit back and enjoy the fruits of Pathologic’s labor.
WYSIWYG editor compatibility
If the site is using a WYSIWYG content editor such as CKeditor, TinyMCE, etc and Pathologic doesn’t seem to be doing anything, it may be due to the fact that such editors often try to output paths which begin with a slash character. Such paths are usually ignored by Pathologic, because Pathologic considers such paths to be absolute. However, you can trick Pathologic into working with such paths by using the “Also considered local” field. If the Drupal installation is at the root level of a web site (such as
http://example.com/), simply enter a single slash into the “All base paths for this site” field. If it’s in a subdirectory (such as
http://example.com/foo/drupal/), enter the full subdirectory path, with slashes at both the beginning and end (so
/foo/drupal/ in this case). See the “Configuring Pathologic” section above for more information.
Migrating from Path Filter
Path Filter is an input filter which works similarly to Pathologic, but requires one to type a prefix of “
internal:” or “
files:” before all internal paths they want Path Filter to function on. A down side to this is that a site’s content becomes strewn with these bits, and if Path Filter is disabled, those “
internal:” prefixes are going to be spat out to web browsers that won’t know what to do with them. That’s one of the reason I avoided using such “hints” in Pathologic.
If you are interested in migrating from Path Filter to Pathologic, be aware that Pathologic will automatically look for a prefix of “
internal:” or “
files:” in your paths, and behave appropriately. This means you should be able to use Pathologic as a drop-in replacement to Path Filter, with no additional configuration.
Alter Pathologic’s behavior -
If you are a developer, you may be interested to know that Pathologic implements a hook which allows you to alter how it will construct a new URL, or even bypass constructing a new URL entirely. Check out the
pathologic.api.php file in the module directory for documentation and example code for
hook_pathologic_alter(). Some examples of things you could do by implementing this hook include:
- Have Pathologic bypass constructing a new URL if the path would be to a particular file, or to a file in a particular subdirectory (handy if you have a non-Drupal directory under your root Drupal directory which you want to link to).
- Have paths to images altered so that they point to a copy of the image on your site’s CDN instead of its main server.
- Remove or add query parameters to the URL that will be generated.
- Alter older path structures to reflect newer ones. For example, if your articles used to have paths like
articles/new-pizza-trends.html, but your paths now look like
magazine/articles/new-pizza-trends, that altercation could be done in a
hook_pathologic_alter()implementation so that links in the old format in site content would continue to work.
Drupal caches the output of input formats for speed. This can cause some stale data problems with the paths that Pathologic creates if circumstances change to make those paths incorrect. See this issue and this issue for examples of this sort of problem which have come up in real-world use. Unfortunately, there’s no real good way to fix this without making Pathologic something other than a standard input filter (and cacheable). To avoid these sorts of problems, consider these tips:
- Do not change the URL path of established nodes, particularly if you have linked to them in your site content. Decide on a good URL path when the node is created and keep it. (If changing the path is truly necessary, change the path on the node editing form as normal, then go to Administration > Configuration > Search and metadata > URL aliases (
admin/config/search/path) and create a new path which points the “old” path to the node to avoid breaking both internal and external links.)
- When migrating Drupal database contents from one site to another, exclude the contents of the cache tables (basically, all tables with names which begin with “cache”). This is actually a good idea whether you’re using Pathologic or not. If you are unable to exclude cached data from your dumps or otherwise avoid migrating cache data, you should clear your site’s cache after importing the data; you can do this by going to Administration > Configuration > Development > Performance (
admin/config/development/performance) and clicking the “Clear all caches” button near the top of the page, or by running
drush cc allif your server has Drush installed.