If you run a large site (mine is around 200,000 pages), then search engine traffic volume and associated server load are significant issues as search engines revisit each page periodically. It's best if you can direct search engines to read only the versions of pages you want indexed. Telling them after they access the url that they shouldn't index the content means a lot of wasted traffic.

As such, it's best if URLs for print pages, email to friend pages, and any other per-node accessory pages are able to be excluded as a set using robots.txt.

Currently the Print Friendly Pages module uses URLs like: http://www.example.com/node/766/print which can't be excluded by robots.txt because they lack a common prefix. By contrast, if they used an URL like http://www.example.com/print/node/766 then you could have robots.txt exclude all of http://www.example.com/print/

If people specify URLs using the Path module, it would be nice to have the print module use corresponding URLs like http://www.example.com/print/user/specified/path/ that could be caught by the same robots.txt exclusion.

Files: 
CommentFileSizeAuthor
#1 print.module.patch1.19 KBngaur

Comments

Version:4.6.x-1.x-dev» 4.7.x-1.x-dev
StatusFileSize
new1.19 KB

The attached patch will change the url format to a robots.txt friendly one.

I figured that if the site admin doesn't want print urls indexed, then they don't need the search engine to visit them at all. Hence
this patch also adds rel="nofollow" to the links to the printer friendly pages where appropriate.

The patch applies against the 4.7.0 version of the module. Backporting should be straightforward.

mabye an important thing to remember: printer friendly pages are also search engine friendly pages!

Status:Active» Closed (fixed)

I have changed the url to print/nid in HEAD.

Changes to add a robots noindex, nofollow were already done some time ago.