The default robots.txt file provided by Drupal disallows crawling of search/* paths. However, the current 7.x versions of advanced search modules such as Apache Solr Search Integration and Search API enable site builders to create search pages via the admin interface which may (and usually do) exist outside of the standard search/* path. This causes SEO headaches when faceted navigation is used because the crawlers have a tendency to get stuck in loops and index loads of duplicate data. The conversation that brought this information to light is posted at #1370342: Implement a setting to add "rel=nofollow" to facet links. The following articles provide some supporting evidence:

The approach taken by Facet API is to provide site builders with a checkbox option to add the rel="nofollow" attribute to facet links. The downside is that faceted navigation could be used for a site's IA in some instances, so this would prevent valuable content from being crawled. I wanted to check with the SEO experts as to whether or not this is the correct approach, and if so whether it would make sense to add an item to the checklist instructing people to make sure this setting is enabled.

Other issues on this topic posted against other projects are listed below for reference:
#197783: Module makes database balloon in size - avoid logging the guided searches (Faceted Search)
#371542: Ponder facets acting as link spam, add rel=nofollow? (Apache Solr Search Integration)
#1370342: Implement a setting to add "rel=nofollow" to facet links (Facet API)

Comments

cpliakas’s picture

Issue summary: View changes

Updated issue summary.

Dave Reid’s picture

I'd also recommend that Facet API integrate with RobotsTxt and implements hook_robotstxt().

cpliakas’s picture

That's a great recommendation. Issue posted at #1376494: Integrate with the RobotsTxt module. Thanks, Dave!

seoegghead’s picture

Nofollow is really not the best approach. I think even Matt Cutts has said as much, but I'd have to dig up the source.

There are a few valid approaches I've outlined online, as well as at conferences (email me and I can send you the slides). Among the correct techniques is using "selective robots.txt," which is also the most straightforward technique, and the one I advise most of the time. Mike P. from Distilled surveyed/outlined a few of the methods here, including that one:
http://www.seomoz.org/blog/building-faceted-navigation-that-doesnt-suck

We mostly agree. I wrote about it more over here:
http://www.clickz.com/clickz/column/2035517/robots-faceted-navigation, and you can see what we do on these example sites: http://www.seoegghead.com/our-work/ (link is not spam, it's a list of sites you can play with that apply the selective robots.txt technique).

There is also a new feature in Webmaster Tools that hints to Google that a parameter a filter, but it's Google-proprietary. I haven't used it, but it doesn't require changes to your link generation functions. Since you may have to generate hundreds of links for faceted navigation (esp. with multi-select), you have to be careful how you generate them, generate them consistently (order of parameters etc.), and quickly (if you profile your link generation function, you'll find out you have to optimize it).

The main reason you can't use something like meta noindex because it's already too late. It's already a spider-trap, and then it misses the good stuff. Over ~2000 documents with a few facets and you have a problem. Even some major big-box stores have a spider-trap problem. Fortunately for them, robots will tolerate more duplicate content. Smaller (as in less important) sites need be more worried.

Hope this helps. We've used Drupal for a few CMS projects (not involving faceted search), and even if I don't contribute the code, I can probably help. Feel free to say hi.

Cheers,
Jaimie @ http://www.seoegghead.com/

seoegghead’s picture

Issue summary: View changes

Updated issue summary.

cpliakas’s picture

Thanks for the excellent overview, Jamie! Non-code contributions are incredibly valuable and very welcome. I am working through the resources you posted, and I welcome any feature requests or advice in issues posted against the Facet API module. I look forward to the continued conversation, because I feel that if we come up with a good solution then Drupal could be the de facto platform for SEO friendly faceted navigation with interesting user experiences.

~Chris

Ben Finklea’s picture

Status: Active » Closed (outdated)