url parameters like &quicktabs_1=0 create duplicate content in search engines

bille - January 6, 2009 - 04:13
Project:Quick Tabs
Version:6.x-2.0-rc3
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:postponed
Description

The parameters that Quick Tabs adds to the tab links causes search engines to create duplicate indexed pages, which isn't great for SEO.

My quick and dirty fix was to add 'rel' => 'nofollow' as an attribute within the _quicktabs_construct_tab_attributes function like so

$attributes = drupal_attributes(array(
    'id' => $id,
    'class' => $class,
    'href' => $href,
    'rel' => 'nofollow'
  ));

Of course, that might not be the right fix, so I'm open to other suggestions. I did add a robots.txt entry to keep the spiders away from those links, but not all search engines support wildcards in robots.txt.

Oh, and thanks for Quick Tabs! Great module. Dropped right in, and the zen themed tabs looked great on my site with no css tweaks. This module "just works".

#1

Pasqualle - February 13, 2009 - 16:13
Version:6.x-2.0-rc1» 6.x-3.x-dev
Category:bug report» feature request
Status:active» postponed

The solution is wrong as it adds 'nofollow' to all tabpages, when only the first tabpage link could be considered as duplicate content..

I think solving issue #361114: Theme templates, would be sufficient to change links as you like.. Marking as postponed until then.

#2

corbacho - March 22, 2009 - 13:43

Thanks bille,
I think I will apply that solution. I don't agree with Pasqualle. For Google Webmaster Tools it seems all the quicktabs are duplicate meta description and also it creates duplicate titles.

I have attached a screenshot, where I have 1 page with 11 duplicate meta tags. Due of Sort & order forum links and also to Quicktabs.

For sort & order I will use this solution http://tips.webdesign10.com/drupal/forum-module-330.html

I will report soon to confirm these solutions when GWT updates the state.

AttachmentSize
Picture.png 22.73 KB

#3

tsi - May 28, 2009 - 17:20
Version:6.x-3.x-dev» 6.x-2.0-rc3

This is truley a problem for me too, how can I add 'nofollow' in 6.x-2.0-rc3 ?
Couldn't find this function anywhere (quicktabs_construct_tab_attributes)
Thanks.

#4

highvoltage - June 16, 2009 - 12:59

Can we add something to robots.txt to solve this?

#5

paardje - July 23, 2009 - 14:47

is there any solution??

#6

gregarios - August 5, 2009 - 14:42

I'd like to see Quick Tabs module add a robots.txt entry to fix this.

This is an acceptable fix for Google at least. The official robots.txt doesn't support wildcards, but Google specifically allows this:

Disallow: /*?quicktabs_*

Can the module automatically create this entry in the robots.txt file?

Update: After adding the entry above to my robots.txt file a few days ago, Google is now correctly reading it and is restricting itself from following the Quick Tabs links. Google is not giving an error on the syntax either.

#7

nikmahajan - September 5, 2009 - 03:30

I am also facing the similar issue. My path to article is like this:
http://example.com/content/abc-def-ghi-jkl

After using quicktabs module, it started generating duplicate paths like this one:
http://example.com/content/abc-def-ghi-jkl?quicktabs_2=1
http://example.com/content/abc-def-ghi-jkl?quicktabs_2=2
http://example.com/content/abc-def-ghi-jkl?quicktabs_3=1 , etc.

My question is what should I put in the robots.txt file to stop Google robot from indexing such pages. Should it be like

Disallow: /content/*?quicktabs_*/

or something else.

#8

gregarios - November 4, 2009 - 00:37

Use exactly what I stated before. It is tried and tested through Google's Webmaster Tools. It works and follows Google's own guidelines on the subject:

Disallow: /*quicktabs_*

This will block any URL with a quicktabs_ anywhere in it.

#9

nikmahajan - September 5, 2009 - 05:33

As I am displaying titles as /content/xxxxxxxxxxxxxxxxxxxxx
so wouldn't it be
Disallow: /content/*?quicktabs_*

instead of /*?quicktabs_* alone

#10

gregarios - November 4, 2009 - 00:38

As I am displaying titles as /content/xxxxxxxxxxxxxxxxxxxxx
so wouldn't it be
Disallow: /content/*quicktabs_*

instead of /*quicktabs_* alone

Nope.

Asterisks are wildcards in this case. The first one stands for anything before the quicktabs_ and the one at the end stands for everything afterward.

Google has thoughtfully hidden this from people in their help section. You have to go to:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
Then click on the "Manually create a robots.txt file" link partway down the page.
In there, you'll see a section with the following:

Pattern matching

Googlebot (but not all search engines) respects some pattern matching.

* To match a sequence of characters, use an asterisk (*). For instance, to block access to all subdirectories that begin with private:

User-agent: Googlebot
Disallow: /private*/

* To block access to all URLs that include a question mark (?) (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):

User-agent: Googlebot
Disallow: /*?

* To specify matching the end of a URL, use $. For instance, to block any URLs that end with .xls:

User-agent: Googlebot
Disallow: /*.xls$

You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows:

User-agent: *
Allow: /*?$
Disallow: /*?

The Disallow: / *? directive will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string).

The Allow: /*?$ directive will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?).

#11

nikmahajan - September 5, 2009 - 07:34

thanks you so much gregarios. I got it.

#12

jonnyp - October 9, 2009 - 11:44

I'm not happy with using robots.txt to fix this as each tab link will still consume some trust + link juice, even if it is blocked by robots.txt

I looked in quicktabs.module and at line 340 found this:

  $link_options = array(
    'attributes' => array(
      'id' => $id,
      'query' => $querystring,
      'class' => $class,
    ),

If you remove 'query'=>$querystring you will lose the ?quicktabs_1= ... from your urls, leaving the #parameter intact. This does not adversely affect pre-loaded tabs but I have not tried with Ajax which I presume might need the parameter.

I've not patched this as I don't know the module writer's purpose in using the querystring but if you are concerned about duplicate content etc its an easy fix to make.

#13

Pasqualle - October 9, 2009 - 15:58

The querystring is required for QT to use it without javascript.

#14

superflyman - October 27, 2009 - 00:51

I am using the Disallow: /*?quicktabs_* workaround and for the most part this takes care of the majority of the duplications (about 6000!!) but for some reason there are about 100 or more that show up as pages with duplicate tags in google. Any thoughts?

#15

gregarios - November 4, 2009 - 00:39

I am using the Disallow: /*?quicktabs_* workaround and for the most part this takes care of the majority of the duplications (about 6000!!) but for some reason there are about 100 or more that show up as pages with duplicate tags in google. Any thoughts?

I modified the code in the posts above... this code gets rid of more. Apparently some don't have question-marks in them:
Disallow: /*quicktabs_*

#16

superflyman - November 7, 2009 - 23:28

@gregarios

Thanks, I'll give it a go and check the analytics....

 
 

Drupal is a registered trademark of Dries Buytaert.