Problem/Motivation

(why the issue was filed, steps to reproduce the problem, etc.)

Google will index multiple versions of the same page because quicktabs uses own urls o display the quicktabs.
Example
Site quicktabs is on: http://www.texturecase.com
Quicktab link: http://www.texturecase.com/welcome?quicktabs_1=0#quicktabs-1

Google will now think that the second url (quicktabs) is a different site and so will index both urls as two different pages.

Proposed resolution

(description of the proposed solution, the rationale behind it, and workarounds for people who cannot use the patch)
First, various Blogposts or Articles mentioned
http://www.propdrop.com/blog/duplicate-content-problems-drupal-quicktabs...
http://2bits.com/bing/how-google-and-bing-crawlers-was-confused-quicktab...

Solution 1
Using .htaccess:
Disallow: /*quicktabs_*
Disadvantage, google still sees the urls and may index them.
Solution 2
A bit risky as its against TOS of google. Khaled suggested using settings.php to redirect the crawlers, see his blogpost(2bits) for the whole details.
Solution 3 (seems to be the best solution)
Using canonical tags
More info: http://googlewebmastercentral.blogspot.de/2007/09/google-duplicate-conte...
Solution 4
Not supporting users without javascript, see: http://drupal.org/node/354867#comment-5204212
Additionals
Add rel="nofollow" to the quicktab urls

Remaining tasks

(reviews needed, tests to be written or run, documentation to be written, etc.)

User interface changes

(new or changed features/functionality in the user interface, modules added or removed, changes to URL paths, changes to user interface text)

API changes

(API changes/additions that would affect module, install profile, and theme developers, including examples of before/after code if appropriate)

Original report by [bille]

My quick and dirty fix was to add 'rel' => 'nofollow' as an attribute within the _quicktabs_construct_tab_attributes function like so

 $attributes = drupal_attributes(array(
    'id' => $id,
    'class' => $class,
    'href' => $href,
    'rel' => 'nofollow'
  ));

Of course, that might not be the right fix, so I'm open to other suggestions. I did add a robots.txt entry to keep the spiders away from those links, but not all search engines support wildcards in robots.txt.

Oh, and thanks for Quick Tabs! Great module. Dropped right in, and the zen themed tabs looked great on my site with no css tweaks. This module "just works".
(for legacy issues whose initial post was not the issue summary)

The parameters that Quick Tabs adds to the tab links causes search engines to create duplicate indexed pages, which isn't great for SEO.

CommentFileSizeAuthor
#2 Picture.png22.73 KBcorbacho

Comments

pasqualle’s picture

Version: 6.x-2.0-rc1 » 6.x-3.x-dev
Category: bug » feature
Status: Active » Postponed

The solution is wrong as it adds 'nofollow' to all tabpages, when only the first tabpage link could be considered as duplicate content..

I think solving issue #361114: Theme functions or template files, would be sufficient to change links as you like.. Marking as postponed until then.

corbacho’s picture

StatusFileSize
new22.73 KB

Thanks bille,
I think I will apply that solution. I don't agree with Pasqualle. For Google Webmaster Tools it seems all the quicktabs are duplicate meta description and also it creates duplicate titles.

I have attached a screenshot, where I have 1 page with 11 duplicate meta tags. Due of Sort & order forum links and also to Quicktabs.

For sort & order I will use this solution http://tips.webdesign10.com/drupal/forum-module-330.html

I will report soon to confirm these solutions when GWT updates the state.

tsi’s picture

Version: 6.x-3.x-dev » 6.x-2.0-rc3

This is truley a problem for me too, how can I add 'nofollow' in 6.x-2.0-rc3 ?
Couldn't find this function anywhere (quicktabs_construct_tab_attributes)
Thanks.

highvoltage’s picture

Can we add something to robots.txt to solve this?

paardje’s picture

is there any solution??

gregarios’s picture

I'd like to see Quick Tabs module add a robots.txt entry to fix this.

This is an acceptable fix for Google at least. The official robots.txt doesn't support wildcards, but Google specifically allows this:

Disallow: /*?quicktabs_*

Can the module automatically create this entry in the robots.txt file?

Update: After adding the entry above to my robots.txt file a few days ago, Google is now correctly reading it and is restricting itself from following the Quick Tabs links. Google is not giving an error on the syntax either.

nikmahajan’s picture

I am also facing the similar issue. My path to article is like this:
http://example.com/content/abc-def-ghi-jkl

After using quicktabs module, it started generating duplicate paths like this one:
http://example.com/content/abc-def-ghi-jkl?quicktabs_2=1
http://example.com/content/abc-def-ghi-jkl?quicktabs_2=2
http://example.com/content/abc-def-ghi-jkl?quicktabs_3=1 , etc.

My question is what should I put in the robots.txt file to stop Google robot from indexing such pages. Should it be like

Disallow: /content/*?quicktabs_*/

or something else.

gregarios’s picture

Use exactly what I stated before. It is tried and tested through Google's Webmaster Tools. It works and follows Google's own guidelines on the subject:

Disallow: /*quicktabs_*

This will block any URL with a quicktabs_ anywhere in it.

nikmahajan’s picture

As I am displaying titles as /content/xxxxxxxxxxxxxxxxxxxxx
so wouldn't it be
Disallow: /content/*?quicktabs_*

instead of /*?quicktabs_* alone

gregarios’s picture

As I am displaying titles as /content/xxxxxxxxxxxxxxxxxxxxx
so wouldn't it be
Disallow: /content/*quicktabs_*

instead of /*quicktabs_* alone

Nope.

Asterisks are wildcards in this case. The first one stands for anything before the quicktabs_ and the one at the end stands for everything afterward.

Google has thoughtfully hidden this from people in their help section. You have to go to:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
Then click on the "Manually create a robots.txt file" link partway down the page.
In there, you'll see a section with the following:

Pattern matching

Googlebot (but not all search engines) respects some pattern matching.

* To match a sequence of characters, use an asterisk (*). For instance, to block access to all subdirectories that begin with private:

User-agent: Googlebot
Disallow: /private*/

* To block access to all URLs that include a question mark (?) (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):

User-agent: Googlebot
Disallow: /*?

* To specify matching the end of a URL, use $. For instance, to block any URLs that end with .xls:

User-agent: Googlebot
Disallow: /*.xls$

You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows:

User-agent: *
Allow: /*?$
Disallow: /*?

The Disallow: / *? directive will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string).

The Allow: /*?$ directive will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?).

nikmahajan’s picture

thanks you so much gregarios. I got it.

jonnyp’s picture

I'm not happy with using robots.txt to fix this as each tab link will still consume some trust + link juice, even if it is blocked by robots.txt

I looked in quicktabs.module and at line 340 found this:

  $link_options = array(
    'attributes' => array(
      'id' => $id,
      'query' => $querystring,
      'class' => $class,
    ),

If you remove 'query'=>$querystring you will lose the ?quicktabs_1= ... from your urls, leaving the #parameter intact. This does not adversely affect pre-loaded tabs but I have not tried with Ajax which I presume might need the parameter.

I've not patched this as I don't know the module writer's purpose in using the querystring but if you are concerned about duplicate content etc its an easy fix to make.

pasqualle’s picture

The querystring is required for QT to use it without javascript.

superflyman’s picture

I am using the Disallow: /*?quicktabs_* workaround and for the most part this takes care of the majority of the duplications (about 6000!!) but for some reason there are about 100 or more that show up as pages with duplicate tags in google. Any thoughts?

gregarios’s picture

I am using the Disallow: /*?quicktabs_* workaround and for the most part this takes care of the majority of the duplications (about 6000!!) but for some reason there are about 100 or more that show up as pages with duplicate tags in google. Any thoughts?

I modified the code in the posts above... this code gets rid of more. Apparently some don't have question-marks in them:
Disallow: /*quicktabs_*

superflyman’s picture

@gregarios

Thanks, I'll give it a go and check the analytics....

Anonymous’s picture

subscribing

pazzypunk’s picture

I'm going to give it a try ... looks easy enough ...

naden’s picture

You better overwrite the themeable function theme_quicktabs_tabs(...) so you can update without hassle.

jrwyse’s picture

I did a combination of the everything in this thread, in panic mode today, as my site is brand new and being indexed quickly. The * character in robots.txt only works for Google. For extra protection, it's important to use rel='nofollow' in the < a > tags. The directions below will implement both solutions.

Add the following in robots.txt:

Disallow: /*quicktabs_*

Search quicktabs.module for the following:

  $link_options = array(
    'attributes' => array(
      'id' => $id,
      'class' => $class,

Add the following line below it:

      'rel' => 'nofollow',

I'm relatively new to Drupal. It's amazing how much I've been able to do simply by searching Google and reading on these forums. So, I hope this post helps someone!

peterjlord’s picture

function yourtheme_quicktabs_tabs($quicktabs, $active_tab = 'none') {
  $output = '';
  $tabs_count = count($quicktabs['tabs']);
  if ($tabs_count <= 0) {
    return $output;
  }

  $index = 1;
  $output .= '<ul class="quicktabs_tabs quicktabs-style-'. drupal_strtolower($quicktabs['style']) .'">';
  foreach ($quicktabs['tabs'] as $i => $tab) {
    $class = 'qtab-'. $i;
    // Add first, last and active classes to the list of tabs to help out themers.
    $class .= ($i == $active_tab ? ' active' : '');
    $class .= ($index == 1 ? ' first' : '');
    $class .= ($index == $tabs_count ? ' last': '');
    $attributes_li = drupal_attributes(array('class' => $class));
    $options = _quicktabs_construct_link_options($quicktabs, $i);
    
    // Added these 2 line
    $options['html'] = TRUE; /* Just for css styling */
    $options['attributes']['rel'] = 'nofollow';  /* Stop the robots */
    
    $output .= '<li'. $attributes_li .'>'. l('<span>' . $tab['title'] . '</span>', $_GET['q'], $options) .'</li>';
    $index++;
  }
  $output .= '</ul>';
  return $output;
}

caominh’s picture

Quicktabs Duplicated URLs and Baidu

I am having a similar problem search engines indexing duplicate, quicktabs URLs on my Chinese site. In fact, most of my highest ranked pages on the Chinese search engine, Baidu, come from these quicktabs urls.
I would like to eliminate these duplicate urls, but I have some questions about potential solutions

Best Solution Seems to Be to Use the Nodewords Canonical URL Tag

This fix is already referred to here: http://groups.drupal.org/node/27194
Have already created more than 5000 pages that contain the Quicktabs code
Is there any way to apply the canonical URL tag to existing pages without editing each page manually?

Alternate Solution: Edit the Robots.txt File

This solution is referred to above
However, the primary search engine involved is Baidu, not Google, and wildcards do not work for Baidu
Is there a way to implement this solution without the wildcards
Also, with so many Quicktabs pages already indexed by Baidu, would excluding them through robots.txt lead to a drop in search rank now that all of those ranked pages will be missing?

Third Solution: Edit the .htaccess File

Is it possible to resolve this issue by redirecting references to the quicktabs urls to the standard url through a redirect in the htaccess file?

Any feedback on these questions would be most appreciated by this noob - Thanks!

Andrew Gorokhovets’s picture

subscribing

sk33lz’s picture

I worked in search engine optimization for 3 years. I would look at the comment here by JennySmith and draw your own conclusions, but gregarios is correct in #6 here. Nofollow would also be advisable as spiders will still still pass link juice to pages they don't index as per your robots.txt. Stop the link and stop the leak.

ressa’s picture

I much prefer adding rel="nofollow" to Quick Tabs links, than editing the robots.txt, I don't think people should have to edit their robots.txt file to use Quick Tabs.

ressa’s picture

To remove the quicktabs query string, and redirect to the same page with the standard URL, add this to your .htaccess file:

# remove query string ?page=1&quicktabs_2=1 from web crawlers 
RewriteCond %{QUERY_STRING} ^(.*)quicktabs_(.*)$
RewriteRule ^(.*)$ http://example.com/$1? [R=301,L]

This will remove the query string from any URL containing the string "quicktabs_", for example:
http://example.com/content/example-page?page=13&quicktabs_2=1
http://example.com/content/example-page?quicktabs_1=1

...will both be redirected to:
http://example.com/content/example-page

gregarios’s picture

@ressa:

You don't think people should have to edit their robots.txt file, but they should edit their .htaccess file? I don't see that logic.

Also, shouldn't the http://example.com/content/example-page?page=13&quicktabs_2=1 in your
example really be redirected to http://example.com/content/example-page?page=13, not http://example.com/content/example-page?

ressa’s picture

No, I don't think people should have to edit neither the robots.txt nor the .htaccess file, but the Quick Tabs module have caused Google to identify hundreds of identical Titles and Meta tags on my site.

I probably should have written in my post that it is an attempt to remove these identical posts from the search engines index.

I don't think http://example.com/content/example-page?page=13&quicktabs_2=1 should be redirected to http://example.com/content/example-page?page=13.

If it has "quicktabs_" in the query string, it's because it has arrived from a Quick Tabs link, right? If that's the case I do not want the web crawler to index that URL, but redirect it to the canonical version.

gregarios’s picture

I don't think http://example.com/content/example-page?page=13&quicktabs_2=1 should be redirected to http://example.com/content/example-page?page=13.

If it has "quicktabs_" in the query string, it's because it has arrived from a Quick Tabs link, right? If that's the case I do not want the web crawler to index that URL, but redirect it to the canonical version.

I don't believe QuickTabs adds pages to links, so in this case, http://example.com/content/example-page?page=13 would be the "canonical" version of http://example.com/content/example-page?page=13&quicktabs_2=1, would it not?

ressa’s picture

It does if you have a pager at the bottom of the QT block.

I am not even sure the above .htaccess is necessary: I just did a "Fetch as Googlebot" test, and the crawler seems to respect the Disallow: /*quicktabs_ in the robots.txt:

The page could not be crawled because it is blocked by robots.txt.

gregarios’s picture

I am not even sure the above .htaccess is necessary: I just did a "Fetch as Googlebot" test, and the crawler seems to respect the Disallow: /*quicktabs_ in the robots.txt:

As I first suggested in comment #6. ;-)

gregarios’s picture

It does if you have a pager at the bottom of the QT block.

And "if" it doesn't have a pager on the QT block, but rather on the actual page, then you've broken the link with that .htaccess entry.

dotpex’s picture

Version: 6.x-2.0-rc3 » 6.x-2.0-rc5
Priority: Normal » Minor
Status: Postponed » Needs review

Try this solution (fix for active class on a by changing link):
http://drupal.org/node/547586#comment-3560266

llite’s picture

Hi Katbailey and Pasqualle,

Can you give us an option to opt-out non-javascript support in exchange for getting rid of the query string?
Users surfing with a browser doesn't support javascript are likely aware of the possible poor experiences and they usually account for just a small portion.
I believe most webmaster won't compromise their SEO efforts just to cater these small audience.
Even Techcrunch doesn't care, why should we go through the above imperfect methods like robots.txt. .htaccess?

Appreciated if this feature request could be considered.

Regards,

XiaN Vizjereij’s picture

Priority: Minor » Major

I agree with the commons sense that SEO >> non-js support. And this is by far not a minor issue for almost every website.

philbar’s picture

Title: url parameters like &quicktabs_1=0 create duplicate content in search engines » SEO: Add Canonoical Tags to Quicktab Pages
Version: 6.x-2.0-rc5 » 6.x-3.x-dev
Category: feature » task

The problem with restricting it with robot.txt or using no-follow is that it will exclude all the quicktab parameter links from being included in SEO ranking. While this is better than being adversely hurt by appearing to have duplicate content, this is not the best solution.

Google recommends using canonical urls for solving this problem.

We need a mechanism to add a canonical url tag to the <head> of pages with quicktabs on them. This should be the page URL with all quicktab parameters stripped.

philbar’s picture

RobertOak’s picture

This seems to be the one that works, the above, modifying the code. Why this isn't rolled in to the latest dev is a good question. Quicktabs destroyed my SEO before I realized what was going on and also screwed up my statistics per reads.

sockah’s picture

Subscribing. Just realized today this is really mucking up SEO on a site I have using quicktabs.

carlos.macao’s picture

Subscribing

bleen’s picture

subscribing

Jessica A’s picture

subscribing, removing this module until this is addressed, bummer :(

geek-merlin’s picture

AJAX & NON-AJAX mode...?

hi there,

i suppose this whole discussion is based on the assumption that people use (1) NON-AJAX mode, where everything is loaded on one page and thus a spider sees duplicated content.
(and i share the opinion that adding a canonical url should be the cure)

but what about (2) AJAX mode (which i think about using on one site for some reosons)?

do i get it right that in AJAX mode there is NO seo problem, as "each tab has its url anyways"?

do i get it right that a cure for the (1) NON-AJAX mode (maybe the "canonical url" thing) should NOT be applied to (2) AJAX mode?

just wanna get this clear...

zincdesign’s picture

subscribing

ParisLiakos’s picture

Title: SEO: Add Canonoical Tags to Quicktab Pages » SEO: Add Canonical Tags to Quicktab Pages
andrew_mallis’s picture

subscribe

bleen’s picture

FYI: you no longer need to "Subscribe" ... there is a shiney new "Follow" button at the top of this page.

DEATH TO SUBSCRIBES

EvanDonovan’s picture

This issue badly needs a summary of proposed solutions, if someone could write one. Hopefully, my notes below could be helpful in doing so.

I already implemented canonical link tags months ago; however I just determined today that pages with query strings like ?page=75&quicktabs_16=2 are still included in Google's index (since I was able to find them when I searched Google using the "site:" parameter).

Thus, I am moving to the robots.txt solution, i.e., Disallow: /*quicktabs_*. I am not going to modify .htaccess for now, as was suggested in #26, since the potential for improperly rewriting URLs with query strings in them (such as ?page=#) is high. (The .htaccess code in #26 is incorrect.)

Since apparently adding canonical link tags via Nodewords is not sufficient to address this problem, a solution should come from this module. I think the suggestion from #34 makes sense: provide a checkbox in the main Quicktabs configuration that allows people to opt-out of the support for Quicktabs when Javascript is not available.

What the checkbox would do is make it so that the query string parameter would not get set in line 372 of quicktabs.module (the _quicktabs_construct_link_options() function).

It should be noted that this SEO issue with Quicktabs has been mentioned in two separate blog posts: http://www.propdrop.com/blog/duplicate-content-problems-drupal-quicktabs... & now also http://2bits.com/bing/how-google-and-bing-crawlers-was-confused-quicktab....

It should also be noted that I believe this issue applies in all branches of the Quicktabs module (6.x-2.x & 7.x as well as 6.x-3.x).

EvanDonovan’s picture

Here is a theme function which you can use to unset the Quicktabs query strings, if you aren't concerned about supporting users without Javascript:

function THEMENAME_quicktabs_tabs($quicktabs, $active_tab = 'none') {
  $output = '';
  $tabs_count = count($quicktabs['tabs']);
  if ($tabs_count <= 0) {
    return $output;
  }

  $index = 1;
  $output .= '<ul class="quicktabs_tabs quicktabs-style-'. drupal_strtolower($quicktabs['style']) .'">';
  foreach ($quicktabs['tabs'] as $tabkey => $tab) {
    $class = 'qtab-'. $tabkey;
    // Add first, last and active classes to the list of tabs to help out themers.
    $class .= ($tabkey == $active_tab ? ' active' : '');
    $class .= ($index == 1 ? ' first' : '');
    $class .= ($index == $tabs_count ? ' last': '');
    $attributes_li = drupal_attributes(array('class' => $class));
    $options = _quicktabs_construct_link_options($quicktabs, $tabkey);
    // Remove the querystring option, which was only needed for Javascript support.
    unset($options['query']);
    // Support for translatable tab titles with i18nstrings.module.
    if (module_exists('i18nstrings')) {
      $tab['title'] = tt("quicktabs:tab:$quicktabs[qtid]--$tabkey:title", $tab['title']);
    }
    $output .= '<li'. $attributes_li .'>'. l($tab['title'], $_GET['q'], $options) .'</li>';
    $index++;
  }
  $output .= '</ul>';
  return $output;
}
corbacho’s picture

I saw Khalid has made a post about fixing this issue with robots.txt.
Nothing new that was not suggested here, but I thought will be nice for reference:
http://2bits.com/bing/how-google-and-bing-crawlers-was-confused-quicktab...

#51: D'oh ... true ;)

EvanDonovan’s picture

I mentioned that in #48 :)

pasqualle’s picture

eule’s picture

hey,

this issue is from 2009 ..is any working fix aviable to set the links nofollow in the d7 Version?

marcoka’s picture

Issue summary: View changes

starting summary

marcoka’s picture

i started to write a summary, will soon be ready, more complete

marcoka’s picture

Issue summary: View changes

.

marcoka’s picture

Issue summary: View changes

.

marcoka’s picture

Issue summary: View changes

.

manuel garcia’s picture

I'm not sure if anyone has checked out this yet, but google has a page with instructions for making ajax applications crawlable:

https://developers.google.com/webmasters/ajax-crawling/

Thoughts?

Exploratus’s picture

Any work on this? Seems like a pretty big problem, and after years, nobody has a solution to this critical aspect of web design. I am planning on using quicktabs for a site, but now I am rethinking my decision based on this discussion.

Exploratus’s picture

Issue summary: View changes

.

avpaderno’s picture

Status: Active » Closed (outdated)

I am closing this issue, since it's for a Drupal version no longer supported.