Will Global Redirect help with duplicate conetent in the case of multiple aliases and taxonomies?

dsamuel - May 1, 2007 - 18:13
Project:Global Redirect
Version:5.x-1.1
Component:Miscellaneous
Category:support request
Priority:normal
Assigned:Unassigned
Status:won't fix
Description

I am noticing a problem with how Google indexes my allergy website, and I wonder if this module will help.

When it comes to duplicate content, an article could exist on the front page (for a while, until it is pushed on to the next page by additional content) and under any of the taxonomy pages it is listed under (I often use multiple tags for a single article) and under any multiple URL aliases that might exist in the system.

For example, this allergy books entry also appears under the allergy section and the food allergy section. Of course as I add tagged articles, old ones will get pushed on to a new page, so if a search engine leads a visitor to the taxonomy-based page, they won't be able to find the article.

To further complicate things, I may sometimes feel I want to change the URL, but still keep the old one so as not to break any links.

What I am seeing is that Google is directing search results to one of the taxonomy summary pages rather than to the clean URL for the article itself. The main problem with this is that the article may have been pushed off the first page of the taxonomy listing, and besides, visitors have to hunt for it, rather than just arriving at the article they were looking for.

Will this module help? If not, how can I make sure that search results are always directed at the clean URL for the article, not to any other page?

Thanks

#1

nicholasThompson - May 2, 2007 - 08:12

Thanks for reposting - there are a number of issues you have here. In my opinion, most are related to site structure.

You're displaying the Full Node on the frontpage + term pages + the page itself.

Personally, I've never been a fan of the full node view. The way I see it is that I should read a little about the article and then if I'm interested I'll click on it - however there are many other groups of people who believe entirely the opposite - I gather you fall into the "other groups" category ;-)

The reason your term pages are doing so well is because you've inadvertantly done some SEO to your website. You see, far fewer people are likely to search for "Wheat-free Milk-free Pan-Fried Granola" compared to "Wheat Allergy Details" (one of your term pages). A Term page is one of the most important pages on your website. Do some searches on google for things like "Running Training", "Swimming Training" and "Boxing Training" and you should see a site called www.pponline.co.uk appear on the first page of google for all those terms. I suggest you install the Page Title module too - it allows you to control the attribute of a page seperately to the node title. This is hugely important as its the attribute which controls what the link reads in search engines.

Personally, I'm not sure what to do about your frontpage issue. All I can suggest is that you list more than 1 item (maybe 3?) so that the page itself doesn't strongly resemble its source page... Or simply move to Teaser Views for the frontpage.

Another thing you need to bare in mind is that if you use Full Node View on your term pages then a user can read the entire article on a term page. Why would they need to click through to the article? That might be why little of your traffic is targetting your articles compared to term pages.

Finally, I dont think Global Redirect will help this issue in particular. Its main goal is to stop people accessing source URL's when that source URL has an alias.

#2

gurukripa - May 3, 2007 - 04:51
Title:Will Global Redirect help with duplicate conetent in the case of multiple aliases and taxonomies?» Nicholas

Hi

this is a little unrelated..but u mentioned about how he has done SEO..am sorry..i didnt understand....cld u give some tips on SEO...seems u know quite a bit ;)

#3

nicholasThompson - May 3, 2007 - 08:28
Title:Nicholas» Will Global Redirect help with duplicate conetent in the case of multiple aliases and taxonomies?

Firstly - Its a little "taboo" to change the subject from something meaningfull to something meaningless like "Nicholas" ;-)

Secondly, your question is slightly off topic and you might get a better response from the forum - however basically, it comes down to the term pages.

Generally speaking, when you search for something, you rarely type in the precise title of the document you're looking for. Mostly you'll target specific keywords. If you have a term page on your site which is titlled "Wheat Allergies" then thats far more likely to be searched for than a page titled "Wheat-free Milk-free Pan-Fried Granola" which is a much more specific page. Thats the basics of it.

My advice to you is that if you'd like more information on SEO, I'd open a forum post in the Post Installation Forum or maybe jump onto IRC support as this issue thread is more relating to dsamuel's problem with his site.

#4

dsamuel - May 4, 2007 - 03:11

Thanks for all the useful information, Nicholas.

I actually don't mind if people read an entire article on the term page, and attracting traffic to that page is on its own a good thing. The problem is if Google sends them to the term page, meanwhile the article has been pushed off of term page 1, so now they can't find it and wonder whey they landed there!

Your point about full posts on the front and term pages is well taken. Sometimes I forget to put in the !--break tag, or if the post is short, it seems silly to have 100 words on the front page, and 25 more in the full article. Most of the time though, I should take care to but in the !--break tags though, thanks for the suggestion/reminder.

I'll check out the Page Title module too. The more SEO techniques the better!

I now have an idea for a module to solve this sort of a problem if it is pervasive enough: the module could use the search terms from the referring search engine, and use that to list related articles in a block, so you'd get a tailored list of related articles listed off to the site. This could be generated based on the site search function and/or based on manually set key words based on the webmaster noticing search patterns and deciding which articles should be listed for those particular terms ... if that makes sense.

-Doug

#5

dsamuel - May 15, 2007 - 13:54

I've finally installed global redirect.

I find that it doesn't help with navigating the page numbers along the bottom of the front page. They point to domain/?page=2, domain/?page=3, etc. Of course I don't have aliases set up for them.

I was hoping there might be a better answer. Is there a module that automatically renames /?page=n into something else (like /pagen or /userdefined/n)?

Would it make any sense for Global Redirect to change /?page=n into /pagen if there isn't already a module out there to fix this problem?
Thanks!

-Doug

#6

nicholasThompson - May 15, 2007 - 14:31

You cant simply tag the page number onto the end - there is a reason its in a separate argument. You cant 100% guarantee that the number in arg(3) is going to be page.

This is, in my opinion, out of the scope of Global Redirect and should be, if even possible, done in a different module.

I also believe this isn't really that big-a-problem. On your own site you simply shouldn't have any links to pages which don't exist, especially links to pages with "random" query string key/value pairs. Other sites doing this to you are outside of your control, but I believe the big search engines are intelligent enough to handle this kind of thing, otherwise you could page a page on your site with links to another big site, like flickr, and just pass shed loads of random query string key/value pairs over to make it look to google like you've found thousands of dupe pages.

This is very similar to another issue open in the Global Redirect queue - #143987. You could apply a similar rule in your htaccess file using mod_rewrite and, after the Drupal Rewrite has acted to turn the URL into an argument called 'q', you can check for other arguments... Eg, if the q argument has node in it and there is another argument called "page", then return a 404.

#7

dsamuel - May 16, 2007 - 01:54

Hi Nicholas,

The reason I found out about this problem is that someone googled (in Google, just to be clear) the phrase "spelt is wheat - why can i eat it if i am allergic to wheat" – which is a perfectly reasonable thing to wonder if you have that sort of a problem. And in fact I wrote about just that on my website, which is why Google sent them to my website. It's just that it sent them to /?page=4 instead of the original article. This is very strange, since http://www.allergy-details.com/ ?page=4 (full URL intentionally broken so as not to give Google the wrong idea, rel=nofollow not withstanding) has a page rank of 0 and http://www.allergy-details.com/44-spelt-safe-wheat-free-or-gluten-free-d... (the original article) has a page rank of 2. Go figure. In two more articles, it will get pushed to /?page=5, but Google will still think it's at /?page=4.

I'm beginning to think I should ban bots from looking at /?page=anything, since this is a navigational convenience for visitors. On the other hand, it does help boost page rank for the original article, by providing an additional internal link to it. Not that it's doing me any good! More to the point, it's not doing my visitors any good. Probably few of them scroll to the bottom of the page where this article just happens to be just now.

So I guess what I am really saying is that whether an visitor arrives at /page4 or /?page=4, there is still the problem that Google is sending them to the wrong place. And I can see how that is not a Global Redirect thing at all. I guess I can try putting a !--break-- in the already very short article, although there is still the strongly pulling <H1> to attract Google's attention!

#8

nicholasThompson - May 16, 2007 - 08:18

That's an interesting issue... Clearly google thinks your main article is less relevant than your "category" page (in this case the category would be 'time' and its just created date descending).

1 thing I noticed... You main article doesn't seem to have a H1... Well it does... It's your site's title. The main article's title is a H2 - ALWAYS a h2.

This is really more a site structure/google problem than a Global Redirect issue, and I cant really see any solution. All the time google thinks your category page is more relevant to a search phrase than the article itself, google will send users to that page. And its likely to happen a lot, you category pages are going to be VERY keyword heavy (by nature).

#9

nicholasThompson - July 25, 2007 - 21:25
Status:active» won't fix

#10

bobthecow - August 30, 2007 - 01:34

the way i see it, you have two options: you can remove the duplicate content from your category pages (i.e. by using a teaser), or you can keep google from indexing the category pages.

personally, i'd block all the category pages in robots.txt. that would force google to return the main article pages instead of category pages. you would take a bit of a hit on internal links, and you'd have to make sure your article pages were still getting spidered. but you might be able to offset that with a second category type page that just has the article titles and links to each article. i dunno...

 
 

Drupal is a registered trademark of Dries Buytaert.