Community & Support

Drupal SEO is driving me crazy

Hello all,

I have been following one SEO tip after another and so far it has only made my Google ranking WORSE, so I was wondering what I have been doing wrong. My website is www.edutube.org and most hits come from the Google search for 'edutube'. Soon after launching the site it was the number one hit. Then it gradually declined as Google crawled my site and I implemented new SEO measures. Now it stands on page 2 at number 11. For a while there was even a website which talks about this website edutube which was higher in the ranking than the actual website edutube!! So what went wrong?

I use the following SEO modules: pathauto, nodewords, xmlsitemap

I also modified robots.txt extensively following SEO tips.

The problem might be duplicate content: I have different types of views for the same content, as well as sorting options, all of which leads to duplicate content. I also have the site in multiple languages for which the interface changes but not the content - all this leads to tens of thousands of pages seen by Google as duplicates.

To solve this problems I restricted access to sorting and other languages in robots.txt. I thought this would solve the problem, but now Google Webmaster tools shows the following problem: 155138 URLs restricted by robots.txt !! I also get 3180 Not found errors - all pages in different languages with URLs like http://www.edutube.org/ar/additional_info-0?sort=asc&order=Submitted
How does google come up with something like that?

Anyway, this has been driving me crazy. I would love to have some advice on this, as all the advice I have been following so far has clearly made things worse!

Thanks in advance,

Frank

Comments

There is no point in

There is no point in complaining about it. There are many other Drupal and non-Drupal sites that are doing very well ranking on Google.

In my opinion, these modules are only a fraction to what SEO is all about. Perhaps you should start by looking at your target audience and keywords in contents. Then there is SEM (Search Engine Marketing) - you should spend some time to market your site. Code alone wouldn't cut it for you.

Good luck,

P.S. duplicate contents - have you looked into your .htaccess file, there are ways to prevent your site from accessing either by www or without (may cause duplicate contents) and use http://drupal.org/project/path_redirect to redirect broken or unknown URIs

Good luck,

Hi, Thanks for your message.

Hi,
Thanks for your message and the path_redirect tip. Duplicate content is caused by the reasons described above and I could remove them through robots.txt. This leads to the hundreds of thousands of 'URLs blocked by robots.txt' list given as errors by Google Webmaster Tools. I think the problem may be Google and how it deals with drupal sites with multiple languages and views of the same contents. The only solution I see now is to remove all the languages, which would be a shame.

As for the SEO modules I think they are great, that is not the problem. There are of course other ways of improving SEO but first I want to deal with the above duplicates problem which I think is responsible for the tenfold reduction in google hits. I will continue to look for a solution and post it here when I find one.

multiple languages

The only thing that is changing when a visitor clicks on a translation is the Drupal menu items. If the <title> element and body text of each page were also translated, then it would be beneficial to let Google crawl your translations. If just the menu text is translated, then it's duplicate content that is better off blocked.

PageRank

Google toolbar shows no PR for your site nor links. Is this correct? If yes the simplest move may be to get links to your site.
-----
Just learning Drupal
www.technologyquestions.com

-----
Just learning Drupal
www.technologyquestions.com

google bot

There are a few links to the site, but maybe the site is too new to have pagerank. The main issue for me though is the decrease in ranking as before I got lots of hits from Google - I just want it back to the way it was. One strange thing is that there is 2.88 GB of traffic just for the month of June from the google bot visits - I heard that even large and popular website do not have anything close to that amount of traffic from the Google bot. The Google bot actually slurps up most of the bandwidth from my site. There's no custom code in my site yet something must be going wrong somewhere.

Drupal robots.txt

Hi,
I wrote that tutorial on robots.txt that you mentioned in your robots.txt file.
It's not the number of blocked URLs that are a problem -- it's which URLs are blocked.

I would add this back to your robots.txt file:

Disallow: /*sort=

That will get rid of those dynamic URLs that are indexed.
Example: edutube.org/ar/additional_info-0?sort=asc&order=Submitted

Did you have these blocked with robots.txt, because that might be where a lot of those 155,000 URLs came from:

# Language pages
#Disallow: /en/
#Disallow: /ar/
#Disallow: /bg/
#Disallow: /zh-hant/
#Disallow: /cs/
#Disallow: /da/
#Disallow: /nl/
#Disallow: /fr/
#Disallow: /de/
#Disallow: /el/
#Disallow: /he/
#Disallow: /it/
#Disallow: /ja/
#Disallow: /mr/
#Disallow: /pl/
#Disallow: /pt-br/
#Disallow: /pt/
#Disallow: /ro/
#Disallow: /es/
#Disallow: /tr/
#Disallow: /vi/

It looks like that translation module is creating tons of duplicate content. If you click any of the language links you get about 20 different URLs for the same page of content. I only took a quick look at the site, but those might need to be blocked, including like this:

#Disallow: /tr/
Disallow: /tr$
#Disallow: /vi/
Disallow: /vi$
#etc.

Count the number of videos that you imported and then multiply by 21 or 22 (the number of translations). If you were blocking the translations, you should expect at least that many URLs to be showing up as being blocked by robots.txt.

Is the text automatically taken from YouTube's video description? I would modify the text if possible so that it's not a duplicate of the YouTube text content.

155,000 URLs is a lot for a new site that doesn't have many inbound links. What content is on those 155,000 URLs listed in the Google Control Panel?

Also, the site will benefit from more links pointed at your site. (They have to be clean links without rel="nofollow" on them.)

Subscribing.

Subscribing.

About the 404 errors: Try

About the 404 errors:

Try running this sitemap generator on your site, but be careful to logout of your site first because it obeys cookies:
http://www.auditmypc.com/free-sitemap-generator.asp

Also be sure to check the boxes for "obey robots.txt" and "obey robots meta tags".

The sitemap generator will spider your site like a search engine and tell you if there are links pointing at those weird URLs you mention.

Another thing to try would be to go into the Google Webmaster Tools and look at the internal link and external links section. Are there any links pointing to those weird URLs?

EDIT:
Check out this Google search. What is that site at the bottom? Is it a scraper script that might have created links to weird pages on your site that then caused Google to spider "not found" errors?

Thanks

Thanks very much for all your advice, I will try it out. I undid the blocking duplicate content through robots.txt because my Google ranking went down afterwards, but I don't know if that was the cause, perhaps it was a reaction to some earlier changes I made. I am back to blocking duplicate contents using your suggestions and I'll see what happens.

You are right that many of the titles and descriptions were copied and pasted from youtube - the purpose of EduTube was simply to better organize educational videos from YouTube and similar sites and not create unique content. Later I realized this will harm the search engine ranking so I started writing my own text. I still have to go back and change the ones for which I kept the original title/description.

Here are a few examples of the 155,000 URLs blocked by robots.txt

http://www.edutube.org/ar/additional-i...titles-non-english?sort=asc&ord... URL restricted by robots.txt Jul 1, 2008
http://www.edutube.org/ar/additional-i...-non-english?sort=asc&order=Vie... URL restricted by robots.txt Jul 1, 2008
http://www.edutube.org/ar/additional-i...download-available?sort=asc&ord... URL restricted by robots.txt Jul 2, 2008
http://www.edutube.org/ar/additional-i...nload-available?sort=asc&order=... URL restricted by robots.txt Jul 3, 2008
http://www.edutube.org/ar/additional-i...ownload-available?sort=asc&orde... URL restricted by robots.txt Jul 2, 2008
http://www.edutube.org/ar/additional-i...nload-available?sort=asc&order=... URL restricted by robots.txt Jul 2, 2008
http://www.edutube.org/ar/additional-i...download-available?sort=asc&ord... URL restricted by robots.txt Jul 2, 2008
http://www.edutube.org/ar/additional-i...ownload-available?sort=asc&orde... URL restricted by robots.txt Jul 2, 2008
http://www.edutube.org/ar/additional-i...load-available?sort=asc&order=S... URL restricted by robots.txt Jul 3, 2008
http://www.edutube.org/ar/additional-i...download-available?sort=asc&ord... URL restricted by robots.txt Jul 2, 2008
http://www.edutube.org/ar/additional-i...-available?sort=asc&order=Type%... URL restricted by robots.txt Jul 2, 2008
http://www.edutube.org/ar/additional-i...-available?sort=asc&order=Type%... URL restricted by robots.txt Jul 2, 2008
http://www.edutube.org/ar/additional-i...ownload-available?sort=asc&orde... URL restricted by robots.txt Jul 2, 2008
http://www.edutube.org/ar/additional-i...ad-available?sort=asc&order=Vie... URL restricted by robots.txt Jul 5, 2008

blocked URLs

Looks good -- those are URLs that should be blocked because they are sorted tables (duplicate content).

On a complex site with a lot of URLs, I think the first stage of SEO is making sure that spiders can get a "clean crawl". After that, start doing the on-page optimization and link building.

If you build links, I think you will be on page #1 soon.

I will let you know how it

I will let you know how it goes. As of now I get virtually no hits from Google, hopefully it will go back to the level it was a month ago.

update

I still have 1054 URLs not found in Google Webmaster tools. All were supposed to be restricted by robots.txt
That's not too good, but still an improvement over 3180 errors a week ago.

I updated robots.txt again, this time using some syntax directly from Google itself:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40367

Hopefully it will go better this time...

By the way this seems like a good robots.txt syntax checker: http://tool.motoricerca.info/robots-checker.phtml

"URLS not found" finally cleared

Following changes to robots.txt it took 3 weeks for Google to register the changes. Oddly enough it updated its index bit by bit, removing a few dozen problems a day. If it had immediately matched robots.txt with its index it should have removed all problems in one go. I wonder if the Google bot really follows robots.txt - perhaps it crawls the site as usual and only uses robots.txt to guide what it does with the information it has gathered.

What also does not make sense is the following: two websites featuring higher than the EduTube website for the keyword "edutube" are actually forum posts about the EduTube website. A smarter bot would have been able to see that these posts are references to a site and should not be ranked higher than the site itself.

"trust"

What also does not make sense is the following: two websites featuring higher than the EduTube website for the keyword "edutube" are actually forum posts about the EduTube website. A smarter bot would have been able to see that these posts are references to a site and should not be ranked higher than the site itself.

It may be that Google "trusts" those sites more than edutube. You can build "trust" by getting more links to your site from other trusted websites.

"trust"

I agree it makes sense from that perspective, but still a smarter bot (a little bit of artificial intelligence) would not have done such a thing. Also why would Google in this case have a higher trust for comments which anyone can post? It should be able to distinguish between actual website contents and user comments.

site ranking improved

Thanks everyone for the SEO tips. It looks as if robots.txt fixed the site ranking by resolving the problems shown in Google Webmaster tools. Just a few days after clearing them the site ranking improved significantly.
The following optimization tips were implemented:

1. Those described here: http://drupalzilla.com/robots-txt

Such as this line to block duplicate content caused by sortable tables (from the Views module):

Disallow: /*sort=

2. The following rules for multiple languages, because non-translated content = duplicate content:
(two lines for each language, as recommended above by guitarmiami)

Disallow: /el/
Disallow: /el$

3. And finally, the following very general rule from Google which could probably replace many of the other rules: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40367

Allow: /*?$
Disallow: /*?

This blocks access to all URLs that include a question mark (?), but allows access to URLs ending with a question mark.

It seemed at first that optimizing robots.txt didn't work - but that's because it took about a month for Google to completely update its index according to robots.txt - even though it was crawling the site almost daily.

subscribing... --------------

subscribing...

-----------------------------
http://www.doufin.com, drupal consult and service.

----------------------
http://www.doufin.com, drupal consult and service.

subscribing

subscribing

Thanks

guitarmiami Thank you so much for posting the advice on how to remove the dynamic url which contain the sort function. Thats been giving me bother for some time now with Google.
-Tom

Thanks

Great, we've been wondering how to get rid of those URLs.

Subscribing

Subscribing

--
I always think tomorrow will have more time than today.
And every today seems to pass-by faster than yesterday.
http://www.rahulsingla.com

nobody click here