Hi,
After surfing though the forums looking for info about Drupal, SEO, best practices, etc, I followed the following steps and I am curious if I have covered all, or rather most of the Drupal specific basis, regarding helping Google index my Drupal site. The site has yet to be indexed by Google (or any other search engine), I just did this this evening. The following is a how to based on the steps I took and the info gleaned from posts to the forum:

1) Create an account and register a url with Google Webmaster Tools (http://google.com/webmasters/tools/). You will be asked to verify your site by uploading a specific file to the document root. We will simply copy the file name provided by Google into a field of the XML Sitemap module you will be installing next.

2) Get the modules, install and activate them. You will also have to provide access control to a manager type user if you are not logged in as the super user:
XML Sitemap (formerly Google Sitemap) http://drupal.org/project/gsitemap
Global Redirect http://drupal.org/project/globalredirect
Path Auto http://drupal.org/project/pathauto
Meta Tags http://drupal.org/project/nodewords

3) Configure Google Sitemap. Go to /admin/settings/gsitemap and expand the 'Other Settings' section. Paste the name of the verification file you were instructed to create and verify your site with Google Webmaster Tools. (there is some debate as to whether you should create multiple accounts with Google Webmaster Tools or just list all of your URLs with the one account. I chose to register all URLs in one location. I just tried to find that debate string again, but couldn't...)

4) Now that Google Webmaster Tools has verified your site, set the prefered domain to be indexed to either "http://www.example.com" or "http://example.com" but not both. I chose "http://www.example.com".

5) Add or edit the rewrite rules and conditions of the htaccess file located in root so that your site always comes up with 'www' (you would have to do the opposite rules if you didn't want the 'www' to be used on your url). I am using the multiple site installation capabilities of Drupal, so I added multiple rewrite rules and conditions:

    RewriteCond %{HTTP_HOST} ^\example\.com$ [NC]
    RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

    RewriteCond %{HTTP_HOST} ^\exampleTwo\.com$ [NC]
    RewriteRule ^(.*)$ http://www.exampleTwo.com/$1 [L,R=301]

    RewriteCond %{HTTP_HOST} ^\exampleThree\.com$ [NC]
    RewriteRule ^(.*)$ http://www.exampleThree.com/$1 [L,R=301]

    RewriteCond %{HTTP_HOST} ^\exampleFour\.com$ [NC]
    RewriteRule ^(.*)$ http://www.exampleFour.com/$1 [L,R=301]

6) Configure Path Auto. Go to /admin/settings/pathauto and expand 'General Settings'. Select 'Verbose' so you can see what takes place right away when you click the 'Save Configuration' button. Now the settings are up to you, but I used the defaults for all but the Node Path settings, and chose to do 'bulk' updates. In the Node Path Settings section, I made the added node/[nid]/[title] so that Path Auto would produce the same urls as the Global Redirect creates.

7) Go to /admin/content/nodewords/frontpage and add your Meta Tags description and keywords.

7) Repeat these steps for each additional site (I have only done the one for now).

8) Wait. Up to a few months apparently... But while you are waiting, you can register your site with other search engines such as http://dmoz.org/about.html "the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors."

Comments

Yura Filimonov’s picture

Though using the modules you listed for Drupal should help, don't be overly obssessed with making it SE friendly. You'd rather focus on the people by:

- using the right keywords (from freekeywords.wordtracker.com, for example)
- having the words in the titles, in subheadings
- using lists, shorter sentences and paragraphs.

Also, DMOZ isn't the primary thing you need to do. Becides needing an already established website to submit to DMOZ, you'd rather simply create content for your visitors instead. This will make the promotion process easy.

Just putting up the website without anything interesting for your visitors will not get itself noticed.

I'd instead recommend using various modules to fascilitate visitor contributed content, such as blog, socializer links (via the Service Links module), etc. Drupal gives you an excellent opportunity to create any kind of pages for your visitors to submit: classifieds, stories, product listings, job wanted postings via the CCK and associated modules. Community-building modules should help, too.

kpm’s picture

I do agree with all mentioned. This post/question is primarily for Drupal specific SEO. I have read that if one uses any sort of aliasing, then you can get penalized fairly heavily by Google for duplicates posts, since the actual node and alias url will both work. This is primarily what I want to ensure I did correctly so Google bots won't find duplicates. It is my belief that the order of importance in any SEO exercise:

1) Content, content, and content.
2) Cross site linking, which relies heavily on 1.
3) Web site meeting the W3C guidelines (coding and accessibility).
4) Keeping up to date and testing what search engines are looking for.

So this post is primarily a portion of number 3. And I am quite behind of number 4!

Thanks for the response,
Cheers.

kpm’s picture

some how it got truned away at the door. The Google Webmaster Tools 'Diagnostics' tab shows one error and it is '403 Fodbidden' for the base URL. I have pasted the robots.txt content below... Can anyone explain what I have to do to allow a Google bot to index the site?

User-agent: *
Crawl-delay: 10
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
Yura Filimonov’s picture

I have had the same issue lately and didn't look for the cure, unfortunately. The bot will not touch the homepage, indeed.

Try moving crawl-delay. Not sure if it helps, but it works for MSN only.

Z2222’s picture

Use the Live HTTP Headers extension for Firefox and look at the headers that are being sent. Is your home page sending a 403 header?

kpm’s picture

I installed the Live Httpd Headers plugin, but, as far as I know, everything looks fine. I posted to my shared host asking them if they are blocking Google bots or if they can think of any reason Google Webmaster Tools is spitting out a 403 blocked http error. No word back yet.
Thanks

Z2222’s picture

Are you sure that Google didn't just hit a 403 error on one occasion and that it was a one-time accident?

If you want to see what Googlebot sees, you can use the User-agent Switcher extension and visit your site as Googlebot.

kpm’s picture

On the advice of http://blamcast.net/articles/drupal-seo I removed Global redirect and added another rewrite rule that removes trailing slashes (since 'www.site.com/pageOrDirectory/' is the same as 'www.site.com/pageOrDirectory' in the eyes of Drupal with Path Auto. So those would be read as double postings. I also changed the Path Auto back to the default mapping (I had added it to be the same as Global Redirect) so I could 'Disallow: /node/'. I also added 'Disallow: /views/' since they are repeated content on the site, just reorganized.
Still not indexed by Google though...

JohnForsythe’s picture

The fastest way to get indexed is to get linked by other sites. The more popular the site, the quicker Googlebot will find you.

--
John Forsythe
Need reliable Drupal hosting?

kpm’s picture

but as mentioned above in a comment, I am interested in finding out what non-content related Drupal chores and settings are required, aside from cross site linking... Just a learning exercise. I wanted to see how long it would take for it to get indexed on its own... but was not sure if I configured Drupal optimally with the proper modules and settings etc. (Which is why I posted... I learned I didn't get everything, corrected, and now the site has been indexed about a week after the robots text file allowed bots in. I could now be evil and take part in the 'tragedy of the commons' that the WWW is becoming and create a web farm pointing to a specific site to bump up search engine points, but as mentioned, this was all strictly a learning process...

Cheers.

JohnForsythe’s picture

Link farms haven't really been effective since 2003, when Google rolled out the "Florida Update". Evil SEO has moved on to more complex territory, like XSS...

--
John Forsythe

kpm’s picture

2003, that sounds just about right... As in, the last time I investigated any SEO info!

Thanks again for all the info.

nicholasThompson’s picture

That site doesn't actually advice you not to install it...

Personally, I prefer setting up .htaccess and robots.txt, rather than having a module parse the URL on every page load. It requires less overhead and there's one less module to have to keep up to date. Furthermore, you avoid any potential conflicts with other modules.

That's just that guys preference. Yes - his method does strip the slashes more efficiently than GR does, but thats all his method does. Global redirect does 2 other tasks which Apache cannot do.

  1. It checks if the current URL has an alias - if it does, it will do a 301 (permanent) redirect to is. This helps if someone creates a link to mysite.com/node/123 rather than the node's alias (or if you later create an alias) as the module will send you to the correct URL. It could also help if you wanted to give out a shorter link to people (node/123 is much shorter/easier to remember than artcles/october/how-to-write-concise-examples-about-web-2.0-stuff).
  2. If you put a view or node to the frontpage (eg, you want your frontpage to be powered by "views/book/october"), Global redirect makes sure than anyone who tries to access your frontpage path directly will be redirected to the frontpage itself (again - dupe content avoiding).

One final point... In terms of slash removal, Global Redirect doesn't actually REMOVE slashes. If you access node/123/ then you dont have a direct match for an alias which is likely to be mapped to node/123 (ie no slash). Therefore a normal redirect wont work. Global Redirect does an "optional" check. If the last character of the URL is a slash, then remove that slash and check for an alias (if one exists, redirect).

What Blamcast suggests is the "blind" removal of ALL trailing slashes regardless of their intent. Global Redirect only removes the slash if it is causing a mismatch for an alias search. Global Redirect is at least partially selective in what it removes a slash from. Blamcast isn't.

Blamcasts is a more efficient way of removing slashes and, when used in conjuction with Global Redirect, could save on server overhead. But on its own - it doesn't do as much.

Another point to raise is that if you DO have a slash on the end of a url - eg, node/123/ - and you install GR AND BlamCasts Apache Redirects, then you could end up with 2 redirects for 1 URL. First, Apache would notice the slash and redirect to node/123 then finally Global Redirect would be allowed to redirec to the alias. If you only used Global Redirect then you'd have 1 redirect only (ie 1 server hit).

At the end of the day - its your choice. Personally I chose Global Redirect.

marknunney’s picture

Some good stuff here. I really owe the Drupal community a great big How To SEO with Drupal and it's coming, I promise. I'll put a few things down here:

You missed out Page title module - developed for SEO, originally by Rob Douglass to my spec. (Can we rename that 'Title tag module' to avoid some confusion).The title tag is the most important place on the page for SEO. Control your title tags.

http://drupal.org/project/page_title

Keyword tag is not important for SEO but don't stuff it.

Description tag is not important for SEO but it is for getting clickthroughs from search engine results pages (SERPs) - include your pages main target keyword and add some words that will make readers click.

G site map is not important. Most SEO pros don't use them because if their pages aren't getting indexed without one they want to know why and fix it. One day G might mark down sites without one but not yet.

Global Redirect rules, no question. I think you need to have other big problems for the extra load to be an issue. Nicholas outlines its benefits well but I'd perhaps add the most important thing it does is make sure all inbound link power goes where you want it. I'll explain:

You need link power to do well at search engines for competitive terms. A site gets that from inbound links and it's distributed around your site via internal links. If you ever show G more than one URL for the same page then G might (it tries not to) treat them as different pages and therefore it sees a dupe. Only one of these pages will be displayed clearly in SERPs the other gets buried. If any of your links go to the buried dupe URL that link power is lost...

Global Redirect makes sure that never happens. Even if, as Nick points out, you change the page you present as your home page.

I'll move on to the Big Problem with Drupal for SEO: term pages made by taxonomy (Category pages for Category module users). For search engine success, these pages are crucial, here's why:

By definition they can target the most popular search terms and...

...they can be 'closest' to your home page as measured in clicks. This is important because of that link power thing. Most inbound links come to the home page and there is only so much to go around - it is shared amongst those pages linked to. So those linked to directly (hello Term pages) get the most, those two clicks away get a fraction and those 3 clicks away are really struggling.

So the most important pages for SEO, even if only because they pass on their own link power and reputation for various keyphrases (the two get combined) are your term (category) pages. And yet, in Drupal, we have little control over them - we want to control title tag, metatags, add copy to the page, ie treat them like nodes.

I only know two solutions to this: Category module (which I use with crossed fingers) and Nicholas Thomson (again) has written a module (WIP) which stays with Taxonomy.

Hope this helps and doesn't just tease. Lots and lots of details and more will be given to all Drupalers soon.

LeonidShamis’s picture

We trying to make our new site verified with Google Webmaster Tools and Yahoo Site Explorer that both require either uploading HTML file to site's root directory or adding META tag with a special value.

Both ways seem to have some obstacles:

1) Uploading HTML file:
- Which folder should the file be placed in multi-site configuration: ~/public_html or ~/public_html/sites or ~/public_html/sites/ and ~/public_html/sites/?
- We are getting “Page not found” error when trying to access the file uploaded using FTP to ~/public_html, which is a document root for the site’s web server.
- Should the file be uploaded using FTP or should Upload module be used? Maybe this makes a difference?
2) Adding the META tag:
- Meta tag (http://drupal.org/project/nodewords) module allows adding pre-defined types of meta tags, but does not allow adding arbitrary tag, for example (for Google) 3) The method mentioned in the original post suggesting adding the uploaded (verification) file name to the Google Sitemap (using XML Sitemaps) module doesn’t help because even though the file name appears in the sitemap XML generated, accessing it still returns “Page not found” error.

Can the above be related somehow the rules defined in .htaccess? We use the file from the default Drupal 5.2 installation with the only “no-www” change.

Quint’s picture

Just upload the file into the main domain root. That's where Google is looking for it. If your main domain is

maindomain.com and your multisite domains are
otherdomain.com
betadomain.com

just uploading one file into the root will cover all of them.

Did you EXCLUDE the period at the end of the file name? Google says use name => "... google12b65f6a90g6d71xf.html." Don't include the dot at the end when creating the file. (The file can be empty.)

You can upload with cpanel, FTP, even Internet explorer (if you know how). If you use the upload module you might have trouble sticking it in the root, but that should work too.

ipwa’s picture

I tried uploading the html file to the sites folder and the root, and I still can't get Google to verify my site. I also added the meta tag to my theme template and it still can't do it, does anyone know why this might be happening?

Nicolas
-------------------------
http://nic.ipwa.net

Nicolas
-------------------------

Z2222’s picture

What's the site? http://nic.ipwa.net/ doesn't look like Drupal...

open-keywords’s picture

default rewrite rules will never work to give access to file at the root of a site

I suggest to use the following in your dupal filder's .htaccess

#force mapping of files existing into the site folder tree but not in drupal root folder tree
 RewriteCond %{DOCUMENT_ROOT}/sites/%{HTTP_HOST}/%{REQUEST_URI} -f
 RewriteRule ^(.*)$ sites/%{HTTP_HOST}%{REQUEST_URI} [L]

# Drupal standard Rewrite current-style URLs of the form 'index.php?q=x'.
 RewriteCond %{REQUEST_FILENAME} !-f
 RewriteCond %{REQUEST_FILENAME} !-d
 RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
open-keywords’s picture

The Xmlsitemap module also enables to fill in the verification file name in
/admin/settings/xmlsitemap/engines , expand Google options

http://drupal.org/project/xmlsitemap

SocialNicheGuru’s picture

sbscribing

http://SocialNicheGuru.com
Delivering inSITE(TM), we empower you to deliver the right product and the right message to the right NICHE at the right time across all product, marketing, and sales channels.

danfinney’s picture

Having a bit of a problem with the XML in the Story RSS. I posted a new thread so as to not pirate the topic.

http://drupal.org/node/446280

Mark.lynn123456’s picture

Hi kmp,

Thanks for this nice post. I would like to suggest you free SEO tool for your website, it's Colibri Tool. For your Drupal website, Colibri Tool provides you large traffic in very less effort. For result oriented inbound marketing and SEO always need a good and user friendly SEO tool, Colibri Tool will be one of them.

Thanks,
Mark