Does Drupal get indexed by Google?!!
philipk - May 21, 2006 - 18:44
I have no idea why my site isn't being indexed :(
The site is here:
http://www.playstationteam.com
http://www.google.com/search?q=site%3Aplaystationteam.com
I've used clean URLs as well and got a few inbound links.
Although when I log in to the admin area I see lots of 'robots.txt not found.' by Anon... could this have something to do with it?

It could take some time to
It could take some time to get indexed by google. Installing the google sitemap module may help get more of your pages indexed.
Robots.txt tells a search engine not to spider your site. Since you have page not found errors for robots.txt, that is a good thing.
I read somewhere that Google
I read somewhere that Google also has some guidelines for webmasters... those may be useful to follow. :)
Anisa.
-----------------------------------------------------------
Kindness builds stronger bonds than necessity.
www.animecards.org - 16,000 card scans and counting!
-----------------------------------------------------------
"Robots.txt tells a search
"Robots.txt tells a search engine not to spider your site."
abit simplistic explanation, you can put an empty robots.txt file to remove the entries in the watchdog and your site will be indexedjust as well
especially for new sites
My devbee site was brand new in march. It took 4 months for the site to be included in search results.
Now, thanks to Drupal (and probably gsitemap), I have spectacular search placement.
Drupal sites are *extremely* Google-SEO friendly.
--
Drupal tips, tricks and services
http://devbee.com/ - Effective Drupal
obviously yes
drupal.org is just a Drupal install just like any other. Googlebot eats quite a lot of resources around :)
and also, do you think Drupal would be popular if it somehow blocked Googlebot??
--
The news is Now Public | Drupal development: making the world better, one patch at a time.
Drupal works wonders when it
Drupal works wonders when it goes to getting into Google's index. Just check out drupal.org's indexing:
http://www.google.com/search?q=site:drupal.org
Over 2,300,000 results!
Google and Drupal
The Gsitemap module is good. I'm not sure if it is necessary, but I make a URL alias from 'gsitemap' to 'sitemap.xml' since I think that is where Google normally expects the sitemap to be. Then go here and make an account so you can see if Google has any spidering errors on your site: http://www.google.com/webmasters/sitemaps/
Google often takes 6 to 8 months to include your site in the results. It looks like they have some of your pages in their index already, but aren't showing a lot of them. It looks like they are indexing both the www version and non-www version of your domain name. You could add this to your .htaccess file (somewhere under RewriteEngine On, and only use the www version of your domain name. In your post above, you used the www version for your link to your site, but the non-www version when you linked to the Google cache. They are different.
This will tell the search engines that only the www version of your domain exists -- replace the your_site with your domain name:
#Changes to www form of domainRewriteCond %{HTTP_HOST} ^your_site.com
RewriteRule (.*) http://www.your_site.com/$1 [R=301,L]
A robots.txt file like this is handy if you want to keep the search engines out of certain places on your site, for example, you don't want them indexing both the regular version of your page and the print-friendly version also. I found this somewhere on Drupal.org:
User-agent: *Disallow: /aggregator
Disallow: /tracker
Disallow: /comment/reply
Disallow: /node/add
Disallow: /user
Disallow: /files
Disallow: /search
Disallow: /book/print
Disallow: /admin
I'll just wait and see I
I'll just wait and see I guess...
One of the reasons for using Drupal was its SEO features.. I just hope its all spidered in time for the PS3 launch.
Evidence of Drupal's Search Engine Friendliness
We had a similar discussion in the past and have some interesting information detailing various experiences that prove the point that a well configured Drupal set up is well indexed by Google and other major search engines.
http://drupal.org/node/20033
http://www.cmsproducer.com/search-engine-optimization-seo-google-msn
http://drupal.org/node/36726
-----
iDonny - Web Content Management System Design, Development. & CRM
mine is well configured :P
mine is well configured :P
SEO and Drupal
Drupal is very search engine friendly. If you put those lines in your .htaccess file it should fix the Google spidering errors.
Good pointers
This is useful information to prevent the indexing of some unnecessary stuff like the links to RSS feeds as regular pages, or the comment links. Also, making sure that a consistent domain format is used (with/without www) makes sure that the google index does not thinkt hat you are posting duplicate content (You can also use the Drupal multi-site feature to point all your domain paths to one of them)
-----
iDonny - Web Content Management System Design, Development. & CRM
Thank you for this post
Thank you for this post Guitarmiami. A few dozen of my pages had been indexed in google a two months ago. Then, after I read this thread, I checked again and all had been removed! I believe this happened because both the www and non-www version of the site were accessable, and google determined that the sites had duplicate content. Drupal, by default, lets one access both versions of a website. Therefore, MAKE SURE you put in this rewrite code into your .htaccess file or you may encounter the same problem.
#Changes to www form of domainRewriteCond %{HTTP_HOST} ^your_site.com
RewriteRule (.*) http://www.your_site.com/$1 [R=301,L]
Different rewrite code for me
the code for redirection found above does not work for me. I had to insert the following instead:
RewriteCond %{HTTP_HOST} !^www\.mysite\.com$ [NC]RewriteRule .* http://www.mysite.com%{REQUEST_URI} [L,R=301]
could anyone please tell if this code is good anyway for SEO purposes?
thx
redirects
I learned a better way to do it than in my earlier post.
If you want it to redirect to the "www" version, use this:
RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]RewriteRule (.*) http://www.example.com/$1 [R=301,L]
If you want the "no-www" version, try this:
RewriteCond %{HTTP_HOST} !^example\.com$ [NC]RewriteRule (.*) http://example.com/$1 [R=301,L]
my robots.txt
User-agent: *Crawl-Delay: 10
Disallow: /aggregator/
Disallow: /tracker/
Disallow: /comment/reply/
Disallow: /node/add/
Disallow: /taxonomy/
Disallow: /user/
Disallow: /files/
Disallow: /search/
Disallow: /book/print/
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /admin/
User-agent * and Googlebot
In addition the the above robots.txt specification, I keep another set of specifications just for googlebot
User-agent: Googlebot
In my case, it's a duplicate of the generic specifications but I maintain the duplicate for googlebot just to make sure that it sees my specifications every time.
You can use the engine specific declarations to cater for engine particulatities (some engines may or may not support given features such as URLs with querystring, or the presence of a session ID in the URL - that is if you cannot use cookies to maintain state)
-----
iDonny - Web Content Management System Design, Development. & CRM
robots.txt
Drupal should already block access to the contents of directories like /database, /modules, /includes, etc. You will already get a 404 error if you try to access a URL like example.com/includes/common.inc.
If you look at the default .htaccess file (4.7.1) it includes this:
# Protect files and directories from prying eyes.<FilesMatch "(\.(engine|inc|install|module|sh|.*sql|theme|tpl(\.php)?|xtmpl)|code-style\.pl|Entries.*|Repository|Root)$">
Order deny,allow
Deny from all
</FilesMatch>
Robots.txt advertises your directory structure to the world, so better not to put more in there than necessary. In this case it doesn't really matter because anyone can download Drupal and find out your directory structure, but it's something to keep in mind about robots.txt in general.
How many nodes does google know about? oh about 25million
http://www.google.com.au/search?hs=V3j&hl=en&safe=active&client=firefox&...
Results 1 - 10 of about 25,000,000 for inurl:node/1..999999999 with Safesearch on. (0.34 seconds)
google knows about 25 million drupal nodes? cute :D
and considering...
and considering how many people use path_auto.module, and thus don't have links to any "node/*" pages, the count is probably much much higher.
It might be fitting to use a comment from someone once I explained and showed drupal to them -
"drupal should run the internet!!"
--Ryan
The code to only use your
The code to only use your site url with or without the 'www' (never both ways) is already in the htaccess file. Starting around line 54 there is some info on it and the option to uncomment code to either use only with 'www' or only without 'www' so its literally a 30 second fix.
.htaccess
That default .htaccess code in Drupal 4.7 is not good to use. I reported it as a bug, but never heard anything more about it. I wish someone would fix it.
If you use that default .htaccess comment it will rewrite something like example.com/page1 to www.example.com (home page) -- when it should rewrite to www.example.com/page1.
This is the correct code to use:
# This is the better way to do it:RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
This default version from 4.7.3 is not good. I highly recommend not using it because it will not redirect correctly:
# This is NOT a good way to do it:# RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
# RewriteRule .* http://www.example.com/ [L,R=301]
Thankyou so much
I had shelved this until I had time to nut out the correct syntax. Once again the Drupal community contributes and saves me time. Thanks again.
Not entirely sure about your RewriteCond regex though?? Needs a "\" before the "." eh? And what does the "$1" in the RewriteRule refer back to?
$1
The $1 refers back to the first set of parentheses in a regular expression.
In the default Drupal .htaccess you have this problem:
http://example.com/page1 redirects to http://www.example.com/ -- this is a bad redirect because it doesn't take the visitor (or search engine) to the correct page.
The $1 adds the correct page:
http://example.com/page1 redirects to http://www.example.com/page1
Thanks. You are totally
Thanks. You are totally right. Hadn't noticed that. So to use the non-www version I would place this in my .htaccess:
# This is the better way to do it:RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://example.com/$1 [R=301,L]
Correct?
Shouldn't it be: # This is
Shouldn't it be:
# This is the better way to do it:RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule (.*) http://example.com/$1 [R=301,L]
??
Backslash
Yeah, that is probably better.
Errors
These are giving me redirect errors. Here is how I have it now, redirecting to the non-www:
RewriteCond %{HTTP_HOST} !^example\.com$ [NC]RewriteRule ^(.*) http://example.com/$1 [L,R=301]
Is this correct? I have been having trouble getting indexed properly, so I'd like to make sure.
Thanks.
error
There is an error. You are telling the server to redirect http://example.com to http://example.com (same url).
Use this and you should be fine:
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
Then test it. You should get the following redirect behavior:
http://example.com redirects to http://www.example.com
http://example.com/page1 redirects to http://www.example.com/page1
What differences would have
What differences would have to be made so that instead of redirecting to www.example.com it redirects to example.com sans www?
Thanks.
Drupal SEO
If you want it without the www, then use this:
RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
More info here:
http://tips.webdesign10.com/drupal-seo-404-ok-and-htaccess
guide to setup drupal to be search engine friendly
here's the guide I wrote for making your drupal site friendly for search engine crawlers:
http://www.smorgasbord.net/how_to_optimize_drupal_web_site_for_google_ya...
it certainly made my page indexing and page ranking higher.
I can't help with all of
I can't help with all of that code stuff, but getting indexed might be helped if you're backlinked from a site that is regularly indexed by Google.
Currently I have a site that the spiders LOVE. When I launch a new site, I add a link at the old site, and the new site ends up in Google by the end of the week.
Can´t get google to index my site
I have submited my site to google several months ago. I got the home page to index but that is as far as I got. I have the gmap module, clean urls and have read all sugestions on this page. I cannot get the resto of the pages indexed. I recently set up another site and in just a few weeks the whole thing was indexed but can´t figure out what I did different. As a matter of fact the site that indexes is almost a drupal out of the box installation, it doesn´t even have de gsite module.
I must say that it could be because the site that indexes has a link in a heavely visited site, but can it make that much a difference?.
My site is
http://www.100x100electronica.com.ar
the other site that indexes is
http://comunidad.demotores.com.ar
Thank you in advance for any help.
http://www.100x100electronica.com.ar
yes, links make that much of a difference!
Yes, the links make a major difference! However, read the 10 Step Search Optimization Guide I just posted in another issue - because it exactly addresses your problem.
Also - two other guides / pages I've written that can help you are:
http://www.smorgasbord.net/how_to_optimize_drupal_web_site_for_google_ya...
and
http://www.search-optimization-school.com
Links
Links pointing to your site from other related sites are very important.
Also, did you install the Nodewords module? Install that and then make sure that the meta description is different on every page. It looks like your page is outputting
n/dfor the meta description on many (all?) pages.I will check meta descriptions
I have the nodewords module installed. I will check the settings to see what is going on.
Thanks guitarmiami!!
http://www.100x100electronica.com.ar
I can´t get meta description to work
I have nodewords installed and can´t get it to insert the description on every node. I have enabled "Use the teaser of the page if the meta description is not set." but I get a n/d in the meta description. If I don´t eneble it I don´t get a description. Where is the teaser of the page, does it generate automatically or do I have to write it?. Could it be I have category module installed and it is not compatible with nodewords?.
Thanks
http://www.100x100electronica.com.ar
Spidering your site
You have in excess of 3000 links which, though not a problem specifically, might have pushed the site into an aggressive (Google) filter if the site just popped-up from nowhere... best to grow a site organically rather than just appear in an "un-natural" big bang.
In addition, there are 130 broken links.
What would really worry me, is the following link:
http:/ /www. 100x100electronica.com.ar/node (Note how I didn't make that a link!)
I expected to see this, and I wasn't "disappointed" - This is a problem for many Drupal sites... if my spider can pick it up, then Google will too.
It is a doorway into supplemental hell, ie... Duplicate content!
I don't know which of your pages linked to it but if I were you, I'd plug it as soon as possible, either by hacking the module responsible for producing it, and/or, use absolute links... oh, and the robots.txt file.
(ADDED: Ah yes, it's the breadcrumb - Click on a category page and mouse-over your "Principal" breadcrumb; there you will see:
http ://www. 100x100electronica.com.ar/node
I notice also that the "categorias" has been duplicated: http :// www. 100x100electronica.com.ar/ categorias-0 - Note the "-0" which is a classic indication.
So, if Google finds a link to the intended link, http ://www. 100x100electronica.com.ar/categorias, you'll see that on your site, there is no content at that page.
I also noticed that "Camaras digitales" has the same "-0" too.
It can get ugly when you have all this to contend with, best of luck)
Mike
------------------------------------------------------------------------------------------
A simple thanks to those that help, a price worth payng for future wealth.
Can u clarify a bit please?
Hi Mike
I don't understand much of what you say :-)
Are you saying there is something wrong with having ".../node" links? What do you mean by duplicate content?
Cheers
Are you saying there is
Hi Twohills
OK, in a nutshell, Google uses their PageRank system to determine a site's authority. As part of that, they clearly have to make a decision as to what content should rightfully be credited with that authority. One of the biggest factors that play a part in this calculation is, links... mainly inbound, but also internal too.
Google also has a problem: spammers... not just spammers but also "innocent" people who might unwittingly find your content sufficiently interesting to copy and publish in a forum/website, or wherever. Clearly, Google has to then try and make sure that your content is still correctly credited, and not the copy.
Again, part of how they do that is, links (and age).
Keep in mind that all this is calculated by computers... and computers are dumb.
The link I highlighted that has "/node" appended, points to your home page... but everyone can also get to your home page via: www. 100x100electronica.com.ar - Clearly, with two different paths to the same content, we have a duplicatation issue which Google (or many of today's influential search engines (SE's)) has to decide which is the credible authority and which gets "kicked-to-the-side" (it doesn't want it indexed twice afterall).
Having a duplication issue with a home page, is the worst case scenario as it sets the path into your site. This problem is compounded even further if your site has relative internal links and other duplicate paths - such as - in your case - the "/categorias-0 link I mentioned.
I said "worse case scenario" didn't I!... well, not quite, Drupal also allows an even worse path trail which makes the problem exponentially more "dangerous" - I've raised it elsewhere, here is not the place.
All the above is why I daren't include the "/node" link to your site from here for fear that Google would crawl it.
I hope that is a little clearer.
Mike
------------------------------------------------------------------------------------------
A simple thanks to those that help, a price worth payng for future wealth.
Found the solution to /node
Thanks a lot, I have been seing for quite some time that google was spidering my home/node and had no idea where it was picking it up. Damn Breadcrumbs. It seems to be a problem with drupal 4.7. I applied the patch and now it is working fine. Hope this helps:
You can find the solution here: http://drupal.org/node/78129
Thanks all for your help :-)
http://www.100x100electronica.com.ar