Google seems not to be indexing my site, http://communitycommitment.net/index.php . I am a newbie to this and I was wondering if anyone can diagnose my problem? The site has been up for about a month. I've submitted the URL to Google several times.

When I try to set up Google site search ( http://services.google.com/cobrand/free_select ) Google gives me the following message: "There are no pages in our index for "communitycommitment.net. We cannot perform SiteSearch on this domain(s).".

I have Clean URLs enabled.

My webhost is cclhosting.com. Mod_rewrite is running according to the web host.

My htaccess says the following:

RewriteEngine on

# Modify the RewriteBase if you are using Drupal in a subdirectory and the
# rewrite rules are not working properly:
RewriteBase /
......
# Rewrite URLs of the form 'index.php?q=x':
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

When I go to a story, the URL in the browser window looks like this: "http://communitycommitment.net/story/118".

I have URL aliasing with the following code:

function conf_url_rewrite($path, $mode = 'incoming') {
if ($mode == 'incoming') { // URL coming from a client
return preg_replace('!^story/(\d+)$!', 'node/view/\1', $path);
}
else { // URL going out to a client
$aliased = preg_replace('!^node/view/(\d+)$!', 'story/\1', $path);
if ($aliased != $path) { return $aliased; }
}
}

The path module is enabled.

Drupal is installed in public_html on my hosting account.

I also have a robots.txt file in the public_html directory.

I would appreciate any help in correcting this problem. (I've tried to read all of the information in the forums and docs, but since I'm not a programmer some of it just goes over my head.)

Comments

Ken Daniszewski’s picture

I also eliminated another possibilty. Apparently a similar problem can be caused if the web host does not allow modification of appache settings from htacess (SEE QUOTE BELOW). However I checked with my web host and they inform me that they do permit this. Therefore this would appear not to be my problem.

THE FOLLOWING IS THE POST REFER TO ABOVE:
"i know your problem well as i was customer of "1und1", the german counterpart of 1and1.
the don't let you modify apache settings via .htaccess. Their support person told me that they wont make exception for single customers.

So the only solution i found is to change the provider - and so i did.
Now clean urls work like a charm" (END OF QUOTE)

robertdouglass’s picture

I can't offer you any concrete advice, but I can tell you that it took several months before I could get any results for my site from Google. Yahoo was much quicker, and I thought for a while that maybe they were doing an overall better job at crawling, spidering and indexing than Google. Now, however, I get tons of hits from Google (without taking any explicit steps to change anything), and about the same as before from Yahoo. I actually think that having 'visit me at RobsHouse.net' at the bottom of my mails and in my signature on forums like this has helped. I even have top ranking for a Google search (without trying!) - "Danjulo Ishizaka" - which is really funny because there are so many other sites and articles about him (Danjulo) which are more relevant.

So a question: do you have access to your site statistics, and do they show the Googlebot crawling it? If so, you're all set, really. If not, I have no suggestions. Goggle has got to be smart enough not to need the www in your URLs.

Good luck,

- Robert Douglass

-----
visit me at www.robshouse.net

Ken Daniszewski’s picture

There doesn't seem to be any trace of Googlebot in the site statistics.

If I go to Google and enter the domain name "communityCommitment.net" in the search box, Google returns only one hit, which is a link to www.communitycommitment.net/ (with no other text or site description at all.) The times when I have done that do show up in the site statistics, but there isn't any indication that Googlebot has visited the site contained in the site stats as far as I could see.

Also, as mentioned above, when I try to set up Google site search it won't let me, and gives me a message which says "There are no pages in our index for "communitycommitment.net" We cannot perform SiteSearch on this domain(s)."

As you say, though, it might be just a question of time. The site's been up for about a month, but recently I have been trying various tweaks to try to get Google to find it, like adding robots.txt. Maybe in time it will be indexed by Google if I just wait.

Thanks very much for your help!
Ken

SupaDucta’s picture

I have sent Alexa to crawl your site. Results won't appear for less than few weeks, but within the next couple of days you should see ia_archiver crawler in your logs - that's Alexa. If it works, then it's really a matter of time for Google.

If there was a configuration in your .htaccess not allowing Google to index your site, you would see one of Google's IPs in your error log with the message that access was denied due to server config. If you have no such messages, probably Google hasn't crawled your site yet.

Additionally, it may be irrelevant to Google, but try adding the following meta tag to HEAD:

<meta name="Robots" content="All">

or

<meta name="Robots" content="Index, Follow">.

Ken Daniszewski’s picture

Great, thanks very much!!! I'll keep checking the logs.

dshah’s picture

How do you do that?

I have also created a new site with drupal and google doens't seems to be interested in it.

My site is www.algogeeks.com

Thanks in advance

Ken Daniszewski’s picture

As of today Googlebot seems to be crawling my site. Yesterday it wasn't. I can't tell if Googlebot was on the site before yesterday, because I didn't have log archive enabled through cpanel, (and I wasn't aware that I should be checking this). But since Googlebot is crawling the site now, hopefully it is only a matter of time before I can set up Google Site Search on my site. Thanks everyone for your help! Drupal reigns!

(the following are the log entries I found)
64.68.82.18 - - [30/Jun/2004:18:32:58 -0700] "GET /robots.txt HTTP/1.0" 200 915 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.18 - - [30/Jun/2004:18:32:59 -0700] "GET /index.php HTTP/1.0" 200 33869 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
216.218.233.37 - - [30/Jun/2004:19:00:01 -0700] "GET /cron.php HTTP/1.0" 200 0 "-" "Lynx/2.8.5dev.7 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.7"
64.68.82.10 - - [30/Jun/2004:19:34:32 -0700] "GET /robots.txt HTTP/1.0" 200 915 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.10 - - [30/Jun/2004:19:34:33 -0700] "GET / HTTP/1.0" 200 33869 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
207.46.98.104 - - [30/Jun/2004:19:59:18 -0700] "GET /robots.txt HTTP/1.0" 200 915 "-" "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"

SupaDucta’s picture

Glad it's working - thumb up! ;)

aaanativearts’s picture

It can take Google up to 3 weeks to find your site, depending on the geographical region you're in and when they last updated their index.

pss0ft’s picture

Hi,

I have the oposite. Google visites my site several times a day. Very strange and a little annoying. It boosts my visiter counter higher on the wrong way. How to prevent this without ruling out my site from indexing by Google? I just want less traffic from them.

By-the-way: very interesting is the BBclone option for Drupal based WebSites. Install it and enjoy! Do not forgot to download and install BBclone first. It is a PHP based freeware statistics program.

Just look at my site. The more option is not complete yet and I am afraid we have to do it ourselves. The author has a lack of time. Don't we all?

Anyway as a bonus I will give my BBclone just for an example:
http://pd5dp.ham-radio.ch/bbclone/index.php

Have fun and YES I am available for questions. It is wunderfull to see what the traffic is. Also WebCrawlers are logged.

Grtz. Henk

WebSite powered by Drupal: http://pd5dp.ham-radio.ch
Other WebSite : http://www.qsl.net/pd5dp

Email:
pd5dp@amsat.org

hedr’s picture

You should use such a download counter which excludes the bots, like the Google bot.
In this way you will always have proper information.

If your counter program does not do this automatically, you can exlude IPs manually.

Mark
.hu domain regisztráció - Domains

mcduarte2000’s picture

It takes sometime. With my new website it was the same. First Yahoo, then MSN and finally, two weeks after, Google.

Websites with many visits from Google are probably websites that Google recognizes as having many actualizations.

Miguel Duarte

Webmaster of: Lisbon Guide & Love Poems

pss0ft’s picture

Hi,

Could be. But do you have some more information about the actualisation part? How is that to be recognized?

Really...every time my website was visit by Google mainly...

Google 80% and all other 20%... Not normal, though?

Grtz. Henk

WebSite powered by Drupal: http://pd5dp.ham-radio.ch
Other WebSite : http://www.qsl.net/pd5dp

Email:
pd5dp@amsat.org

mcduarte2000’s picture

Well, it's really a question of waiting, and having links to your website from other websites on the web (mainly relevant ones).

If you want to learn more about this topic, I advise you to visit Web Masters World.

Miguel Duarte

Webmaster of: Lisbon Guide & Love Poems

aaanativearts’s picture

Well, since about 80% of all people on the web use Google as their search engine, 80% vists from Google vs. 20% for the rest sounds about right. Google will show up in your stats when people find you from the Google search engine, not just the Google Bot visits, so I would think that's a good thing. It means people are finding your site in the search engines, Google in particular.

philipk’s picture

I've found in the past that MovableType is amazing for Googles spiders.

Is there much difference in the news story pages that Drupal produces (with path alais on) to MovableTypes pages.. ?

sepeck’s picture

Google hits my pages with or without clean urls. With or without path alias on. The advantage of path alias is that the url contains (hopefully) relevant names to terms on the page.

-sp
---------
Test site...always start with a test site.
Drupal Best Practices Guide

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide

green monkey’s picture

here google, google, google ... come get me and lets get the sandbox thingy started. site phase one is complete.

http://pets.cybereze.com

dazzz168’s picture

Good one lol

legacyb4’s picture

I've noticed that while Google is very quick to pick up and index story content (I put up something at 9.30 pm last night on www.lumine.net and selective search terms are already bringing it up), it seems to take a while before Google fully indexes and shows the full URL when using pathauto.

Having said that, once you are picked up by the Googlebot, it's pretty much a guarantee you will be indexed fully.

sangamreddi’s picture

Hi,

I am facing the same problem here, i am using gsitemap, nodewords , googleaddsense modules. Couple of months over google didn't index my site www.gleez.com (hostony.com). But google bots are crawling my site regularly, i can see from awstats analyser.

Moreover, i just launched another site with 4.7 codebase www.sandeepone.com on different server(servage.net). Got indexed without submiting it.

I don't know what to do, any help appreciated.

Sunny
www.gleez.com

jbernat’s picture

Sunny,

I have read numerous times that until you have inbound links (that Google deems as having merit) Google will not index your site. Frequently, the symptom is that only the root page is indexed despite deeper crawling on a regular or semi-regular basis.

I experienced this and within 1 week after Google recognized an inbound link about a dozen pages appeared.

The next hurdle often becomes the "Supplemental Result" designation. These pages are indexed, but not part of the primary search index. (The designation is described somewhere on the Google webmaster site, if you become interested in the details.) Until the pages come out of the "Supplemental" category, very few if any search phrases will call up your page in search results.

Also, do some web searches on "Google Sandbox." Many believe there is an intentional delay or penalty imposed on new sites by Google. Still others say it is a myth. I'll let you decide for yourself. :)

I believe link relationships are updated in the Google database once every month or quarter when they recalculate page rank (a.k.a. the "Google Dance"?) so the effect of your new inbound links will not be immediate. Get inbound links where you can, post quality content, and be patient. Google will index you eventually.

jim.bernatowicz.net

sangamreddi’s picture

Hi,

Elaborative and useful infromation. Thank You very much.

I'll do some research on google.

Sunny
www.gleez.com

fletcherson’s picture

Never submit your site in search engines several times the same month!
Google then maybe blacklists your domain, so you've got to purchase another domain with a chance of getting into Google.

If you for example do linkexchange, you've got to be careful with which sites you do linkexchange, cause if those sites allready are blacklisted by Google, then there is a big chance yours will be as well.

jbernat’s picture

Never submit your site in search engines several times the same month! Google then maybe blacklists your domain, so you've got to purchase another domain with a chance of getting into Google.

I have heard reasoning that repeated submissions never result in penalties, because it would provide an easy way for your competitors to delist your site. Once the site is submitted, more frequent submissions are not likely to accelerate any positive results, however.

Also, I would only promote one domain to concentrate all inbound links to that domain to maximize the page rank. Multiple domains will dilute your page rank.

A new domain will restart the "sandbox" clock as well, if you believe in such things. :)

If you feel you were blacklisted, however, read up on the Google webmasters site (http://www.google.com/webmasters) to see if you have unknowingly engaged in a practice that Google deems unsavory. Clean up any misdeeds and then try to submit an appeal.

... you've got to be careful with which sites you do linkexchange, cause if those sites allready are blacklisted by Google, then there is a big chance yours will be as well.

Excellent point. I have heard this as well, and follow this advice myself.

jim.bernatowicz.net

trumanCodes1’s picture

FYI,

I am having the same problem as well. Only 3 of my urls have been indexed. But now that Jim mentioned it, I think I might have accidentally submitted my site map a number of times while trying to configure the xmlSiteMap module. Then again the site is only a few weeks old. I will keep you folks posted and let you know when the site is eventually fully indexed.

dazzz168’s picture

My site does not come up under a google search at all. :-(

fletcherson’s picture

Do some linkexchange, and the chance of getting indexed will increase inmensely.

fletcherson’s picture

Be careful with what you have in your robots.txt file.
One sentence to much and you kick Google out.

ideviate’s picture

having an empty robots.txt file uploaded, is it a problem and should i modify that?

powered by Drupal www.universideliyiz.biz

jbernat’s picture

having an empty robots.txt file uploaded, is it a problem and should i modify that?

Actually, I do not know if an empty robots.txt file implies "exclude everything," or "allow everything." Either way, this is not the best behavior for a Drupal site.

Case 1: Yours is a private site and you want to be sure everything is excluded:

User-agent: *
Disallow: /

Case 2: Yours is a public site and you want as much traffic as possible from the search engines. Drupal has many ways to get to the same information, and search engines really dislike indexing duplicate content. The more duplication they detect in your site, the poorer your results will be. You can Google "seo duplicate content penalties" or similar keyphrases for the nitty gritty.

Read this post http://drupal.org/node/22265 for a discussion on what to put in your robots.txt file.

Visit this site to read more about the robots.txt exclusion standard to see how to customize your robots.txt file.

Jim Bernatowicz
Photography Tips

bollywoodtalking’s picture

Visit my new site for latest bollywood celebrities pics and wallpapers at Bollywoodtalking. I also have the same problem

hschmid’s picture

I have been trying, submitting and everything I know to get my site www.suzannecalvert.com.au indexed by google.

It could be that the site is a wordpress blog format with its url www.suzannecalvert.com.au/blog

I 'point' to the blog so if anyone just type www.suzannecalvert.com.au it would automatically go to the blog extention.

This must be the problem. I have had bots visit... but no indexing????

Please offer thoughts or ideas

thank you

hschmid’s picture

BTY, www.suzannecalvert.com.au is not indexed with google but is linked through hotfrog

It has been 3 months since I submitted this site to google and still no indexing