By isaacbowman.com on
Okay, I have been trying to figure this out since yesterday. I have gsitemap (cvs version works on 4.7) and urllist, but I get errors on both after repeated submits to google.
I have clean URLs and I can see both maps fine using my browser.
I have submitted the following sitemaps to Google - (their response in italics)
-
http://isaacbowman.com/?q=gsitemap and http://isaacbowman.com/gsitemap - URL not allowed (Line 1) with URL http://www.isaacbowman.com/ This url is not allowed for a Sitemap at this location. (I got this error on every page when I submit this sitemap)
http://isaacbowman.com/urllist - 5xx error Network unreachable
Definitions by Google (online here)
-
URL not allowed - Your Sitemap contains a URL that is not allowed based on the Sitemap's location.
Network unreachable - We encountered a network error when we tried to access the page.
5xx error - See RFC 2616 for a complete list of these status codes. Likely reasons for this error are an internal server error or a server busy error. If the server is busy, it may have returned an overloaded status to ask the Googlebot to crawl the site more slowly. In this case, we'll return again later to crawl additional pages.
Here is my Robots.txt which I have checked with Google
User-agent: *
Crawl-Delay: 10
Disallow: /aggregator
Disallow: /tracker
Disallow: /comment/reply
Disallow: /node/add
Disallow: /user
Disallow: /files
Disallow: /search
Disallow: /book/print
Disallow: /admin/
I saw a post about .htaccess and tried this with no success (placed at the top of the file)
AddType text/xml .xml
AddType text/xml .xs
Comments
Another issue on sitemap with Google
Now Google is giving me a different error on /gsitemap - 5xx error Network unreachable
/?q=gsitemap still gets - - URL not allowed (Line 1) with URL http://www.isaacbowman.com/ This url is not allowed for a Sitemap at this location. (I got this error on every page when I submit this sitemap)
Isaac Bowman
www.isaacbowman.com
dumb question,
but could this be related to www.isaacbowman.com vs plain isaacbowman.com ?
www
Your sitemap was submitted for http://isaacbowman.com/, but every URL in the sitemap is for http://www.isaacbowman.com/
Fixed Google Sitemap
I resubmitted with www and the sitemaps were accepted just minutes ago. I had read through google's help and did not catch this issue. I am going to see if the any other errors pop-up after the spiders have finished the site.
Thanks!
Isaac Bowman
www.isaacbowman.com
Redirecting non-www to www
Hi Isaac,
If you haven't already done so, I would recommend redirecting your non-www to your www pages, to prevent both from getting accessed/ indexed and thereby risk losing PageRank.
Add this to your .htaccess file right after the line #RewriteBase /drupal and replace domain.com with your domain :
#custom redirects
RewriteCond %{ENV:REDIRECT_STATUS} =200
RewriteRule ^ - [L]
# Redirect non-www to www
RewriteCond %{HTTP_HOST} !^www\..*
RewriteRule ^.*$ http://www.domain.com%{REQUEST_URI} [R=permanent,L]
#end custom redirects
Hope this helps,
Alex
Contract Web Development
Still having trouble....
I've been following this thread and some other similar ones. I've made the changes to the rewrite rules in .htaccess, mod rewrite is running, and as one post suggested, I made a url alias of sitemap.xml to gsitemap because google seemed to prefer it.
I still get errors of "This is not a valid URL. Please correct it and resubmit" from google sitemaps. The url's in the xml are in the format of /home for example. I don't know if google is looking for a more complete address or /home/ .
The generated sitemap is at http://www.251northriverroad.com/sitemap.xml
Restricting access for node/ or based on robots.txt?
Never mind - I just answered my own question - I see that I can specify by editing a content item whether or not it appears in the sitemap :)
Thanks for an awesome module!!
Alex
Contract Web Development
hi, why Disallow: /admin/
hi,
why Disallow: /admin/ and Disallow: /files ?
when we use slash?
powered by Drupal www.universideliyiz.biz
When to Use Slash
Hi There,
You shouldn't have to disallow /admin/ - that is blocked automatically from my understanding. /files is meant for your files, kept separate from Drupal core files. You might want to read this on keeping your Drupal site tidy.
As far as the backslash in robots.txt, I believe the difference is that /files/ will block everything within the directory /files/, whereas /files (no backslash) will also block the filename /files (and anything that begins with /files*)
Cheers,
Alex
----------
Contract Web Development