Jump to:
| Project: | Leech |
| Version: | 5.x-1.6 |
| Component: | leech |
| Category: | support request |
| Priority: | normal |
| Assigned: | wisdom |
| Status: | closed (fixed) |
Issue Summary
It sounds I have to update manually the published status of feed items when their URL is aliased for them to get indexed by search engines even though the feed items are configure to published when setting them up. I see the google bots trying to index the non aliased URL because I find /node/nodeid corresponding to the feed items listed as blocked URL in Google webmaster tool report. This is obvious because in the robots.txt file disallow /node is set.
But the aliased URLs are not getting indexed automatically or even no report to not index them. However, when feed items with aliased url are saved manually they get indexed.
By default the feed items are set to published but it seems this is not functional. What is the way to make the feed items with aliased urls automatically indexed by search engines or make them visible by crawlers.
Comments
#1
Are the items not accessible when you try to address them with their aliased URLs?
#2
The items are accessible when I address them with their alias url.
#3
Somewhere they must be linked as node/[nid] - could you find out links that are referring to the feed items without using the URL alias?
#4
"node/nodeid" and "node/nodeid/feed" are both pointing to the same feed items. For the first one the clean url is just pure but the second appends to the clean url /feed as in the following.
a380-superjumbo-lands-sydney node/4147
a380-superjumbo-lands-sydney/feed node/4147/feed
Well other than this I do not see other places where there is association is defined. The above association is in the URL aliases listing page.
#5
If there are node/[nid] URLs turning up in your Google report there must be somewhere publicly accessible references to those URLs that Google picks up, right?
So, if there aren't references to those URLs from within leech-created links, this issue is not related to the leech module.
Alex
#6
I also see significant drop in feed items with url node/nodeid listed in restricted url.
#7
I see that in the site map http://mydomain/sitemap0.xml the pages that are generated by the leech module http://mydomain/node/nodeid are listed but not with clean url. I assume the site map is the source for Google to list those url as not accessible because in the robot.txt file /node/ is disallowed while the site map is submitting them for indexing.
Why the http://mydomain/node/nodeid is in the site map instead of the clean url feed items? Most important what is the way to include the clean url feed items in the xml site map?
For feed items the clean url's are automatically generated by the leech module with pathauto module enabled and like other node types it sounds that the clean url that need to be in the site map.
#8
Now the problem is fixed. The problem is from Google Sitemap. I upgraded the Google Sitemap module and now all the feed items appear in the sitemap with clean url.
#9
Automatically closed -- issue fixed for two weeks with no activity.