By geoffb on
I think I have found out why, since moving to Drupal, our content is no longer getting indexed as part of the Google News Crawl.
The reason seems to be that according to Google, the URL for each article must contain a unique number consisting of at least three digits and, as we are using the pathauto module to give us clean urls.
Any suggestions on ways around that besides turning off pathauto?
Thanks,
Geoff
Comments
The reason seems to be that
There was a forum posting months ago that discounted this, but that's besides the point.
Try the year/month/date etc patterns of pathauto. This will give the number with digits that Google is supposed to be looking for.
----
Previously user Ramdak.
Creating a News Sitemap
Thanks for the reply and the ideas, ventat-rk.
I did a search of drupal.org but couldn't find the forum posting you mentioned.
Google's own support page at http://www.google.com/support/news_pub/bin/answer.py?answer=40741 says the URL for each article must contain a unique number consisting of at least three digits.
It also notes that "this number can't consist of only an isolated four-digit number that resembles a year."
Since my first post, I found another Google page (http://www.google.com/support/webmasters/bin/answer.py?answer=42738) that talks about creating a News Sitemap. As we have over 20,000 articles where the path would likely need changing, I'm thinking maybe this is the way to go.
But,again, I'm still always open to suggestions.
Geoff
Hi Geoff, Pathauto has the
Hi Geoff,
Pathauto has the following patterns that generate numbers in the urls:
By combining one of the other placeholders [title] with a couple of these, say [yyyy], [mm] [dd], you would get urls with multiple numbers exceeding the isolated four digit number. For example,
[title]/[yyyy]/[mm]/[dd]Perhaps that would solve your problem?
----
Previously user Ramdak.
Hi, I had the next pattern:
Hi,
I had the next pattern: os/[termpath-raw]/[yyyy]/[mm]/[dd]/[hour]/[min]
But after upgrade to pathauto-5.x-2.2 and token-5.x-1.11 tokens [hour] and [min] do not correct any more.
What shoud I do?
Please help.