Posted by Jean-Philippe Fleury on February 28, 2009 at 7:22pm
Jump to:
| Project: | Global Redirect |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | postponed |
Issue Summary
Hi.
I would like to know if it would be possible to redirect an url that contains a bad GET page variable. For example, say we have a node:
mysite.com/a-test
If this page doesn't have pagination and we add a GET page variable:
mysite.com/a-test?page=4
There's no redirection 404 to mysite.com/a-test
Is it possible to know if a page has a pagination, and if not or if the GET page value isn't valid, doing a 404 redirection?
Thanks.
Comments
#1
Technically this is possible.
The problem is "how do you define a bad entry in the query string". Maybe a module on the page requires it? It could be anything...
#2
Good idea!
I just noticed in Google Webmasters one of my simple node pages was reported for duplicate title tage, when I checked it was like this
mysite.com/mynode
mysite.com/mynode?page=1
mysite.com/mynode?page=2
mysite.com/mynode?page=1205
Weird. No clue how Google picked that up as those page variables do not exist, it is just simply "mynode"
#3
Google Webmaster Tools will always report those pages like duplicated, whenever the passed query string is used or not.
The only solution to that problem is to add a meta tag to those pages.
#4
Thanks Kiam, I think I understand but the problem is that those ?page=xxx dont exist!
I have no idea how Google picked those up, as they all show the same page!
mysite.com/mynode?page=1
mysite.com/mynode?page=2
mysite.com/mynode?page=1205
is the equivalent of mysite.com/mynode
This is not a views page with paging, it is a simple node page :)
#5
That is really oddy. Google should pick up links used in Drupal nodes, not attach random strings to the URLs.
#6
I see the same in the webserver logfiles. Google is crawling pages adding out of the blue ?page=xxx and thus theoretically indefinitely crawling the same pages over and over again. What a bottomless mess! Wonder how this brain dead Googlebot is/was picking these up.
Like for example this very page here can be called with any nonsensical query string like http://drupal.org/node/386928?page=123 etc. and Drupal is silently ignoring it. This can lead to significant overhead and waste of bandwidth.
The issue is also discussed at http://drupal.org/node/309804
I was hoping that this module could do something about it, but I understand it is a much wider problem and not all limited to Drupal. One can pretty much add ?page=123 to the URLs of perhaps most websites without any consequence at all.