Hi.

I would like to know if it would be possible to redirect an url that contains a bad GET page variable. For example, say we have a node:

mysite.com/a-test

If this page doesn't have pagination and we add a GET page variable:

mysite.com/a-test?page=4

There's no redirection 404 to mysite.com/a-test

Is it possible to know if a page has a pagination, and if not or if the GET page value isn't valid, doing a 404 redirection?

Thanks.

Comments

nicholasthompson’s picture

Title: Redirect url with a bad GET page variable? » Redirect url with a bad/spammed query string?
Version: 6.x-1.2 » 6.x-1.x-dev
Category: support » feature
Status: Active » Postponed

Technically this is possible.

The problem is "how do you define a bad entry in the query string". Maybe a module on the page requires it? It could be anything...

giorgio79’s picture

Good idea!

I just noticed in Google Webmasters one of my simple node pages was reported for duplicate title tage, when I checked it was like this

mysite.com/mynode
mysite.com/mynode?page=1
mysite.com/mynode?page=2
mysite.com/mynode?page=1205

Weird. No clue how Google picked that up as those page variables do not exist, it is just simply "mynode"

avpaderno’s picture

Google Webmaster Tools will always report those pages like duplicated, whenever the passed query string is used or not.

The only solution to that problem is to add a meta tag to those pages.

giorgio79’s picture

Thanks Kiam, I think I understand but the problem is that those ?page=xxx dont exist!

I have no idea how Google picked those up, as they all show the same page!
mysite.com/mynode?page=1
mysite.com/mynode?page=2
mysite.com/mynode?page=1205

is the equivalent of mysite.com/mynode

This is not a views page with paging, it is a simple node page :)

avpaderno’s picture

That is really oddy. Google should pick up links used in Drupal nodes, not attach random strings to the URLs.

hd’s picture

I see the same in the webserver logfiles. Google is crawling pages adding out of the blue ?page=xxx and thus theoretically indefinitely crawling the same pages over and over again. What a bottomless mess! Wonder how this brain dead Googlebot is/was picking these up.

Like for example this very page here can be called with any nonsensical query string like http://drupal.org/node/386928?page=123 etc. and Drupal is silently ignoring it. This can lead to significant overhead and waste of bandwidth.

The issue is also discussed at http://drupal.org/node/309804

I was hoping that this module could do something about it, but I understand it is a much wider problem and not all limited to Drupal. One can pretty much add ?page=123 to the URLs of perhaps most websites without any consequence at all.

avpaderno’s picture

Issue summary: View changes
Status: Postponed » Closed (outdated)

I am closing this issue, which is for a not supported Drupal version.