node/32?page=32&from432..... [#143987]

Comment #1

English

commented 14 May 2007 at 22:31

Its possibly but VERY hard.

A better solution than redirecting would be to produce a 404 message for those non-existent dupe pages, for example:
EXISTS: node/123?page=1
DOESN'T EXIST: node/123?page=1&from=456

Is that what you mean?

The difficulty is that there is next to no way for Global Redirect to know if this is a Dupe page or if its actually intended to be there by another module.

Log in or register to post comments

Comment #2

nicholasthompson

English

commented 15 May 2007 at 10:45

A thought....
1) Do all these "dupe" URL's share the same argument?
2) Are you using Apache with mod_rewrite enabled?

You could do a redirect that would be SOMETHING like this (but not this):

  RewriteCond %{QUERY_STRING} from=([0-9]+)
  RewriteRule (.*) $1? [R=301,L]

I think your problem is too specific for Global Redirect - however it should be solvable somehow.

Log in or register to post comments

Comment #3

cybe commented 15 May 2007 at 11:59

Well actually node/123?page=123 does not exist, but just like on any? server (http://www.washingtonpost.com/wp-dyn/content/article/2007/05/11/AR200705...)
it doesn't say 404.

category?page=123 does exist though.

For some strange reason Yahoo Slurp thinks those pages do exist, perhaps they even differ in some minute way which is why Slurp keeps going through them?

I've been trying to figure out a mod_rewrite to redirect Slurp to a "410 Gone" but I've not been successful even though I've already got a huge .htaccess full of tricks (some rules I've spent days figuring out)

It's probably the question-mark that's causing trouble. The rule should probably be made with using RewriteCond but I'm not skillful enough nor does your example help me much so please help if you are able to.

These two examples below do not work.

RewriteRule ^node/(.*)?page - [G,L]

  RewriteCond %{QUERY_STRING} node/([0-9]+)\?page
  RewriteRule - [G,L]

Log in or register to post comments

Comment #4

cybe commented 15 May 2007 at 12:00

I've now banned pages like these in robots.txt but Slurp seems to read it very seldom.

Log in or register to post comments

Comment #5

nicholasthompson

English

commented 15 May 2007 at 12:06

Try SOMETHING like this? (I say something with such emphasis as I'm not a Jedi Rewrite Master yet)

  RewriteCond %{QUERY_STRING} q=node/([0-9]+)
  RewriteCond %{QUERY_STRING} page=([0-9]+)
  RewriteRule - [G,L]

That would need to go after the line which reqwrites the URL into a neat one which I think is one of the last things to happen...

Log in or register to post comments

Comment #6

cybe commented 15 May 2007 at 16:24

Thanks for the suggestion, but you are not a Jedi yet - it didn't work, nor the modifications I did. Looks like it's time to visit the mod_rewrite forum.

Log in or register to post comments

Comment #7

cybe commented 15 May 2007 at 22:19

Wonderful! I got it from someone on the mod_rewrite forum

Options +FollowSymLinks

RewriteEngine On

RewriteCond %{QUERY_STRING} ^(.*&)?page=[0-9]+(&.*)?$ [NC]
RewriteRule ^node(/.*)?$ - [G,L]

Why not put this rewrite to this module?

Eat 410 Yahoo Slurp

Log in or register to post comments

Comment #8

cybe commented 15 May 2007 at 22:25

What a weird bot it is that Yahoo Slurp by the way, now it is accessing my robots.txt.old

Log in or register to post comments

Comment #9

cybe commented 15 May 2007 at 22:49

I just noticed that the rule also rewrites http://site/?page=2 and http://site/node/add so it still needs some modification.

Log in or register to post comments

Comment #10

cybe commented 16 May 2007 at 07:10

I've added a

RewriteCond %{HTTP_USER_AGENT} Slurp [OR]

so it only applies for Yahoo Slurp. It will still find all the contents without visiting any "page=num" pages.

Log in or register to post comments

Comment #11

nicholasthompson

English

commented 16 May 2007 at 08:11

On this line:

RewriteCond %{QUERY_STRING} ^(.*&)?page=[0-9]+(&.*)?$ [NC]

You dont need most of that Regex - it can be simplified to:

RewriteCond %{QUERY_STRING} page=[0-9]+ [NC]

Now, seeing as it was applying itself on non "node/123" style pages, you need to apply more filtering... Something like this?

RewriteCond %{REQUEST_URI} ^/node/[0-9]+$ [NC]

Put them all together and you get....

RewriteCond %{QUERY_STRING} page=[0-9]+ [NC]
RewriteCond %{REQUEST_URI} ^/node/([0-9]+)$ [NC]
RewriteRule .* /node/%1? [G,L]

That will remove ALL page arguments from any node... this might break Book content types though.

Log in or register to post comments

Comment #12

nicholasthompson

English

commented 25 July 2007 at 21:28

Status:

Active

» Closed (won't fix)

Log in or register to post comments

Comment #13

SlyK commented 8 July 2009 at 10:05

I got many URLs in with various parameters in search engine index stats. Too bad you can't fix it.
It seems if someone will left the link to:
www.site.com/?page=1&some=sh*t

the whole pages with param will be indexed :(

Log in or register to post comments

node/32?page=32&from432.....

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

News items

Our community

Documentation

Drupal code base

Governance of community