Problem with random crud URL gibberish actually working on taxonomy/term when it should not. Help!

Rhino - May 28, 2009 - 11:00

How do I get some pages (that mysteriously generate OK by drupal but are wrong URL's) out of google? Similar but not to "how do I get my site out of Google": http://drupal.org/node/245173

I have a bunch of mystery pages listed at Google, I have no idea where they get the URL's from and have combed the site trying to find them without luck - there might be pages out on the web pointing to these URL's for some reason. URL's are like this;

example.com/TOPIC/%3Cbr+/%3Ehttp:/www.linkedin.com/www.brandchannel.com/node/relaunched?page=160
example.com/TOPIC/node/%3Cbr+/node/:http:/www.ananova.com/news/story/sm_908920.html%22?page=1
example.com/TOPIC/www.myspace.com/www.myspace.com/node/%3Cbr+/modules.php?page=6&op=modload&name=Sections&file=index&req=viewarticle&artid=100
example.com/

Now this is the strange part: THEY WORK. They shouldn't. What they do is bring up the "topic" page (a taxonomy term) when they shouldn't be doing that at all. How do I prevent this from happening? a 401 would quickly get googled sorted but for some mystery reason this URL-rubbish generates the taxonomy/term page just fine. I think I'm being penalized for double content due to these URL's and need to get rid of them stat.

Similar problem - any node has with pathauto a readable URL. Now things like this are showing up at google;

example.com/topic/human-readable-headline?page=1

Where is page=1 coming from there and how do I make it stop. It goes to the exact same article as example.com/topic/human-readable-headline - so again, duplacetd content in googles eyes.

I use pathauto. I currently do not have anything in Robots.txt.

ANyone? Bueller? Does anyone

Rhino - June 1, 2009 - 10:15

ANyone? Bueller? Does anyone out there know why these URL's work when they clearly should not?

These people have posted about the same problem.

Rhino - July 18, 2009 - 07:47

http://groups.drupal.org/node/14809

"&from=1289 and node?page= produces multiple pages and fictional pages" - this discussion is about Drupal5, I'm on Drupal6

Also, bug it was a bug report Sept 2008;

http://drupal.org/node/307244

I am still riddled with

Rhino - August 29, 2009 - 12:54

I am still riddled with fictional (but working) URL's

Example;

DOMAIN/SECTION/node/%3Cbr+/%3Ehttp:/wistechnology.com/www.myspace.com/modules.php?page=7&op=modload&name=Search&file=index

DOMAIN/SECTION/node/<br+/>http:/node/v

The above examples show up listed on google when searching for my site (on the first page no less) and both lead to simply /section (as in "news") of the site.

How do I get rid of these URL's that should not be working in the first place. And where the hell are they coming from?

The <br+> is a real mystery.

Yeah. No biggie

dman - August 29, 2009 - 13:14

An explanation although not a solution if you find it a problem.

Your incoming URLs ending with <br /> come from inline text-processing on the offsite site. Y'know how a word processor can automatically convert an address into a link, and will get it wrong if you end the link with a ".". Some WYSIWYG embedded a break where a newline should go, and that turned into part of the URL. Incorrectly.

.dan.

I find it a

Rhino - August 29, 2009 - 14:06

I find it a problem.

example.com/node/IcanwritehwateverthehellIwanthere does the right thing and sends me to the 404 I should be getting

but

example.com/TOPIC/node/IcanwritehwateverthehellIwanthere does not. I think this is a problem.

Just found this fictional url in google

FlemmingLeer - August 31, 2009 - 01:17

Hi Rhino,

Your post made me look closer in google and I found this fictional url in google and it works.

example.com/comment/924/function.mysql-query

I use Comment Page 5.x-1.1 to display urls of comments.

Found here:
http://drupal.org/project/comment_page

But it's not specific to comment page. The tracker module also produces fictional urls.

example.com/tracker//function.mysql-query

with added //

and without
example.com/tracker/function.mysql-query

And with 3 / it still works.

example.com/tracker///function.mysql-query

So it is somehow connected to Drupal core.

Even a turtle reaches it´s goal...

Exactly. I tried that

Rhino - September 2, 2009 - 07:36

Exactly. I tried that fictional URL on my Drupal site(s) and it worked there are well. Not good.

 
 

Drupal is a registered trademark of Dries Buytaert.