Problem with random crud URL gibberish actually working on taxonomy/term when it should not. Help!
How do I get some pages (that mysteriously generate OK by drupal but are wrong URL's) out of google? Similar but not to "how do I get my site out of Google": http://drupal.org/node/245173
I have a bunch of mystery pages listed at Google, I have no idea where they get the URL's from and have combed the site trying to find them without luck - there might be pages out on the web pointing to these URL's for some reason. URL's are like this;
example.com/TOPIC/%3Cbr+/%3Ehttp:/www.linkedin.com/www.brandchannel.com/node/relaunched?page=160
example.com/TOPIC/node/%3Cbr+/node/:http:/www.ananova.com/news/story/sm_908920.html%22?page=1
example.com/TOPIC/www.myspace.com/www.myspace.com/node/%3Cbr+/modules.php?page=6&op=modload&name=Sections&file=index&req=viewarticle&artid=100
example.com/
Now this is the strange part: THEY WORK. They shouldn't. What they do is bring up the "topic" page (a taxonomy term) when they shouldn't be doing that at all. How do I prevent this from happening? a 401 would quickly get googled sorted but for some mystery reason this URL-rubbish generates the taxonomy/term page just fine. I think I'm being penalized for double content due to these URL's and need to get rid of them stat.
Similar problem - any node has with pathauto a readable URL. Now things like this are showing up at google;
example.com/topic/human-readable-headline?page=1
Where is page=1 coming from there and how do I make it stop. It goes to the exact same article as example.com/topic/human-readable-headline - so again, duplacetd content in googles eyes.
I use pathauto. I currently do not have anything in Robots.txt.

ANyone? Bueller? Does anyone
ANyone? Bueller? Does anyone out there know why these URL's work when they clearly should not?
These people have posted about the same problem.
http://groups.drupal.org/node/14809
"&from=1289 and node?page= produces multiple pages and fictional pages" - this discussion is about Drupal5, I'm on Drupal6
Also, bug it was a bug report Sept 2008;
http://drupal.org/node/307244
I am still riddled with
I am still riddled with fictional (but working) URL's
Example;
DOMAIN/SECTION/node/%3Cbr+/%3Ehttp:/wistechnology.com/www.myspace.com/modules.php?page=7&op=modload&name=Search&file=index
DOMAIN/SECTION/node/<br+/>http:/node/v
The above examples show up listed on google when searching for my site (on the first page no less) and both lead to simply /section (as in "news") of the site.
How do I get rid of these URL's that should not be working in the first place. And where the hell are they coming from?
The
<br+>is a real mystery.Yeah. No biggie
An explanation although not a solution if you find it a problem.
Your incoming URLs ending with
<br />come from inline text-processing on the offsite site. Y'know how a word processor can automatically convert an address into a link, and will get it wrong if you end the link with a ".". Some WYSIWYG embedded a break where a newline should go, and that turned into part of the URL. Incorrectly..dan.
I find it a
I find it a problem.
example.com/node/IcanwritehwateverthehellIwanthere does the right thing and sends me to the 404 I should be getting
but
example.com/TOPIC/node/IcanwritehwateverthehellIwanthere does not. I think this is a problem.
Just found this fictional url in google
Hi Rhino,
Your post made me look closer in google and I found this fictional url in google and it works.
example.com/comment/924/function.mysql-query
I use Comment Page 5.x-1.1 to display urls of comments.
Found here:
http://drupal.org/project/comment_page
But it's not specific to comment page. The tracker module also produces fictional urls.
example.com/tracker//function.mysql-query
with added //
and without
example.com/tracker/function.mysql-query
And with 3 / it still works.
example.com/tracker///function.mysql-query
So it is somehow connected to Drupal core.
Even a turtle reaches it´s goal...
Exactly. I tried that
Exactly. I tried that fictional URL on my Drupal site(s) and it worked there are well. Not good.