Return 404: Not found on illegal node url

gerd riesselmann - October 23, 2006 - 09:10
Project:Drupal
Version:6.x-dev
Component:node system
Category:bug report
Priority:critical
Assigned:Unassigned
Status:closed
Description

I get a lot of Google requests like "/node/comment/reply/13". This URL is invalid (it should be "/comment/reply/13"), yet invoking it returns the main page and a status of 200: OK. This leads to duplicated content, which is penalized by Google and others.

I suggest to instead return a 404: Not Found, this is because the page requested was - well - not found, actually.

To achieve this, lines 2109p in node.module have to be changed from

<?php
>
drupal_set_title('');
return
node_page_default();
?>

to

<?php
drupal_not_found
();
return;
?>

#1

gerd riesselmann - October 23, 2006 - 20:00

Silly me, the case of /node needs to be handled. Thats what the code should look like:

<?php
   
case '':
     
drupal_set_title('');
      return
node_page_default();
    default:
     
drupal_not_found();
      return;
?>

#2

owahab - December 6, 2006 - 13:01

I had this issue too.
I think the way drupal handles the second argument (node/$arg) is confusing for me, I tried fixing this in the node.module file but it's not an easy job, at least for me.
Dozens of dependancies I can't foresee.

#3

gerd riesselmann - December 6, 2006 - 22:36
Status:active» patch (code needs review)
AttachmentSize
node.module.404.4-7.patch333 bytes

#4

killes@www.drop.org - December 13, 2006 - 23:35
Version:4.7.4» 5.x-dev

looks good, moving to HEAD for more reviews.

#5

m3avrck - December 14, 2006 - 02:01

Would like to see this backported to 4.7 as well--- once it gets committed.

Code looks good and makes sense to me.

#6

ChrisKennedy - December 14, 2006 - 09:17
Version:5.x-dev» 6.x-dev
Category:bug report» feature request
Priority:critical» normal

#7

m3avrck - December 23, 2006 - 01:28
Version:6.x-dev» 5.x-dev
Category:feature request» bug report
Status:patch (code needs review)» patch (code needs work)

This is a bug, if a page is not found, it should not redirect to another page claiming it still exists :-p

This should go into 5 and be backported to 4.7. However, this patch doesn't apply to HEAD because node_page() is gone now.

#8

m3avrck - January 6, 2007 - 02:39
Version:5.x-dev» 6.x-dev
Component:node system» menu system
Priority:normal» critical

IMO, this is indeed a critical bug. It would be trivial to create a spamming solution using this. Easy enough to ruin a sites reputation in a search engine too.

Bumping this to 6 though, this would not be a trivial patch. This would be great to incorporate into the thinking going behind the new menu system proposed for 6 too.

#9

gerd riesselmann - January 18, 2007 - 21:45

I don't know about 5 and 6, but in the meantime: can somebody just apply the existing patch to the 4.7 branch? ;-)

#10

moshe weitzman - January 22, 2007 - 05:05

i proposed a more generic solution for this at http://drupal.org/node/74347. i think this is a duplicate of that one, but i'll wait for someone else to change status.

#11

gerd riesselmann - January 22, 2007 - 18:12
Version:6.x-dev» 4.7.x-dev
Component:menu system» node system
Status:patch (code needs work)» patch (reviewed & tested by the community)

Yes, your patch handles the same problem, but I doubt it will be incorporated in the foreseeable future, since it proposes a rather big change. At least it won't be incorporated into 4.7.

Since everybody commenting here actually

  • acknowledged the current behavior is a bug and
  • my patch does solve this for the 4.7 branch,

I'd like to ask again for someone to apply it to this branch - please.

#12

moshe weitzman - January 24, 2007 - 15:57
Version:4.7.x-dev» 6.x-dev

i think this is a pretty good fix. this has to be applied to HEAD before it can be backported. moving ...

#13

gerd riesselmann - January 24, 2007 - 17:21

As m3avrck pointed out in #7, this patch cannot be applied to HEAD, since HEAD menu system works different in many ways. So what's the sense of porting this patch to HEAD, which means to completely rewrite it, just to port it back afterwards?

My point of view is that this patch solves a 4.7-specific critical bug in a 4.7-specific way. If there's are similar unsolved problem in HEAD or 5.0, so it may be. But this cannot be an argument to not fix the original bug!

#14

gerd riesselmann - January 24, 2007 - 17:34

I've investigated 5.0 and the patch does not apply to this, either, since a URL like http://drupal.org/node/ispamdrupalorg gets routed to node_default_page() without further argument checking. URLs with patter node/id however are mapped to node_page_view().

In 4.7 node_page() does handle both cases and does internal argument checking - and that is where my patch comes in: It modifies the argument checking case statement.

Therefore, for 5 or HEAD a completely different solution needs to be found.

#15

gerd riesselmann - January 24, 2007 - 18:09

So, to end moving responsibilities around, here is a proposal to fix this for 5.0. In HEAD however, menu system has changed again and seems to be in the flow, so I second the patch proposed by moshe in #10.

I suggest somebody applies the 4.7-fix, and afterwards we discuss the 5.0 patch ;-). Agree?

AttachmentSize
node404.5.0.patch633 bytes

#16

m3avrck - January 24, 2007 - 18:49

I think this might be a better general fix:

http://drupal.org/node/74347

So this might be a duplicate of that as Moshe pointed out above.

#17

moshe weitzman - January 24, 2007 - 19:03
Version:6.x-dev» 5.x-dev

seems that i've made this too complicated. lets fix DRUPAL-5 with ted's patch and then 4.7 with gerd's patch. reassigning accordingly.

#18

BioALIEN - January 25, 2007 - 12:30

I agree with Moshe, lets get this applied for 4.7.x and 5.x and then concentrate on D6 maybe using the new menu system. Drupal is widely used for its good SEO features. This patch addresses a major bug with regards to SEO. I've seen Google and Yahoo! bots send out test checks on some of our client sites in the form of google+hash.html so they do carry out 404 tests on websites.

+1 to getting this committed asap.

#19

m3avrck - January 25, 2007 - 16:29

I agree with the said comments :-)

However, I didn't write a patch yet :-p

Moshe, can you clarify *which* comments have the correct patches to be committed? I'm sure one of the core committers would ask this question, so might as well ask it for them :-p

#20

moshe weitzman - January 25, 2007 - 17:19

oops again. gerd's most recent patches are the ones to go with. one for 5.0 and one for 4.7

#21

dww - January 25, 2007 - 19:20
Status:patch (reviewed & tested by the community)» patch (code needs work)

gah, hate to do this, but http://drupal.org/files/issues/node404.5.0.patch from #15 contains drupalnot_found(). someone should re-roll that.

http://drupal.org/files/issues/node.module.404.4-7.patch is indeed RTBC.

oh, and +1 for fixing bugs in older versions, even if HEAD is in flux or bugs aren't present in newer versions. ;)

#22

dww - January 25, 2007 - 19:24
Status:patch (code needs work)» patch (reviewed & tested by the community)

here you go. ;)

to repeat, http://drupal.org/files/issues/node.module.404.4-7.patch from #3 is also RTBC for 4.7.x.

cheers,
-derek

AttachmentSize
node404.5.0.patch_1.txt881 bytes

#23

dww - January 26, 2007 - 03:14
Status:patch (reviewed & tested by the community)» patch (code needs work)

alas, whoops. previous patch (and therefore, my latest for 5.x) had syntax errors, too. hehe, guess i was a little hasty with that RTBC. ;)

#24

dww - January 26, 2007 - 03:34
Status:patch (code needs work)» patch (code needs review)

i couldn't get empty() to play nicely with arg(). this seemed like the simplest approach. another pair of eyes, please. ;)

AttachmentSize
node404.5.0.patch_2.txt880 bytes

#25

m3avrck - January 26, 2007 - 14:38
Status:patch (code needs review)» patch (code needs work)

I'm not clear where this $validateargs comes from -- maybe a note about that in a comment? Should it be called $validate_args for clarity as well? Lowercase "true" should be "TRUE".

Wouldn't !isset(arg(1)) be faster than != as well?

#26

dww - January 27, 2007 - 00:07

FYI: i was just making a quick pass through the RTBC queue, and came upon this. i don't personally care much, and i have no time to work on it, which is why it's not Assigned to dww. ;) someone who cares will have to pick up the torch here. just wanted to make sure i hadn't created any false expectations...

cheers,
-derek

#27

gerd riesselmann - February 12, 2007 - 09:08
Version:5.x-dev» 4.7.x-dev
Status:patch (code needs work)» patch (reviewed & tested by the community)

Since everyone agrees that 4.7-patch is RTBC, I marked this bug accordingly. I suggest to set it to 5.x.dev and status "code needs review" after the 4.7-patch has been applied.

#28

RobRoy - February 12, 2007 - 09:11
Version:4.7.x-dev» 6.x-dev
Status:patch (reviewed & tested by the community)» patch (code needs review)

Should go on HEAD first and then be backported. Marking back to review to make sure this applies cleanly to HEAD.

#29

gerd riesselmann - February 12, 2007 - 09:32
Version:6.x-dev» 4.7.x-dev
Status:patch (code needs review)» patch (reviewed & tested by the community)

Aargh! Please do not start moving responsibilities around, again! There is a solution of a critical bug for 4.7 around for 4 months, and it doesn't get applied because everyones wants it to be ported to higher versions first. This is not good.

Rob, please read a thread before changing status: porting this to 5.0, or HEAD means complete rewriting, backporting therefore obviously means rewriting the rewritten - or applying the actual patch, that everyone already agrees on.

#30

chx - February 12, 2007 - 09:49
Status:patch (reviewed & tested by the community)» patch (code needs review)

Who is the everyone that agreed this is RTBC? The code is downright ugly. Why the attached patch is not enough?

AttachmentSize
node_64.patch852 bytes

#31

BioALIEN - February 12, 2007 - 10:37

I didn't have enough time to test chx's approach but once again,

lets get this applied for 4.7.x and 5.x and then concentrate on D6 maybe using the new menu system.

#32

gerd riesselmann - February 12, 2007 - 10:40
Status:patch (code needs review)» patch (reviewed & tested by the community)

Oooookay. Let's get things straight.

  1. There is a patch for 4.7 given in #3. This patch is ready to be committed
  2. There are several patches available for 5.0, which need to be reviewed
  3. There is a patch by chx that I don't know what version it applies to (6.4?) - but it is sparkling and shiny. This patch needs to be reviewed, too

However, since menu system has changed from 4.7 to 5.0 patches to either version are completely different. This means that talking about "backporting" is downright nonsense, since "porting" in this special case means "rewriting".

Now, since 5.0 and HEAD still need to be discussed but 4.7 is ready, and unfortunately the bug reporting system doesn't allow us to express this situation properly, I proposed in #27 to mark this bug as RTBC for 4.7, wait till it gets committed, and afterwards mark it as 5.0, "code needs review".

Can we agree on this, please? I fear this bug else will never get fixed for 4.7 but endlessly discussed.

Or maybe two bug should be filed, one for 4.7 and one for 5.0?

#33

BioALIEN - February 13, 2007 - 10:48

Someone please commit patch in #3 to 4.7.x branch. Then we'll move on to the 5.x discussions.

#34

killes@www.drop.org - March 8, 2007 - 08:11
Version:4.7.x-dev» 6.x-dev

gerd, that's not the workflow we follow. We follow the sequence HEAD, 5, 4.7 in order to avoid inconsistencies.

#35

Dries - March 8, 2007 - 19:41

When I try to access 'http://localhost/drupal-cvs/node/comment/reply/13' as per the original bug report, I get an access denied message. Could it be that this is no longer an issue for CVS HEAD?

Update: I also get an SQL error, I'm afraid. Not sure if that's related to the 'access denied' but I'm guessing it might be. We'll want to fix the 'access denied' message in CVS HEAD before we can properly test this. :/

#36

Wesley Tanaka - March 15, 2007 - 11:14

"Yes Please" (subscribing)

I've found a bunch of hits from googlebot of the form:

/node/?page=&

because the default 'node' page has a pager at the bottom......

#37

Wesley Tanaka - March 15, 2007 - 11:41

Err, that was supposed to say:
/node/[some string]?page=[large number]&[maybe other stuff]

This bug appears to be a duplicate of http://drupal.org/node/82764

To clarify the 5.x status:

#38

Wesley Tanaka - March 15, 2007 - 12:07

Sorry. http://drupal.org/files/issues/node-not-found.2.diff doesn't work on non-numeric node paths.

So the only patch which works for me on 5.1 is http://drupal.org/files/issues/node404.5.0.patch_2.txt

(and to answer the question in #24, you could assign the return from arg() to a variable and test that variable with empty())

#39

matt_paz - April 12, 2007 - 22:42

subscribing

#40

Dries - April 13, 2007 - 08:51

This is a bit unwieldy. So, what is considered to be the best patch for CVS HEAD? :)

#41

fractile81 - April 13, 2007 - 14:54

http://drupal.org/files/issues/node404.5.0.patch_2.txt

This patch appears to work for me as well. It appears to handle this problem on my site correctly (5.1, had to apply by hand and move the changes to line 2372). Will keep testing it, of course.

#42

kbahey - April 15, 2007 - 21:35

As per Dries in #35, I can't reproduce the original issue with today's HEAD.

If you type ?q=node/something, you get an access denied, plus the following errors:

    * warning: Invalid argument supplied for foreach() in /drupal/HEAD/drupal/modules/node/node.module on line 567.
    * notice: Undefined variable: cond in /drupal/HEAD/drupal/modules/node/node.module on line 571.
    * warning: implode() [function.implode]: Bad arguments. in /drupal/HEAD/drupal/modules/node/node.module on line 571.
    * user warning: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 query: SELECT n.nid, n.vid, n.type, n.status, n.created, n.changed, n.comment, n.promote, n.sticky, r.timestamp AS revision_timestamp, r.title, r.body, r.teaser, r.log, r.format, u.uid, u.name, u.picture, u.data FROM node n INNER JOIN users u ON u.uid = n.uid INNER JOIN node_revisions r ON r.vid = n.vid WHERE in /drupal/HEAD/drupal/includes/database.mysqli.inc on line 151.

If you type a URL that references a non existent node such as q=comment/reply/123456, you get a clean access denied.

If you type an invalid URL such as q=node/comment/reply/123456, you get the access denied plus the above error.

So, what is the problem we are trying to solve? Doesn't exist anymore?

#43

drumm - April 18, 2007 - 04:53

I went ahead and reviewed and committed #30 to Drupal 5 at chx's advice. In the future, I think multiple issues for different versions of Drupal with significantly different patches is okay.

#44

chx - April 18, 2007 - 06:01
Version:6.x-dev» 4.7.x-dev

Gerhard, #3 is OK and HEAD needs a totally different patch (menu.inc level, I am thinking allowing callback arguments to be FALSE not an array in cases like this).

#45

Gerhard Killesreiter - April 29, 2007 - 21:59
Version:4.7.x-dev» 6.x-dev
Status:patch (reviewed & tested by the community)» active

Applied #3 to 4.7.
moving back to HEAD

#46

m3avrck - July 2, 2007 - 15:55

Where's the 6.x patch? Fixed in 4.7, 5.x but not 6.x? *doh*

#47

xqus - July 8, 2007 - 15:50

As far as I can see this is not an issue in 6.x.

#48

chx - August 29, 2007 - 11:19
Status:active» fixed

indeed. if node/dfghjk is not defined that any path beginning with that will come back with 404 thanks to node_load returning FALSE thus translate returning FALSE thus menu_get_item returning FALSE

#49

Anonymous - September 12, 2007 - 11:21
Status:fixed» closed
 
 

Drupal is a registered trademark of Dries Buytaert.