Problem with 404 when PHP is CGI / Rewrite rules

guitarmiami - May 20, 2006 - 16:55
Project:Drupal
Version:4.7.0
Component:base system
Category:support request
Priority:critical
Assigned:Unassigned
Status:closed
Description

When running Drupal on a server with PHP as CGI you have to change line 288(?) in /includes/common.inc from

drupal_set_header('HTTP/1.0 404 Not Found');

to

drupal_set_header('Status: 404 Not Found');

Otherwise it will not send the correct 404 Not Found headers. That includes popular hosts like Site5, Bluehost, etc. More information can be found here: http://us3.php.net/header

Also, the rewrite rules included in Drupal 4.7 are not good. When I tried uncommenting the rules to redirect from non-www to www form of the domain name, it would not work properly. What it would do is if you requested a page like this:
http:// my_site.com/asdf (a non-existant page)
it would just redirect to the home page with the www like this:
http:// www. my_site.com

[spaces inserted to prevent linking]

This is bad for search engines and can cause serious problems and send your site into the supplemental results. I'm speaking from experience from a badly configured server that did this exact same thing -- redirecting 404 errors to the home page instead of sending correct 404 errors.

This is the badly-working default version:

  # If you want the site to be accessed WITH the www. only, adapt and uncomment the following:
  RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
  RewriteRule .* http://www.example.com/ [L,R=301]

I don't understand rewrite rules well, but the following version is the one that I use and it prevents the problem mentioned above: it will redirect something like http:// example.com/asdf to http:// www. example.com/asdf -- either leading the visitor to the correct page (instead of the home page) or a correct 404 not found error:

# Redirect to www
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

If that doesn't make sense let me know and I'll try to explain it better.

#1

bradlis7 - May 21, 2006 - 04:25
Title:Two Drupal bugs and fixes» Problem with 404 when PHP is CGI

I think I have this problem. Google won't stop hitting my xtracker page, even though I took out the module.

#2

bradlis7 - May 21, 2006 - 04:41

Another way to avoid having to edit code is to add it to a custom page, and use it as the default 404 in admin->settings.

Does anyone know how to get the original url when you get to the 404 page. If I print out $_GET['q'], then it gives me /node/47, the 404 node, instead of /xtracker, which is what I want. I was trying to do a redirect to /tracker, but this makes it complicated.

#3

tenrapid - May 21, 2006 - 11:42

The original value of 'q' is saved in $_REQUEST['destination'].

#4

guitarmiami - May 21, 2006 - 22:34

Another way to avoid having to edit code is to add it to a custom page, and use it as the default 404 in admin–>settings.

Will that work? I tried it and it takes my 404 page and 'themes' it so that it is within the overall drupal theme. Doesn't the 404 header have to be the first thing sent?

#5

guitarmiami - May 21, 2006 - 22:52

I think I have this problem. Google won't stop hitting my xtracker page, even though I took out the module.

To find out, use Firefox with the Live HTTP Headers extension. After installing the extension, restart the browser. Use Alt-l (letter L, lowercase) to open LiveHTTPheaders in the sidebar. Then load a non-existant URL from your site in the browser. If the header in the sidebar says something like 'HTTP 1.x 404 OK' then you have this problem. It should say '404 Not Found', not '404 OK'.

I think the .htaccess problem that I mentioned above is a pretty serious bug. If someone activates those rewrite rules and gets the wrong URL spidered it could wreck their search engine rankings. When my server did something like this to me (due to bad server configuration by the hosting company), it killed my rankings in Google and MSN Search on that particular site. With the new Drupal 4.7 rewrite rules, instead of sending a 404 error it tells search engines that your non-existant page has moved to your home page and that everything is ok. Search engines don't seem to be able to handle that. I hope someone will take a look at it and fix the .htaccess file. Is this the right place to be reporting these kinds of bugs?

Example: visit http:// www. drupal.org/asdf (a non-existant page. I added spaces to prevent creating a link.)
It redirects to http:// drupal.org/asdf (no www) which then sends a 404 error. That is the best way. If this site were using the 4.7 .htaccess file, it would send the browser to the home page and say everything is ok. Does that make sense?

#6

bradlis7 - May 22, 2006 - 18:59

Yep, it works. I checked it using my Web Developer extension.

I also checked another install of drupal, and it gave me 404 OK, so I know that that made the difference.

#7

guitarmiami - May 22, 2006 - 19:40

The Web Developer Toolbar does everything. I use the toolbar all the time but didn't know it could check headers so I was using the LiveHTTPheaders extension.

Thanks for checking. I'll try the 404 page again. Where did you put the 404 page? In the root directory? 404.html or something like that?

#8

bradlis7 - May 22, 2006 - 23:59

No, I created a drupal node, and set it to the default 404 page in administer->settings. And you can look at header information on the toolbar at Information->View Response Headers.

#9

guitarmiami - May 23, 2006 - 00:12

That must have been my mistake — I created a 404 html page.
:S

#10

guitarmiami - May 27, 2006 - 16:44

bradlis7, were you getting "404 Ok" header errors before you added the 404 page through the Drupal admin? I tried the Drupal admin settings and still get the "404 Ok" error. I think if you are running PHP as CGI you must use the following syntax in PHP, which means changing common.inc:

drupal_set_header('Status: 404 Not Found');

#11

magico - September 16, 2006 - 17:50
Category:bug report» support request

I run Drupal on a server with lighttpd and fastcgi and 404 work fine without need to change that line.

@guitarmiami: any news?

#12

guitarmiami - September 22, 2006 - 16:40

I still have to make the change to common.inc on all of my sites to avoid the "404 Ok" header problem.

http://tips.webdesign10.com/drupal-seo-404-ok-and-htaccess

#13

magico - January 15, 2007 - 19:52
Priority:normal» critical

Try to get some attention to this... so a senior can make a decision.

#14

guitarmiami - January 20, 2007 - 20:44

I've mentioned it a few times but never got a reply.

I've also recently pointed out a similar "301 Ok" error on my blog post here:
http://tips.webdesign10.com/drupal-seo-404-ok-and-htaccess

Another user there mentions a "403 Ok" error.

#15

guitarmiami - June 3, 2007 - 01:29

I just upgraded a site to Drupal 5.1 (from 4.7.3) and it looks like the common.inc file was updated to have this line:

drupal_set_header('Status: 404 Not Found');

... but I'm getting the 404 OK problem still.

Any ideas?

#16

mdlueck - July 10, 2007 - 18:18
Title:Problem with 404 when PHP is CGI» Problem with 404 when PHP is CGI / Rewrite rules

About your suggestion concerning the URL rewrite rules in .htaccess:

I added $1 to the stock Drupal 5.1 .htaccess file, and that is enough tinkering to prevent the elimination of the rest of the URL when the URL is rewritten. I noticed that bad behavior too. I will open a bug report against it suggesting simply adding the $1.

So, My working solution is as follows:

# If you want the site to be accessed WITH the www. only, adapt and
# uncomment the following:
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule .* http://www.example.com/$1 [L,R=301]

#17

mdlueck - July 10, 2007 - 18:50

Well, my proposed fix did not work at one domain / hosting provider, but your proposed solution does work.

I created and updated a bug report, which is as follows:

http://drupal.org/node/158224

#18

ScoutBaker - February 11, 2008 - 21:46
Status:active» fixed

The fix for this is documented in http://drupal.org/node/109150 and committed to D5 and D6.

#19

Anonymous (not verified) - February 25, 2008 - 21:51
Status:fixed» closed

Automatically closed -- issue fixed for two weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.