When running Drupal on a server with PHP as CGI you have to change line 288(?) in /includes/common.inc from

drupal_set_header('HTTP/1.0 404 Not Found');

to

drupal_set_header('Status: 404 Not Found');

Otherwise it will not send the correct 404 Not Found headers. That includes popular hosts like Site5, Bluehost, etc. More information can be found here: http://us3.php.net/header

Also, the rewrite rules included in Drupal 4.7 are not good. When I tried uncommenting the rules to redirect from non-www to www form of the domain name, it would not work properly. What it would do is if you requested a page like this:
http:// my_site.com/asdf (a non-existant page)
it would just redirect to the home page with the www like this:
http:// www. my_site.com

[spaces inserted to prevent linking]

This is bad for search engines and can cause serious problems and send your site into the supplemental results. I'm speaking from experience from a badly configured server that did this exact same thing -- redirecting 404 errors to the home page instead of sending correct 404 errors.

This is the badly-working default version:

  # If you want the site to be accessed WITH the www. only, adapt and uncomment the following:
  RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
  RewriteRule .* http://www.example.com/ [L,R=301]

I don't understand rewrite rules well, but the following version is the one that I use and it prevents the problem mentioned above: it will redirect something like http:// example.com/asdf to http:// www. example.com/asdf -- either leading the visitor to the correct page (instead of the home page) or a correct 404 not found error:

# Redirect to www
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

If that doesn't make sense let me know and I'll try to explain it better.

Comments

bradlis7’s picture

Title: Two Drupal bugs and fixes » Problem with 404 when PHP is CGI

I think I have this problem. Google won't stop hitting my xtracker page, even though I took out the module.

bradlis7’s picture

Another way to avoid having to edit code is to add it to a custom page, and use it as the default 404 in admin->settings.

Does anyone know how to get the original url when you get to the 404 page. If I print out $_GET['q'], then it gives me /node/47, the 404 node, instead of /xtracker, which is what I want. I was trying to do a redirect to /tracker, but this makes it complicated.

tenrapid’s picture

The original value of 'q' is saved in $_REQUEST['destination'].

Z2222’s picture

Another way to avoid having to edit code is to add it to a custom page, and use it as the default 404 in admin–>settings.

Will that work? I tried it and it takes my 404 page and 'themes' it so that it is within the overall drupal theme. Doesn't the 404 header have to be the first thing sent?

Z2222’s picture

I think I have this problem. Google won't stop hitting my xtracker page, even though I took out the module.

To find out, use Firefox with the Live HTTP Headers extension. After installing the extension, restart the browser. Use Alt-l (letter L, lowercase) to open LiveHTTPheaders in the sidebar. Then load a non-existant URL from your site in the browser. If the header in the sidebar says something like 'HTTP 1.x 404 OK' then you have this problem. It should say '404 Not Found', not '404 OK'.

I think the .htaccess problem that I mentioned above is a pretty serious bug. If someone activates those rewrite rules and gets the wrong URL spidered it could wreck their search engine rankings. When my server did something like this to me (due to bad server configuration by the hosting company), it killed my rankings in Google and MSN Search on that particular site. With the new Drupal 4.7 rewrite rules, instead of sending a 404 error it tells search engines that your non-existant page has moved to your home page and that everything is ok. Search engines don't seem to be able to handle that. I hope someone will take a look at it and fix the .htaccess file. Is this the right place to be reporting these kinds of bugs?

Example: visit http:// www. drupal.org/asdf (a non-existant page. I added spaces to prevent creating a link.)
It redirects to http:// drupal.org/asdf (no www) which then sends a 404 error. That is the best way. If this site were using the 4.7 .htaccess file, it would send the browser to the home page and say everything is ok. Does that make sense?

bradlis7’s picture

Yep, it works. I checked it using my Web Developer extension.

I also checked another install of drupal, and it gave me 404 OK, so I know that that made the difference.

Z2222’s picture

The Web Developer Toolbar does everything. I use the toolbar all the time but didn't know it could check headers so I was using the LiveHTTPheaders extension.

Thanks for checking. I'll try the 404 page again. Where did you put the 404 page? In the root directory? 404.html or something like that?

bradlis7’s picture

No, I created a drupal node, and set it to the default 404 page in administer->settings. And you can look at header information on the toolbar at Information->View Response Headers.

Z2222’s picture

That must have been my mistake — I created a 404 html page.
:S

Z2222’s picture

bradlis7, were you getting "404 Ok" header errors before you added the 404 page through the Drupal admin? I tried the Drupal admin settings and still get the "404 Ok" error. I think if you are running PHP as CGI you must use the following syntax in PHP, which means changing common.inc:

drupal_set_header('Status: 404 Not Found');
magico’s picture

Category: bug » support

I run Drupal on a server with lighttpd and fastcgi and 404 work fine without need to change that line.

@guitarmiami: any news?

Z2222’s picture

I still have to make the change to common.inc on all of my sites to avoid the "404 Ok" header problem.

http://tips.webdesign10.com/drupal-seo-404-ok-and-htaccess

magico’s picture

Priority: Normal » Critical

Try to get some attention to this... so a senior can make a decision.

Z2222’s picture

I've mentioned it a few times but never got a reply.

I've also recently pointed out a similar "301 Ok" error on my blog post here:
http://tips.webdesign10.com/drupal-seo-404-ok-and-htaccess

Another user there mentions a "403 Ok" error.

Z2222’s picture

I just upgraded a site to Drupal 5.1 (from 4.7.3) and it looks like the common.inc file was updated to have this line:

drupal_set_header('Status: 404 Not Found');

... but I'm getting the 404 OK problem still.

Any ideas?

mdlueck’s picture

Title: Problem with 404 when PHP is CGI » Problem with 404 when PHP is CGI / Rewrite rules

About your suggestion concerning the URL rewrite rules in .htaccess:

I added $1 to the stock Drupal 5.1 .htaccess file, and that is enough tinkering to prevent the elimination of the rest of the URL when the URL is rewritten. I noticed that bad behavior too. I will open a bug report against it suggesting simply adding the $1.

So, My working solution is as follows:

# If you want the site to be accessed WITH the www. only, adapt and
# uncomment the following:
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule .* http://www.example.com/$1 [L,R=301]
mdlueck’s picture

Well, my proposed fix did not work at one domain / hosting provider, but your proposed solution does work.

I created and updated a bug report, which is as follows:

http://drupal.org/node/158224

scoutbaker’s picture

Status: Active » Fixed

The fix for this is documented in http://drupal.org/node/109150 and committed to D5 and D6.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.