I thought I knew how Drupal's "clean" URLs interacted with Apache's mod_rewrite and the file system. The logical flow seems obvious after observing the following code in .htaccess:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

The Apache 1.3 documentation for RewriteCond says:

'-d' (is directory) Treats the TestString as a pathname and tests if it exists and is a directory.
'-f' (is regular file) Treats the TestString as a pathname and tests if it exists and is a regular file.

The code checks for a file or directory that matches the request and, if neither is found, rewrites the clean URL such that Drupal can handle it. But all is not as it seems. I expected RewriteCond %{REQUEST_FILENAME} !-f to require a complete match between the file system and the URL before failing.

Here is the caveat:
Apache will not rewrite a clean URL if Apache finds a file at the base of the Drupal installation whose name matches everything up to the first slash (/) in the request. As an added twist, Apache does the same if the file ends with .txt, .diff, and perhaps some other common extensions.

Here is how to demonstrate the quirk:

  1. Create an empty file called node.txt at the base of the Drupal installation.
  2. Browse to the Drupal installation via a clean URL such as node/1.
  3. Notice the 404 error response from Apache (not drupal_not_found()).

Because this is a quirk in Apache, I don't expect the Drupal team can fix this. Would documentation be appropriate?

Nic

Comments

EffieRover’s picture

I've hit that before in non-drupal situations. My random testing showed that mod_rewrite treats everything up to a '.' as a directory name, then tests the filename properly. The front end of the filename is essentially subject to both sets of rules, regardless of what the Apache docs say.

Sucks. No, I don't have a fix.

njivy’s picture

It looks to be a bit more complicated than that in two respects.

1. RewriteCond %{REQUEST_FILENAME} !-f fails when a file at the base of the Drupal installation partially matches the request.

For example, a request for node/5 will return a 404-error if there is a file called node. A directory of the same name does not yield a 404, however.

2. Not all file extensions are treated equally. It appears that files with common extensions like .txt will also produce the unexpected behavior.

Continuing the same example, a file with any of the following names will result in a 404-error:

node.txt
node.diff
node.jpg
node.pdf
node.doc
node.tgz
node.xyz

But a file with any of the next names will not result in a 404-error:

node.abc
node.dump
node.pdq

Does that reflect your experience? Is this behavior intentional?

Nic

morbus iff’s picture

This is due to Apache's Options +MultiViews, where a URL of "node.txt" can be made extensionless to make a cleaner URL (it ultimately comes from the ability to do language determination, such that index.html.en and index.html.de are served to browsers sending certain language preferences). MultiViews is, in my opinion, quite a nice feature, and I recommend nearly anyone interested in "clean URLs" or the tenets behind Cool URIs don't change, to turn them on and use them. I, for example, have used http://www.disobey.com/about/morbus since time began to refer to, originally, morbus.htm, then morbus.html, then morbus.shtml, and finally, a Drupal node that has been path aliases. The URL has never changed even though my technology has, and that's one of the prime benefits of MultiViews. You should be able to turn off this behavior by modifying Drupal's default .htaccess from "Options -Indexes" to "Options -Indexes -MultiViews".

http://www.disobey.com/
http://www.gamegrene.com/
Developer of Drupal's GameAPI

reed.r’s picture

Turning of the Multiview option fixed my clean urls problem very neatly after nasty hours of trying find the reason for the 403 error permission denied. THANKX!!!!

One question though multiview is threatning every subdirectory as file if it is not a directory? And if you have directory browsing turned of it seems to imply to the server when testing for clean urls it is looking for a file perhaps or something in the subdirectory and the if the directory browsing is turned of it seems to render a 403 error for not finding the file.

I am not sure I got this right but I was tearing my hair at this problem and now it's fixed. Thank god...