Drupal should not handle 404 for certain files

kbahey - August 3, 2006 - 03:31
Project:Drupal
Version:7.x-dev
Component:base system
Category:task
Priority:normal
Assigned:Unassigned
Status:needs work
Issue tags:Performance
Description

Drupal handles 404s for certain files that are not Drupal content.

Examples are .js, .css, .jpg, .gif, ...etc.

This has two disadvantages:

- We do an expensive bootstrap for every 404 for such files.

- We send far more bytes for that 404 than if we let the web server standard error page get sent

Here is a patch that incorporates this.

Note that .png is intentionally left out so that the clean URL test passes.

Based on this handbook page http://drupal.org/node/56773

AttachmentSizeStatusTest resultOperations
htaccess.patch.txt939 bytesIgnoredNoneNone

#1

kbahey - August 3, 2006 - 13:35

Here is another version.

It adds .ico to the list.

AttachmentSizeStatusTest resultOperations
htaccess_7.patch947 bytesIgnoredNoneNone

#3

pwolanin - August 3, 2006 - 15:35

Quick question based on the mailing list discussion-

Assume I want to be able to set up path aliases ending in .html or .htm becuase I've replaced static site content with Drupal content. Will this patch prevent that from working? It seems that it will, so I'd suggest separating out these sorts of endings (shtml, html, htm) into a separate rewrite statement and making it commented out by default.

#4

agentrickard - August 3, 2006 - 18:35

I opened another issue before getting to the end of the thread....

The original handbook page was written with reference to Drupal 4.6.x.

As discussed at http://drupal.org/node/56773, in 4.6, we made some edits to the default .htaccess file to allow the web server to handle bad file requests for most common file types.

Before this is standardized, some more testing is needed. Attached is out current .htaccess file.

AttachmentSizeStatusTest resultOperations
htaccess_new_0.txt2.59 KBIgnoredNoneNone

#5

pwolanin - August 3, 2006 - 21:12

@agentrickard - that attached file does not correspond to the current .htaccess. Did you add more to it?

#6

agentrickard - August 4, 2006 - 14:04

Let me clarify:

That file is the .htaccess file that we use in production (Drupal 4.6.9), which inspired the original Handbook page. It has been modified since the original handbook page was written.

I added it for reference.

#7

kbahey - August 4, 2006 - 14:22

For what its worth, I am using the patch attached in #1 on my live sites, and there is no ill effect at all ...

It does not seem to catch them all though ...

mysql> select wid, from_unixtime(timestamp) as time,  location from watchdog where type = 'page not found' and location like '%gif' order by time desc limit 5;
| 178866 | 2006-08-04 08:48:20 | http://example.com/sites/default/themes/example/images/menu-expanded.gif   |
| 178865 | 2006-08-04 08:47:16 | http://example.com/sites/default/themes/example/images/menu-leaf.gif       |
| 178864 | 2006-08-04 08:12:02 | http://example.com/sites/default/themes/example/images/sidebar_bg_left.gif |
| 178860 | 2006-08-04 07:52:26 | http://example.com/sites/default/themes/example/images/menu-expanded.gif   |
| 178850 | 2006-08-04 05:44:12 | http://example.com/images/header_bg.gif                                       |

I changed it to this, but don't know if it works or not yet.

Index: .htaccess
===================================================================
RCS file: /cvs/drupal/drupal/.htaccess,v
retrieving revision 1.73
diff -u -r1.73 .htaccess
--- .htaccess   14 Apr 2006 09:08:26 -0000      1.73
+++ .htaccess   4 Aug 2006 14:20:13 -0000
@@ -8,6 +8,12 @@
   Deny from all
</FilesMatch>

+# Handle files that should not be handled by Drupal,
+# so a) no bootstrapping happens, and b) 404 error is less bytes
+<FilesMatch "\.(gif||jpg|jpeg|css|js|ico|cgi)$">
+  ErrorDocument 404 default
+</FilesMatch>
+
# Set some options.
Options -Indexes
Options +FollowSymLinks
@@ -82,6 +88,8 @@
   # Rewrite current-style URLs of the form 'index.php?q=x'.
   RewriteCond %{REQUEST_FILENAME} !-f
   RewriteCond %{REQUEST_FILENAME} !-d
+  # Handle 404s for non Drupal files. See above.
+  RewriteCond %{REQUEST_FILENAME} !\.(gif||jpg|jpeg|css|js|ico|cgi)$
   RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
</IfModule>

#8

robertDouglass - August 10, 2006 - 13:39

@agentrickard: I don't see how your .htaccess (from #4) is really applicable to Drupal core since you simply disable Drupal's 404 handling altogether:

# Customized error messages.
#ErrorDocument 404 /index.html
#ErrorDocument 403 /index.php?q=FileNotFound

# This overrides the Drupal 404 handler for files that should never be handled by Drupal
#<FilesMatch "\.(gif|jpe?g|png|s?html|css|js|dll|exe|asp)$">
#  ErrorDocument 404 "File Not Found
#</FilesMatch>

#9

robertDouglass - August 10, 2006 - 14:39

The original version of this patch (and that which is in the handbook) works for nonexistant files requested from directories that actually exist in the filesystem. It does something like this:

Request: http://site.com/misc/image.gif

1. Look for directory /misc/.
2. /misc/ is found, so look for image.gif inside of it.
3. image.gif is not found, and image.gif matches the FilesMatch regexp, so return Apache default 404.

Request: http://site.com/some/image.gif

1. Look for the directory /some/.
2. Directory not found, so the URL gets rewritten to http://site.com/index.php?q=some/image.gif
3. Bootstrap Drupal, look in menu, not found, return Drupal 404 page.

The only fix I'm able to think of for this would be to modify index.php to intercept files using the same regexp from the FilesMatch directive, and return 404 from index.php. I don't know if the powers that be will buy such a solution.

#10

kbahey - August 10, 2006 - 15:01

Oh, I think they will, given large sites having lots of 404 really suffer because of that false bootstrap, and too many of them stressing the system.

I don't have the time to devise a patch, but will be happy to test one on my sites, and perhaps a large client site too.

#11

robertDouglass - August 12, 2006 - 12:21

Here is what seems like an optimal solution. We keep the stuff in .htaccess as it is because having Apache handle things without invoking PHP is ideal. For the cases that Apache can't get to because of the rewrite rules, we run the same regular expression as the first line of index.php and return our own 404 if it matches. The results are as follows:

RequestResult
/misc/blog.pngApache will serve the image because it exists.
/misc/blogxyz.pngApache will serve a 404 based on the .htaccess rules because the directory exists but the file doesn't.
/foo/bar.gifDrupal's index.php will serve a 404 based on the rules there because the request ends with .gif. The very fact that we are in index.php means Apache has already decided it is a 404 case, with or without the modification to .htaccess (so the modification to index.php is not an Apache specific hack).

As long as developers don't start making paths in the menu that end in .gif, .png, .swf etc., we're fine. Does anyone know of any cases where such paths are generated? Perhaps by CAPTCHA or watermarking?

AttachmentSizeStatusTest resultOperations
htaccess-index.patch1.55 KBIgnoredNoneNone

#12

robertDouglass - August 12, 2006 - 12:28

So much for my nice table.

/misc/blog.png
Apache will serve the image because it exists.

/misc/blogxyz.png
Apache will serve a 404 based on the .htaccess rules because the directory exists but the file doesn't.

/foo/bar.gif
Drupal's index.php will serve a 404 based on the rules there because the request ends with .gif. The very fact that we are in index.php means Apache has already decided it is a 404 case, with or without the modification to .htaccess (so the modification to index.php is not an Apache specific hack).

#13

kbahey - August 12, 2006 - 20:14

Here is an updated patch.

It picks up from what Robert did, and changes the following:

- Adds .ico file as a type to be ignored.
- Makes .html as a commented out line with instructions. The reason is, some sites use the .html in URL aliases for various purposes (e.g. content migrated from older systems, making awstats give better stats by type, ...etc.)

I tested this and it works as expected. Should cut down resource usage on some busy sites with lots of flat files.

So, +1 from me.

AttachmentSizeStatusTest resultOperations
htaccess-404.patch.txt1.97 KBIgnoredNoneNone

#14

Dries - August 13, 2006 - 09:17

What if I'm serving images from the database, using the 'private file method'?

What if, in the near future, we want to do access control on images associated with nodes? (Something we should do.)

#15

robertDouglass - August 13, 2006 - 10:55

This breaks private files. Ugh.

Of course, we could update the regexp to accommodate the private files path... but what about filemanager files? And ecommerce files?

Yuck.

I don't know how we're going to introduce the access permissions on files attached to nodes (filemanager/attachment already has a solution for this), and don't think that it should be part of this patch.

Is the best thing to do to make the change in .htaccess and leave it at that? I suspect that we're only covering 5-10% of the target cases if that's the route we go. The other option would be to build the check into the bootstrap process and hope that we could do a partial bootstrap for 404 images instead of a full bootstrap, but this is clearly inferior.

#16

forngren - August 13, 2006 - 10:57

I'm writing something maybe related, http://drupal.org/node/77514. It performs a search for the missing URL with certain words stripped (eg html, php etc).

#17

robertDouglass - August 13, 2006 - 11:01

I wouldn't be surprised if we end up introducing a second entry point to the bootstrap process, something like files.php, that handles any URL that has a file ending. That file would initiate an alternate bootstrap process where one of the first checks would be "Is this really a file that exists?", and right after that would check "Is this a path that is handled by a module?".

Haven't Walkah and Co. been doing some thinking on the file bootstrap issue? What is their concensus?

#18

moshe weitzman - August 16, 2006 - 02:34

i posted some thoughts on private files at http://www.tejasa.com/node/113. no code though.

#19

Dries - August 16, 2006 - 13:17

IMO, the best we can do is to try and serve a lightweight page. Dropping the sidebars on 404 pages could save a fair amount of resources; both bandwidth as CPU cycles.

#20

moshe weitzman - August 16, 2006 - 15:58
Assigned to:Anonymous» moshe weitzman

i guess that will have to do for now ... i've been meaning to add a block-less page anyway. needed for panels.module ... if anyone has ideas on how to implement this, please chime in. assigning to self.

#21

kbahey - August 16, 2006 - 16:19

Introduce a new theme function? mini_page based on theme_page() in theme.inc?

#22

Dries - August 16, 2006 - 16:55

theme_maintenance does something similar. That or provide a drupal_is_404() function for the template files? Food for thought. :)

#23

moshe weitzman - August 16, 2006 - 22:07

Thats was easy. i just added minor conditionals in phptemplate.engine and chameleon.theme. Is used by 404 page, and will certainly be used by panels.module - http://angrydonuts.com/the_dashboard_revisited

To omit blocks, just pass FALSE as 3rd argument - for example theme('page', $output, FALSE)

#24

moshe weitzman - August 16, 2006 - 22:44

oops - here is the patch

AttachmentSizeStatusTest resultOperations
patch_953.35 KBIgnoredNoneNone

#25

Dries - August 17, 2006 - 08:55

It begs the question: do we need to redo the theme_maintenace() page using this system? Can, and if so, should we merge them?

#26

robertDouglass - August 17, 2006 - 09:13

Here's the .htaccess portion of the patch. We should adopt this part regardless of the solution that we come to for the rest because it is the most effecient way to handle requests for files that don't exist.

AttachmentSizeStatusTest resultOperations
htaccess-404.patch_0.txt1.19 KBIgnoredNoneNone

#27

moshe weitzman - August 17, 2006 - 11:46

i looked into maintainence_page and install_page but they are specialized and merit own function. They avoid calls to functions that haven't been loaded yet. I think we can leave them alone.

#28

Dries - August 18, 2006 - 09:08

I'm ok with Moshe's patch. My original thought was to introduce a helper function (drupal_is_404()) but I guess this works too. A drupal_is_404() might be a little more powerful, and more transparent though.

Then again, the extra paramater passed to theme_page() might, for example, be useful to simply certain pages (like the welcome page after installation).

#29

moshe weitzman - August 18, 2006 - 11:21
Status:needs review» reviewed & tested by the community

i'll take some more feedback from cvs commit team, or proceed with commit. the extra param on theme_page() is vital for panels.module (whichis super cool if you haven't looked yet - http://drupal.org/files/issues/htaccess-404.patch_0.txt)

#30

drumm - August 21, 2006 - 06:34
Status:reviewed & tested by the community» needs review

Which patch are we talking about? I can't tell from moshe's review if he thought his own patch was ready for commit or Robert's.

#31

moshe weitzman - August 22, 2006 - 21:29

i am talking about my patch. i have not tested robert's patch.

#32

m3avrck - August 23, 2006 - 03:05

Updated patch... includes Moshe's patch + Roberts + update for HEAD.

Tested and working well. I really like the fact that blocks go off when you get a 404, that really can save a lot of bandwith + CPU cycles.

Also, the Apache 404 for actual files is *very* necessary. If you have a link to http://www.example.com/somethingFunnyLooking.gif ... you are linking to a file directly--you are expecting either the file or a 404, *not* a full Drupal boostrap to give you a 404. 99% of the time that link is going to be in the theme or some link posted by someone and they'll never see that 404... so let's give them the absolute least amount of overhead with that 404.

I'd say this is ready to go, nothing new here and we solve 2 problems at once: 404s for content in Drupal (no blocks, less CPU, SQL, etc..) and 404s for files (*nothing* at all).

AttachmentSizeStatusTest resultOperations
404_0.patch5.24 KBIgnoredNoneNone

#33

m3avrck - August 23, 2006 - 03:17

Also when/if this goes in, we need to update the theming docs about this new $show_blocks variable so themers can be sure to use it so save some CPU cycles on 404s.

#34

m3avrck - August 23, 2006 - 04:06
Status:needs review» reviewed & tested by the community

Talked with Moshe in IRC, we both agreed this is RTBC. It was ready before Neil just didn't know what to commit (now it's both).

#35

drumm - August 23, 2006 - 04:07
Status:reviewed & tested by the community» needs work

This is two separate issues. Make a separate issue for one half of this.

#36

m3avrck - August 23, 2006 - 04:13

Neil, ok here is the first patch then, this is for the blocks. Note, this will need a documentation update.

This patch is the same from before, just synced with HEAD.

AttachmentSizeStatusTest resultOperations
block_9.patch4.05 KBIgnoredNoneNone

#37

m3avrck - August 23, 2006 - 04:13
Status:needs work» reviewed & tested by the community

#38

m3avrck - August 23, 2006 - 04:37

After the patch in #36 is commited (same as in #24 just updated for HEAD), here is the next patch to address the .htaccess and 404.

AttachmentSizeStatusTest resultOperations
404_1.patch1.2 KBIgnoredNoneNone

#39

drumm - August 23, 2006 - 06:01
Status:reviewed & tested by the community» needs work

Please post different patches on /separate/ issues.

#40

robertDouglass - August 23, 2006 - 08:37

I moved the most recent version of Moshe's patch to a different issue since the title of this issue fits my patch the best:
http://drupal.org/node/80201

#41

m3avrck - August 23, 2006 - 15:22
Status:needs work» reviewed & tested by the community

Here's the 404 patch which is for this issue :-)

Thoroughly tested, this is ready to go. Like I said in #32, this is needed for files, it can save a ton of wasted Drupal bootstraps and server overheads.

AttachmentSizeStatusTest resultOperations
404_2.patch1.2 KBIgnoredNoneNone

#42

forngren - August 23, 2006 - 20:01

Because of this (two) patches I think the need for http://drupal.org/node/76824 has increased, since the 404 page just "blanks" ATM. Most users wont configure a 404 page...

#43

drumm - August 24, 2006 - 08:19
Status:reviewed & tested by the community» needs work

The nearly-identical commented lines next to the uncommented need a better explanation.

From what I can tell this still breaks private file downloads.

#44

moshe weitzman - November 6, 2006 - 15:47

sad to see this one languishing. if it cam edown to it, i would add this and put some doc for private files tha tthis must be commented out.

#45

coreb - November 28, 2006 - 18:15
Version:x.y.z» 6.x-dev

moving from x.y.z to 6.x-dev version queue. Hopefully this will also bump the feature back into people's attention.

#46

moshe weitzman - December 4, 2006 - 03:44

color.module also does not work for private files and it is a quite prominent part of our default theme. see http://drupal.org/node/92059

i think private file incompatibility is acceptable for drupal 5. we will fix that in 6. anyone want to address drumm's last comment and resubmit?

#47

dopry - December 20, 2006 - 20:24

-1 on this addition to the .htaccess, It will break any dynamic generation of the restricted file types & drupal based file access control.

If it is added I'd like to see it commented out by default, so people who have concerns about bandwidth and server load can uncomment them. We could also choose a special case for files/ or system/files.

#48

m3avrck - December 20, 2006 - 22:19

Here's and updated patch which cleans up some stuff.

Known issues:

  • Regexes need to be updated to ignore files that live in files/ and system/files/ --- this will fix both private downloads and modules like imagecache which make special use of the files/
  • Directories that do not exist (e.g., themes/abc/something.gif) cause an Apache 404 error but then it says it can't find the proper 404 document. However, if use a directory that exists /themes/something.gif with an image that doesn't exist, it gets the proper 404
AttachmentSizeStatusTest resultOperations
d_3.patch1010 bytesIgnoredNoneNone

#49

dopry - December 21, 2006 - 05:39

The problem with 'files' is that it can change and we don't have access to file_directory_path from the .htaccess file...

#50

moshe weitzman - December 21, 2006 - 12:45

it is reasonable to force folks to have 'files' in the difrectory path, like we do for modules and themes. but if not, i htink this can go in commented out and ask folks to enter their files path.

#51

jvandyk - January 4, 2007 - 03:23

Here's an updated patch. Moved definition of ErrorDocument below our existing definition (otherwise it'll just get stomped on by the later ErrorDocument 404 index.php line).

Changed the description a bit. Commented the lines out per #50, except the RewriteCond.

AttachmentSizeStatusTest resultOperations
htaccessjv.patch1.08 KBIgnoredNoneNone

#52

RobRoy - January 4, 2007 - 03:30

Don't have time to test this just now, but will not handling rewriting for .html files mess when you have node/2 aliased to about.html, for example?

#53

RobRoy - January 4, 2007 - 03:31

Also, in those new comments it might help to have a reason why one might NOT want to uncomment those lines. For example, why does this come commented-out by default.

#54

m3avrck - January 4, 2007 - 03:58
Version:6.x-dev» 5.x-dev

Here's an updated patch with a regex from Steven on IRC to ignore those files when they exist in /files which can interfere with private downloads and imagecache module.

Regex needs some tweaking though, almost there.

AttachmentSizeStatusTest resultOperations
d_5.patch1.19 KBIgnoredNoneNone

#55

m3avrck - January 4, 2007 - 04:01

Also uncommented the regex because if it can ignore anything in files/ it shouldn't break anything. Feel free to comment out though if I missed a use case.

#56

Morbus Iff - January 4, 2007 - 04:35

m3vrck: FilesMatch does absolutely nothing with Directories. Scanning against files/ won't work. The only way forward on that avenue is to get the default handler into the .htaccess that is generated into the files/ directory.

#57

drumm - January 4, 2007 - 05:54

And files isn't always named 'files'.

#58

kbahey - January 4, 2007 - 06:06

How about a new approach to this issue that has been going back and forth for so long, rolled back, ...etc?

Instead of doing things in .htaccess, requiring people to uncomment things, ..etc., can we move this into Drupal itself?

Can we do a light bootstrap (as early as possible), read a variable (perhaps in settings.php) that has a pattern, try to match it, then exit early?

This is of course more overhead than letting Apache handle the whole thing, but it would still be better that what we have now, provided we do it as early as possible.

Comments?

#59

moshe weitzman - January 4, 2007 - 14:34

@kbahey - now is not the time to switch gears on this issue. thats a radical change and quite debateable.

@drumm - i think a code comment in htaccess and possibly where we set the files directory is enough to satisfy the rare case where files dir does not end in 'files'.

#60

kbahey - January 4, 2007 - 15:36

moshe, the reason I suggested an alternative is that this is dragging on, due to various issues that are pointed out by various people. So, getting consensus on it will be difficult.

So, in my view, it may not make it to 5.0.

My suggestion was to move it inside drupal, so we can manage the patterns to exclude, ...etc. from inside Drupal with a little bit of sacrifice on the overhead side.

If others think it will get in core via htaccess soon, then I am happy to go with that too.

#61

moshe weitzman - February 2, 2007 - 16:46

anyone care to carry this forward?

#62

kbahey - May 19, 2007 - 02:39
Version:5.x-dev» 6.x-dev

Anyone willing to work on this performance enhancing patch?

The code freeze is a few weeks away ...

#63

RobRoy - May 19, 2007 - 03:26

Can someone answer my question at http://drupal.org/node/76824#comment-173567, which is the same as #3 submitted by pwolanin? What if I have an alias for .html pointing to a missing node? Not that that is a huge deal, just would like clarification. Also, the hardcoding /files issue is still there. What about for multisite?

@kbahey, I agree that moving this inside Drupal would save us a lot as we could get the right files dir and some of the other issues might be alleviated. But, maybe that would soften the performance gain that this is trying to achieve. Since this is a bit funky, I think the easiest way is to put this in .htaccess and comment it out initially, allowing users to turn on this boost if they want. What do you think?

#64

kbahey - May 19, 2007 - 04:11

Many of my site has .html as the suffix, mainly so awstats can give good stats on them.

I think that this patch is for performance more than anything else. If we avoid a Drupal bootstrap, we waste less resources when we have a 404.

So, the options are:

1. We add the stuff in .htaccess and index.php and comment it out. This is the path of least resistance and would give us what we need for D6.

2. Have .htaccess generated automatically from Drupal like we do with the CSS cache stuff. A default setting provides which file extensions to use and the admin can override them (e.g. remove html or add xyz)

The .htaccess need not be writable. We can take a non-fancy approach and just display the .htaccess contents on the screen and ask the admin to copy/paste, or we can write to the tmp directory and ask the admin to move the file manually. Since this is an advanced feature, I think this is acceptable.

3. Move parts of that in a lighweight bootstrap, as I suggested earlier.

I think 1 will have the best performance. If Dries is amenable to this, I can work on this towards end of next week.

#65

moshe weitzman - June 11, 2007 - 02:03

I agree that #1 is best for now. I hope kbahey or other have a chance to revive this.

#66

kbahey - June 11, 2007 - 02:21
Assigned to:moshe weitzman» kbahey
Status:needs work» needs review

I hate to see Drupal 6 go out the door without this simple yet effective feature.

Here is a patch for current HEAD.

AttachmentSizeStatusTest resultOperations
htaccess-404.patch2.15 KBIgnoredNoneNone

#67

kbahey - June 11, 2007 - 03:03

Here is a patch that moves the index.php stuff to settings.php, and comment it out.

This way, we have no cruft in index.php, and those who need this can just uncomment it (albeit in 3 different places, but better than a whole Drupal bootstrap for a measly 404).

AttachmentSizeStatusTest resultOperations
files-404.patch2.49 KBIgnoredNoneNone

#68

moshe weitzman - June 11, 2007 - 03:27
Status:needs review» reviewed & tested by the community

works as advertised. i think this is a good compromise, and all the lines are needed.

#69

Steven - June 14, 2007 - 01:39
Status:reviewed & tested by the community» needs work

One of the chunks is double commented, the other is not.

#70

kbahey - June 14, 2007 - 01:53
Status:needs work» needs review

Here it is, fresh from the microwave ... ;-)

AttachmentSizeStatusTest resultOperations
files-404-2.patch2.56 KBIgnoredNoneNone

#71

agentrickard - July 1, 2007 - 14:17

I think this patch is important (I wrote the original handbook page).

Circling back, have you tested against http://drupal.org/node/56773#comment-243479? And against file uploads in general?

#72

Gábor Hojtsy - July 1, 2007 - 22:22

Reread the htaccess comments, there are typos and repeated stuff as mentioned by Steven. Also if the functionality in htaccess is effective, there would not be a way to get to the default settings.php change. Are three three marketed as an 'all or nothing' solution to uncomment all three to cover all bases?

#73

kbahey - July 1, 2007 - 22:56

The 3 are needed but for different reasons. When a directory exists but a file in it does not, then Apache is able to handle the 404, and hence the .htaccess changes.

However, if the directory does not exist, then Drupal is still bootstrapped, so we exit as early as we can, and hence the settings.php change.

Rerolled the patch. Hopefully comments are OK now. I also changed it to use REQUEST_URI instead of QUERY_STRING, since that is what we are checking in .htaccess.

Also added a small "Page not found" message.

AttachmentSizeStatusTest resultOperations
files-404-3.patch2.44 KBIgnoredNoneNone

#74

Morbus Iff - July 2, 2007 - 12:02
Status:needs review» needs work

* What's a "flat file"? I know what a flat file is. It's not what the documentation prefaces.
* None of the sentences that end with "settings.php" have a period at the end.
* None of these lines are commented (and they should be).
* "corresponding the RewriteCond" has an extra "the".
* I can agree that we'll save "server" resources, but not "bandwidth" resources.
* "corresponding FilesMatch line" - it's actually three line[s], not just one "line".
* Technically, if a number is less than 10, it should be written out ("three lines", not "3 lines").

I'm not sure I like this third check. I worry about dynamically generated images (ala CAPTCHA or various statistics) - I suspect, just off the top of my head, that one of the "make an image from text" modules will fail, like textimage.module. This third check also defeats a request for an image uploaded to a node when private downloads are enabled (URL path is ?q=system/files/1608.png, which triggers your check).

#75

Morbus Iff - July 2, 2007 - 12:06

And note that #74 and the "private downloads" issue was brought up as far back as #14.

#76

Arancaytar - July 2, 2007 - 12:15

I have trouble believing that the performance gain is worth completely eliminating file extensions in the virtual file system. The whole simple beauty of the "no file named $1? let index.php?q=$1 handle it" method is that there is no difference from the outside between the Drupal handler and the files served directly.

What performance gain is this supposed to give, anyway - it seems to speed up only 404 errors. For a site without broken links and sensible, persistent URLs, shouldn't those be the exception rather than the norm?

Just my two cents there...

#77

Morbus Iff - July 2, 2007 - 12:32

What performance gain is this supposed to give, anyway - it seems to speed up only 404 errors. For a site without broken links and sensible, persistent URLs, shouldn't those be the exception rather than the norm? - Not really. Given enough time, your site will be constantly hit by broken spiders, endless requests for favicon.ico (which may not exist), and various cracking attempts via known vulnerabilities in IIS and other webservers. My top WEEKLY 404s are 14045 /favicon.ico(has never existed and never will), 2296 /popular (never existed. broken bot), 1668 /ghostsites/images/pixel.gif (probably once existed.), 1414 /popular/alltime (never existed. broken bot), 1025 /images/pixel.gif (probably once existed), 467 /cgi-bin/isp-post/isp-post.cgi (last existed 7 years ago. bot attempting exploits), 409 /files/users/picture-11.jpg (existed once. user, and all his content, was deleted.), 178 /db/zone.htm (never existed. no idea where this comes from).

#78

agentrickard - July 2, 2007 - 13:33

@Arancaytar

Just for background: This is a massive problem for sites that migrate to Drupal from a flat-file system. In the case of SavannahNOW, we had thousands of now dead links in Google, Yahoo, et. al. We were basically getting DOS'ed by 404 errors.

Plus, if someone requests a page that no longer exists, and that .html file contains two .css and 10 .gif files, that turns into 13 404 errors that "fall through" to Drupal. In effect, Drupal spends all of its time searching for files instaed of serving pages.

It is also a generic problem with bots and spiders -- particularly spambots and hackers. Our traffic logs show repeated requests for .dll and .cgi files that point to known or suspected exploits in IIS.

All that aside, the case may be that the best path here is a well-tested and documented handbook page, with a reference in the core INSTALL.txt that points to those instructions. I say that simply because edge-cases seem to be drowning this issue. If we can't fix all the edge cases, I doubt this will get into core.

#79

kbahey - July 2, 2007 - 17:04

Here is the wording, before I roll a patch again:

.htaccess part1:

The following lines prevents Drupal from handling 404 errors for certain
types of static files, such as images and css.

This can help reduce the server's load because 404s for such files do not
cause a full Drupal bootstrap.

You must uncomment the corresponding RewriteCond lines later in this file as
well as those in the settings.php file.

.htaccess part2:

The following lines prevents Drupal from handling 404 errors for certain
types of static files, such as images and css.

This can help reduce the server's load because 404s for such files do not
cause a full Drupal bootstrap.

You must uncomment the corresponding FilesMatch lines later in this file as
well as those in the settings.php file.

settings.php:

The following lines prevents Drupal from handling 404 errors for certain
types of static files, such as images and css.

You must uncomment the corresponding RewriteCond and FilesMatch lines in
the .htaccess file as well.

Regarding private download, we can add a comment that this does not work in such a case. How about that? Any wording suggestions Morbus?

#80

Morbus Iff - July 2, 2007 - 17:18

CSS is a acronym and should be capitalized. I'd much rather these sentences be merged into one paragraph (for each block).

As for the settings.php, I can't support this patch with that block in - adding wording-to-the-wise about private downloads doesn't solve the issue of textimage (which creates dynamic images) or any other module that attempts to do something dynamic with normally static files. It's something that would cause more confusion and endless "your module is broken!" requests.

I support the .htaccess additions.

#81

kbahey - July 3, 2007 - 01:46
Assigned to:kbahey» Anonymous

Morbus is right about private file download. This is something we need to think about. However, that should not stop the .htaccess parts.

Not sure what merging sentences in one block for each paragraph means.

I am unassigning myself from this issue. Someone else can take it and run with it.

#82

moshe weitzman - July 27, 2007 - 16:07

In order not to break textimage and imagecache, I just added this condition in the settings.php if block:

&& strpos($_SERVER['REQUEST_URI'], 'files') === FALSE

i know that one could use a files dir that doesn't begin with 'files' but thats a bug in core IMO. we require modules dir to be modules, themes to be themes, and we have every right to require that files be files. i just need to submit a patch for that. if folks agree, i'll get started.

note that color.module does not work with private files and thats a far more important feature than these chunks of commented out code that are proposed.

#83

Arancaytar - August 2, 2007 - 11:04

Given that any module (or even custom site code) might attempt to "fake" a file extension in the URL, it seems like this fixes one issue (404 overload) at the cost of introducing an uncounted number of problems whose cause cannot not readily identified.

"Why is such-and-such a page suddenly returning 404 errors?" seems like a far more difficult support issue to solve than "Why is my Drupal site constantly running into my host's server load limits?". When the server overloads, you have a very obvious problem that will probably be handled by an admin who can poke around in .htaccess files. The other one is encountered by users who have no idea what happened.

Because of this, I'd propose that the relative addition/change in .htaccess be commented out with a description of how to activate it, and document this as a performance tweak in the handbook...

#84

Michael Curry - August 8, 2007 - 11:49

subscribe

#85

Christefano - August 23, 2007 - 19:52

Where has this issue been all my life? Subscribing.

#86

Owen Barton - October 8, 2007 - 22:12

Subscribing

#87

juerg - October 26, 2007 - 15:39

Subscribing

#88

Pancho - February 8, 2008 - 00:07
Version:6.x-dev» 7.x-dev

Moving feature requests to D7 queue.

#89

zeezooz - April 21, 2008 - 17:42
Version:7.x-dev» 5.7
Status:needs work» needs review

For handle file in non-existent directory I use rewrite to /404.ext in .htaccess. Put

  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_URI} !^/files/
  RewriteCond %{REQUEST_URI} \.(png|gif|jpe?g|s?html?|css|js|cgi|ico|swf|flv|dll)$
  RewriteCond %{REQUEST_URI} !^404.%1$
  RewriteRule ^(.*)$ 404.%1 [L]

before last RewriteConds/RewriteRule and
<FilesMatch "\.(png|gif|jpe?g|s?html?|css|js|cgi|ico|swf|flv|dll)$">
  ErrorDocument 404 "<html><head><title>404 Not Found</title></head><body><h1>Not Found</h1><p>The requested URL was not found on this server.</p></body></html>
</FilesMatch>

just after
ErrorDocument 404 /index.php

It does not require to patch other files and seems to work well in 5.7 (including imagefield preview), but error message contain 404.ext filename so I use custom message.

#90

christefano - April 22, 2008 - 03:03
Version:5.7» 7.x-dev
Status:needs review» needs work

@zeezooz, can you roll a patch?

New features are going into 7.x-dev, not 5.7.

#91

Owen Barton - April 22, 2008 - 03:55

The trickiest thing to work around here are modules like imagecache that rely on the 404 handling to auto-generate versions of files that don't yet exist. I wonder if we could get around this by having them add a querystring paramete (e.g. 404handler=drupal) that will cause .htaccess to use the Drupal 404 for these files, rather than the lightweight static 404 page. This way we could potentially have the rule enabled by default, and still allow these modules to work without having people hack the (fragile!) .htaccess regexi.

Anyone fancy taking a shot at this?

#92

zeezooz - April 22, 2008 - 19:19

patch for #89

AttachmentSizeStatusTest resultOperations
htaccess.patch1.58 KBIdlePassed on all environments.View details | Re-test

#93

Gábor Hojtsy - July 10, 2008 - 14:46

Note: Favicon.ico just got such special treatment in Drupal 6.3.

#94

m3avrck - July 11, 2008 - 02:20

patch #92 works great!

We are using this on MothersClick with imagecache & imagefield with *no* problems at all, test it yourself:
http://www.mothersclick.com/no-directory/this-really-works.css

The patch that went into D7 and D6 for favico should be reverted in favor of this patch which works great.

#95

catch - July 11, 2008 - 13:16
Category:feature request» task
Status:needs work» needs review

Still applies with fuzz, back to needs review.

I agree 100% that a more generic solution would be great for this issue rather than special casing favicon.ico - it's a big performance issue on some sites (including one I run) that have old legacy linkrot for images etc.

#96

Arancaytar - July 11, 2008 - 15:34

Could we leave default behavior as it is? This will break functionality on a great number of sites - whether you are using a private file system, or any module that dynamically generates file downloads, or even if you are in the habit of adding ".html" to the URL aliases of your nodes.

It's a fine tweak, but I don't see that we should knowingly and without warning or explanation break the sites of one part of the user base in order to improve performance for another. These lines should be commented out initially, so you can uncomment them if you need the performance. We don't default to aggressive caching either, for the same reasons.

#97

m3avrck - July 11, 2008 - 15:38

This patch DOES NOT break dynamically generated files like images with imagecache. Like I said, we're using it on MothersClick and we rely on imagecache and everything works a-ok. Before saying a patch doesn't work, test it. There is a line in there that says ^/files/ ...

#98

Arancaytar - July 11, 2008 - 17:27

I'm sorry, I often get mod_rewrite rules wrong. Does the following prevent the file being handled by Drupal, or is it an exception for files that are handled by Drupal?

+  RewriteCond %{REQUEST_URI} !^404.%1$
+  RewriteRule ^(.*)$ 404.%1 [L]

#99

gpk - August 15, 2008 - 18:46

Related: #295456: Bug in .htaccess file.
Also the patch Gábor mentions in #93: #174940: Root /favicon.ico accounts for majority of 'Page not found' errors.

Is it proposed that *by default* Drupal should not handle 404s for the image file types? Two comments:
- if Drupal doesn't handle them then I lose an easy way of finding broken img tags etc. on my sites
- the comment sections e.g. # The following lines prevents [sic] ... # You must uncomment the corresponding RewriteCond and RewriteRule lines later in this file. suggests that by default these lines will be commented out??

@98, and for my own edification: What I think those 2 lines do is the following:
- first of all, %1 is a backreference to the file extension of the image file (as matched the line before in the RewriteCond)
- so RewriteCond %{REQUEST_URI} !^404.%1$ makes sure that the request was not for 404.jpg say (this is needed so that the next line doesn't create a loop)
- RewriteRule ^(.*)$ 404.%1 [L] then rewrites the current request (e.g. picture.jpg) to 404.jpg and ends ReWrite processing (so that the following rule - Drupal's normal one - doesn't get applied).
- only then does the FilesMatch directive get applied

Actually I'm not sure the request URI needs to be rewritten ... just need to make sure that Drupal's normal RewriteRule doesn't get run?

#100

System Message - November 16, 2008 - 21:40
Status:needs review» needs work

The last submitted patch failed testing.

#101

gpk - November 17, 2008 - 10:44
Status:needs work» needs review

#100 was the testbot problem #335122: Test clean HEAD after every commit.

#102

Owen Barton - November 18, 2008 - 01:04

Here is a rerolled patch that does the following:
- Reverts the favicon.ico rules, because they now are handled by these rules
- Adds some additional extensions for common 404s (mostly exploit-bots)
- Changes the 'files' rule so it does not need to be at the start of a line. This is needed so we can catch sites/default/files paths (the default since Drupal 6). This also fixes (AFAICS) the case where private files are enabled.
- Adds a help text and a validation rule to the file system settings to ensure the file system path contains the word 'files' somewhere (as suggested by moshe in #82).
- Removes handling for htm/html files by default, because migrating from a static html site is a pretty common use case. I have added comments and sample lines for people who know they don't/won't use these aliases, but by default you don't have to do this (and you still save a large number of CPU cycles).
- Makes this enabled by default. I don't think it's worth including this if it is going to be disabled by default, especially if it means uncommenting 2 far apart sections of code in a hard for newbies-to-grok file. If we are not going to enable it by default this could live just as happily in a documentation section for high traffic domains.

Some thoughts:
- This opens up the possibility of a contrib module to save static versions of the error pages as 404.whatever (a bit like boost module does for regular pages), or introduce some lightweight way of logging these errors (404-handler.php) with a minimal change to the rewrite rules.

AttachmentSizeStatusTest resultOperations
76824.patch4.5 KBIdleFailed: Failed to apply patch.View details | Re-test

#103

Arancaytar - November 18, 2008 - 02:09

this could live just as happily in a documentation section for high traffic domains.

+1, though leaving it in as a comment will make it far easier to enable - just tell them to remove the comment on the documentation page.

Again: This change will speed things up for some people and break the sites of everyone using private file downloads, since you're not even using files/ or system/ exceptions, as well as anyone generating dynamic images with extensions, which can't be fixed with any additional special exception. Is it too much to ask not to break people's site by default?

#104

Owen Barton - November 18, 2008 - 13:54

@Arancaytar - it has been stated multiple times in the thread that this patch does not break imagecache or any similar modules and has been used successfully on large production sites that also use imagecache. If you read my post you would also note that this should now work with private file downloads. If you would like to find something that it does break, then please by all means try the patch and report back, but repeating "this breaks imagecache" over and over is neither accurate nor helpful :)

#105

System Message - November 29, 2008 - 07:00
Status:needs review» needs work

The last submitted patch failed testing.

#106

Owen Barton - November 30, 2008 - 23:09

It occurred to me that this would be a great candidate for some tests (which would help settle the what this does/does not break debates!) - any takers?

#107

m3avrck - December 11, 2008 - 02:01

The issue I have with this patch is treats 404s for files differently than 404s for something in Drupal's menu system.

IMO these should be treated the same. The attached patch is how we handle on MothersClick.com we a single configurable file that both Apache and Drupal can use.

While not core worthy, hopefully this sparks proper direction for this patch.

AttachmentSizeStatusTest resultOperations
patch12.patch2.29 KBIdleFailed: Failed to apply patch.View details | Re-test

#108

moshe weitzman - December 23, 2008 - 20:13

What a cursed issue we have here.

#109

sun - January 31, 2009 - 09:20

Patch in #102 should not throw a form validation error. Instead, just use a note for the form element description, so site admins who need to use a path without "files" do not have to hack core.

Patch #107 looks interesting. Just an idea: Could we use a separate 404.php in the root directory, which performs a minimal bootstrap, checks the configured file directory path for the site in question, and short-circuits to a (optionally configurable) 404 output - or - if Drupal is supposed to handle the request, performs a full bootstrap?

#110

kbahey - June 13, 2009 - 21:03
Status:needs work» needs review

I hate that D7 would be released still handling all 404s via an expensive bootstrap. This is a big performance suck.

The attached patch solves this in two ways:

- Adds rules to .htaccess to bypass 404s for certain files.
- Adds a safeguard in settings.php to stop very early if we are handling a static file.

Please please let us have this in D7. We can refine later if needed, but let us ship with something that prevents Drupal from handling static files.

AttachmentSizeStatusTest resultOperations
76824-110.patch6.01 KBIdleFailed: 11098 passes, 18 fails, 0 exceptionsView details | Re-test

#111

System Message - June 13, 2009 - 21:20
Status:needs review» needs work

The last submitted patch failed testing.

#112

chx - June 14, 2009 - 02:04

You want to write a test module which defines a callback in files, say 'test.gif' and check the return of it -- stuff a random string in a variable in the test and print variable_get in the test module.

#113

Owen Barton - June 14, 2009 - 23:54
Status:needs work» needs review

Here is an updated patch, with 3 new assertions to ensure that the web server is passing "fast 404" responses to specific file type requests outside the files directory, and that Drupal can still handle requests within the files directory (system_test module now registers 2 imagecache style handlers for this purpose) for both public and private file serving.

I have also tried to make the comment help as clear as possible, although it is hard to balance clarity against excess verbosity - feedback is welcome.

I also took the settings.php fallback out - I am happy to add this back in if someone can explain why we should. This would only be reached on non-Apache webservers. We generally don't duplicate .htaccess functionality in php (especially for "optional" functionality such as this) - for example the www/no-www redirection could easily be done in settings.php, but is not. I think it is reasonable to request that people set up equivalent functionality in their non-Apache webserver configurations, and/or add code to their settings.php to do the same - we have places in the manual to document these.

#114

Owen Barton - June 14, 2009 - 23:55
AttachmentSizeStatusTest resultOperations
76824-113.patch8.87 KBIdleFailed: 11124 passes, 1 fail, 0 exceptionsView details | Re-test

#115

System Message - June 15, 2009 - 00:10
Status:needs review» needs work

The last submitted patch failed testing.

#116

Owen Barton - June 15, 2009 - 01:17

Hmm, I can't reproduce the test fail locally - anyone else care to test?

GET http://default.head/s120547Pa34UmLhrr.ico returned 404 (139 bytes). Browser system.test 613 PageNotFoundTestCase->testPageNotFound()
Valid HTML found on "http://default.head/s120547Pa34UmLhrr.ico" Browser system.test 613 PageNotFoundTestCase->testPageNotFound()
404s for specific (fast 404) file types outside the files directory are returning without Drupal bootstrap. Other system.test 614 PageNotFoundTestCase->testPageNotFound()
GET http://default.head/sites/default/files/simpletest120547/system-test/404... returned 200 (38 bytes). Browser system.test 615 PageNotFoundTestCase->testPageNotFound()
Valid HTML found on "http://default.head/sites/default/files/simpletest120547/system-test/404-handling.jpg" Browser system.test 615 PageNotFoundTestCase->testPageNotFound()
Drupal is handling requests for specific (fast 404) filetypes when within a public files directory. Other system.test 616 PageNotFoundTestCase->testPageNotFound()
GET http://default.head/system/files/system-test/404-handling.jpg returned 200 (38 bytes). Browser system.test 618 PageNotFoundTestCase->testPageNotFound()
Valid HTML found on "http://default.head/system/files/system-test/404-handling.jpg" Browser system.test 618 PageNotFoundTestCase->testPageNotFound()
Drupal is handling requests for specific (fast 404) filetypes when within a private files directory. Other system.test 619 PageNotFoundTestCase->testPageNotFound()

#117

chx - June 15, 2009 - 04:05

Owen, I do not get this, the rewrite rule changes the request to 404.gif or similar? Or what is the 404.%1 wanted to be? And, why wont the filesmatch grab your request, even in the files dir before any rewrite?

#118

Owen Barton - June 15, 2009 - 05:42

The rewrite rule simply prevents the request from being rewritten and passed to Drupal (if the file does not exist, the file is outside */files/* and has a specified extension). This causes Apache to 404 the request and look up the appropriate ErrorDocument for the request, which has been previously set for files with this extension. The filesmatch does not do anything at all unless Apache looks up an ErrorDocument for the request.

There perhaps could be problems with this if clean URLs are disabled - kbahey, did you check this at all?

#119

killes@www.drop.org - June 15, 2009 - 12:21

I am wondering if this is a good idea. While it might be good for performance, you will never notice that some files are missing if you do nto additionally look into the apache log.

#120

Morbus Iff - June 15, 2009 - 12:37

Yeah, honestly, I'm not hugely fond of this functionality - it's pretty hackish, and severly cripple's Drupal's own log functionality (which, over the many years, people have probably come to expect to be a literal equivalent to server logfiles). I don't doubt that heavy 404s can cause performance issues - I just don't feel this is the answer.

#121

catch - June 15, 2009 - 12:45

I'm wondering if we should just do either extension matching or file directory matching in drupal_not_found() - so log the 404 for site admins, but then return straight away if it's a file rather than a page. That'd save all the page building and rendering for the 404, but still give us logs.

#122

moshe weitzman - June 15, 2009 - 13:41

It simply is not Drupal's job to catch all possible requests under its tree and log them. I like this patch a lot. catch's idea still requires a bootstrap which is a big problem. some contrib module can implement an apache log viewer if people need to view these 404's inside of drupal. or make a contrib module which lets you add a simple link to awstats/webalizer the reports block. web hosts could even make their own distro with that link pre-configured.

#123

killes@www.drop.org - June 15, 2009 - 14:36

It simply is not Drupal's job to catch all possible requests under its tree and log them.

It has done so for years and considering everything this behaviour has caused very little problems.

I like this patch a lot. catch's idea still requires a bootstrap which is a big problem. some contrib module can implement an apache log viewer if people need to view these 404's inside of drupal. or make a contrib module which lets you add a simple link to awstats/webalizer the reports block. web hosts could even make their own distro with that link pre-configured.

I think it would be bad user experience to change this behaviour.

Catch's idea sounds interesting, but I am not sure how it could work.

Summary: I'd won't fix this if I were a core committer.

#124

Morbus Iff - June 15, 2009 - 14:46

I don't like, at all, how the patch redirects the user to 404.$fileextension. This is *really* damaging - that's the actual entry that will get logged into the server log file, meaning that we are *crippling* a traditional error_log by filling it with garbage. How can I fix my broken links or embeds (etc.)., if Drupal is filling my log with "404.jpg" and "404.html"? This'll corrupt not just the error_log, but also anything that works on error_logs, such as Webalizer and so forth.

#125

Morbus Iff - June 15, 2009 - 14:56

To amend my previous comment, here's what the error log looks like with this patch:

[Mon Jun 15 10:52:35 2009] [error] [client 75.67.50.34] File does not exist: /var/www/vhosts/disobey.com/httpdocs/404.jpg
[Mon Jun 15 10:52:37 2009] [error] [client 75.67.50.34] File does not exist: /var/www/vhosts/disobey.com/httpdocs/404.ico, referer: http://www.disobey.com/ahem.jpg

The first entry is the rewritten file - I had requested ahem.jpg. The second entry is a modern browser's standard attempt to get the favicon.ico (which, naturally, we can't really tell here, since it's been rewritten).

Huge -1.

#126

catch - June 15, 2009 - 15:22

Also has anyone tested this with imagecache?

#127

Morbus Iff - June 15, 2009 - 15:32

catch: assuming that the Drupal 'files' directory has been set to 'files', then imagecache would be fine. If the 'files' directory was named something not 'files', then imagecache, private downloads, and other dynamic generations would fail. They could be made accessible again only if the .htaccess and Drupal source code was modified (fun, right?).

#128

catch - June 15, 2009 - 15:35

Should've read the patch again, thanks though Morbus.

#129

moshe weitzman - June 15, 2009 - 16:44

@Morbus - could you recommend an approach then? Right now, a site can nearly crumble if a few bad invalid images get posted to the home page (for example)

#130

catch - June 15, 2009 - 16:54

Just a note I think we could do my approach via #444756: Add a page_not_found() hook to Drupal with a few modifications.

#131

Morbus Iff - June 15, 2009 - 17:07

@moshe: I don't have a suggestion you'd like, no - I don't think it can be cleanly fixed at the .htaccess level (i.e., meaning we'd need to do /some/ sort of bootstrap; ala @catch's point). However, I also disagree entirely with your assertion that "a site can nearly crumble with a few invalid images" - that's ridiculous: as someone who has been running the same site for more than 10 years, and gets dozens of 404s every minute, (historical, rootkitish, and accidental, etc.) it's a bit FUDdish. To think that Drupal could get to where it is now, with such a easy 404 DoS, is inane.

I will agree, however, that certain shared hosts can have such a problem with a popular site and sliced resources. I just don't think, however, it's so incredibly widespread that Drupal sites everywhere are being crippled because of it. Do I think something should be done about it? Sure, I have no problem with that. Do I think it's critical? Nope. Do I think it's critical enough to a) strip an existing expectation from Drupal (404s in the reports) and b) break the equivalent functionality at the webserver/log level? Most definitely not.

#132

killes@www.drop.org - June 15, 2009 - 17:12

Also, putting the notice about the missing images into the watchdog will allow site maintainers to create the missign images or to remove the broken node/template/whatever.

This issue is pure sillyness.

#133

moshe weitzman - June 15, 2009 - 17:15

@catch - in this issue, we want to handle 404 without a bootstrap. really we want to handle a 404 without any php at all but definately no bootstrap. a 404 hook completely the wrong direction IMO. a 404 hook for the files directory might be fine and can be pursued in that issue.

#134

catch - June 15, 2009 - 17:16

Also we haven't been doing a full bootstrap for favicon.ico 404s since 28th June 2008, so this patch only fixes image files which aren't favicon.ico and aren't in sites/*/files. I think with something in drupal_not_found() we could skip rendering the page for files in sites/*/files as well.

#135

gpk - June 15, 2009 - 17:20

>skip rendering the page
If page caching is on then the response will be from the page cache anyway, so there will only be a full bootstrap/rendering of the page if the request hasn't already been cached.

#136

killes@www.drop.org - June 15, 2009 - 17:25

cool, so I need to check if favicon.ico exists myself? What a truly great idea. I'd expect something like this from WP or Joomla, but not from Drupal...

#137

Owen Barton - June 15, 2009 - 17:30

Comment 127 is not correct - the file directory path does not need to be set to "files", but simply needs to include the string "files" somewhere within it. There is a validation on the setting in the patch.

I agree that the current patch breaks Apache logging (404.ext) in an unacceptable way and is not committable. Possibly there is a way to fix this - I tried briefly to no avail. I think I am also leaning towards a php based solution, ideally with optional Drupal logging (and hence DB bootstrap). Either way, I DO think we need an improvement here - I have seen several site outages caused by 404s when assets on a single browser page view creates many Drupal hits, or the page has repeating AJAX callbacks that are 404ing.

Either way, I think 2 of the 3 tests I created are still worthwhile, since they ensure the imagecache style facility is functional (which I think could pick up some subtle breakages in the file path/url functions).

#138

Morbus Iff - June 15, 2009 - 17:37

Owen - the RewriteCond specifically looks for "/files/" (unbounded), but the validation code only looks for "files". This would allow me to use "drupal_files" inside Drupal, and pass the validation, but lose the "fast" 404s.

#139

killes@www.drop.org - June 15, 2009 - 17:50

I actually like to translate the default files path to use "dateien"...

If your system is prone to meltdown because of a few missing files I tend to think that this is not a Drupal problem per se.

You should evaluate your hardware and possibly buy more.

You also should test your site on a staging system to catch such 404s in advance.

#140

Owen Barton - June 15, 2009 - 17:51

@Morbus - correct, this was a bug - however the symptoms would be broken imagecache, rather than lost fast 404s.

Attached patch fixes this, and also fixes the issue with the tests failing when mod_rewrite is not enabled. It doesn't fix the Apache 404.ext logging issue however.

#141

Owen Barton - June 15, 2009 - 17:59

Sigh

AttachmentSizeStatusTest resultOperations
76824-4.patch8.97 KBIgnoredNoneNone

#142

moshe weitzman - June 15, 2009 - 18:14

Well, 5 broken images on your home page can be a nearly a 5x multiplier on your traffic. It leads to 5x on the number of bootstraps that your web and DB servers have to service.

#143

killes@www.drop.org - June 17, 2009 - 10:38

Excellent reason to just fix the stupid images. Thanks to Drupal (as it is now) you'll at least be alerted to the problem.

#144

Owen Barton - June 18, 2009 - 18:22

You won't be alerted to the problem if the site is down due to 5x traffic

#145

Morbus Iff - June 18, 2009 - 20:10

You can't solve the issue if your error log shows 5 entries for 404.jpg.

(See, we can do snarky replies too!) ;)

#146

Owen Barton - June 18, 2009 - 21:21

Snarky replies aside, I think everyone agrees that logging 404.ext for missing files is not acceptable.

What we need is either an alternate .htaccess approach (that logs 404s correctly, and works acceptably both with mod_rewrite enabled and disabled) - OR a PHP based approach that is sufficiently faster than a cached bootstrap to make it worth doing.

#147

m3avrck - July 17, 2009 - 20:58

FWIW, I do this on my sites:

<?php
/**
* Generates a 404 error if the request can not be handled.
*/
function drupal_not_found() {
 
drupal_set_header('HTTP/1.1 404 Not Found');

 
watchdog('page not found', check_plain($_GET['q']), NULL, WATCHDOG_WARNING);

  global
$conf;
  include_once
$conf['404_file'];
  exit;
}
?>

Than I usually set the path to sites/whatever/theme/whatever/404.php and include my logic there.

This is in addition to setting the htaccess to point to this same 404 file as well (and having similar htaccess work).

 
 

Drupal is a registered trademark of Dries Buytaert.