Download & Extend

Too many files in 'node' folder slow down the web server

Project:Boost
Version:4.7.x-1.x-dev
Component:Caching logic
Category:support request
Priority:normal
Assigned:Arto
Status:closed (fixed)

Issue Summary

There are thousands and thousands of files under cache/*/0/node folder. So the 'node' folder is not very efficient. Because all files are kept on remote file servers not on web server, meaning every time a folder is accessed, every single file has to be accessed over the remote LAN connection.

I think it will be helpful that I separate(divide) those files into many sub-folders and only put 100 files in each sub-folder to speed up your sites access.

For example:
use path like cache/*/0/node/1/2/3/4/5/6/7.html to save cached files instead of cache/*/0/node/1234567.html

But I don't know how to modify the RewriteCond and RewriteRule in .htaccess file to do that.

My website is stopped by DreamHost because it is overloading and slowing the whole web server.

DreamHost tell me that: Anything you can do to make smaller folders would speed up your sites access.

Please help me out. Thank you very much.

Comments

#1

Title:too many files in 'node' folder to slow down the web server» Too many files in 'node' folder slow down the web server
Assigned to:Anonymous» Arto

Hmm, interesting problem. Your proposed solution is reasonable, but I don't think mod_rewrite will allow it, and I can't immediately think of a workable alternate workaround. Suggestions welcome.

#2

I finally figured out a way to solve this problem.

First, add following lines into your .htaccess file. To redirect request to sub-folders.

  #RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789][0123456789][0123456789]+
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html -f
  RewriteRule ^node/([0123456789][0123456789])([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html [L]

  #RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789]+
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$1$2.html -f
  RewriteRule ^node/([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$1$2.html [L]

Then, I have set a cron job to run following bash .sh script per 10 minutes. This script will move cached files from 'node' to corresponding sub-folders.

cd /home/wbj123/wbj123.com/cache/wbj123.com/0/node

for file in ./*.html
do
  ### now $file is something like: ./filename.html
 
  ### To let fileName = filename.html
  fileName=${file##*/}
 
  ### Get length of file name
  length=${#fileName}
  length=$((length - 5))
 
  ### now, length contains the count of characters in main file name
 
  ### for filename has 3 or more characters, for example: 123.html
  if [ $length -ge 3 ]
  then
    ### get folder name for level 1, for example: 12
    level1FolderName=${fileName:0:2}
   
    mkdir -p $level1FolderName
    mv --target-directory=$level1FolderName $fileName
   
    ### for filename has 5 or more characters, for example: 123456.html
    if [ $length -ge 5 ]
    then
      ### get folder name for level 2, for example: 34
      level2FolderName=${fileName:2:2}
     
      mkdir -p $level1FolderName/$level2FolderName
      mv --target-directory=$level1FolderName/$level2FolderName $level1FolderName/$fileName
    fi
  fi
done

I have tested this solution on my website wbj123.com, and it works well.

#3

Are this solution added to new release or not ? because i have the same problem in my server

Thanks for this great module

#4

No, the release does not have any new features per se. I still need to review your solution.

#5

OK, I've tested this.

I added the .htaccess bits to the end of the boost part of the .htaccess file. The rewrite part now looks like this:

</IfModule>
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{REQUEST_URI} ^/$
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/index.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/index.html [L]
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{REQUEST_URI} !^/cache
  RewriteCond %{REQUEST_URI} !^/user/login
  RewriteCond %{REQUEST_URI} !^/admin
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI} -d
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}/index.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/$1/index.html [L]
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{REQUEST_URI} !^/cache
  RewriteCond %{REQUEST_URI} !^/user/login
  RewriteCond %{REQUEST_URI} !^/admin
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/$1.html [L]
  #RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789][0123456789][0123456789]+
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html -f
  RewriteRule ^node/([0123456789][0123456789])([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html [L]
  #RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789]+
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$1$2.html -f
  RewriteRule ^node/([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$1$2.html [L]
  # BOOST END

The bash script works as advertised, and moves pages from /node/ into subfolders.

However. After I've run the bash script, /node/12345.html is moved to /node/12/34/12345.html. AND: whenever I look at a page as user 0 (= anonymous), a new page gets generated under cache/$server/0/node/12345.html .

That's not what I'd call caching ... @bingjiw, what else did you do, to get this to work?

Thanks!

#6

Yes. So you have to run that bash script often. For my website, I have setted it in cron to run it every ten minutes, that's my solution.

#7

The problem is, that's defeats the whole idea of caching, and thus is a non-solution to this particular problem.

Have you set the expiry for your files to 10 minutes, too? Personally, I think the longer the better, so I'd love to have a week in there ... 1 day max isn't all that much, for a mostly static site.

#8

You can not expect a real solution without modifying the core code of this module. The idea of 12/34/12345.html need to be writen into this module to solve this problem from the root. For now, my "solution" can avoid the heavy load of web server. That's it.

#9

Component:Code» Caching logic
Priority:critical» normal
Status:active» closed (fixed)

Moved to #410730: System limits: Number of files in a single directory. We are no longer dealing with the node folder, and running a separate cron doesn't sound like an idea solution.

nobody click here