Jump to:
| Project: | Boost |
| Version: | 4.7.x-1.x-dev |
| Component: | Caching logic |
| Category: | support request |
| Priority: | normal |
| Assigned: | Arto |
| Status: | closed (fixed) |
Issue Summary
There are thousands and thousands of files under cache/*/0/node folder. So the 'node' folder is not very efficient. Because all files are kept on remote file servers not on web server, meaning every time a folder is accessed, every single file has to be accessed over the remote LAN connection.
I think it will be helpful that I separate(divide) those files into many sub-folders and only put 100 files in each sub-folder to speed up your sites access.
For example:
use path like cache/*/0/node/1/2/3/4/5/6/7.html to save cached files instead of cache/*/0/node/1234567.html
But I don't know how to modify the RewriteCond and RewriteRule in .htaccess file to do that.
My website is stopped by DreamHost because it is overloading and slowing the whole web server.
DreamHost tell me that: Anything you can do to make smaller folders would speed up your sites access.
Please help me out. Thank you very much.
Comments
#1
Hmm, interesting problem. Your proposed solution is reasonable, but I don't think mod_rewrite will allow it, and I can't immediately think of a workable alternate workaround. Suggestions welcome.
#2
I finally figured out a way to solve this problem.
First, add following lines into your .htaccess file. To redirect request to sub-folders.
#RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789][0123456789][0123456789]+
RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html -f
RewriteRule ^node/([0123456789][0123456789])([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html [L]
#RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789]+
RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$1$2.html -f
RewriteRule ^node/([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$1$2.html [L]
Then, I have set a cron job to run following bash .sh script per 10 minutes. This script will move cached files from 'node' to corresponding sub-folders.
cd /home/wbj123/wbj123.com/cache/wbj123.com/0/node
for file in ./*.html
do
### now $file is something like: ./filename.html
### To let fileName = filename.html
fileName=${file##*/}
### Get length of file name
length=${#fileName}
length=$((length - 5))
### now, length contains the count of characters in main file name
### for filename has 3 or more characters, for example: 123.html
if [ $length -ge 3 ]
then
### get folder name for level 1, for example: 12
level1FolderName=${fileName:0:2}
mkdir -p $level1FolderName
mv --target-directory=$level1FolderName $fileName
### for filename has 5 or more characters, for example: 123456.html
if [ $length -ge 5 ]
then
### get folder name for level 2, for example: 34
level2FolderName=${fileName:2:2}
mkdir -p $level1FolderName/$level2FolderName
mv --target-directory=$level1FolderName/$level2FolderName $level1FolderName/$fileName
fi
fi
done
I have tested this solution on my website wbj123.com, and it works well.
#3
Are this solution added to new release or not ? because i have the same problem in my server
Thanks for this great module
#4
No, the release does not have any new features per se. I still need to review your solution.
#5
OK, I've tested this.
I added the .htaccess bits to the end of the boost part of the .htaccess file. The rewrite part now looks like this:
</IfModule>RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/index.html -f
RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/index.html [L]
RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{REQUEST_URI} !^/cache
RewriteCond %{REQUEST_URI} !^/user/login
RewriteCond %{REQUEST_URI} !^/admin
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI} -d
RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}/index.html -f
RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/$1/index.html [L]
RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{REQUEST_URI} !^/cache
RewriteCond %{REQUEST_URI} !^/user/login
RewriteCond %{REQUEST_URI} !^/admin
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}.html -f
RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/$1.html [L]
#RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789][0123456789][0123456789]+
RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html -f
RewriteRule ^node/([0123456789][0123456789])([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html [L]
#RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789]+
RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$1$2.html -f
RewriteRule ^node/([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$1$2.html [L]
# BOOST END
The bash script works as advertised, and moves pages from /node/ into subfolders.
However. After I've run the bash script, /node/12345.html is moved to /node/12/34/12345.html. AND: whenever I look at a page as user 0 (= anonymous), a new page gets generated under cache/$server/0/node/12345.html .
That's not what I'd call caching ... @bingjiw, what else did you do, to get this to work?
Thanks!
#6
Yes. So you have to run that bash script often. For my website, I have setted it in cron to run it every ten minutes, that's my solution.
#7
The problem is, that's defeats the whole idea of caching, and thus is a non-solution to this particular problem.
Have you set the expiry for your files to 10 minutes, too? Personally, I think the longer the better, so I'd love to have a week in there ... 1 day max isn't all that much, for a mostly static site.
#8
You can not expect a real solution without modifying the core code of this module. The idea of 12/34/12345.html need to be writen into this module to solve this problem from the root. For now, my "solution" can avoid the heavy load of web server. That's it.
#9
Moved to #410730: System limits: Number of files in a single directory. We are no longer dealing with the node folder, and running a separate cron doesn't sound like an idea solution.