As with everything, a work in progress. Just a sort of bug report, not a call to arms..lol
Is anyone else having any issues with RC4? There isn't anything in que, It's just me? great
Am I jumping to quick to install new things? hmmm, maybe time to crawl back into my cave.

In the process of several back steps on different sites, well I did get a bit confused at times. Hope I don't leave anything out. I get busy changing things to make something work and don't document some details as I should..

Installed RC4 in 5 sites. 2 different servers.
4 sites on shared host Bizland, normal install in root, single domain setup, slow database response times.
1 site on shared-managed-vps HotDrupal in a sub-folder. (not a typical setup, although Drupal is a normal full install with it's own database.) Servers are fast and database response is very good.

The shared host Apache Debian, MySQL 5.0.45, PHP 5.2.2
gave me 404, but did not display my 'Custom' 404 Drupal Page.

Changing this in htaccess solved the 404.
- <FilesMatch "(\.html\.gz|\.html|\.xml\.gz|\.xml|)$">
+ <FilesMatch "(\.html\.gz|\.html)$">
Changing htaccess just made the site viewable again, crawler not starting, but boost will cache pages after anonymous visit.
...These sites are currently running RC4, but may change one back to Aug31-dev to regain precache...

The vps Apache/2.2.11 (Unix), MySQL 5.0.75, PHP 5.2.9
worked with no changes to RC4, but had errors, crawler did not precache pages.

Due to my setup. htaccess document root and server_name is actual path to make boost display cached pages.

# NORMAL
  RewriteCond /home/buckeyel/public_html/daveb/cache/www.davebeall.com%{REQUEST_URI}_%{QUERY_STRING}\.html -s
  RewriteRule .* cache/www.davebeall.com%{REQUEST_URI}_%{QUERY_STRING}\.html [L,T=text/html]
  RewriteCond

error or notice in firebug:
new watch expression

error on each page in firebug:
syntax error

<! DOCTYPE html PUBLIC "-//W3C//DTD XHTML...3.org/TR/XHTML/
send-message (Line 1)
strict.dtd">\n

Did serve cached pages after anonymous visit to the page.
In order to stop errors, changed the VPS site back to Boost Aug31-dev and it works as expected. Serves boosted pages and crawler precaches pages. I would call it normal or as expected.

@mikeytown2, Maybe good thing to do is wait for some others to chime in here.. This may be several issues, may need additional information, and maybe I just have weird servers. You work like crazy and I don't like clogging up your day... My sites are not at issue here due to scope and size.

Comments

dbeall’s picture

Hold on a second... I have been messing with this for a few hours, before submitting bug and after..
I have done something that made one of the shared host sites start precaching.. Now, if I can figure out what I did. Or maybe I have a site visitor hitting pages..

Froggie-2’s picture

@dbeall
Visit your admin>top visitors section and check if it is your IP or IP of your visitor to be sure as to who is causing your pages to cache; I mean your server or your visitor...

dbeall’s picture

The installs were done 24 hours ago.
Shared Host:
I remember now what I did a few hours ago,, I changed poormanscron to 4 hours and raised crawl time to 10000000 or 10 seconds between URLs.

@Froggie, I had a look. it shows my host IP with most hits. Db Log is showing notices of 'crawler already running'. Recent hits is showing what looks to be crawler, page after page 'anonymous'(didn't check the IPs).
cache/ looks as it's the precached files, but some are missing(triggered my question of visitor hitting pages).. Maybe be the slow database, and I need longer crawl time set(makes me wonder what plant that Mysql server is on).
I will be watching it.

mikeytown2’s picture

@dbeall
Odds are if your having issues with it, someone else will as well. As for the old version working and this one not... thats' very strange. Here's the commit log, and nothing groundbreaking.
http://drupal.org/project/cvs/89309

The only thing that stands out from that list is this patch; which I did mainly for people running servers on Nginx/Lighttpd.
http://drupal.org/node/547944#comment-1987746

mikeytown2’s picture

Thanks for pointing out the error with the htaccess rules.
#567690: typo in htacceess rules

mikeytown2’s picture

Status: Active » Needs review
StatusFileSize
new2.38 KB

Patch that should be make the crawler work better with APC
See http://drupal.org/node/244072#comment-2003750

mikeytown2’s picture

Status: Needs review » Active

committed

mikeytown2’s picture

@dbeall
Use only a single thread if you want to limit the cpu usage; and if its still too much then add a delay of 0.3 seconds or so. 10 seconds is way too long IMHO.

mikeytown2’s picture

"Preemptive Cache" and "Crawl All URL's in the url_alias table" in their default unchecked state. Is there a way to disable boost-crawler? - brianmercer

Disable the 'Preemptive Cache: Crawl URL's with the boost block setting to yes' checkbox... latest commit should make that clearer.

dbeall’s picture

StatusFileSize
new41.55 KB

agree, 10 seconds is way too long.. was trying many things, have since reset these after testing time was passed and have set 1 thread on the 2 smaller sites. I am sure this is user settings issue and I will keep after it until working.
When your not the wizard writing the code, it just takes longer to figure out what's happening(trial and error). (i hate being stupid)The good part of that is when you finally figure things out, helping others get things working is easy(been there, done that kind of thing). And I am committed to helping people when I know how to direct them.

(newbe question) I see this has php.ini things in it. hmmm. I have set some of the php variables in my server to allow larger uploads and longer timeouts(my shared host does have some good features). Does this actually change those settings in my server php.ini.
Have started notes for performance pages, but not complete yet for http://drupal.org/node/565796
Will apply patch now..

dbeall’s picture

I have gone to CVS and added all available patches to RC4 from Sep. 4th. http://drupal.org/project/cvs/89309
Steps.
Shared Host, 1 site, normal install, single domain in root:
Disabled RC4, uninstalled, clear all caches, run cron, ran updates(just for fun), Removed RC4, Removed /cache directory and all contents.

Uploaded and Installed RC4(with all patches from CVS Sep.4th), enabled basic settings:
(do not cache) user/* & users/*, set crawler time to 20000, 1 hour page expire(default), enabled all cache settings(core and boost), (enable)purge cache pages on cron, Enable the cron cralwer(spelling), (enable)Crawl All URL's in the url_alias table, 2 threads @ 25 each, saved all settings, added updated htaccess, ran updates(just for fun), ran cron before log out..

! It's like magic !

I think we might be fine to mark this as fixed. Will continue installing on 4 more sites today.
Will watch as cron runs it's schedual(4 hours) for regeneration of cache files.

Thank you mickeytown2 and Froggie

mikeytown2’s picture

@dbeall
You could have downloaded the latest dev (goes out every 12 hours usually around 5am/5pm PST); no need to get it from CVS, unless your playing around ;)

Thanks checking the spelling, fixed it.

mikeytown2’s picture

ini_set only does it in memory, doesn't actually change the ini file.

dbeall’s picture

mickeytown2, in 12 hours you change it again, lol. Besides, I always learn something.
I got the stuff from cvs so I wouldn't miss anything, and still missed 2 small patches, updated my installs.
Thank you on the php.ini info, now I know, I will never forget.

2 more Shared Host sites, normal Drupal, single domain, installed in root. Fresh install RC4 with all patches, no issues to report. all is well, preemptive cache working, cron clear and regenerate working as expected.

1 Shared Host site , normal Drupal, single domain, installed in sub-folder, crawler is not cooperating, yet. But I am almost positive it's htaccess Working on that.

mikeytown2’s picture

Status: Active » Needs review
StatusFileSize
new6.52 KB

@dbeall, @Froggie, @omega8cc, @brianmercer
Thanks for all your testing, this is ground breaking stuff here; never seen vanilla php used as a crawler like this, threading and all.

Did some thrashing of the crawler trying to put it into very odd states, and I think this latest patch brings some big improvements in terms of stability & bug fixes.

mikeytown2’s picture

scratch the above patch... trying a different way that should be cleaner.

dbeall’s picture

Well, the patch did work.
maybe different issue. I use project/faq. crawler creates faq folder with the nodes, but does not create base /faq page, which is the one everybody looks at.

dbeall’s picture

should have added, faq thing is Not related to this patch.

mikeytown2’s picture

wow, ok that wasn't easy... newer version of the crawler on cvs.

mikeytown2’s picture

@dbeall
Crawler only hits what's already been boosted or is in the url alias table. Future plans #363077: Add spider to crawler - Cache entire site with new install.

dbeall’s picture

CVS 3 patches #568122: Various fixes to the crawler. Confirm these 3 patches work as expected.
1 Shared Host site, normal Drupal, single domain, installed in root. Fresh install RC4 with all patches, no issues to report. all is well, preemptive cache working, cron clear and regenerate working as expected.

Froggie-2’s picture

#15: Thanks Mikeytown! I shall test the patch stated in #15 in this coming week as soon as I return home from a tour. Currently, I am on an official tour. Thanks for all your hard work!
Best Regards

mikeytown2’s picture

@Froggie
Just grab the latest dev, it's been committed.

dbeall’s picture

RC4 with all patches including:
CVS:
#566808: Cleanup css/js code; smaller & more accurate.
#244072: Better support for servers other then apache.
boost-565796.patch /node/565796; Better Explanation of Settings

.Fresh install.
Shared Host site, normal Drupal, single domain, installed in root.
Apache Debian, MySQL 5.0.45, PHP 5.2.2
no issues to report. all is well, preemptive cache working.
-waiting on cron clear and regenerate- will report back if any issues found-

Off Topic1: Just found project/css_gzip, this was installed as well, mikeytown2 is a busy wizard!
Off Topic2: Noticed htaccess is different from generator compared to included htaccess folder.

mikeytown2’s picture

@dbeall
css_gzip is not needed if using boost; boost will gzip all css files, unlike css_gzip (it only does aggregated). I can do this because I got htaccess rules! Having both is overkill, but wont harm anything. One project that might make your site faster is http://drupal.org/project/parallel if your looking for more speed. As with anything test, because some report huge speedups, while others report it actually slows down their site. I put css and img on different domains (see demo site); leaving js blank since I only have 1 js file, and that js file doesn't load other files like css does (img backgrounds).

I write the htaccess rules in PHP and then change the htaccess text files. If it has to do with .js ajax/json files then it's a minor thing, see #567808: Write update function to change .js to .json. Are there any other differences?

dbeall’s picture

Status: Needs review » Fixed

The install in #24 bexleymarine.com has gone through several cron runs and all is well, clears and regenerates as expected. Will continue to install in 4 more sites this weekend as this looks good. Will have second look at htaccess in that process.
Thanks for info on css_gzip.. pulled it. I looked at parallel, but don't Think I need it.
The site is serving very fast. Thank You. I am very happy with this and I just know others are too.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.