Setting to grab url's from url_alias table. [#337391]

Comment	File	Size	Author
#26	boost-337391.1.patch	5.29 KB	mikeytown2
#25	boost-337391.patch	5.29 KB	mikeytown2
#21	boost-337391.patch	2.46 KB	mikeytown2
#17	boost_batch_export_3.patch	4.97 KB	capellic
#11	boost_batch_export_2.patch	4.93 KB	capellic
#3	boost_batch_export.patch	5.17 KB	swentel

Comment #1

moshe weitzman commented 21 November 2008 at 16:06

Very nice work! For those who can't be bothered to follow links, this code uses the batch api to request each node page and write it to cache.

To me, this should be part of core boost - probably in a boost.pages.inc file.

Log in or register to post comments

Comment #2

swentel commented 22 November 2008 at 14:26

Update the code cf http://drupalbin.com/4138 , more mature allready including terms, providing hook_boost_export_operations support, using boost_is_cacheable() and boost_is_cached functions to determine if the page has to be cached or not. I'll post a patch (probably tomorrow) against 6.x-dev create a boost.pages.inc file.

Log in or register to post comments

Comment #3

swentel commented 22 November 2008 at 21:34

Status	File	Size
new	boost_batch_export.patch	5.17 KB

Patch attached against DRUPAL-6--1. All batch funtionality is in boost.pages.inc. Other changes are
- boost.module requires boost.pages.inc (inspired by devel_generate batch functions)
- boost.admin.inc form alter function includes the button and the submit callback of it

Log in or register to post comments

Comment #4

swentel commented 23 December 2008 at 20:56

Bumping, any update on this ? Again, I'd be happy to start a separate project for this - we are also currently testing a few patches at work supporting views and panels, so possibly lots of patches and I wouldn't want to bother you guys every time with that :)

Log in or register to post comments

Comment #5

rsvelko commented 20 April 2009 at 11:24

Title:

Boost export module

» Auto Regenerate Cache (pre-caching) ( formerly "Boost export module" )

since 23rd of Jan this year a similar issue was started - #363077: Add spider to crawler - Cache entire site with new install. . We finally found this one here and will use it as the main thread for this task.

@swentel : Please if you have newer code - give it here so we can use it .

Log in or register to post comments

Comment #6

swentel commented 20 April 2009 at 11:45

@rsvelko : I have no newer code right now, so that patch is ready to use.

Log in or register to post comments

Comment #7

mikeytown2 commented 20 April 2009 at 22:11

Status:

Active

» Needs review

problem with patch right now is I think it will grab the node/* and then only that will be cached (not that useful anymore). Getting a list of URLs to hit from the url_alias table would be a better idea. It would cache the 1st page of views as well :)

Log in or register to post comments

Comment #8

mikeytown2 commented 8 June 2009 at 08:24

Lookup URL's for aliases via url($path, array('absolute' => true)). Get list of previous pages in the cache once #453426: Merge Cache Static into boost - Create GUI for database operations is done.

Log in or register to post comments

Comment #9

mikeytown2 commented 17 June 2009 at 19:27

Need to break this up as sites with LOTS of url's take too long to generate the initial array. Do a count, if more then 5k URLs then go into super batch mode & use db_query_range(*,0,1). It then looks up the URL on each batch run, going off of count to keep track of the total progress.

Log in or register to post comments

Comment #10

ferrangil commented 23 June 2009 at 10:52

Subscribing!!

Log in or register to post comments

Comment #11

capellic

he/him/his

Austin, Texas

commented 18 July 2009 at 13:51

Status	File	Size
new	boost_batch_export_2.patch	4.93 KB

I have a huge need fore pre-caching and have been looking for a solution to my issue for MONTHS. I simply didn't have the keyword right in Google: pre-caching. I have a lot of small, low-volume sites that have performance problems because every visitor is regenerating cache due to cron clearing it every 15 minutes. I get about 20 visitors a day and so you can see how this is a problem.

I applied the patch in #3 to the latest dev release (7/18). The patch was mostly applied and the only thing that didn't make it was the "export pages to html now" button. I manually applied that, put it in it's own field group and added a description to explain that "pre-cache" would do.

I also added some code to the boost_export_done() function to properly output error and notices through drupal_set_message() and added watchdog logging.

I've attached a new version of the patch so that it works with he 7/18 version of the dev release.

The patch file for pre-caching does a couple of things:

Adds a button to the Boost settings (boost.admin.inc) page that allows us to run the script that will cache all nodes
Creates a file called boost.pages.inc that holds all the operations necessary for pre-caching
A function is added (boost.admin.inc) that invokes boost_collect_pages() in boost.pages.inc
A require() for boost.admin.inc in boost.module

I have read through the comments on this thread http://drupal.org/node/363077 and I agree -- more configuration items would be great including being able to declare only menu items -- and the thread is focused on my greatest need -- generating pre-cache on cron -- so that after Drupal clears the cache, Boost pre-cache will regenerate it. Of course, you should be able to toggle this on/off on the settings page. I haven't really looked at the code too closely to see how easy it would be to bring the cron functionality to this patch, but I will be doing so within the next week. Let me know if you have any tips.

I am also thinking that we should be able to configure whether you want nodes, taxonomy and eventually views, etc. I, for example, don't use taxonomy lists, so caching those isn't interesting to me. I've got some more ideas for Pre-Caching, so maybe you should create a new component for this feature set? This is really cool stuff.

I am new to module dev, but will likely be submitting patches to move this along. I see that this feature is a bit farther out on your roadmap.

#7: I don't see what the problem is here, but maybe I just understand what you mean. The cache files are written in the same way when I tested the patch.

Log in or register to post comments

Comment #12

mikeytown2 commented 18 July 2009 at 18:41

Status:

Needs review

» Needs work

#7 is only an issue if Global Redirect is not installed.

Thanks for showing some interest into boost by writing some code! Sorry to be picky, but you should use 2 spaces instead of a tab for indentation. This page helped me with writing code for drupal: http://drupal.org/coding-standards. Also if your developing on a windows box (like me), this is how I currently write code for drupal: http://drupal.org/node/505974. Marking this as "needs work" until the tab issue is taken care of.

Heads up: the boost_cache database table has a column in it called push. This will be used to "push" the content out so it is pre-cached. That table will also record the page generation time so one can crawl the slow pages first. This is why the crawler is at the bottom of the list, because it will be that awesome!

As for running this code on cron, it might not work; see #229905: Batch API assumes client's support of meta refresh. It could eventually work with #363077: Add spider to crawler - Cache entire site with new install., if I set that up to use a database connection, as that code doesn't need a browser window in order to call it's self; only problem is I can't make it run on all systems out of the box, it needs to be customized to match your servers setup.

Log in or register to post comments

Comment #13

capellic

he/him/his

Austin, Texas

commented 18 July 2009 at 20:00

@miketown2

Please be picky, I'll fix the code. Thanks for the guides.

Yes, this feature does sound great, your plans sound nice. I know it's down on the list, but is there a reasonable ETA?

As for the cron, it warms my heart to see that this has been thoroughly researched and that it will be difficult to provide a "Drupal solution" that will work for everybody. For that reason, it might be best if people roll their own helper module that hooks into cron? I'll be strolling over to 363077 to see what I can cobble together with the code you've posted there.

Another consideration should be, "Do I really need to run cron every 15 minutes?" I didn't know, until halfway through yesterday, that cron cleared cache. I think I'll run it a couple of times a day instead on brochure sites. But, if I am running Notification on cron, then every 5 to 15 minutes is mandatory.

Thanks for this module, it's really nice.

Log in or register to post comments

Comment #14

mikeytown2 commented 18 July 2009 at 20:44

@capellic
Set the page expiration time to a higher value, then all your pages won't be expired on cron. The nodes have a hook so if you edit/del, them the cache for that page gets flushed; same with comments. Views, taxonomy and other content types do not have this so if your site relies upon views, then the current option is to use a lower expiration time. Try this patch, as it allows for different expiration times for each page
http://drupal.org/node/453426#comment-1817832
and heres one that works with the promote checkbox
#459956: Flush front page when node is edited/created with promote to front page selected.

Log in or register to post comments

Comment #15

capellic

he/him/his

Austin, Texas

commented 19 July 2009 at 02:38

@mikeytown2
I've tried to set the expiration time to a higher value, but if my node is included in a view, that content doesn't update and that's a deal breaker for me.

Not sure how setting different expiration times for each page will help me out. If that is something required by someone editing content, then it's going to be a bit too technical.

I don't use the "promote" checkbox.

Thanks for the tips.

Log in or register to post comments

Comment #16

mikeytown2 commented 19 July 2009 at 08:23

@capellic
You can make it so nodes expire in like a week, while views expire in 5 min. Interface for this is still a little clunky, so my recommendation to you is to run the crawler right after you run drupal cron. Thats the simple fix for your current setup.

Log in or register to post comments

Comment #17

capellic

he/him/his

Austin, Texas

commented 30 July 2009 at 14:16

Status	File	Size
new	boost_batch_export_3.patch	4.97 KB

Sorry for the long absence, but here's the new patch with tabs converted into two spaces.

Log in or register to post comments

Comment #18

mikeytown2 commented 3 August 2009 at 07:05

This is the future direction of this thread.
#538460: Auto Regenerate Cache (pre-caching) Preemptive Cron Cache - throttle & crawl rate stats
But this thread may still be useful on it's own...

Log in or register to post comments

Comment #19

mikeytown2 commented 8 August 2009 at 08:14

Status:

Needs work

» Needs review

is this still useful with the cron crawler?

Log in or register to post comments

Comment #20

mikeytown2 commented 10 August 2009 at 01:27

Title:	Auto Regenerate Cache (pre-caching) ( formerly "Boost export module" )	» Add start btn to crawler; grab url's from other tables.
Component:	Miscellaneous	» Cron Crawler
Status:	Needs review	» Active

Going to kill the export module, since it is limited by the batch api; sad.

The cron crawler is going to replace it. Having a start button, & selecting url's from other tables like the node, taxonomy, user, url_alias is the next step for the crawler, and should replace the export module's functionality. Marking this as active.

Log in or register to post comments

Comment #21

mikeytown2 commented 11 August 2009 at 02:59

Status:

Active

» Needs review

Status	File	Size
new	boost-337391.patch	2.46 KB

Going to skip the start button, cron can start this. Gets Url's from url_alias table.

Log in or register to post comments

Comment #22

mikeytown2 commented 11 August 2009 at 04:57

Status:

Needs review

» Fixed

committed

Log in or register to post comments

Comment #23

capellic

he/him/his

Austin, Texas

commented 11 August 2009 at 12:43

@mikeytown2 Great! Just trying to understand what you've implemented. So this is a crawler that kicks off on cron. Is there any sort of control to set a limit on the the number, define the type or priority of certain pages? If I deserve a kick because I didn't install it and look for myself, by all means do so. ;-)

Log in or register to post comments

Comment #24

mikeytown2 commented 13 August 2009 at 07:25

Status:

Fixed

» Needs work

Need to make sure the URL is published. Also a way to turn this off. Have a crawler field on the boost settings page.

Log in or register to post comments

Comment #25

mikeytown2 commented 19 August 2009 at 21:32

Title:	Add start btn to crawler; grab url's from other tables.	» Setting to grab url's from url_alias table.
Status:	Needs work	» Needs review

Status	File	Size
new	boost-337391.patch	5.29 KB

Log in or register to post comments

Comment #26

mikeytown2 commented 19 August 2009 at 21:35

Status	File	Size
new	boost-337391.1.patch	5.29 KB

a FALSE should have been a TRUE

Log in or register to post comments

Comment #27

mikeytown2 commented 21 August 2009 at 02:48

Status:

Needs review

» Fixed

committed

Log in or register to post comments

Comment #28

4 September 2009 at 02:50

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Setting to grab url's from url_alias table.

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

Comment #26

Comment #27

Comment #28

News items

Our community

Documentation

Drupal code base

Governance of community