I think it might be a worthwhile feature to have a drush / batch commands to try and cache entities configured to get cached. It would help make a site quite fast. I will try and see if I can get something up and running.

Comments

btmash’s picture

Here is the drush file in patch format (as a note, this is via svn diff, not git).

There are a fair number of todos which may make this more challenging, but trying this out on my site (800ish nodes, 23 users, 10 terms, 2300ish files), it seems to fare very well so far.

btmash’s picture

Playing around some more with things, I tried doing an entity_load with multiple items and its a fair bit faster doing as such than running the entity_load on a single item (again on my site...speed went from 23 seconds at peak for running entity_load one at a time to 14 seconds at peak - nearly 60% decrease in processing time but maybe that's just my site). Attaching patch which has the change. I'm not sure if this is the best approach (have to figure out what should be done on a large site with a much larger number of entities being viewed).

swentel’s picture

Shameless subscribe - really cool!

catch’s picture

Status: Active » Needs work

If it's drush the whole operation doesn't need to be in a batch, but you'll need to add a range to EntityFieldQuery and reset the entity controller static cache every so often so there's a fixed limit on memory usage.

I think entity_get_controller('foo')->resetCache() will reset the cache OK, but I'll need to double check that entitycache isn't also resetting the persistent cache when you do that 'cos that would be bad.. if it doesn't allow you to reset the static only, we can easily change that.

Whether this is useful is going to depend on the site, but seems worth having so happy to commit as long as we can resolve the memory issues.

btmash’s picture

When I run the entity_load() function on the entity_ids, I set the reset parameter to TRUE so that the memory gets cleared (atleast as far as the memory on the prior entities is concerned). Would that be ok or is there another area where static memory needs to get reset?

EDIT: nevermind, I just got the gist of what you mean. Yes, I will look into the same as well.

catch’s picture

Because you have both $ids and $reset, it's going to clear the persistent cache for those $ids before loading the entities. See resetEntityCache().

It's a bit of extra work clearing the caches when we'll just set them again later, but might not be worth trying to work around.

That mainly leaves splitting the EntityFieldQuery into chunks I think.

btmash’s picture

Ok, I have taken a look at resetCache. It will only clear the static memory of the ids you are requesting. The persistent cache does not get cleared out (and so any items that are in the entity cache tables should get retrieved. I remember testing this part out and the results for empty entity cache tables took 14 seconds for the data described above and under one second if all the data was already in cache.

However, I was also taking a look at how apachesolr does it (which is via batch api) and that approach might be a better long term solution anyways (since that would also allow usage via a web ui if wanted).

btmash’s picture

It took a long while to realize that I shouldn't just be looking at the resetCache that is in the various entities...I should be looking at what entityCache does instead (given that entityCache is what is taking over the caching mechanisms for the various entities). I took a look and realized that since entity_load called on resetCache without actually passing any IDs (or the array is NULL) into the mix and your function looks at IDs to remove, the IDs won't remove themselves from the bunch :) Hence why my code seemed to work just as fast with the caches there. However, I have updated the code so it runs entity_load and then runs resetCache() afterwards. So the memory is freed up by the end of this set of operations. The batch stuff is a little overwhelming and I'm still learning how to implement it in this scenario since all the entity ids get retrieved and I need to figure out how to do proper ranged entityfieldquerys (though will that make the code *not* work with dbs like MongoDB and the like? I'm not sure :()

btmash’s picture

Status: Needs work » Needs review
StatusFileSize
new3.04 KB

Ack..didn't include patch :(

Status: Needs review » Needs work

The last submitted patch, entitycache_drush_support_1212488_btmash_9.patch, failed testing.

btmash’s picture

Status: Needs work » Closed (won't fix)

I should have followed up on this long ago - I had ended up making various changes to the patch above which I was using on a number of sites (along with adding in the batch functionality . And I recently changed it to so it runs the entity_load functions on any entity to cache whatever it can (so while entitycache is not required, it is recommended) and put it up as a drush plugin at http://drupal.org/project/drush_ecl.