| Project: | Drupal core |
| Version: | 8.x-dev |
| Component: | entity system |
| Category: | task |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
| Issue tags: | D8 cacheability, Needs profiling, Performance |
Issue Summary
Spinning this out of #1302378: Use multiple specialized entity controllers.
The base EntityController class currently handles keeping it's own 'static' cache internal to the class, or always fetching from the database all the time.
In Drupal 7, this means that http://drupal.org/project/entitycache needs to provide it's own controller class to provide persistent caching for entities.
We can make this a lot more flexible by just replacing the custom caching within the Entity controller class, with a configurable cache controller key instead of the current 'static cache' key. There's no need for entity-specific cache controllers, since any class implementing CacheBackendInterface will do fine.
To allow for 'static' caching, I've added a new ArrayBackend which simply stores cache items in a php array. Apart from expires which didn't seem worth working on until #730060: Replace CACHE_TEMPORARY, cache_clear_all() and minimum cache lifetime with cache tags support is in, this (untested) backend supports everything the database backend does, including tags etc., meaning it should be useful for unit testing eventually.
For no static caching at all, the NullBackend works fine.
To actually implement entitycache in D8 (whether core or contrib), you'd need a backend that proxies the MemoryBackend + whichever one is implemented for the cache_entity bin, since we're placing responsibility for both in a single class doing things this way. There's some inconsistencies with resetCache() etc. that would need to be sorted out for that.
Completely untested patch that will likely blow up, but just to show what it looks like.
One feature gets dropped here, which is fetching from cache when $conditions is passed to entity_load(). That's already a bit wonky as it is, and the $conditions parameter is already on its way out and documented as deprecated in 7.x per #1184272: Remove deprecated $conditions support from entity controller.
Change records for this issue
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| entitycache.patch | 11.72 KB | Idle | FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in core/lib/Drupal/Core/Cache/ArrayBackend.php. | View details | Re-test |
Comments
#1
The last submitted patch, entitycache.patch, failed testing.
#3
Without the syntax error..
#4
The last submitted patch, entitycache.patch, failed testing.
#5
Sounds like a good improvement, the ArrayCache implementation is a good idea!
class DatabaseBackend implements CacheBackendInterface {Wrong name, right? Should beArrayBackend?'cache controller' => 'Drupal\Core\Cache\MemoryBackend',Sounds wrong too, is itArrayBackendorMemoryBackend? I'm fine with both, but you have to agree with yourself about this one!#6
If you don't react before I'm getting back home, I'll try to fix your patch.
#7
I think this needs more architectural planning.
The 'static cache' flag in the entity (crud) controller is vastly more intelligent than a completely separate memory cache backend.
It is true that we could attempt to split and separate that backend, but we don't want to lose the static caching within the crud controller entirely.
This patch adds "cache controller", but actually means "static cache controller".
Let's also bear in mind that there's a cost of instantiating one more object (including config + DI) for every single entity type.
#8
We can chain backends if we want both static and a real cache, sounds like an easy win: we could get rid of this awful loading logic that do both, and rely on a unique object with a clean interface that would do both at the same time. It's a pattern we could also re-use at various places in Drupal too.
Instanciating a single object, you won't notice it on your runtime, it's not significant, even with 10 or 20 different entity controllers, this won't hurt a bit your execution time, it's really nothing compared to what does a single Views, yet most people continue to put up to 10 Views on the same page.
In Drupal 7 the static cache is quite dumb, all you have is a cacheGet() and cacheSet() internal methods, which acts like a basic key/value store (if we omit the deprecated conditions array). If it is smarter than this, please enlighten me because I probably missed it.
#9
Hmm I thought I'd fixed the silly naming issues already but apparently only in my head rather than the actual patch. I think we should go for ArrayBackend since then there's no chance of confusing it with an actual static, or APC or something, MemoryBackend sounds a bit more clever than it is.
@sun: the intention here is to allow the cache controller to do either no caching, static caching, persistent caching, or static + persistent caching (via chaining the two together), and also to start the process of pushing small bits of the entity code back into core/lib where they're generically useful. I'm not sure how viable that will actually be - often static and persistent caches need to be treated differently, but when I looked at offering the potential for both, at least within the controllers the access points are the same.
#10
Some other ideas arround the idea:
- We could, in the future, share cache chains to entity controllers that needs the same chain (we should then prepare some kind of mass deregistration mecanism in the array cache for static entries to act upon controller destructor)
- This pattern, with a clean cache chain definition mecanism, could be used in a lot of other places where the code actually emulate this behavior, for example a lot of info hooks
Using it massively would remove a lot of duplicated code everywhere, it might even be useful in conjunction with the actual DrupalCacheArray implementation that could evolve to use this base.
Using the chain of command (for saving), chain of responsability (for loading) and proxy patterns (basically using a chain towards the objects behind) for this seems really clean in term of architecture because the final object only rely on the cache interface, without worrying about what's behind, it's actually maintainer's even somehow.
#11
I like the idea of having a dedicated cache system, which has a single consistent interface to developers. Having a NullCacheBackend + a static cache backend that proxies another permanent backend makes much sense too me too.
I don't like having the EntityStorageController directly talking to the cache system though. Imho, we loose some flexibility here.
I'd prefer the approach taken by #1302378: Use multiple specialized entity controllers: Have a dedicated entity cache controller class, which is invoked by entity_load() and handles everything related to caches - see #1302378-4: Use multiple specialized entity controllers (comment #4).
Reasons:
#12
#13
The patch already has a dedicated entity cache controller, it's just it only requires that to match the core cache interface. Custom entity-type-specific controllers would still be possible of course. For a controller that does persistent caching that's the most likely to need something custom. I'm not sure from your post what you think that's needed that's not in the cache interface already. Apart from the conditions stuff but I'd prefer to press on with the issue that removes it. Almost did that before starting this one but wanted to get this out for discussion.
#14
I think that the conditions (or properties later) loading wouldn't be affected that much by replacing the internal cacheGet() with a backend directly: you still control what happens in your load() function and you are not forced to cache entities by id, you can do something else instead. You can still add a layer for frequently used queries in specific controllers.
But, to be honest, I think that trying to implement such layer would be more than difficult, the load() method may still be optimized later for very specific cases if needed. I even have a hunch that if the controller has direct access to its own cache backend, it would be easier to implement such frequently used queries caching: the controller knows its own schema and fields.
The approach in #1302378: Use multiple specialized entity controllers suppose that there is an upper layer that needs to combine itself the cache and "main" controller, which means that the entry point of the API is in another piece of code (in the exemple in procedural code), I think this would be prone to errors: some developers will continue to hit the controllers directly (which makes a lot of sense) instead of calling entity_load(). I don't think this a good idea until there is a front controller for all entity operations.
I think that in the end, both provide the same feature and generalize caching method, but I prefer catch's method: it allows the entity "main" controller to implement its own load/save etc.. methods and have a finer control over its own cache if needed, while most of the others would probably continue to use the generic one.
Regarding specific entity type cache configuration, weither it's a cache controller or a chain set inside the main controller, I think both solutions are equivalent in term of configuration, especially if we can create the cache chain outside and inject it when building the controller instance: instead of creating a cache controller from configuration, we create a cache chain from configuration, doesn't seem too different in the end, it just moves some responsabilities.
And, latest note, I think that even if the cache controller is the way to go, this isntance can still benefit from using the cache chain catch is proposing, it's just a matter of moving it into another objet.
#15
That gets the point. I've aimed for moving that cache-control into a separate class, such that it stays separated from the storage class.
However, I just realized this thinking is deprecated, as we won't have "pluggable" storage controller anyways (sry, I'm still used to that thinking). As discussed in the entity-interface introduction issue, they already implement the per entity-type storage logic, so any simple storage-pluggability (=without repeating entity-type specific customizations) would have to happen on a lower level. In that light, having the cache-control in the storage-class makes sense to me too. :)
Great, I'm about to look at that as well :)
@patch: Having a permanent cache renders hook_entity_load() useless. If possible I'd prefer to drop it in the long road anyways, an entity really shouldn't be altered by page request. If it needs to be changed, re-save it with new property values.
#16
There's a point I agree with is that we shouldn't alter this kind of stuff on every load, it can get quite CPU intensive if you have a lot of modules playing with this. But, in the other end, having the hook_entity_load() can be useful for some modules to set business data which are not properties, but that needs to be on the entity and cached along with it. Concrete use case: I could load data from business tables and set them on a node, but I don't want to store this data in actual node properties because I don't want to duplicate it and handle the full synchronization between my source business data and node properties. If I can still attach arbitrary data on the object I can then just handle nice synchronisation by clearing a few cache on my business table updates. Right now, I actually achieve this using the virtual_field module, that provides a null field storage backend, but defining some virtual fields allows me to actually store precomputed data in field cache along with my commerce products, line items and stuff, without worrying about a complete field storage layer.
EDIT: Sorry, a bit off topic, and that's an akward and quite ugly way of doing this, but in my case, it's mainly for denormalizing some data for performances. I don't know where the property system is going to right and if it's going to help me with those kind of use cases.
What matters in here, is that by caching entities, we cache the hook_entity_load() result with it, and that's why I really like the idea, because if we do this I can then remove my need for hacks such as virtual field etc.
#17
Yeah, if there remains a hook_entity_load() it has to be cacheable. Altering stuff or changing stuff on a per request base there is a bad idea, and we should stop supporting that.
The discussion of removing it probably belongs more into #1497374: Switch from Field-based storage to Entity-based storage though, as it depends on whether we want to support a "modular entity storage". However, even with a centralized entity storage I agree that it might be worthwhile to keep it for easily adding non-queryable stuff as you've depict in your use-case.
#18
Yes, I honestly think those kind of non queryable stuff are really ugly, but sometime you just don't really have other straightforward choices. I don't know what will be final API for properties, but properties could solve that somehow.
#19
The ArrayCache implementation is a great idea indeed. However, I want to propose some changes and ideas that came up today while we were discussing the Entity Property API and waiting for fago to write down the initial interface definitions :).
The CacheChainInterface
Since we have multiple caching layers we might end up with various combinations of them (static cache only, static + persistent cache, persistent cache only, etc.). We could prevent that from happening if we had a way to chain cache backends in some way. This could be implemented in a generic fashion as a CacheChainInterface that would extend the CacheBackendInterface. This interface would extend the constructor definition for the CacheBackendInterface with an additional argument: An array of cache backends that the should be queried sequentially in order to retrieve an object from the cache.
The CacheChain class would iterate over the CacheBackend implementations that it has attached to it and forward the get / getMultiple queries to them. Once we have a full result for a query we exit the loop and, if there are any cache backends that could not return a result for the query before, we use the set() method on them to fill them with everything that they don't have cached yet. The next time the same query hits the CacheChain the first backend in the chain will now have element(s) that are subject to the query and return them right away.
This way we would be able to easily stack caching layers on top of each other and also replace single layers independently.
#20
@fubhy that sounds just like http://drupal.org/sandbox/catch/1153584 which I've never got around to implementing. I agree that'd be a good idea but I think we should add a separate issue for it. I also want to break the ArrayCache backend into a separate patch probably since that's useful for unit testing (and we also need to look at re-organising the cache tests so they're more re-usable with alternate cache backends).
#21
Oh great. I will take a look at your sandbox then. I think this can work really well with the PropertyContainerInterface that is part of the 3rd iteration of the Entity Property API for defining the tag-properties of the cache items (Note: the PropertyContainerInterface is generic and Entity agnostic so it would be completely generic).
Anyways... I agree this should be in a separate issue. I will create one later today and maybe already provide some code.
Sorry for off-topic :)
#22
Not that off-topic, the cache chain interface is the idea behing this issue, and IMHO it should be done before we can actually continue this issue. It is a good reusable pattern that could be used at many places.
#23
@fubhy, the sandbox is empty, I really didn't do anything with it.
I'll try to make the array backend issue soon and upload a separate patch, new issue for cache chain sounds good too, then marking this postponed on them also sounds good :)
#24
Opened #1637478: Add a PHP array cache backend.
#25
Opened and implemented and unit tested #1651206: Added a cache chain implementation.
#26
A small changes here could have significant ramifications (for better or worse), so I am adding a tag as a reminder to check performance.
I nearly committed a very basic EntityCache controller for the Countries module before doing some simple tests to discover that this caused a 20 fold decrease in speed. On the plus side of this testing, reading in an array of Country entities from the database was 3 times faster than including iso.inc and doing a single drupal_alter() on the countries list. So performance gains could be significant here :)
#27
I just committed #1651206: Added a cache chain implementation.
#28
Awesome. Now we can get this rolling again. I worked on this a bit during DCon Munich already but was not really satisfied with the outcome because it was still hard-wired with the storage controller. My wish would be to make the caching layer independent of the storage system. I am not entirely sure how we would approach that yet, though. I do have a crazy suggestion though which I have been playing around with in my head lately but it would be a bit drastic to say the least. Will post more about that soon.
#29
Anyone up for reviving this? Next steps?
#30
#3 woops
root@pubuntu:/var/www/d8/drupal# git apply --check entitycache_0.patcherror: patch failed: core/modules/comment/comment.module:116
error: core/modules/comment/comment.module: patch does not apply
error: patch failed: core/modules/entity/entity.module:69
error: core/modules/entity/entity.module: patch does not apply
error: core/modules/entity/lib/Drupal/entity/EntityController.php: No such file or directory
error: patch failed: core/modules/system/system.module:273
error: core/modules/system/system.module: patch does not apply
a fast look to all the changes already in repository says this implementation very different against repo one so not possible for a fast reroll
#31
Related #1497374: Switch from Field-based storage to Entity-based storage.
#32
#33
Tagging.
#34
Weird, I could swear the the tags list was empty when I added those tags...