Updated: Comment #260
Problem/Motivation
Loading entity from the database is slow. Configurable fields have their own cache handing, but entities can now have up to 4 base tables, from which multiple translations might be loaded.
Content entities are optimized for the case when they are initialized with all the values, base fields and configurable fields, which are then stored in the internal ->values array. The necessary field classes are only created when the fields are accessed through the API. Even though configurable fields are cached, they need to be set on already created entity objects, which has to initialize all those field objects.
Proposed resolution
The basic idea is to generalize the concept of the configurable field cache and apply it to the entity itself and all values within.
For now, the entity is built just like before, we query the different base tables, create the entity objects, then call the methods to load the field items/values. #2137801: Refactor entity storage to load field values before instantiating entity objects will eventually try to call the field methods before constructing the entity objects.
The entity itself now implements the PrepareCacheInterface, so that we can ask it to return all cacheable data. This also allows to return values added to entities that are not defined fields, so we can cache them too. Those are typically values added in hook_entity_load() implementations. Internally, it has more or less the same code as we currently have for configurable fields, just applied to all fields. We then save those values in the persistent cache. (Explicitly kept to highlight the difference to the old implementation)
Now that PrepareCacheInterface has been removed, there is no special processing necessary, instead, the $entity object can be stored in the cache directly, which will then serialize it, which will call __serialize() on it, which already ensures that the entity is optimized and no unnecessary definitions/objects are serialized. The only exception are language objects, which are currently added as an optimization. @todo: Open issue to add langcode() that should make that unnecessary.
When loading the entity from the persistent entity, we can just unserialize. At this point, only the field item objects that are explicitly requested are created. With the entity key cache (#2182239: Improve ContentEntityBase::id() for better DX) and the render cache, this means almost none, and some of the remaining ones will hopefully go away soon. Only loading the values on the first access would only be useful for cache misses with this system, in which case it is very likely that it will be accessed as depending caches will likely also have cache misses.
Because some hook_entity_load() implementations are fairly dynamic (comment statistics, translation metadata), the patch introduces a hook_entity_storage_load() which is cached, so implementations can chose which one they want.
The flow of loadMultiple now looks like this:
- Get already loaded entities from the static cache.
-- For remaining entities, check the persistent cache
--- Create entity objects for entities loaded from the persistent cache as explained above.
-- Load entities that were not found in the persistent cache from the database.
--- Create entities with the base field values
--- Load configurable field values and add them to entity objects
--- Call storage load hook.
--- Put entities loaded from storage into the persistent cache.
-- Call hook_entity_load() on entities from database/persistent cache. This allows modules that want to attach non-cacheable (dynamic/personalized?) values to an entity. Also calls ::postLoad().
-- Put entities from persistent cache and database into static cache
- Combine all requested entities together, ensure order, return
Additional changes in this patch:
- Unify the code that ensures that entities are returned in the requested order into a single helper method
- Remove cache handling from all somethingFieldItems() methods, they no longer need to worry about this.
- Drop the doSomethingFieldItems() pattern and rename them to somethingFieldItems(), as the wrapper methods only cared about caching, which they no longer have to.
- Comment and user storage controllers have special data manipulation that needs to run before entity objects are created. They now override mapFromStorageRecords() instead of postLoad(), as that now receives entity objects.
Remaining tasks
User interface changes
-
API changes
- Refactoring of loadMultiple(), that should however not affect anyone
- Introduction of hook_entity_load_uncached(), modules that attach non-cacheable values in hook_entity_load() need to switch to that.
- Removal of the doSomethingFieldItems(), would only affect subclasses of FieldableStorageControllerBase, which I do not think exist, non-database storage controllers will use completely different patterns anyway, those methods always were database specific.
Original report by @catch
This is a long delayed followup to http://drupal.org/node/111127#comment-1494790
Reasons I'm posting this issue:
1. Despite converting lots of things to fields, caching at the node level rather than the field level gives a 10% performance improvement on a single node view with only the default profile and database caching. So the field cache does not 'do the same job' as an entity level cache as was suggested by Dries and bjaspan in that issue - it can't remove the cost of node_load() itself, the query building, the field_attach_load() calls etc.
2. I posted a proof-of-concept module at http://drupal.org/project/entitycache which implements node, taxonomy term and comment caching from contrib. This works, when hacked into the default profile, all but a couple of tests pass, and the entire .module is 14k - some of which is dead code for handling users which should probably be abandoned, some of which could probably be tidied up (pretty much all the NodeController / TaxonomyTermController overrides don't need to be there if we can call back to those classes directly), so potentially < 10k.
So, I'm proposing that we add this module to core, enabled in the default install profile. By having it as a separate module, we're able to maintain all the cache clearing code in one central place, so it doesn't have the messiness of putting node cache clears in taxonomy module, or taxonomy cache clears in image module or whatever. However we'll get real performance benefits in core rather than having to schlep off to contrib to get them.
We have real performance issues in Drupal 7, and this is one possible way to ameliorate those without changing any APIs (it adds a total of one hook). I'm not posting a patch here, since the code is in CVS, and this is more of an architecture/policy decision at this stage. Abusing status though to get it in the real issue queue.
Comment | File | Size | Author |
---|---|---|---|
#299 | drupal_597236_299-interdiff.txt | 701 bytes | Berdir |
#299 | drupal_597236_299.patch | 61.74 KB | Berdir |
#297 | drupal_597236_297-interdiff.txt | 2.28 KB | Berdir |
#297 | drupal_597236_297.patch | 61.74 KB | Berdir |
#296 | drupal_597236_296.patch | 61.63 KB | Wim Leers |
Comments
Comment #1
sunYou say that this only works at Drupal installation time - why is that?
Comment #2
catchTo clarify, the only way I know of to enable a contrib module throughout all core tests is to hack it into default.profile - that way all tests get run with the caching enabled and you can see what breaks. entitycache itself doesn't care when it's installed.
Comment #3
David StraussEntity caching belongs in core.
Comment #4
catchBenchmarked node/1 with Drupal 6 vs. Drupal 7 vs. Drupal 7 plus entitycache module. Numbers were pretty consistent between multiple runs of ab.
6: 17.48 #/sec
7: 11.49 #/sec
7 with entity caching: 14.18 #/sec
I make that a 20% performance gain - this using database caching and very few modules acting on hook_node_load(). It also clears just under ~40% of the gap between D7 and D6 on single node views.
Drupal 6:
Drupal 7:
Drupal 7 + entity caching:
Comment #5
Frando CreditAttribution: Frando commentedHuge +1 to entity caching in core. Let's make use of what we gained through the new entity API. On an actual production site with many contrib modules, gains through entity caching will even be much bigger. No reason at all to not do this in core, IMO. And gets us quite a bit closer to solving our perfomance problems in HEAD, as the benchmarks in #4 show.
Comment #6
catchAlright, let's do a patch then.
This is a straight cp of the contrib module to core directory. What to do about poll/book which do things we don't want to cache inside hook_node_load() needs discussion since the approach taken in the module is a bit of a sledgehammer. There's existing code for that in #111127: Cache node_load though.
The only place where I found core test failures which might require some changes to something in core outside the new module due to this, was in translation module - but that's likely due to a lack of hook invocations in the right places.
Going to let the test bot have a crack at it (by adding to default profile we get all tests run with entity caching for free) - last time I tried to run tests locally, apart from finding unreported fatal errors in HEAD, things went surprisingly well. That was a couple weeks ago though.
Comment #7
Frando CreditAttribution: Frando commentedWe just need a second hook here, that's also the result of #111127: Cache node_load. Only the individual modules can decide whether their data is cacheable at the entity level or not. Note that this is not necessarily an API change but merely the addition of a new hook. So I would say we just need to decide on naming.
I'd vote for
hook_ENTITY_TYPE_load()
for everything that's cacheable andhook_ENTITY_TYPE_load_uncached()
orhook_ENTITY_TYPE_postload()
, as it was suggested in the original node_load cache issue. Which one I don't care, with proper documentation both are fine.Comment #9
yched CreditAttribution: yched commentedIf we do that, let's rename hook_field_info()'s 'cacheable' property into 'field cache'.
It should be done nonetheless as a WTF fix, but this change would make the WTF even worse.
Comment #10
yched CreditAttribution: yched commented... or better: keep 'cacheable' (or 'cache'), but turn it into 'TRUE = yes, I have entity caching' instead of currently 'TRUE = yes, use the field cache because I have no entity caching' (meaning field API should take it backwards) ?
Comment #11
catchAgreed on the 'cacheable' => 'field cache' change. We also have a 'static cache' property which is already used by the entity API.
Attached patch just removes bogus hook_user() implementations which are causing those exceptions. Should be down to genuine bugs / fails with this one. Help identifying/fixing those welcome of course - as is code review (my OOP is extremely basic, so there may well be better/more concise ways to do what I'm doing in EntityCacheBaseController). Leaving CNR for the bot.
Comment #12
sunHmmm... not sure - somehow I preferred the more fine-grained options at first sight (i.e. entity cache, field cache, static cache). I think that would allow for some very granular performance tuning on sites that need it? Just my $0.02
Comment #13
catchWrong patch sorry.
Comment #15
yched CreditAttribution: yched commentedre #11-12: 'entity cache, field cache, static cache'.
It's just that entity cache + field cache is useless, the two should be mutually exclusive.
Comment #16
Crell CreditAttribution: Crell commentedSubscribing. I'll try to have a look at the latest patch from an OO perspective soon.
Comment #17
yched CreditAttribution: yched commentedCache clearing sounds like fun :-/
Field API will need an API function to clear the cache for a selected set of entities - like nodes 12, 13, 14, users 5, 7, 9...
Use case: clear the cached check_markup() for text fields when a format definition changes.
More fun: for some entity types, that data will be in the field cache, for others it will be in the entity-level cache...
Comment #18
catchIt looks like this is the text module issue for check_plain() - #556022: When a text format is deleted, some modules (like text.module) don't update their data, even though we say they do. Just posted in there, but I think for that kind of update, you're looking at a general cache clear anyway - it's a rare enough operation, and expensive enough to try to figure it out granularly.
For general reference fields like terms / nodereference / userreference, we already have #493314: Add multiple hook for formatters / #606114: taxonomy_field_formatter_load() should use taxonomy_term_load_multiple() which bypasses the infinite recursion and/or stale data issues we'd get from trying to cache other entities within the field or entity caches. The advantage here, is that taxonomy_field_formatter() load only ends up a cache hit in most cases. If a term has actually been deleted, but the tid is still in the field cache, then we handle missing terms already - since the neither the term field nor the field cache has garbage collection for missing references anyway.
Comment #19
yched CreditAttribution: yched commentedcatch: agreed on those cases.
I'm a little worried that we have no API way to clear the entity cache. Field API has field_cache_clear() or knows how to clear it's own cache for a specific object when it needs. If data can be cached either in the field cache or 'somewhere else', it becomes more problematic.
Er, I might be just rehashing #111127: Cache node_load, which I don't remember that well :-).
in comment #207 over there, I wrote: "Field API needs to know the name of the cache table (or assume 'cache_[obj_type]' ?) and the structure of the cids (or assume [id] ?)." - or get an API function from entity API ?
Comment #20
catchWe could very easily add a drupal_get_entity_loader('entity')->flushCache() style thing, similar to how the static flushing happens now. That seems like a good idea in general. fwiw all the tables are cache_$obj_type and all the cids are $id - so it can also just guess, and it'd be right ;)
Comment #21
catchThis should fix most of the fails, and get us down to a couple of exceptions - which are down to the hack in entitycache_entitycache_node_load().
However fixing that hack means deciding on whether to keep this as a distinct module or merge it back into the entity loader itself, and how we deal with those uncachable hook implementations in either case. So I'll stop here for a bit until that discussion's been had.
Comment #22
catchDown to translation locally, CNR for the bot.
Comment #24
yched CreditAttribution: yched commented"deciding on whether to keep this as a distinct module or merge it back into the entity loader itself"
Possibly related: isn't it a bit limitative that another module (e.g contrib) willing to extend entityController for whatever reason will basicaly forbid the use of the entity-level cache for lack of multiple inheritance ?
Comment #25
catchI think you could do this:
hook_entity_info_alter() same as entitycache does.
Require entitycache module.
Your class extends EntityCacheController instead of the default.
Or just copy the code and then work off that.
But if everyone does that, then there's no point separating them, no.
Comment #27
fago>>"deciding on whether to keep this as a distinct module or merge it back into the entity loader itself"
>Possibly related: isn't it a bit limitative that another module (e.g contrib) willing to extend entityController for whatever reason will basicaly forbid the use of the entity-level cache for lack of multiple inheritance ?
I've extended it already, see this. So requiring entitycache module would be the way out for me there. However I'd prefer to see that baken in the DefaultEntityController class. So why do we need a separate module at all?
>By having it as a separate module, we're able to maintain all the cache clearing code in one central place, so it doesn't have the messiness of putting node cache clears in taxonomy module, or taxonomy cache clears in image module or whatever.
Having the comment cache clear in comment_save makes a lot of sense. Node cache clears in taxonomy module won't fly anyway, because entitycache can't support entity X introduced in contrib. But taxonomy module knows it provides a field type needing a cache refresh, so it should look up the created fields and clear the caches as appropriate.
Comment #28
catchThe reason there's a separate module is because the node_load() caching patch sat at RTBC for six months then was postponed by Dries indefinitely, to be revisited when we'd converted some stuff to fields and compare the field cache. We've done that, I've compared them, this is still a very significant improvement, so this is the same thing revamped. However I have no intention of spending time rolling a proper patch and keeping it up to date until there's some indication that it'll even be considered given what happened last time, new modules don't get CVS conflicts. Once that indication is given either way (or if someone else wants to step in), then we should definitely roll it into DrupalDefaultEntityController.
Comment #29
Dries CreditAttribution: Dries commentedI'll do my best to revisit this patch/approach tomorrow, and to provide some direction.
Comment #31
Anonymous (not verified) CreditAttribution: Anonymous commentedRight now I'm playing with an interesting bug in the attached patch. If the poll module is enabled, then all nodes that are loaded from the cache are passed through poll_load. Since most nodes are, in fact, not polls, this seems less than ideal behaviour. In the following code, it $this->entityType = 'node' AND the poll module is enabled...
then poll_entitycache_node_load is called, which in turn calls poll_load. The same happens for books as well. I'm trying to find a good way to fix it AND clean up some of the testbot failures.
Comment #32
catchProbably the easiest way to fix it would be for poll_entitycache_node_load() to check for node type before executing poll_load() on each node it gets.
However for a proper core patch, I would split the user-specific stuff in poll_load() into poll_node_view() or similar - see the original patch in #111127: Cache node_load (and we'd probably need to do a similar thing for translation.module as was done for node_load() caching too).
@Dries - did you get a chance to review this issue yet?
Comment #33
Anonymous (not verified) CreditAttribution: Anonymous commentedI wonder if this might be done more cleanly. Just off the top of my head, what if implementations of hook_load were free to add '#cachable' => FALSE or '#cachable' => TRUE (defaulting to TRUE)? This would allow most everything to decide whether or not to cache, removing the problem that hook_entitycache_node_load appears to be addressing.
If we take this one (logical) step further and add '#cached' => TRUE to all nodes loaded from the cache, then we could use hook_nodeapi easily to update anything that needs to be changed. We could use poll_load as per usual and then only use poll_nodeapi to update the data that should not be cached. For larger nodes where most of the data will remain the same and only a very little will need to be updated, I think that this might be more efficient (since several queries could potentially be saved.)
I believe this might be a little easier / more versatile. (Or I could be completely and totally wrong. I've done that before too.)
Comment #34
catchI'm reluctant to do that, because I actually think poll_load() shouldn't be cramming user-specific stuff about which links get displayed etc. into node_load() anyway. Doing it in poll_node_view() (is there even a hook_view()? - could happen there) or wherever solves the problem without adding extra complexity for well-behaved modules - a single foreach() to filter out non-poll node types isn't expensive at all.
The trickier one here is book_node_load() - where it's relatively inexpensive, and we don't want to cache it, but it's much more arguably canonical data about the node than poll permissions is. But even then, either re-using an existing hook that's not cached, or adding new hook which isn't persistently cached as the node_load() caching patch did seems safer to me than trying to conditionally call some hook implementations on some nodes some of the time depending on various flags.
Comment #35
Dries CreditAttribution: Dries commentedI reviewed the patch, and I like the idea of entity caching. I'd certainly want to explore this more as I agree that this is one of the most important performance patches in the queue. Sorry it took me so long to get to this issue.
1. I'm not a big fan of making this a module. Why aren't we integrating this better into the default implementation? Worst case, I'd prefer to make this an include file. A module called entitycaching is too low-level for my liking -- especially for core.
2. Does this allow us to remove some other caching code?
3.
Reading the patch top to buttom, it is unclear what this cacheable variable does. Could use a code comment, because I don't understand why it matters. Given that this is a caching module, I'd expect those to be set to TRUE.
4.
I don't like how ids can be FALSE. It seems like we're overloading the variable ids to mean a number of different things.
5.
_comment_helper() sounds like a really poor function name for what it does. Also, the function is small enough that I wouldn't bother with the helper function to begin with.
6. I'd be interested to see actual benchmark results. You mention 10%, but is that documented somewhere?
Comment #36
yched CreditAttribution: yched commentedre Dries
That's the variable the controls *field* level caching. Was originally part of hook_fieldable_info(), and thus takes things from the field end. 'cacheable' = FALSE means 'I have an entity-level cache of some kind, don't bother caching at field level.
It should obviously be renamed to 'field_cache' now (API change...). Been mentioned several times in the past, but it we never got actually done :-/.
Comment #37
catchYeah we need cacheable / field_cache, cache / static_cache and add entity_cache to make this work and sensible. That plus one new hook should be the only API change though I hope.
I don't have a bunch of time this week, but I'll try to have something to look at in the next week-ish.
Comment #41
catchinitial patch to move this to entity.inc and make it more of a proper API.
Done:
Makes all the necessary changes to entity load, hook_entity_info() and adds cache tables for nodes, files, taxonomy and comments. i.e. enough for benchmarking and a look at the API additions / changes - both of those welcome.
Not done (hint, these are in order for anyone who wants to help):
cache clearing in node_save() and similar places.
book_node_load() -> book_node_load_uncached()
half of poll_load() -> poll_node_load_uncached()
fixing tests (there's a method name change for resetCache())
Comment #42
catchSlight cleanup in entity.inc, not got to the TODOs yet.
Comparing node/n, article, default profile, anonymous user (no page caching), only user and comment modules implementing hook_node_load().
Short version, HEAD 10.4 #/sec, Patch: 12.75 #/sec -). That's an 18% performance improvement on single node views.
As usual, devel numbers given only for sanity checking and query counts, I tried to get the most representative of a few requests, but they're all over the place for just one request.
HEAD:
devel says: Executed 49 queries in 45.64 milliseconds. Page execution time was 175.59 ms.
Patch:
Devel says: Executed 45 queries in 33.78 milliseconds. Page execution time was 160.92 ms.
I'd expect the results to be even better with a non-database caching backend.
It'd be good to get benchmarks with taxonomy terms and comments attached to nodes - since these would also be cached, to see if we continue to see 15-20% gains or it starts flattening out. Would also be good to see what the change is on a node listing - I'm pretty sure it drops to 5-10% for those since we're already efficient in terms of SQL using multiple load.
Comment #43
carlos8f CreditAttribution: carlos8f commentedcatch: bravo, this is very cool and exciting stuff.
Comment #44
catchSide note, I've opened #665618: Add entity caching for users - having tried to do this briefly in entitycache.module, it was tough enough that I don't think it's a good idea to tackle here. However if we can keep this issue focused, it'd be worth trying to get that working in separate patch, then maybe working it in here later if it's not too hellish. Various issues with doing user_load() all over the place at #636992: Entity loading needs protection from infinite recursion, #638070: Router loaders causing a lot of database hits for access checks and #490108: user_load() should add session properties when loading the currently logged-in user.
In answer to Dries' #2. Not really. Both the field cache and entity-level caching should allow us to get rid of the filter cache in Drupal 8, assuming most diy formatted textareas get converted to text fields. If we're able to successfully apply entity caching for users, then that points towards possibly removing the field cache since we'd have no use case in core, and core entities cover a lot of use cases - but again I think that's a D8 thing.
Comment #45
catchOK there's still a bunch of test failures to fix, but all the 'core' tests like comment, taxonomy, node, user, file are passing now - so the rest is cleanup, plus poll/book/translation which need special treatment. However I'm going to mark this as needs review, since all the API changes are now in the patch, along with all the changes to entity.inc, and it needs a wider variety of benchmarks run on it.
Side note, I noticed that we clean the field cache when a taxonomy term is deleted, however our taxonomy field formatter handles missing terms. Also we don't cache any term information, only the reference to the term, in either the field or entity cache. So that's a lot of unnecessary cache clearing. This should be removed in a separate issue probably, but I've rolled it in here for now.
Comment #46
catchAnd a summary of the changes:
This patch now adds a cache_get_multiple() and cache_set() stage to the default entity controller, this is in between fetching from static cache and from the database. Both persistent and static caching remain optional.
The 'cacheable' key in hook_entity_info() has been changed to 'field cache', this remains the default.
Two new keys are added: 'entity cache', and 'entity cache bin'. Both are required if you want to use entity caching, at the moment all core entities create their own cache bins, with taxonomy sharing one bin between vocabularies and terms. It would be feasible to do this with dynamic schema at some point, but the module install system is still pretty fragile at the moment, and there's various opportunities for race conditions (esp user_load() if we're able to do that), enough to make this not worth automating.
Modules are responsible for clearing the entity cache in their update and delete operations - when/if we eventually have an entity API which provides save / update / delete as well as load, that'll be handled centrally.
Field API modules don't need to know anything about the cache. Modules using _load() / _save() / _delete() hooks to interact with entities may need to, but only if they're doing something user / language specific in those hooks, or if they're altering data which goes into _load() outside of the entities own API functions - i.e. Poll and comment.
There are a couple of API changes we could avoid making if we really wanted to, but I've tried to keep variable names etc. as self-explanatory as possible rather than keeping them exactly as is (given changes like that still seem to be acceptable for new APIs).
I'm going to keep working on tests, and hopefully some new benchmarks over the next day or so. But reviews in the meantime are welcome.
Comment #47
Dries CreditAttribution: Dries commentedThis looks very promising, but haven't looked at the latest patch yet. Thanks for running with this, catch.
Comment #48
Anonymous (not verified) CreditAttribution: Anonymous commentednitpicks below.
should be $ids no $id.
should be 'is' not 'are'.
is that hunk from another patch?
possibly bikeshedding, but i think resetCache should be clearCache, as that seems to be more inline with what its does and the terminology we use elsewhere?
Comment #49
scroogie CreditAttribution: scroogie commentedSorry for poluting the issue, but catch, can you point me to your benchmark/profiling doc again? I can't find it anymore :/
Or is it enough to bench with ab?
Comment #50
catchFixed field, poll, translation, book and comment tests, should be close to 100% pass but I'll leave the entire test suite to the bot when it's back up and running.
Remaining TODOs:
1. Benchmark single node with taxonomy terms / comments etc.
2. Benchmark node listing
3. API documentation for hook_ENTITY_load_uncached() - although we have book and poll examples for this already.
Comment #51
catchAnd the patch.
Comment #52
catchCrossposted with Justin and scroogie:
Justin: resetCache/clearCache - yeah that seems like a reasonable change. It also means that if anyone is already calling resetCache() with no arguments, they'll get a proper error instead of unknowingly clearing the entire cache bin for that entity. Have made that change and fixed the typos.
@scroogie - ab is fine.
Another thing we need to figure out. When modules are installed or disabled, if they include any hook_ENTITY_load() implementations, those won't get picked up until the cache is flushed.
I think we can do two things here potentially - 1. since hook_entity_info() has the name of the hooks (hook_node_load(), hook_comment_load() etc., we could only do this when modules implement one of those hooks. 2. We can only flush the cache tables for the entity types which are affected. I'm not sure if we'll have all the information at the right time to do this, but seems worth a try. If not we may just have to flush all entity caches everywhere on module disable / enable.
Comment #53
Anonymous (not verified) CreditAttribution: Anonymous commentedcut and paste of #drupal discussion about hook_ENTITY_load between catch and i:
(18:25:15) beejeebus: catch: cool. on the hook_ENTITY_load question, creating new entities doesn't clear the cache right?
(18:25:49) beejeebus: catch: so we could get some entities with the new load hook called on them, and some not, until the cache is cleared? that feels messy
(18:26:17) catch: beejeebus: creating new entities?
(18:26:42) beejeebus: catch: if we have a bunch of foo entities cached
(18:27:05) beejeebus: catch: and we enable a module that implements the hook_foo_load
(18:27:28) beejeebus: catch: new foos will pass through that, but existing cached foos wont?
(18:27:36) catch: beejeebus: OK yeah, in the current patch, that is broke.
(18:27:59) catch: beejeebus: but I'm wondering if we can only clear the cache if the module implements a hook_foo_load(), and only for foo. Not just clear everything.
(18:28:10) beejeebus: catch: that seems reasonable
(18:28:45) beejeebus: catch: perhaps that's should be an implementation of hook_modules_enabled ?
(18:29:00) catch: beejeebus: ooh that would be a nice place to put it yeah.
(18:29:08) catch: It'll have to be in system.module but that's OK.
Comment #54
scroogie CreditAttribution: scroogie commentedSorry, I'm new to this. I didn't know what configuration I should test this with, so I just created a node with some comments, tags and terms and got these results.
Configuration is Windows XP, Nginx 0.8.30, PHP-CGI 5.3.1, APC enabled.
Comment #57
Jeremy CreditAttribution: Jeremy commentedThe following notices are introduced by this patch (and happen dozens of times with each page load on my test website):
Comment #58
catch@Jeremy, you'll need to clear the {cache} table before testing.
@scroogie - thanks, those results are still an improvement, but a much smaller relative improvement than mine (and 30 requests a second seems a lot, maybe nginx really is that fast though). Couple of things - could you check that anonymous users have been given access to view comments, that's off by default. Also please run with -c1 -n1000 to give things time to warm up. And if benchmarking with a lot of comments, it might be worth applying #636992: Entity loading needs protection from infinite recursion to both the before and after version of HEAD, since that's such a massive slowdown at the moment it'll skew results..
Benchmarked a node with a comment and some taxonomy terms this time. This is probably the most favourable example towards caching, since we don't get any advantages of multiple load apart from on the terms. Will try some more unfavourable examples next.
Just a note I got almost exactly the same result the first time I ran the benchmarks, then cleared caches and re-ran the benchmarks both times, then consistently got these, this also happened once before. Not sure if it's just me benchmarking the wrong thing first time or if there's a weird side-effect when applying or un-applying the patch in either direction.
HEAD:
Executed 68 queries in 64.01 milliseconds. Page execution time was 190.02 ms.
Patch:
Executed 60 queries in 53.31 milliseconds. Page execution time was 148.35 ms.
Comment #60
catchOne node, 50 comments. Results are not what I expected - ~12% improvement which is too high for one query and a bit of PHP shortcutting.
There seems to be two reasons for this:
1. In devel query log, I'm seeing a 17ms query caused by field_sql_storage_schema() - this should never be called on a regular page load in normal circumstances, needs it's own bug report. Opened #667098: drupal_get_bootstrap_phase() is broken.
2. The regular comment_load_multiple() query takes 25.09ms on my machine, whereas the cache_get_multiple() is > 1ms. This seems to be simply due to the joins it does, there's no filesort or anything. We might be able to optimize this in the uncached case, but it's a genuine improvement as things stand. Opened #667100: comment_load_multiple() query joins users table and is slow.
Patch:
Executed 110 queries in 42.58 milliseconds. Page execution time was 262.16 ms.
I can't get devel_generate to generate 10 nodes with proper node bodies yet, might need an update for recent changes in HEAD, but if you ignore the fact that caching appears to allow us to gloss over some existing performance bugs in HEAD, then this looks like consistently between 10-20% with database caching in various scenarios. Would be very happy if we could get one or two more indpendent benchmarks though.
Comment #61
Jeremy CreditAttribution: Jeremy commentedI'm seeing an overall slowdown with this patch. My fastest page load time is a little faster, and my slowest page load time is a little faster, but the average page load time is slower. It's possible I didn't let the tests run long enough to see full gains from this patch, as after it had run with entity caching enabled I only had 141 cached nodes and 169 cached comments. Or perhaps I'll need to set a minimum cache lifetime to see any gains?
I'm using a large database for testing: ~50,000 nodes, 10,000 users, and ~500,000 comments (all data was generated with the devel module -- I was unable to get taxonomy generation working.) Before each test, I dropped / reloaded the database from a backup, restarted mysqld and apache, and flushed the cache or restarted memcache as applicable. Page caching, block caching, and JS and CSS aggregation were all enabled during all tests. I manually loaded the front page 1 time before each test.
The actual testing was done with jMeter (all tests were developed thanks to the Examiner/NowPublic). I simultaneously ran 6 different tests at the same time to try and generate more realistic traffic: 1) anonymous spiders of the entire website, 2) anonymous visitors favoring the front page and spidering links from the front page, 3) logged in visitors favoring the front page and spidering links from the front page, 4) logged in user posting nodes, 5) logged in user posting comments, and 6) anonymous user creating user accounts. Test 6 is currently adding comments to the same node repeatedly. All tests are run together for 5 minutes, and then all are automatically stopped and the data is collected.
(I also tested the same website with Plain Drupal 7 + memcache to confirm that I saw a significant speedup.)
--
Test 1: Plain Drupal 7
Test 2: Drupal 7 + entity caching patch
Comment #62
catchWith that schema bug patched, here's more realistic results for HEAD in the 50 comments case. That's more like a 2% improvement which is closer to what I was expecting.
Patch:
Even more fun, it turns out that bug is largely responsible for the speedup with just a single node as well - basically, if something stupid is happening during node_load(), and you cache node_load(), then you ameliorate the effects of the stupidity. But at least with database caching, the actual caching itself isn't buying us all that much in itself. This might well change with an alternate caching backend, or with more going on in hook_node_load(), but not as exciting :(
Comment #63
catchCrossposted with Jeremy. Yeah this patch allowed us to mask #667098: drupal_get_bootstrap_phase() is broken - which happens on pretty much any situation after cache clear, except for visiting the front page, and was causing at least a 15ms slowdown (easily enough to account for 20% of the page request). Results seemed a bit too good to be true - node caching was more like 10% a few months back, but took a few long looks at the devel query log to figure out what was going on. Bumping this down to normal and CNW :(
Comment #65
scroogie CreditAttribution: scroogie commentedcatch: Do I need to do anything else besides enabling Page Caching to activate the entitity cache? Or shouldn't I activate page caching? I'm still a bit confused. :)
Comment #67
catchEntity caching will only make any difference with the page cache disabled - the page cache bypasses just about everything. Generally when benchmarking core, unless you're making a change to very low-level bootstrap or page caching itself, you should never enable page caching.
Also until the bootstrap issue is solved, you'll need to visit the front page before benchmarking to allow caches to rebuild properly.
Comment #68
Anonymous (not verified) CreditAttribution: Anonymous commentedcatch: want me to work on #53? don't want to duplicate work if you're already on it.
Comment #70
rickvug CreditAttribution: rickvug commentedCan this be assumed to be D8 work now?
Comment #71
catchYeah we don't get real gains with out of the box Drupal, so that leaves sites with lots of hook_node_load() implementation, or using non-SQL caching, which is contrib for D7 in http://drupal.org/project/entitycache, and we can see how that goes for inclusion for D8.
Comment #72
catchI'm really sorry to move this back down a day before alpha, but I just realised that entitycache has no clean way to override the base class. I previously thought this was my lack of OOP-fu, but a conversation with Damien indicated it's actually not possible to do nicely in PHP. A quick patch to make entity.inc swappable is a no-go since we have registry breakage if you try to do that as far as I know. So if we want to properly support caching in contrib, core has to support caching even if it doesn't use it.
This patch is exactly the same as the previous patches, except it doesn't enable caching for any core modules. The only thing entity cache would then do from contrib is hook_entity_info_alter() and create cache bins, plus probably a couple of hook implementations to clear translation caches since those changes are specific to having node caching enabled.
Comment #73
catchFixed one missed find/replace and what looks to be an uncaught test failure in node.test - in that it shouldn't be passing in HEAD either.
Comment #75
catchHEAD moving fast today.
Comment #77
catchNot that fast, uploaded the same patch :(
Comment #79
catchComment #80
catchLooking at all this, apart from #611752: taxonomy_field_validate calls taxonomy_allowed_values and #684202: Entity insert/delete/update hooks are missing, we can make do for now, back to D8.
Comment #81
pwolanin CreditAttribution: pwolanin commentedmostly looks good - but I'd like some minor clarification of the variable/key names.
E.g. 'field cache' doesn't suggest to me that it's a Boolean unlike the 'cacheable' it replaced. Maybe 'field caching' ?
Comment #83
moshe weitzman CreditAttribution: moshe weitzman commentedD8 is open for business. Folks think we should move this to core?
Comment #85
catchfwiw:
- http://drupal.org/project/entitycache has 183 users at the moment (not loads, but it's also increasing each week).
- Most core entity tests pass with it enabled, in the project there's a script to generate the tests which also documents each one that fails and related core issues.
- Last time we benchmarked this, it didn't make much difference, at least with database caching. That might be worth revisiting though.
Comment #86
carlos8f CreditAttribution: carlos8f commentedalso fwiw:
I use the Pressflow entity cache patch (https://code.launchpad.net/~catch-drupal/pressflow/load_cache/+merge/39241) on a fairly large (75k users, 100k nodes) D6 site, with great results. Probably more dramatic in D6 since D6 lacks static caching.
The real benefit is for complex custom sites, with expensive hook_user_load/hook_node_load() implementations which call many queries. For them, entity caching is a huge bonus. It seems like having full support in core for this is better in the long run than having half-core half-contrib like we do now :)
Comment #88
mitchell CreditAttribution: mitchell commentedPerhaps ArrayCache DatabaseBackend PERMANENT Bins could be used as an entity storage backend for parts or even collections of entities.
This optimization seems equivalent to views materialization.
Alternatively, database tools like collections and indexes (foreign keys) would have significant performance benefits without the need for caching.
#1869338: Support ETag + If-Match for PATCH requests to prevent concurrent updates is tangentially related.
Comment #92
DamienMcKennaShouldn't this have the "D8 cacheability" tag too?
Comment #93
catchIt should, but note that a lot of the activity on getting this actually into core has been on #1596472: Replace hard coded static cache of entities with cache backends.
Comment #94
yched CreditAttribution: yched commented[edit: crosspost with #93 - haven't looked at #1596472: Replace hard coded static cache of entities with cache backends yet, but the remark below has nothing to do with the static cache of entities]
So right now:
- we have a field cache that is a direct descendant of CCK / D7 Field API and only covers configurable fields, not base fields, which is absurd.
- FieldItem classes can provide a prepareCache() method, but it's only triggered if the class is used for a configurable field, which is inconsistent / misleading.
The field cache logic should move from FieldableEntityStorageControllerBase::[load|save|...]FieldItems() methods, that are only about configurable fields, to "somewhere higher up (TBD) in the storage controller flow", at a point that encompasses the loading/saving of both base and configurable fields.
We should probably try put that in a place that's easily overridable, since it's likely that alternate storages (mongo...) might have no need for a field cache in the first place ?
Comment #95
DamienMcKennayched: silly question - is there a need for field cache when there'll be an entity cache?
Comment #96
yched CreditAttribution: yched commented@DamienMcKenna: Given the current Entity API ("everything is a field"), a "persistent cache for entities" would mostly be a "cache for all fields, not just configurable fields". There might be a couple extra bits that would need to be in the cache as well, dunno for sure, but this doesn't invalidate #94 IMO.
Static cache is probably different, that's about statically persisting already instantiated $entity objects.
Comment #97
fagoYeah, we really really should do that. As yched points out in #94 the current approach of just caching configurable fields is obsolete, misleading and sub-optimal - caching all fields instead is of course better. So yeah this removes the need for field cache then.
Comment #98
fagoYeah, we really really should do that. As yched points out in #94 the current approach of just caching configurable fields is obsolete, misleading and sub-optimal - caching all fields instead is of course better. So yeah this removes the need for field cache then.
Comment #99
yched CreditAttribution: yched commentedWe should settle the "configurable field cache" code in #2083785: Add support for determining which field properties are cacheable first, and then move it up the stack to encompass all fields.
Comment #100
BerdirQuick proof of concept to do entity field caching for all fields.
Seems to be working quite fine for me. Tests will of course explode. Also didn't remove the nested field caching.
Comment #101
BerdirOk, the prepareCache issue went in, let's see what happens...
Comment #103
BerdirRe-roll, removing configurable field cache, which means that those configurable field helpers and the do* pattern can and imho should be removed.. I don't think we need to leave anything in the base class of those. They're useless for e.g. MongoDB.
Comment #105
BerdirComment #106
yched CreditAttribution: yched commentedFully agreed that the methods stack around "extensible" sql field storage can be greatly simplified after field cache is moved upstream, that was totally the plan in the "switch to entity storage" issue.
Comment #108
BerdirOk, so a lot of those fails are because the we lose things that are just added to entities but not defined, e.g. $entity->translations and the book stuff.
We have an issue to convert the first to a field, and issues to clean up book as well, but we still might need a solution for this. I'm not yet sure what do it, as we'd need access to $entity->values. We could add a getCacheData() method there, which works similar to the one in PrepareCacheInterface. Not sure.
Or we say that we don't support caching things that aren't defined, but then we need to solve the core problems.
Comment #109
plachIf we could do this I'd be very happy :)
Comment #110
BerdirForgot the use for the PrepareCacheInterface.
@plach: I'd love that too, but might not be so easy. For starters, you need to work on your translation/field issue :p
Maybe we could still invoke hook_entity_load() for the time being when loading an entity from the cache? Just to see how that goes...
Comment #112
BerdirA bit complicated to do that, #2095283: Remove non-storage logic from the storage controllers would help here, so please help to decide what to do there.
Untested, no idea what's going to happen with this...
Comment #113
BerdirRemoving the do* pattern, the field methods are now an implementation detail of the fieldable database storage controller . Which means we can more easily refactor them and e.g. make delete work on a list of entities or load just load the values.
Comment #115
catchI'd be OK with saying we don't support things at all that aren't defined assuming we can get core to that point, but not caching them is OK too.
Comment #116
BerdirFixed a number of tests, array_merge() doesn't work like that...
The remaining ones I either didn't figure out or are actual problems. Like the FilterSecurityTest, do we need to clear the entity cache when changing filter settings? I guess so, but who is responsible for it? text.module, as it puts cached filtered things in there?
Comment #118
BerdirSome more fixes. The remaining ones are... weird.
Comment #119
BerdirThe previous patch had some debug left-overs, the interdiff is against #116.
Comment #120
yched CreditAttribution: yched commentedText module currently has code to clear the field cache, so yes, if field cache becomes entity cache, text module should clear that one instead ?
Comment #122
plachFYI, I am currently working on #1916790: Convert translation metadata into regular entity fields :)
Comment #123
yched CreditAttribution: yched commentedAs a side effect, this places field data in a different cache bin than field metadata (plugin definitions, field definitions), which I've been willing to do for ages :)
Comment #124
BerdirYeah, I'm not sure yet about how exactly the bins should be set up (entitycache 7.x uses a bin per entity type, which would be really easy in 8.x but that might make "clear the cache of all entity types" harder.
Comment #125
yched CreditAttribution: yched commentedThis raises the question of field_cache_clear() - the current patch doesnt seem to touch it, meaning it still empties the "old" bin instead of the new one ? Should it become a method on the storage controller ?
But yes, the ability to empty the cache for a single entity type would be cool, those are our main use cases. This could also be done just through the cid in a single bin.
Comment #126
plachFYI (again :) we should be mostly done with #1916790: Convert translation metadata into regular entity fields, reviews welcome.
Comment #127
jibranIs it not nice to have a helper function for this?
more then 80 chars.
More then 80 chars.
Comment #128
BerdirTracking down those test fails. Crazy stuff. Two things fixed:
1. configurable entity references consider target_id = 0 as empty, so it didn't cache it and the default value is NULL (weird on it's own but that's a different story), result in those notices.
2. This one was reaally fun to track down. What happens was this:
a) user with picture is saved
b) after base fields were saved, cache is reset.
c) then, field item list update runs, and adds a new file usage
d) that somehow results in loading the entity again.
e) Now, at this point, we actually load from the database because the static cache was cleared. But the field values have not yet been saved. So we load the entitiy without the updated field values and put it in the static cache.
For now, I just moved the resetCache() down below after saving the fields. Really wondering if more in that order there isn't wrong. Shouldn't we save fields *before* we start invoking postSave()?
Comment #130
fagoA related point that came to my point recently: What about also caching when entities do not exist, i.e. loading fails? When working with remote entities you'd like to cache that also, but I think it would be reasonable to do that always? Having to clear the cache on creation also shouldn't be a big deal anyway.
Comment #131
fagoAlso see #2135169: Support caching of failed load operations
Comment #132
yched CreditAttribution: yched commentedNo db hit on entity load - awesome !
NestedArray doesn't seem to be used anymore either ?
Don't we use [] typehints now ? (or is it only for @return docs ?)
Leftover
Nitpick, but IMO we should make it cleaner in code comments that parent::cacheXxx() handle static caching.
The $this->cache flag controls both the static and persistent cache - is that ok ? Don't we want to be more granular ?
(If not - probably debatable, but I'd find it clearer if the if ($this->cache) checks were made by the calling code rather than inside cacheGet() / cacheSet())[edit: can of worms, scratch that]
It would be handy to have cacheGet($cids) take $cids by ref and remove the successfully loaded entries, like CacheBackendInterface::getMultiple() does.
Currently we have three layers (static cache, persistent cache, load from db), where each layer needs to array_diff to figure out the misses of the previous layer.
Copy/pasted from the old field cache code, but misleading now - the code doesn't put data from the cache into pre-existing $entity objects, but creates fresh objects from the cache data.
Comment looks weird. "Populate the static cache" ?
+ The code here is implementation logic leaking out of EntityStorageControllerBase::cacheSet() - should this call parent::cacheSet() instead ?
Also, could be for a followup, but properties about caching are a bit confusing:
$this->cacheBackend is the 'cache.entity' service, $this->entityCache is an array of statically cached entities
$entity->getUntranslated()->language()->id should be computed once, before looping on langcodes ?
No if ($this->cache) check here ? (parent has one).
Also, not introduced here, but we have cacheGet() / cacheSet(), so the method would be more consistent if named cacheReset() ?
Nice move. However:
- It would make more sense to move it higher up to EntityStorageControllerBase, next to invokeHook() ?
(configStorageController::attachLoad() would need to be split similarly - not even sure config entities have a need for this method)
- It is weird to call invokeLoadHook() from attachLoad() and cacheGet() separately. The same entity_load_multiple() results in two separate hook_entity_load($entities) invocations (one for the cache hits, one for the cache misses).
Can't loadMultiple() do "get entities (from wherever, db or cache), then call invokeLoadHook() on them" ?
Looks like it's the whole of FieldAttachOtherTest::testFieldAttachCache() that should move to a testEntityCache() method somewhere in Entity component ?
Also, that's for a separate issue, but the FieldableESC / FieldableESCBase pair looks really confusing after the doXxxFieldItems()/xxxFieldItems() swapping, not clear what is in one vs the other, nor which one should extend the other.
FESC really looks like a "ContentESC" now, and invokeTranslationHooks() / invokeFieldMethods() have no business staying in FieldableESCBase.
Do we already have an issue to clean that up ?
Comment #133
yched CreditAttribution: yched commentedre @Berdir #128 :
Hm, yes, that would definitely seem reasonable, the current order is weird to say the least...
Comment #134
BerdirThanks for the review!
- Yes, we should probably have separate flags for static and persistent cache. I was just too lazy to introduce that :) The persistent cache setting would probably just replace the existing field_cache one.
There are a lot of related issues now...
#2095283: Remove non-storage logic from the storage controllers moves a lot of attachLoad() stuff around, we also have agreed there to move hook invocations to the entity classes, but in a follow-up that doesn't exist. Having the hooks there should also simplify creating entities, so less code duplication.
#1867228: Make EntityTypeManager provide an entity factory should also help with that.
#221081: Entity cache out of sync and inconsistent when saving/deleting entities cleans up the order of cache/hook/saving.
6. Sounds interesting, wondering how bigger that the patch would get by that.
8. The thing is that cacheSet() writes back into the persistent cache. What we want to achieve here is to add it to the static cache so that the next call to cacheGet() could receive it from there. Calling the parent would work, but not sure about directly calling parent implementations of other methods.
10. Renaming resetCache() would result in a large patch, as that is called a lot. I actually prever the way that is named and I'd rather rename the other two to getCache/getCachedEntities or something like that.
11. No it can't because there are three options. static cache, persistent cache and db. We don't want to execute it for the static cache. However, I think that once those related issues are in, it will come down to two lines: instantiate through factory + $entityClass::postLoad().
12. Yes, probably makes sense.
Comment #135
yched CreditAttribution: yched commented8. 10. :
Then I think maybe the current approach of dealing with persistent / static cache by layering overrides of cacheGet() / cacheSet() is brittle. Having explicit methods for "fetch from static cache", "fetch from persistent cache"... might be cleaner.
Comment #136
pounardUsing a cache chain implementation would be good here? I don't know if this old patch ever reached core...
Comment #137
pounardhttps://api.drupal.org/api/drupal/core!lib!Drupal!Core!Cache!BackendChai...
Comment #138
BerdirThere's a separate issue where that is discussed: #1596472: Replace hard coded static cache of entities with cache backends.
I decided to move forward here, because I don't think a cache chain works here. It's a nice idea, but in practice, we do need to handle static cache differently, static cache contains the entity objects as-is, for the persistent cache, we prepare them and store the values, we invoke the load hook when we get it from there and so on.
Comment #139
pounardOk I didn't catch that point where the persistent cache was only a store for values and post hooks were invoked.
Comment #140
BerdirRe-roll and fixed the use statement.
Haven't addressed yched's review yet.
Comment #142
BerdirOk, working on @yched's feedback.
1. Removed.
2. Updated some @params and @returns to use the [] syntax, yes we use that for both.
3. Removed.
4. Worked a bit on the comments, hope that it's clearer now. Completely overriding it is still an option.
5. Switched to use the existing field_cache property, do we want to rename that? see @todo.
6. Implemented. The only downside of that is that it makes mocking of those methods impossible.
7. Comment updated.
8. Updated that comment too, see 6.
9. Improved.
10. Added.
11. As responded already, currently has to be like this, but added a @todo to investigate ways to avoid that completely.
Also started using cache tags instead of emptying the whole cache bin.
Comment #144
amateescu CreditAttribution: amateescu commentedFixed a bunch of test fails, couldn't figure out the one about feed items yet..
Comment #146
jhedstromI haven't figured out exactly why feed tests are failing, but quickly looking through the way those tests are written, they use a mix of aggregator module api functions, and direct database manipulation, which is most likely causing the caching to get out of sync during the test.
Comment #147
BerdirYeah, that is quite messy. Might be useful to have a decision for #2127725: Remove category handling from aggregator, if we'd remove it then we won't have to mess with it here. We might be able to fix that test by adding some manual cache clears, but I'm not sure if that's the right approach.
Comment #148
amateescu CreditAttribution: amateescu commentedI tried manual cache clears when I was working on #144, no go :) Or maybe I didn't try hard enough..
Comment #149
yched CreditAttribution: yched commenteds/ID's/IDs/, no ?
(in a couple places in the patch)
So $storageController->cache controls static caching, and $entity_info['field_cache'] controls persistent caching ? Not too intuitive :-)
In general, looks like the names of properties and flags around "static cache' and "persistent cache" could use some harmonization.
Current patch:
- Static cache:
controlled by $this->cache
stores in $this->entityCache
- Persistent cache:
controlled by $entity_info['field_cache']
stores in $this->cacheBackend, which is 'cache.entity' (kind of clashes with $this->entityCache)
Suggestion:
- Static cache:
controlled by $this->staticCache
stores in $this->entities
- Persistent cache:
controlled by $this->persistentCache
stores in $this->cacheBackend, which is 'cache.entity'
?
I'd tend to think it would make the code easier to grasp, yes. Forwarding to the parent for static logic brings little code saves, and there is still parent implementation logic leaking in the child implementation ($this->entityCache)
Also, you pointed that a cache chain wouldn't work here because of different contents and the need for hook_entity_load() in one case & not the other, but it seems what the current patch does is actually a one-off implementation of a chained cache ? cacheGet() encapsulates stacked static & persistent cache, and thus we have split calls to hook_entity_load().
Hence my proposal to split into explicit methods for "fetch from static cache", "fetch from persistent cache", explictly called from loadMultiple(), which can then control when to fire hook_entity_load() ?
Comment #150
msonnabaum CreditAttribution: msonnabaum commentedReroll of #144.
Comment #153
jhedstromThe wrong language code was being used in those failing tests.
Comment #157
BerdirRe-rolling the re-roll.
Comment #161
jhedstromDrupal\system\Tests\Form\FormTest
seems to consistently fail with an out of memory error:Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 268435456 bytes) in /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Connection.php on line 336
Comment #162
BerdirRe-roll. Can't reproduce the FormTest fail and that wasn't there before the re-test. Will try to run it again with the same memory settings if it still fails.
Comment #163
BerdirYay first green patch.
Ok, started to implement @yched's suggestions. Was quite a bit of work, but I agree it's way cleaner.
- Completely moved away from using the cache related stuff from the base class, I think it was a mistake to attempt and extract that there, caching is different for every storage controller. Reverted the changes in the other classes as we no longer use them here
- Introduced staticCache and persistentCache flags, not sure what to do about the entity annotation keys.
- Have two separate dumb methods to get/set static and persistent cache. resetCache() handles both.
- Refactored loadMultiple() to track the caching logic and also moved the mapping/loading fields directly into it and loadRevision(). That allows to to just call to postLoad() for both persistent entities and those from the database. The goal is still to elliminate the first, maybe we could make the entity itself implement PrepareCacheInterface, then it could have a logic similar to the serialization patch and return the array including non-defined values?
- The change to postLoad() to receive entity objects means that the user and comment overrides no longer work there, so I moved that to mapFromStorageRecord(). Also cleaned up a lot of weird variable names there, but didn't go inside that method, which is still a very dark place. We'll get to that in https://drupal.org/node/2137801 i think.
Might introduce some new fails, let's see.
Comment #165
BerdirShould fix the fails.
Comment #166
msonnabaum CreditAttribution: msonnabaum commentedThis was my first thought. setPersistentCache is doing much more than it should be doing. It contains entity serialization logic, which belongs on the entity itself. It needs to have a method that returns an array that can be used as the "$values" argument in the constructor.
The changes in #163 are an improvement, but loadMultiple is still pretty complex and hard to follow. Attached patch is my attempt to simplify it futher:
- Moved logic specific to storage or cache into loadMultipleFrom* methods to isolate them.
- Moved static-cache-miss logic to doLoadMultiple, similar to how we handle caching in other classes. This also creates a logical separation between entities loaded from an external source and those that have already been loaded and are in memory. Entites from cache or storage need hooks run on them, so this seems like a natural boundry to create.
- postLoad gets run on the results of doLoadMultiple, which includes entities from cache and storage. I saw the todo to get rid of that, but I don't really understand why. I'd think we could move anything expensive to a new hook/event and just run the postLoad hooks on a hit.
- Moved all conditional checks for whether or not to cache in the methods themselves. Simplifies the caller and prevents duplicate conditional logic.
There's more ugliness I wanted to get rid of, but couldnt because of loadMultiple's multiple argument types. Started #2157241: Add loadAll() to EntityStorageInterface to fix that.
Comment #168
BerdirI want to get rid of postLoad() because that invokes hook_entity_load(), which in turn a lot of modules use to load and attach a lot of stuff to those entities. Right now, we don't put that in a cache because we can't extract those
modulesedit: values. Adding a method to get the values would allow allow to get those too.It would be a similar logic as #2027795: Optimize content entity serialization.
Not sure I see the advantages of your changes, need to look at the result and not just the diff, seems to have very deep nesting of method calls, not sure if that helps to understand it. Also I don't see why static cache happens outside and persistent cache inside the helper method.
Comment #169
msonnabaum CreditAttribution: msonnabaum commentedThis should fix the failure I introduced.
The goal I had was to split up the loadMultiple method into distinct chunks and isolate each responsibility. This reduces the number of temporary variables and conditionals needed, as well as reducing the scope of the remaining variables, resulting in code that's more composable and less error prone. The number of methods and instance variables that loadMultiple depends on is dramatically reduced, so it will have fewer reasons to change in the future, making the code more maintainable/refactorable. We should never look at "very deep nesting of method calls" as a bad thing on its own.
I think it's important here because this *should* be refactored into a different class, sooner rather than later. Caching has nothing to do with a class that expects to be swapped out when the storage backend changes. That said, we don't yet have that class (entity repositories), so this is fine where it is for the time being.
Comment #171
BerdirApparently not :) Also seeing that fix (interdiff) would be helpful ;)
In theory yes, but I'm not so sure if it really works. From what I've seen so far in entity caching issues is that caching is tightly coupled with the type of entity we're working with (config, content, ...) and the backend. For example, config entities currently (not sure how that the config context system changes affect this, but it will need something similar) need to be keyed by config context in the static cache, because depending on the currently active language, entity_load('node_type', 'article') will *not* return the same information. And persistent cache makes no sense for them, the config system takes care of that. See #1885830: Enable static caching for config entities.
Another example is that I'd love to see a FieldableMongoDbStorageController. I expect that caching will work quite differently there, I think that the values that are extracted right now in setPersistentCache() would be the thing that's actually written to the storage, which would make the loading similar to getFromPersistentCache() , I've no idea how useful a persistent cache is then, as that sounds quite fast already.
There's #1596472: Replace hard coded static cache of entities with cache backends, but we more or less agreed that the static cache chain doesn't work, as that would serialize the entities too, which is obviously not something we want. Some of the others still want to pursue the idea of a separate class to handle caching, but I don't see the advantages of that so far.
Comment #174
msonnabaum CreditAttribution: msonnabaum commentedHere's the interdiff, which does fix the previous failure, but I can't recreate these new ones locally.
Comment #175
BerdirRe-roll, see interdiff, that was responsible for those test fails as it killed the $ids === NULL special case.
IMHO, there's too much nesting in those method calls and combined with the by reference arguments, it can result in weird side effects as the interdiff shows that quite nicely.
Yes, my loadMultiple() implementation was a bit long, but IMHO had the advantage that you could read through it and understand how the two caches and loading from the work played together, I think it's distributed too much now. And undocumented :p Maybe we can find a way to do something inbetween?
Would be great to hear some other opinions,
Comment #176
BerdirHidding irrelevant patches.
Comment #177
dawehnerCompareing #175 and #165 I have to admit that #175 is easier to understand especially because we have more methods you can ignore on the first read. It is kind of good that it just covers the flow of static, then cache then actual loading.
I wonder whether we ever had time to think more about the chainable cache, which would potentially allow us to move the static cache out of the class? This is just something we could research as a follow up.
It is interesting that we don't serialize the actual objects, but just the values, so we don't run into serialization issues, as well as memory problems.
Comment #179
BerdirThanks for the feedback.
See #171 re chainable cache.
This code and the __sleep() implementation of ContentEntityBase is now actually very similar, the main difference being the PrepareCacheInterface part.
Comment #180
yched CreditAttribution: yched commentedCan't seem to find where, but I think the question of "data for entity cache == data for entity serialization ?" was risen earlier.
Do you think we should unify that ? And run getCacheData() when serializing too ? (no strong opinion here, just wondering)
Also, unadressed from #125:
field_cache_clear(), which was so far supposed to clear the "field data cache", still clears the old 'field' cache bin instead of the one that now holds the data. Meaning it's basically useless ? Yet it's still called in like 20 places in core, so it's surprising that we're green ?
- For a couple cases, the call to field_info_cache_clear() might actually happen to be enough; although it's hard to predict when it's ok to rebuild field definitions without refreshing values too...
- Uses in tests are often to clear the tester side after a drupalPostForm($entity), so yeah, not sure how those pass.
- the uses in text_filter_format_update() / text_filter_format_disable() are important though. We cache check_markup()ed strings in the entity/field cache, so we need to clear the cache when a filter_format changes.
Bottomline is: strictly speaking, it's now the EntityStorage that knows if & where field values are cached. So how do we do "clear all cached field values" ?
foreach ($fieldable_entity_type) { $entity_type_storage->resetCache() } ?
Comment #181
yched CreditAttribution: yched commentedOther than that, I think I'm fine with the methods too, but IMO :
- loadMultipleFromCache() should be loadMultipleFromPersistentCache() (other methods are explicit about which cache level they involve)
- but do we really need both loadMultipleFromCache() and getFromPersistentCache() ? Can't we remove one mental nesting level and just keep getFromPersistentCache() ?
- then rename loadMultipleFromStorage() to getFromStorage() ?
Then we'd have:
, which is quite easy to follow IMO ?
Comment #182
Berdir@yched
- I don't think we should combine it, prepare cache is about doing extra processing with the assumption that you won't have to do it again. Seems pointless to process markup just because a node is put into the form_state "cache". I do think it should be moved into the class, basically "ContentEntityInterface extends PrepareCacheInterface" :)
- Yeah, those renames/merges make a lot of sense to me, was thinking something similar. I don't like doSomething() helper methods though, wondering if there isn't a better name? Unlike the removed doField*(), this isn't something that you want to implement differently in a subclass, so I still don't really get why we can't inline those 5 lines of code but anyway, a better name would be nice already because I currently have no idea how to document it :)
- Yes, we need to clean up the whole cache clearing, the reason we don't need many cache clears here is drupalPostForm() problem, it's mostly the static cache that's problematic, so our tests are already full of cache clear calls, the persistent cache works mostly transparent. Will have a closer look at this :)
Comment #183
yched CreditAttribution: yched commentedrename / inline doLoadMultiple():
Well, I started off writing #181 with that in mind too - "let's inline it or at least rename it, Verb() + doVerb() bleh". But looking more closely I'm actually fine with it :-)
In this case, doLoadMultiple() is not actually such a bad name, since this is the part of loadMultiple() that is actually about loading, as opposed to "yay it's already in memory". And the results are what we call postLoad() on...
Inlinling would mean :
Well, matter of taste, both work... :-)
Comment #184
BerdirOk, trying to implement those suggestions, leaving that method. Haven't touched the cache clear stuff yet. Interdiff is quite large because I moved some of the new methods around to keep all that stuff together.
Comment #185
yched CreditAttribution: yched commentedInterdiff looks lovely :-)
Comment #186
fagohm, I don't see either why doLoadMultiple() would be separated. I wanted to suggested inlining it before reading the comments above :-) Separating getFromStroage() is imho the important part, but not a big deal.
I think the method names are pretty good now, I'd just keep "loading" in regard to the storage as usually we call "getting" from storage "loading", so shouldn't it be loadFromStorage() also? Caches usually use "get", storage usually uses "load".
The whole loadMultiple() method should probably live in the base class as it seems to be storage independent?
you mean @return.
minor: would be $ids === array() easier to read?
Is there a reason we cannot go with serializing?
Yes, agreed.
However, it seems a bit odd to have the logic duplicated - but I've not been able to come up with an approach that works with cache-data but re-uses serialization logic and is nicer either.
Comment #187
fagooh, and I forgot two things:
- the issue summary could need an "update"
- great work guys!!!
Comment #188
yched CreditAttribution: yched commentedre #186.1 :
well, "loading" here means both FromPersistentCache() + FromStorage(), and we call PostLoad() on both. So I don't think one should be getXxx() and the other loadXxx().
Comment #189
BerdirI'd still love to move postLoad() into getFromStorage() (= only call it on those), so I wouldn't count on that staying there.
I'm also not sure about the language logic that I currently have in there, I thought that the check there should no longer be necessary now that we simplified the language handling, but it didn't work at all...
Comment #190
fagoThat's what I'd expect. If we really need a hook for operating on entities fetched from cache it should probably be a separate one, as entitycache module in d7 has it.
I see - yes loading is all, but load-from-storage is not necessarily all. Having postLoad() on storage + persistent cache, why is it missing from the static cache then? It's also part of loading :-) Imho, post-load refers to loading from storage - but obviously this makes only sense if we can move postLoad over.
Comment #191
BerdirRe-roll and fixed the easy parts from @fago's review.
Not sure about applying the doLoadMultiple() and generic loadMultiple() in the base class to all storage controller, maybe look into that in a follow-up?
Comment #193
BerdirOk, yet another fun re-roll and implemented the PrepareCacheInterface idea that I was talking about multiple times.
As explained before, this allows us to also cache non-field values, examples are content translation and book. Tested with the node translation UI tests and it seems to be working.. Due to that, we can make postLoad() a part of getFromStorage().
I think this now better explains fago's and my thinking about the method naming and grouping, it doesn't make much sense now to call postLoad() in getFromStorage() and have doLoadMultiple().
Also the PrepareCacheInterface should now probably move into the Entity namespace.
Left it for now, though.
Comment #195
BerdirAh, those test fails mean that it's actually working now for hook_entity_load() :)
Time for hook_entity_load_uncached() then I guess, converted some cases that that. Some will probably need follow-ups to try and make it part of the cached load hook.
Comment #196
BerdirOh nice, didn't expect that this would be enough to fix all those tests :)
Had to re-roll after #2165155: Change $entity_type to $entity_type_id and $entity_info to $entity_type/$entity_types.
This might need some more cleanup, but feedback on the PrepareCacheInterface approach and the method names as they are now would be great :)
Comment #197
jibranActs
Comment #200
tim.plunkettNo, when in *.api.php docs, you would use "Act" not "Acts".
Comment #201
BerdirAnother re-roll, core/lib/Drupal/Core/Field/ConfigEntityReferenceItemBase.php has been removed. Let's see if the isEmpty() methods are now correct :)
Also, 200+ comments now...
Comment #202
moshe weitzman CreditAttribution: moshe weitzman commentedNeeds issue summary update please.
Comment #203
swentel CreditAttribution: swentel commentedNeeded a re-roll again .. hope I got everything right.
Comment #204
msonnabaum CreditAttribution: msonnabaum commentedI thought the staticCache and persistentCache variables were confusing because they suggested that those variables *are* the caches rather than a bool for whether to use caches.
This patch is the same as #203, but removes those variables in favor of method calls to $this->entityType as discussed with berdir in IRC.
The rest looks good to me.
Comment #205
BerdirI think this is a reset that we shouldn't have to do based on what @yched said, so we need to update text.module to clear the correct caches. And probably all other places to call field_cache_clear() right now.
Comment #206
BerdirRe-roll.
Starting to work on the issue summary, then change record...
Comment #207
moshe weitzman CreditAttribution: moshe weitzman commentedGreat issue summary!
I'd feel more comfortble if we have an open issue for "Clean up the field cache handling, introduce a way to do that for the entity cache instead.". I have special disdain for redundant caches.
Comment #208
BerdirSorry, that line was slightly wrong, it should read "Field cache *clear* handling....". There is no field (value) cache anymore, that has been replaced/merged into the entity cache. Updated in the issue summary.
The only thing left of the field cache is the field_cache_clear() function, which is called in ~20 places throughout core and empties the cache_field table + calls field_info_cache_clear().
That cache clear is now useless, because the values are stored in a different table. We only had a single issue in the tests due to that, that is the manual cache clear mentioned in #205. This is a problem and is blocking this issue (security relevant), so I need to address it anyway in here, will do that asap.
Comment #209
BerdirOk, this fixes that test without the need for the manual cache clear.
Two problems with that:
1. It's a performance regression, because previously, it was a simple TRUNCATE on the cache_field table, now it's a deleteTags() per enabled entity_type. However, I don't think we can change that without making assumptions about entity cache implementations. We could consider to only call it for content entities or even only for fieldable entities, but it's now possible a non-fieldable entity type uses a text field or there's another reason why we'd need to unset the entity/field cache.
2. It's still something in field.module, not sure about that. We could move it to a method on the EntityManager (clearStorageCache() ?) or we could say that we try to untangle the field_cache_clear()/field_info_cache_clear() mess in a follow-up? We already have #2116363: Unified repository of field definitions (cache + API) and #2148723: Clean up field_info_cache_clear() for example which are related to that.
Comment #210
BerdirNow with the patch.
Comment #211
BerdirRe-roll after #2192259: Remove ConfigField.. Item and List classes + interfaces. Any feedaback on #209? :)
Comment #212
andypostAbout #209:
1) is this regression measurable? suppose cleaner API in this case better because cache tags are in progress to rework anyway
2) better address in follow-up, this is not in scope
just minor nitpick:
Is this a test for resetCache()? so better to pass array($entity->id())
Comment #213
BerdirRe-roll.
Comment #214
swentel CreditAttribution: swentel commentedAgree with Andy re #209, we've got #2116363: Unified repository of field definitions (cache + API) and #2148723: Clean up field_info_cache_clear() and had a meeting today with berdir, fago and yched to go forward on the first one, so I would untangle that in those issues.
Comment #217
BerdirRe-roll.
Comment #219
BerdirAw, createInstance() stuff...
Comment #220
BerdirRe-roll after the storage rename.
Comment #222
BerdirMissed two new getStorageController() calls.
Comment #223
fagoThis is going to instantiate field objects, so why bother with reading $data out of $this->values ?
$data should get started with $this->toArray() and only improve values for everything that implements prepareCache(). This also means that stuff sitting in $item that has no metadata won't be kept, but I think that's the behaviour we want. There is no point in caching craft modules throw on the objects.
IDs
from the storage
IDs
This needs a rename. Can be a separate issue.
Other type-hint to ContentEntityInterface... That said, shouldn't the caching logic be generally usable and go to the general base storage class - while e.g. being not implemented for config entities for now? It seems weird to have a different loading flow for both. Not sure it's easy to move things around, else it might fit for a follow-up.
hm, the name confused me. I'd expected uncached to operated only on uncached entities, i.e. only on entities loaded from storage.
what about
hook_entity_load() (runs always) +
hook_entity_storage_load() ?
(runs on storage load only)
I could see us adding more storage hooks to achieve #1729812: Separate storage operations from reactions also.
Comment #224
BerdirThanks for the review.
1. As discussed, tried an unset() instead of the array trick and commented it a bit better.
2. - 4. Fixed
5. Yeah, discussed name with @xjm and decided to do it here, it's only used in a very few places. It's now isPersistentlyCacheable() which is crazy but consistent with others and persistent_cache. Could also be storage_cache and a corresponding method but the static_cache is also storage related...
6. Not visible in the context which method that is, but I added it to one. Yeah, not sure about the whole Entity/ContentEntity thing. @tim is working on moving more stuff to the base storage controller in #2225955: Improve the DX of writing entity storage classes, we agreed that we will this first. Maybe that can move some things back to Entity and check for e.g. PrepareCacheInterface for the save to persistent cache method and so on? But for now I think ContentEntityInterface is OK.
7. Yeah.... As a start, I'm converting comment to use the normal load hook and replaced the ->private hacks with an actual field, which removes the need for that. That just leaves us with book and it shouldn't be doing what it is anyway.
Comment #226
BerdirFixing test fails. MapItem => sad face. I think we killed the ability to dynamically define properties based on the current values, so toArray() doesn't work for it. propertyDefinitions() is wrong for it IMHO, as that's the storage, not the properties.
Comment #228
BerdirRe-roll.
Comment #230
BerdirOpened a follow-up to fix MapItem properly: https://drupal.org/node/2229181
Comment #231
BerdirToo much sprinting isn't good....
Comment #232
fagoThanks, this looks pretty good already.
hm, that's still here and confusing :-/ Maybe could we just go with
hook_entity_load() (runs always) +
hook_entity_storage_load()
instead?
One new line too much.
No table any more as this is a configurable field now.
Comment #233
Wim LeersLink to an issue?
80 col rule.
Comment says "configurable fields", method name says "fields", which implies both base + configurable fields.
Same elsewhere.
Properties are always camelCased? i.e. s/persistent_cache/persistentCache/.
Shouldn't
$entities
be passed by reference?Comment #234
Wim LeersAlso: now that #2217749: Entity base class should provide standardized cache tags and built-in invalidation has landed: is there a specific reason why this patch does not use cache tags? (As long as there is a good reason, that's fine — I just want that to be a conscious choice.)
Comment #235
BerdirWell, cache tags for what exactly? We're storing separate entities here and we don't care about referenced entities as we don't cache them, the only exception are special cases like text where we want to clear the cache when a format changes. We have no API to get those tags, with or without your issue.. And I'm already using cache tags to clear by entity_type and I'm rather unhappy about that, I'd rather have bins per entity_type, at least optionally, but that's not something that the current API easily allows to do. with hardcoded cache bin services (we can request any type of bin but then cache bin removal doesn't work).
But we said that's something we can look at in a follow-up..
And it doesn't apply to all other cases where we call that cache clear function which are like, we changed *something* (module installed/removed, and so on), so lets make sure and drop the cache in case it might be affecting something. Some of those calls might not be necessary but they exist in HEAD (we're just making them slower because cache tags and many calls) and it's nothing that cache tags could help with AFAIK.
Thanks for the reviews, I'll try to re-roll and update the patch asap.
In regards to hook_entity_storage_load(), I'm unsure about a few things: a) It introduces another difference between config and content entities, and while we've discussed having separate hooks for them before, I'm not sure this is a good way to start that. and b) entity_storage_load() is the one you should be using, not entity_load() and I'm worried that people will not bother when porting code because it will work. So after-cache load should be the one with the non-obvious name IMHO. c) Looking at core, book.module is now the *only* place where we need to call the after-cache one, so I would have to rename all the other ones instead of just that.
Comment #236
BerdirPostponing on #2225955: Improve the DX of writing entity storage classes, moved a lot of changes that are in here over there and it will conflict a lot, so let's wait until that is done (hopefully soon), then this issue can focus on introducing the persistent cache and doesn't need to refactor as much stuff as it does now.
Comment #237
BerdirMinimal re-roll after that issue went in. Note that this currently doesn't call the uncached hook anymore but uses the normal one as the uncached, as that's what the base storage class now does. the storage hook would be easier to implement now but my arguments from above remain. also the entity constructor change is ugly, again, minimal change to see what happens.
Comment #239
BerdirFixing tests.
Comment #245
BerdirWow, file_get_file_references() is such a broken and stupid function :( It overwrites $field_type within the loop, so it depended on the order of fields if it was still the correct one and ended up filtering out the field because the new integer field used by the test module was last in the loop....
Comment #246
BerdirAdding hook_entity_storage_load().
Currently doing some profiling on this, seeing a ~15% performance improvement with the previous patch on a view with 100 nodes with 14 fields. This patch should do even better. The biggest difference is that we no longer have to initialize the field objects just to assign the loaded values, with this, we just have to pass it to the constructor, and fields are then only loaded on demand, when needed.
With the current field loading implementation, I consider this to be critical for performance.
Comment #248
Berdir#2167167: Remove field_info_*() will hopefully resolve the cache clear questions.
And #2217877: Text filters should be able to add #attached, #post_render_cache, and cache tags is on a good way to kill the need for either at least the core implementations of the PrepareCacheInterface or even the whole API. Because the render cache allows us to not having to calculate the processed text on cache hits. That would remove quite a bit of complexity.
Comment #249
BerdirStupid typo is stupid.
Comment #251
fagoYes, the current logic is totally not ideal, so it's great to see the clean-up happen just as well as having entity caching!
Not sure whether we discussed that already in Szeged, but do we need the is_array() check here?
This is hook_entity_storage_load() now. yay for the new hook name!
Other than that, this looks RTBC to me!
Comment #252
BerdirRe-roll and updated for the field_info and related caching stuff removal.
Related: #2274517: Remove \Drupal\Core\Field\PrepareCacheInterface
#251:
1. Yeah, I don't like that either, but not sure if there's a better way.
2. Yep, still need to update the documentation, and there's also still #235 to consider for the hook.
Comment #253
BerdirNow with patches.
Comment #255
BerdirFixing the cache tag structure...
Comment #256
BerdirPrepareCacheInterface is gone, re-rolling after that...
Note that I left getCachedData() on the entity, because we still need to be able to get a) the values of all languages and b) values of undefined fields from the entity. But I guess we can do this now with an entity specific method (will need to add this to the interface), or we could even go back to make this the __serialize() implementation, as serialize($entity) should now basically work as well?
Also, getting close to 300 comments... :(
Comment #257
BerdirWow.
Comment #258
Wim LeersIt'd be great if these were all consistent.
typehint mismatch.
s/entity cache/persistent cache/
s/from/in/
Shouldn't this go away? :)
$persistentCache
?Why not
Cache::deleteTags($comment->getCommentedEntity()->getCacheTag())
andCache::deleteTags($account->getCacheTag())
?If we're consciously choosing not to use cache tags, then we should document why.
Comment #260
Berdir1. Better now?
2. "It's complicated" ;) Can be array or null. should be correctly documented now.
3. Changed.
4. Changed.
5. Yes.
6. This is an annotation key, we currently don't use camel case for those. Would be inconsistent with everything else.
7. I disagree :)
a) Cache tags deletion is a low-level API that should not be used directly if there's an API. This is the entity storage, we have an explicit API to clear the cache, calling the cache tag deletion manually is like doing a db_query() against an entity table ;)
b) We currently don't do by-entity cache tags, because we don't need it and if we would, it would be the job of the resetCache() implementation to do that because some entities might not use a persistent cache
c) cache tags can't clear static caches, which is a very important part of resetCache().
So IMHO, it's the other way round, every manual cache tag deletion should explain why there's no better way :) Cache tags are very powerful, but they're also as spaghetti as it gets, as they by design breach every layer that your application has ;)
Also in this interdiff:
- Added some unit tests for cache hit/miss/disabled, hoped to replace the tests in FieldAttachOtherTest but save()/delete() would be very very complicated, so kept that.
- Fixed the test fails. The missing language is because we're now also caching the currently available languages and that path language test apparently manages to add a new language and then change the default language of that entity to that language, because for non-default languages, we already had that check. Will open an issue to add EntityInterface::langcode(), because then we can replace all those language()->id with langcode(), which will avoid the need for the language objects in most cases. Then we can hopefully remove that optimization.
- Updated hook documentation in entity.api.php, still not 100% convinced that this is the correct way round, especially because now we call postLoad() on the entity class as well for entities from the cache. There are no implementations of that for content entities right now but I know some contrib implementations that this will mess up (but they're workarounds for core bugs, like broken unserialize handling for the link field).
- Patch size increased due to unit tests.
Either way, I really hope this is getting close, updated the issue summary again.
Comment #261
BerdirComment #262
Wim Leers1. Yes :)
2. I knew that that was the other possibility — thanks for confirming/clarifying in the docs :)
5. It's still there…
6. Aha — ok!
7. Well, if you're modifying an entity, cache tags are invalidated automatically. These are things that are not part of the entity (i.e. extrinsic versus entity fields which are intrinsic), hence there's no way that the entity system can know that it should invalidate the cache tags! So I think that it makes sense that if this should use cache tags for invalidation, that we'd need to call it manually.
You're right that a manual cache tag deletion would alsomerit extra explanation.
You're also right that they "breach" (invalidate) every cache layer. But IMO the fact in this case you don't want that to happen, is the exceptional thing. And hence merits an extra explanation.
So: I still think that an extra comment here is warranted, though it can be argued that the lack of cache tag invalidation (of the affected entity) is a pre-existing problem.
It's a detail, but it confused me. Your call :)
Comment #263
BerdirRe-roll, messy merge conflict in content entity database storage unit tests.
7. I think we discussed this through in IRC. the storage cache does not use cache tags because we only cache single entities without ID's, using cache tags would be overhead we don't want. And the first example already clears the cache tag somewhere (this only happens after comments are saved) or we'd know and the second is not relevant because this is not information that's displayed as part of a cached entity view.
Comment #266
BerdirReroll.
Comment #269
XanoThis should fix the minor test failure. I know how to fix the other ones, but I have to do some other work first. I should have a patch ready by tomorrow.
Comment #270
XanoDid I mention the last patch was delayed?
Comment #272
XanoGrrreeeeeeen!
Comment #274
XanoThis is simpler.
Comment #276
XanoRe-roll.
Comment #278
BerdirThat test fail had me worried for a minute, but it was just a wrong merge.
Comment #279
Berdir278: drupal_597236_278.patch queued for re-testing.
Comment #281
BerdirRe-roll.
#2144263: Decouple entity field storage from configurable fields is more important, but I would love love love to get this in after that...
Comment #282
Wim LeersWe briefly discussed this during the D8 performance call. We thought it would be good to have profiling data to see what we actually gain by doing this. I vaguely recall that profiling has happened, but I can't find it. Perhaps it happened in another issue?
Comment #283
BerdirI did not post detailed profiling data, see #246 for one data point. That was a single with view 100 nodes.
Note that profiling this heavily depends on a) the amount of entities you load on a page, the more the more difference this will make, c) Depends on the amount of fields, because every field right now adds overhead when loading and c) it also depends on the amount of entities you have in your database, as the executed queries will get slower while it will not affect a cache get that much, especially on a non-database cache backend.
I just repeated this on a site I'm building, again seeing a ~15% increase of the total time, the time spent in ContentEntityDatabaseStorage::doLoadMultiple() goes away almost completely. Note that I did those tests against a cache backend mix of redis (entities) and APC (field definitions, as they're in the discovery cache bin). That doesn't make much of a difference as long as mysql isn't under load and the query cache is warm, though.
Total difference:
Difference in doLoadMultiple():
Those are the entities that have been loaded, grouped by call:
- 1 user
- 2 nodes
- 5 nodes
- 5 comments
- 5 nodes
- 2 shortcuts
- 1 term
- 1 term
- 1 term
=> 23 content entities grouped in 9 calls.
Note that part of doLoadMultiple() moves to other places, because so far, this also triggers the loading of the field definitions for the relevant entity types from cache, which moved to a different place as that happens on the first time we access a field, which does not happen while loading an entity with the latest patch. So that part still has to be done, just a bit later. Was 7ms (2.6%) in my case.
That said, I'm working on getting rid of most of those calls, possibly even avoiding them completely in some cases.
Comment #286
BerdirRe-rolled now that #2144263: Decouple entity field storage from configurable fields is in, only a single conflict, yay.
Comment #287
plachWhat are we missing for this to be ready? Reviews? A final sign-off?
Comment #288
Fabianx CreditAttribution: Fabianx commentedNice work! :)
Comment #289
corvus_ch CreditAttribution: corvus_ch commentedReroled against latest HEAD.
Comment #291
BerdirNew try at re-rolling this, the previous patch had some wrong merge conflicts on hook documentations and a .htaccess change, was 20kb bigger.
Comment #292
Fabianx CreditAttribution: Fabianx commentedIsn't the 2nd isset() check enough?
Where would it get the language from directly?
Just asking, but the order is undefined here in which entities are returned?
Even though a loadAll will be rare and expensive anyway. Can't we get the IDs to load in that case?
This could run a query and populate $ids.
Micro-optimization, but still:
This saves the 2nd lookup to buildCacheId, where if cacheID was more complicated would save real time ...
It would be clearer to define cache tags before the loop as the arguments in the set() are static:
$cache_tags = array(
$entity_type . '_values' => TRUE,
);
etc.
Also do we have a place where one can lookup all defined cache tags besides the source?
Nit, just from a code perspective, I would decouple this into:
resetCache and resetPersistentCache to make the code easier to read.
Currently this inherits docs, so I don't know if it says that this also deletes the persistent cache.
Also I don't assume there is a base class, which this caching class is inherited from?
---
Seems not to be the case. Is this function defined in the interface? Why was it not needed before? Or is this missing doxygen?
Don't get this, but that means if any entity is another revision it loads for all entities from revision storage?
How can you _set_ this property?
Overall looks absolutely great to me. Tests pass and my comments are mostly nits. This is a weak
RTBC
from me.
Comment #293
Wim LeersI think poor Berdir has had to reroll this one for far too long already. So I tried to add more to Fabianx's review in #292. And I already answered/fixed the things I could, so that Berdir can focus on the most important questions.
$ids === NULL
, i.e. ifempty($ids) === TRUE
, then we don't load anything from the persistent cache. I think the answer has multiple parts: 1) we don't have aCacheBackendInterface::getAll()
, 2) even if we had that, this would be a subset of "all" entities, and hence we'd have to still load the remainder from entity storage, but we wouldn't know what the remainder is. The only way to implement this is to 1) have a very cheap way to get the IDs of *all* entities, 2) have a way for entity storage to "load all entities except for X, Y …", which is definitely an edge case. So I think all that has to happen here is for a comment to be added, explaining why the "load all" behavior is not supported by the persistent cache.EntityInterface::getCacheTag()
:)@EntityType
annotation'spersistent_cache
property.This one should be removed :) Did that in this reroll.
(Already mentioned in #258.5.)
Is this @todo still relevant?
AFAICT the only change here is to remove the helper variable
$field_type
in favor of calling the method on the object between 2 and 4 times. Reverted this hunk.@Berdir: that leaves everything from #292 except #6 and #9, and point 2 in my code review.
Comment #294
Fabianx CreditAttribution: Fabianx commentedRegarding #292.4:
Can't we just do:
The double query should be:
a) cached by MySQL in that short time frame anyway
b) faster as most of the entities should be retrieved from cache - when it is warm
Comment #295
BerdirThanks for the review, some additional answers.
1. I'm unhappy about the code there, the constructor has to care about too many variations of values. I think we can move forward with what we have, maybe #1867228: Make EntityTypeManager provide an entity factory would help, but that has been fighting with the very same problem for quite some time now.
2. The language manager, instead of caching the available languages within the entity object, we should request them from the language manager so they would always be up to date. This is a performance optimization that we basically have because there is no langcode() method, only language(), so we always have to return a language object while only needing an ID in 90%+ of the cases. Core is full of language()->getId() code.
3. Being able to load *all* entities is a feature that only makes sense for config entities IMHO, it shouldn't even be supported for content entities (if you really want that, load the ID's yourself and load it explicitly). So not really interested in supporting that. Possibly that should be separated into a loadAll() method.
7. The public API makes no difference between static and persistent cache. resetCache() does exist on the interface and it is implemented in the base class but only for the static cache, because that's all the base class knows about. I'm not sure if we should really expose this in the API. It will never make sense to have a persistent API for config entities for example, because they're already persistently cached as configuration.
8. This is not visible from the API, but there is no loadMultiple() for revisions nor a way to mix loading by revision and not. So if goes into that condition, it simply means that there is only one entity. And even if we would add a way to load multiple revisions, I don't think we ever want to support loading a mix.
The reason I did this change is that this overwrites the $field_type argument passed in to this function, so reverting this should fail again.
Search for this function in this issue, there's a lengthy explanation what's going on and why it's only failing here and not on HEAD.
Comment #296
Wim Leers#295.2: I wanted to add docs for this/open an issue for this, but I don't fully grok what you're getting at, so you'll have to do that.
#295.3 actually answers #295.4, not #295.3. Berdir, shall we remove the ability to load *all* content entities? Then this point of confusion disappears. I definitely agree with you that it doesn't make sense that we have this capability at all.
#295.8: plus, this is a pre-existing code smell, it's just been moved around.
RE:
$field_type
: wow. Sorry :( And +1 to It'd be funny if it weren't so painful. Hunk restored.With #295, we now only have to address #292.3, #292.4, #292.5 and #293.2, then this is RTBC.
Comment #297
Berdir2. #2303877: Remove cached languages from ContentEntityBase
3. The order of returned entities is ensured at the end of loadMultiple() in the base class. No need to do it multiple times.
4. (sorry for messing up the order) Not in this issue. It's not related to this, except it's a use case that is not supported for the persistent cache. There might even be an issue for loadAll() already.
5. Sure, can do that, doubt it changes much ;)
#293.2: Still true, but it's not really an actionable @todo, just that some of it are cases that we currently can't test with unit tests. And as @catch so nicely put, @todo's without an issue reference are just excuses. So, rewritten to just describe what it is ;)
I think that addresses everything.
Comment #299
BerdirUps... That's what happens if you copy code without thinking about it ;)
Not sure why nothing else failed, probably because there are no tests that verify that we load from cache, only that we write into cache other than the unit tests.
Comment #300
Fabianx CreditAttribution: Fabianx commentedRTBC - if tests pass as all points have been addressed by patches.
Comment #301
catchiirc this was added explicitly for taxonomy vocabularies in Drupal 7, which are now config entities, so opened #2304935: Deprecate ability to load all content entities at once from Entity::loadMultiple() to kill that off.
Also opened #2304939: Stop loading comment statistics into entity object, not really sure why we do that still.
Patch looks good to me, leaving RTBC a bit longer so I can give it another once over, but won't complain if someone beats me to committing. Nice to see this RTBC after nearly 5 (!) years.
Comment #302
Wim LeersIf #2304939: Stop loading comment statistics into entity object lands, then we'll only have one use case left for
hook_entity_storage_load()
in core: translation metadata. Perhaps we can find a way to make that work differently in a follow-up, so we can remove this hook? :)Comment #303
catchAlso correction, it's seven years if you count from #111127: Cache node_load.
Comment #304
Berdir#1916790: Convert translation metadata into regular entity fields will help with that. Not sure, there might still be a need for it in contrib, we'll see.
Comment #305
plachThere's already an issue for that, which I am planning to work on as soon as I am done with the entity storage one :)
#1916790: Convert translation metadata into regular entity fields
Comment #307
catchI was wrong about 7 years, it's actually 10 years!
#111127: Cache node_load - that issue originally wanted to add persistent caching too, eventually added the static cache only.
Committed/pushed to 8.x, thanks!
Comment #309
yched CreditAttribution: yched commentedTake a bow, @Berdir :-)
Comment #310
Fabianx CreditAttribution: Fabianx commentedYes! This is great! Congrats, @Berdir, et. al!
Comment #311
moshe weitzman CreditAttribution: moshe weitzman commented10 years! Thanks so much, Berdir.
Comment #312
Wim LeersThank you, Berdir!
Comment #313
cosmicdreams CreditAttribution: cosmicdreams commentedHey gang, writing a blog post about this. Is there somewhere that talks about the performance gains of this patch? #283 ( https://www.drupal.org/node/597236#comment-8914931 ) seems out of date give the modifications that happened after it, and I don't see another review of the performance wins after it.
Comment #314
Wim Leers#283 is still accurate; only minor changes happened after it.
Comment #316
chx CreditAttribution: chx commentedAs far as I can see UserData is no longer injected into UserStorage at all. I haven't read all ~300 comments (sorry) but a few searches haven't shown anything. The issue summary says
But in fact the patch committed does not contain any mapFromStorageRecords overrides. Could you please clarify this? We are considering putting user data in the user destination in #2268897: Write per user contact settings D6=>D8 migration but this whole issue gives me pause and I would like to understand better the UserData situation before going ahead.
Comment #317
jenlamptonWas there ever a change record made for entity cache being added to core? The only Change Record I found referenced here was https://www.drupal.org/added-d8-modules, but that doesn't contain any information about entity cache. If anyone knows of one, can we add a link?