Updated: Comment #260

Problem/Motivation

Loading entity from the database is slow. Configurable fields have their own cache handing, but entities can now have up to 4 base tables, from which multiple translations might be loaded.

Content entities are optimized for the case when they are initialized with all the values, base fields and configurable fields, which are then stored in the internal ->values array. The necessary field classes are only created when the fields are accessed through the API. Even though configurable fields are cached, they need to be set on already created entity objects, which has to initialize all those field objects.

Proposed resolution

The basic idea is to generalize the concept of the configurable field cache and apply it to the entity itself and all values within.

For now, the entity is built just like before, we query the different base tables, create the entity objects, then call the methods to load the field items/values. #2137801: Refactor entity storage to load field values before instantiating entity objects will eventually try to call the field methods before constructing the entity objects.

The entity itself now implements the PrepareCacheInterface, so that we can ask it to return all cacheable data. This also allows to return values added to entities that are not defined fields, so we can cache them too. Those are typically values added in hook_entity_load() implementations. Internally, it has more or less the same code as we currently have for configurable fields, just applied to all fields. We then save those values in the persistent cache. (Explicitly kept to highlight the difference to the old implementation)

Now that PrepareCacheInterface has been removed, there is no special processing necessary, instead, the $entity object can be stored in the cache directly, which will then serialize it, which will call __serialize() on it, which already ensures that the entity is optimized and no unnecessary definitions/objects are serialized. The only exception are language objects, which are currently added as an optimization. @todo: Open issue to add langcode() that should make that unnecessary.

When loading the entity from the persistent entity, we can just unserialize. At this point, only the field item objects that are explicitly requested are created. With the entity key cache (#2182239: Improve ContentEntityBase::id() for better DX) and the render cache, this means almost none, and some of the remaining ones will hopefully go away soon. Only loading the values on the first access would only be useful for cache misses with this system, in which case it is very likely that it will be accessed as depending caches will likely also have cache misses.

Because some hook_entity_load() implementations are fairly dynamic (comment statistics, translation metadata), the patch introduces a hook_entity_storage_load() which is cached, so implementations can chose which one they want.

The flow of loadMultiple now looks like this:

- Get already loaded entities from the static cache.
-- For remaining entities, check the persistent cache
--- Create entity objects for entities loaded from the persistent cache as explained above.
-- Load entities that were not found in the persistent cache from the database.
--- Create entities with the base field values
--- Load configurable field values and add them to entity objects
--- Call storage load hook.
--- Put entities loaded from storage into the persistent cache.
-- Call hook_entity_load() on entities from database/persistent cache. This allows modules that want to attach non-cacheable (dynamic/personalized?) values to an entity. Also calls ::postLoad().
-- Put entities from persistent cache and database into static cache
- Combine all requested entities together, ensure order, return

Additional changes in this patch:
- Unify the code that ensures that entities are returned in the requested order into a single helper method
- Remove cache handling from all somethingFieldItems() methods, they no longer need to worry about this.
- Drop the doSomethingFieldItems() pattern and rename them to somethingFieldItems(), as the wrapper methods only cared about caching, which they no longer have to.
- Comment and user storage controllers have special data manipulation that needs to run before entity objects are created. They now override mapFromStorageRecords() instead of postLoad(), as that now receives entity objects.

Remaining tasks

User interface changes

-

API changes

- Refactoring of loadMultiple(), that should however not affect anyone
- Introduction of hook_entity_load_uncached(), modules that attach non-cacheable values in hook_entity_load() need to switch to that.
- Removal of the doSomethingFieldItems(), would only affect subclasses of FieldableStorageControllerBase, which I do not think exist, non-database storage controllers will use completely different patterns anyway, those methods always were database specific.

Original report by @catch

This is a long delayed followup to http://drupal.org/node/111127#comment-1494790

Reasons I'm posting this issue:

1. Despite converting lots of things to fields, caching at the node level rather than the field level gives a 10% performance improvement on a single node view with only the default profile and database caching. So the field cache does not 'do the same job' as an entity level cache as was suggested by Dries and bjaspan in that issue - it can't remove the cost of node_load() itself, the query building, the field_attach_load() calls etc.

2. I posted a proof-of-concept module at http://drupal.org/project/entitycache which implements node, taxonomy term and comment caching from contrib. This works, when hacked into the default profile, all but a couple of tests pass, and the entire .module is 14k - some of which is dead code for handling users which should probably be abandoned, some of which could probably be tidied up (pretty much all the NodeController / TaxonomyTermController overrides don't need to be there if we can call back to those classes directly), so potentially < 10k.

So, I'm proposing that we add this module to core, enabled in the default install profile. By having it as a separate module, we're able to maintain all the cache clearing code in one central place, so it doesn't have the messiness of putting node cache clears in taxonomy module, or taxonomy cache clears in image module or whatever. However we'll get real performance benefits in core rather than having to schlep off to contrib to get them.

We have real performance issues in Drupal 7, and this is one possible way to ameliorate those without changing any APIs (it adds a total of one hook). I'm not posting a patch here, since the code is in CVS, and this is more of an architecture/policy decision at this stage. Abusing status though to get it in the real issue queue.

CommentFileSizeAuthor
#299 drupal_597236_299-interdiff.txt701 bytesBerdir
#299 drupal_597236_299.patch61.74 KBBerdir
#297 drupal_597236_297-interdiff.txt2.28 KBBerdir
#297 drupal_597236_297.patch61.74 KBBerdir
#296 interdiff.txt1.52 KBWim Leers
#296 drupal_597236_296.patch61.63 KBWim Leers
#293 interdiff.txt3.99 KBWim Leers
#293 drupal_597236_293.patch60.2 KBWim Leers
#291 drupal_597236_291.patch62.04 KBBerdir
#289 drupal_597236_289.patch86.32 KBcorvus_ch
#286 drupal_597236_286.patch61.83 KBBerdir
#283 entity_cache_total_difference.png46.67 KBBerdir
#283 entity_cache_load_difference.png101.98 KBBerdir
#281 drupal_597236_281.patch61.69 KBBerdir
#278 drupal_597236_278-interdiff.txt728 bytesBerdir
#278 drupal_597236_278.patch61.65 KBBerdir
#276 drupal_597236_276.patch61.75 KBXano
#274 drupal_597236_274.patch61.75 KBXano
#274 interdiff.txt2.38 KBXano
#272 drupal_597236_272.patch62.19 KBXano
#272 interdiff.txt2.89 KBXano
#270 drupal_597236_269.patch61.09 KBXano
#270 interdiff.txt740 bytesXano
#266 entitycache-core-597236-265.patch61.08 KBBerdir
#263 entitycache-core-597236-263.patch61.26 KBBerdir
#260 entitycache-core-597236-260-interdiff.txt21.8 KBBerdir
#260 entitycache-core-597236-260.patch63.31 KBBerdir
#257 entitycache-core-597236-257-interdiff.txt3.38 KBBerdir
#257 entitycache-core-597236-257.patch50.42 KBBerdir
#256 entitycache-core-597236-256-interdiff.txt1.74 KBBerdir
#256 entitycache-core-597236-256.patch52.2 KBBerdir
#255 entitycache-core-597236-255-interdiff.txt890 bytesBerdir
#255 entitycache-core-597236-255.patch56.33 KBBerdir
#253 entitycache-core-597236-251-interdiff.txt861 bytesBerdir
#253 entitycache-core-597236-251.patch56.33 KBBerdir
#249 entitycache-core-597236-249-interdiff.txt834 bytesBerdir
#249 entitycache-core-597236-249.patch57.99 KBBerdir
#246 entitycache-core-597236-246-interdiff.txt3.76 KBBerdir
#246 entitycache-core-597236-246.patch57.99 KBBerdir
#245 entitycache-core-597236-245-interdiff.txt1.43 KBBerdir
#245 entitycache-core-597236-245.patch54.77 KBBerdir
#239 entitycache-core-597236-239-interdiff.txt1.45 KBBerdir
#239 entitycache-core-597236-239.patch53.25 KBBerdir
#237 entitycache-core-597236-237.patch52.97 KBBerdir
#231 entitycache-core-597236-231-interdiff.txt619 bytesBerdir
#231 entitycache-core-597236-231.patch65.03 KBBerdir
#228 entitycache-core-597236-228.patch65.04 KBBerdir
#226 entitycache-core-597236-226-interdiff.txt2.52 KBBerdir
#226 entitycache-core-597236-226.patch65.06 KBBerdir
#224 entitycache-core-597236-224-interdiff.txt10.07 KBBerdir
#224 entitycache-core-597236-224.patch63.97 KBBerdir
#222 entitycache-core-597236-222-interdiff.txt1.32 KBBerdir
#222 entitycache-core-597236-222.patch50.6 KBBerdir
#220 entitycache-core-597236-220.patch50.62 KBBerdir
#219 entitycache-core-597236-219-interdiff.txt1.79 KBBerdir
#219 entitycache-core-597236-219.patch50.86 KBBerdir
#217 entitycache-core-597236-217.patch49.39 KBBerdir
#213 entitycache-core-597236-213.patch50.98 KBBerdir
#211 entitycache-core-597236-211.patch50.96 KBBerdir
#210 entitycache-core-597236-209-interdiff.txt2.09 KBBerdir
#210 entitycache-core-597236-209.patch51.16 KBBerdir
#206 entitycache-core-597236-206.patch51.69 KBBerdir
#204 interdiff.txt3.19 KBmsonnabaum
#204 entitycache-core-597236-204.patch51.55 KBmsonnabaum
#203 entitycache-core-597236-203.patch51.75 KBswentel
#201 entitycache-core-597236-201.patch49.88 KBBerdir
#196 entitycache-core-597236-196.patch50.53 KBBerdir
#184 entitycache-core-597236-184-interdiff.txt16.49 KBBerdir
#184 entitycache-core-597236-184.patch43.61 KBBerdir
#179 entitycache-core-597236-179-interdiff.txt662 bytesBerdir
#179 entitycache-core-597236-179.patch40.42 KBBerdir
#175 entitycache-core-597236-175-interdiff.txt735 bytesBerdir
#175 entitycache-core-597236-175.patch40.42 KBBerdir
#174 interdiff-66-69.txt513 bytesmsonnabaum
#169 entitycache-core-597236-169.patch40.38 KBmsonnabaum
#166 entitycache-core-597236-166-interdiff.txt8.51 KBmsonnabaum
#166 entitycache-core-597236-166.patch40.5 KBmsonnabaum
#165 entitycache-core-597236-165-interdiff.txt1.45 KBBerdir
#165 entitycache-core-597236-165.patch37.86 KBBerdir
#163 entitycache-core-597236-163-interdiff.txt21.77 KBBerdir
#163 entitycache-core-597236-163.patch37.77 KBBerdir
#162 entitycache-core-597236-162.patch38.99 KBBerdir
#157 entitycache-core-597236-157.patch39.59 KBBerdir
#153 entitycache-core-597236-153.patch39.61 KBjhedstrom
#153 interdiff.txt792 bytesjhedstrom
#150 entity-field-cache-597236-150.patch39.15 KBmsonnabaum
#144 interdiff.txt3.46 KBamateescu
#144 entity-field-cache-597236-144.patch39.05 KBamateescu
#142 entity-field-cache-597236-142-interdiff.txt10.08 KBBerdir
#142 entity-field-cache-597236-142.patch38.28 KBBerdir
#140 entity-field-cache-597236-140-interdiff.txt674 bytesBerdir
#140 entity-field-cache-597236-140.patch33.76 KBBerdir
#128 entity-field-cache-597236-128-interdiff.txt1.54 KBBerdir
#128 entity-field-cache-597236-128.patch33.76 KBBerdir
#119 entity-field-cache-597236-119-interdiff.txt1.98 KBBerdir
#119 entity-field-cache-597236-119.patch32.54 KBBerdir
#118 entity-field-cache-597236-118-interdiff.txt3.85 KBBerdir
#118 entity-field-cache-597236-118.patch34.41 KBBerdir
#116 entity-field-cache-597236-116-interdiff.txt7.44 KBBerdir
#116 entity-field-cache-597236-116.patch31.68 KBBerdir
#113 entity-field-cache-597236-113-interdiff.txt1.32 KBBerdir
#113 entity-field-cache-597236-113.patch24.95 KBBerdir
#112 entity-field-cache-597236-112-interdiff.txt7.71 KBBerdir
#112 entity-field-cache-597236-112.patch18.76 KBBerdir
#110 entity-field-cache-597236-110-interdiff.txt605 bytesBerdir
#110 entity-field-cache-597236-110.patch12.46 KBBerdir
#105 entity-field-cache-597236-105-interdiff.txt3.23 KBBerdir
#105 entity-field-cache-597236-105.patch12.4 KBBerdir
#103 entity-field-cache-597236-103.patch10.15 KBBerdir
#100 entity-field-cache-poc-597236-100.patch3.62 KBBerdir
#79 entity_cache_api_changes_only.patch27.37 KBcatch
#77 entity_cache_api_changes_only.patch26.59 KBcatch
#75 entity_cache_api_changes_only.patch27.52 KBcatch
#73 entity_cache_api_changes_only.patch27.52 KBcatch
#72 entity_cache_api_changes_only.patch25.7 KBcatch
#52 entitycache.patch34.49 KBcatch
#51 entitycache.patch35.2 KBcatch
#45 entitycache.patch27.36 KBcatch
#42 entitycache.patch23.23 KBcatch
#41 entitycache.patch18.7 KBcatch
#22 entitycache.patch10.91 KBcatch
#21 entitycache.patch11.04 KBcatch
#13 entitycache.patch11.08 KBcatch
#11 entitycache.patch11.38 KBcatch
#6 entitycache.patch11.38 KBcatch
#195 entitycache-core-597236-195-interdiff.txt5.95 KBBerdir
#195 entitycache-core-597236-195.patch50.53 KBBerdir
#193 entitycache-core-597236-193-interdiff.txt5.7 KBBerdir
#193 entitycache-core-597236-193.patch45.33 KBBerdir
#191 entitycache-core-597236-191-interdiff.txt860 bytesBerdir
#191 entitycache-core-597236-191.patch43.8 KBBerdir
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

sun’s picture

Issue tags: +API clean-up

You say that this only works at Drupal installation time - why is that?

catch’s picture

To clarify, the only way I know of to enable a contrib module throughout all core tests is to hack it into default.profile - that way all tests get run with the caching enabled and you can see what breaks. entitycache itself doesn't care when it's installed.

David Strauss’s picture

Entity caching belongs in core.

catch’s picture

Benchmarked node/1 with Drupal 6 vs. Drupal 7 vs. Drupal 7 plus entitycache module. Numbers were pretty consistent between multiple runs of ab.

6: 17.48 #/sec
7: 11.49 #/sec
7 with entity caching: 14.18 #/sec

I make that a 20% performance gain - this using database caching and very few modules acting on hook_node_load(). It also clears just under ~40% of the gap between D7 and D6 on single node views.

Drupal 6:

Concurrency Level:      1
Time taken for tests:   57.223 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      7954000 bytes
HTML transferred:       7388000 bytes
Requests per second:    17.48 [#/sec] (mean)
Time per request:       57.223 [ms] (mean)
Time per request:       57.223 [ms] (mean, across all concurrent requests)
Transfer rate:          135.74 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    40   57  12.9     56     246
Waiting:       40   57  12.7     56     230
Total:         40   57  12.9     56     246

Drupal 7:

Server Hostname:        d7.7
Server Port:            80

Document Path:          /node/1
Document Length:        6313 bytes

Concurrency Level:      1
Time taken for tests:   86.995 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6800000 bytes
HTML transferred:       6313000 bytes
Requests per second:    11.49 [#/sec] (mean)
Time per request:       86.995 [ms] (mean)
Time per request:       86.995 [ms] (mean, across all concurrent requests)
Transfer rate:          76.33 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:    75   87  12.6     82     144
Waiting:       74   87  12.5     82     144
Total:         75   87  12.6     82     144

Drupal 7 + entity caching:

Document Path:          /node/1
Document Length:        6313 bytes

Concurrency Level:      1
Time taken for tests:   70.537 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6800000 bytes
HTML transferred:       6313000 bytes
Requests per second:    14.18 [#/sec] (mean)
Time per request:       70.537 [ms] (mean)
Time per request:       70.537 [ms] (mean, across all concurrent requests)
Transfer rate:          94.14 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    60   70   9.9     67     109
Waiting:       60   70   9.9     67     109
Total:         60   70   9.9     67     109
Frando’s picture

Huge +1 to entity caching in core. Let's make use of what we gained through the new entity API. On an actual production site with many contrib modules, gains through entity caching will even be much bigger. No reason at all to not do this in core, IMO. And gets us quite a bit closer to solving our perfomance problems in HEAD, as the benchmarks in #4 show.

catch’s picture

Status: Needs work » Needs review
FileSize
11.38 KB

Alright, let's do a patch then.

This is a straight cp of the contrib module to core directory. What to do about poll/book which do things we don't want to cache inside hook_node_load() needs discussion since the approach taken in the module is a bit of a sledgehammer. There's existing code for that in #111127: Cache node_load though.

The only place where I found core test failures which might require some changes to something in core outside the new module due to this, was in translation module - but that's likely due to a lack of hook invocations in the right places.

Going to let the test bot have a crack at it (by adding to default profile we get all tests run with entity caching for free) - last time I tried to run tests locally, apart from finding unreported fatal errors in HEAD, things went surprisingly well. That was a couple weeks ago though.

Frando’s picture

We just need a second hook here, that's also the result of #111127: Cache node_load. Only the individual modules can decide whether their data is cacheable at the entity level or not. Note that this is not necessarily an API change but merely the addition of a new hook. So I would say we just need to decide on naming.

I'd vote for hook_ENTITY_TYPE_load() for everything that's cacheable and hook_ENTITY_TYPE_load_uncached() or hook_ENTITY_TYPE_postload(), as it was suggested in the original node_load cache issue. Which one I don't care, with proper documentation both are fine.

yched’s picture

Status: Needs work » Needs review

If we do that, let's rename hook_field_info()'s 'cacheable' property into 'field cache'.
It should be done nonetheless as a WTF fix, but this change would make the WTF even worse.

yched’s picture

... or better: keep 'cacheable' (or 'cache'), but turn it into 'TRUE = yes, I have entity caching' instead of currently 'TRUE = yes, use the field cache because I have no entity caching' (meaning field API should take it backwards) ?

catch’s picture

FileSize
11.38 KB

Agreed on the 'cacheable' => 'field cache' change. We also have a 'static cache' property which is already used by the entity API.

Attached patch just removes bogus hook_user() implementations which are causing those exceptions. Should be down to genuine bugs / fails with this one. Help identifying/fixing those welcome of course - as is code review (my OOP is extremely basic, so there may well be better/more concise ways to do what I'm doing in EntityCacheBaseController). Leaving CNR for the bot.

sun’s picture

Hmmm... not sure - somehow I preferred the more fine-grained options at first sight (i.e. entity cache, field cache, static cache). I think that would allow for some very granular performance tuning on sites that need it? Just my $0.02

catch’s picture

FileSize
11.08 KB

Wrong patch sorry.

yched’s picture

re #11-12: 'entity cache, field cache, static cache'.
It's just that entity cache + field cache is useless, the two should be mutually exclusive.

Crell’s picture

Subscribing. I'll try to have a look at the latest patch from an OO perspective soon.

yched’s picture

Cache clearing sounds like fun :-/
Field API will need an API function to clear the cache for a selected set of entities - like nodes 12, 13, 14, users 5, 7, 9...
Use case: clear the cached check_markup() for text fields when a format definition changes.
More fun: for some entity types, that data will be in the field cache, for others it will be in the entity-level cache...

catch’s picture

It looks like this is the text module issue for check_plain() - #556022: When a text format is deleted, some modules (like text.module) don't update their data, even though we say they do. Just posted in there, but I think for that kind of update, you're looking at a general cache clear anyway - it's a rare enough operation, and expensive enough to try to figure it out granularly.

For general reference fields like terms / nodereference / userreference, we already have #493314: Add multiple hook for formatters / #606114: taxonomy_field_formatter_load() should use taxonomy_term_load_multiple() which bypasses the infinite recursion and/or stale data issues we'd get from trying to cache other entities within the field or entity caches. The advantage here, is that taxonomy_field_formatter() load only ends up a cache hit in most cases. If a term has actually been deleted, but the tid is still in the field cache, then we handle missing terms already - since the neither the term field nor the field cache has garbage collection for missing references anyway.

yched’s picture

catch: agreed on those cases.
I'm a little worried that we have no API way to clear the entity cache. Field API has field_cache_clear() or knows how to clear it's own cache for a specific object when it needs. If data can be cached either in the field cache or 'somewhere else', it becomes more problematic.

Er, I might be just rehashing #111127: Cache node_load, which I don't remember that well :-).
in comment #207 over there, I wrote: "Field API needs to know the name of the cache table (or assume 'cache_[obj_type]' ?) and the structure of the cids (or assume [id] ?)." - or get an API function from entity API ?

catch’s picture

We could very easily add a drupal_get_entity_loader('entity')->flushCache() style thing, similar to how the static flushing happens now. That seems like a good idea in general. fwiw all the tables are cache_$obj_type and all the cids are $id - so it can also just guess, and it'd be right ;)

catch’s picture

FileSize
11.04 KB

This should fix most of the fails, and get us down to a couple of exceptions - which are down to the hack in entitycache_entitycache_node_load().

However fixing that hack means deciding on whether to keep this as a distinct module or merge it back into the entity loader itself, and how we deal with those uncachable hook implementations in either case. So I'll stop here for a bit until that discussion's been had.

catch’s picture

Status: Needs work » Needs review
FileSize
10.91 KB

Down to translation locally, CNR for the bot.

yched’s picture

"deciding on whether to keep this as a distinct module or merge it back into the entity loader itself"
Possibly related: isn't it a bit limitative that another module (e.g contrib) willing to extend entityController for whatever reason will basicaly forbid the use of the entity-level cache for lack of multiple inheritance ?

catch’s picture

I think you could do this:

hook_entity_info_alter() same as entitycache does.
Require entitycache module.
Your class extends EntityCacheController instead of the default.

Or just copy the code and then work off that.

But if everyone does that, then there's no point separating them, no.

fago’s picture

>>"deciding on whether to keep this as a distinct module or merge it back into the entity loader itself"
>Possibly related: isn't it a bit limitative that another module (e.g contrib) willing to extend entityController for whatever reason will basicaly forbid the use of the entity-level cache for lack of multiple inheritance ?

I've extended it already, see this. So requiring entitycache module would be the way out for me there. However I'd prefer to see that baken in the DefaultEntityController class. So why do we need a separate module at all?

>By having it as a separate module, we're able to maintain all the cache clearing code in one central place, so it doesn't have the messiness of putting node cache clears in taxonomy module, or taxonomy cache clears in image module or whatever.

Having the comment cache clear in comment_save makes a lot of sense. Node cache clears in taxonomy module won't fly anyway, because entitycache can't support entity X introduced in contrib. But taxonomy module knows it provides a field type needing a cache refresh, so it should look up the created fields and clear the caches as appropriate.

catch’s picture

The reason there's a separate module is because the node_load() caching patch sat at RTBC for six months then was postponed by Dries indefinitely, to be revisited when we'd converted some stuff to fields and compare the field cache. We've done that, I've compared them, this is still a very significant improvement, so this is the same thing revamped. However I have no intention of spending time rolling a proper patch and keeping it up to date until there's some indication that it'll even be considered given what happened last time, new modules don't get CVS conflicts. Once that indication is given either way (or if someone else wants to step in), then we should definitely roll it into DrupalDefaultEntityController.

Dries’s picture

I'll do my best to revisit this patch/approach tomorrow, and to provide some direction.

Anonymous’s picture

Right now I'm playing with an interesting bug in the attached patch. If the poll module is enabled, then all nodes that are loaded from the cache are passed through poll_load. Since most nodes are, in fact, not polls, this seems less than ideal behaviour. In the following code, it $this->entityType = 'node' AND the poll module is enabled...

  protected function attachAfterLoad($new_entities) {
    foreach (module_implements('entitycache_' . $this->entityType . '_load') as $module) {
      $function = $module . '_entitycache_' . $this->entityType . '_load';
      $function($new_entities);
    }
  }

then poll_entitycache_node_load is called, which in turn calls poll_load. The same happens for books as well. I'm trying to find a good way to fix it AND clean up some of the testbot failures.

catch’s picture

Probably the easiest way to fix it would be for poll_entitycache_node_load() to check for node type before executing poll_load() on each node it gets.

However for a proper core patch, I would split the user-specific stuff in poll_load() into poll_node_view() or similar - see the original patch in #111127: Cache node_load (and we'd probably need to do a similar thing for translation.module as was done for node_load() caching too).

@Dries - did you get a chance to review this issue yet?

Anonymous’s picture

I wonder if this might be done more cleanly. Just off the top of my head, what if implementations of hook_load were free to add '#cachable' => FALSE or '#cachable' => TRUE (defaulting to TRUE)? This would allow most everything to decide whether or not to cache, removing the problem that hook_entitycache_node_load appears to be addressing.

If we take this one (logical) step further and add '#cached' => TRUE to all nodes loaded from the cache, then we could use hook_nodeapi easily to update anything that needs to be changed. We could use poll_load as per usual and then only use poll_nodeapi to update the data that should not be cached. For larger nodes where most of the data will remain the same and only a very little will need to be updated, I think that this might be more efficient (since several queries could potentially be saved.)

I believe this might be a little easier / more versatile. (Or I could be completely and totally wrong. I've done that before too.)

catch’s picture

I'm reluctant to do that, because I actually think poll_load() shouldn't be cramming user-specific stuff about which links get displayed etc. into node_load() anyway. Doing it in poll_node_view() (is there even a hook_view()? - could happen there) or wherever solves the problem without adding extra complexity for well-behaved modules - a single foreach() to filter out non-poll node types isn't expensive at all.

The trickier one here is book_node_load() - where it's relatively inexpensive, and we don't want to cache it, but it's much more arguably canonical data about the node than poll permissions is. But even then, either re-using an existing hook that's not cached, or adding new hook which isn't persistently cached as the node_load() caching patch did seems safer to me than trying to conditionally call some hook implementations on some nodes some of the time depending on various flags.

Dries’s picture

I reviewed the patch, and I like the idea of entity caching. I'd certainly want to explore this more as I agree that this is one of the most important performance patches in the queue. Sorry it took me so long to get to this issue.

1. I'm not a big fan of making this a module. Why aren't we integrating this better into the default implementation? Worst case, I'd prefer to make this an include file. A module called entitycaching is too low-level for my liking -- especially for core.

2. Does this allow us to remove some other caching code?

3.

+++ modules/entitycache/entitycache.module	28 Oct 2009 15:48:01 -0000
@@ -0,0 +1,289 @@
+  $entity_info['node']['cacheable'] = FALSE;

Reading the patch top to buttom, it is unclear what this cacheable variable does. Could use a code comment, because I don't understand why it matters. Given that this is a caching module, I'd expect those to be set to TRUE.

4.

+++ modules/entitycache/entitycache.module	28 Oct 2009 15:48:01 -0000
@@ -0,0 +1,289 @@
+    // Load any remaining entities from the database. This is the case if $ids
+    // is set to FALSE (so we load all entities), if there are any ids left to
+    // load, if loading a revision, or if $conditions was passed without $ids.
+    if ($this->ids === FALSE || $this->ids || $this->revisionId || ($this->conditions && !$passed_ids)) {

I don't like how ids can be FALSE. It seems like we're overloading the variable ids to mean a number of different things.

5.

+++ modules/entitycache/entitycache.module	28 Oct 2009 15:48:01 -0000
@@ -0,0 +1,289 @@
+function entitycache_comment_helper($comment) {
+  cache_clear_all($comment->cid, 'cache_entity_comment');
+  cache_clear_all($comment->nid, 'cache_entity_node');
+}

_comment_helper() sounds like a really poor function name for what it does. Also, the function is small enough that I wouldn't bother with the helper function to begin with.

6. I'd be interested to see actual benchmark results. You mention 10%, but is that documented somewhere?

yched’s picture

re Dries

3. $entity_info['node']['cacheable'] = FALSE;. Given that this is a caching module, I'd expect those to be set to TRUE.

That's the variable the controls *field* level caching. Was originally part of hook_fieldable_info(), and thus takes things from the field end. 'cacheable' = FALSE means 'I have an entity-level cache of some kind, don't bother caching at field level.
It should obviously be renamed to 'field_cache' now (API change...). Been mentioned several times in the past, but it we never got actually done :-/.

catch’s picture

Yeah we need cacheable / field_cache, cache / static_cache and add entity_cache to make this work and sensible. That plus one new hook should be the only API change though I hope.

I don't have a bunch of time this week, but I'll try to have something to look at in the next week-ish.

catch’s picture

FileSize
18.7 KB

initial patch to move this to entity.inc and make it more of a proper API.

Done:
Makes all the necessary changes to entity load, hook_entity_info() and adds cache tables for nodes, files, taxonomy and comments. i.e. enough for benchmarking and a look at the API additions / changes - both of those welcome.

Not done (hint, these are in order for anyone who wants to help):
cache clearing in node_save() and similar places.
book_node_load() -> book_node_load_uncached()
half of poll_load() -> poll_node_load_uncached()
fixing tests (there's a method name change for resetCache())

catch’s picture

FileSize
23.23 KB

Slight cleanup in entity.inc, not got to the TODOs yet.

Comparing node/n, article, default profile, anonymous user (no page caching), only user and comment modules implementing hook_node_load().

Short version, HEAD 10.4 #/sec, Patch: 12.75 #/sec -). That's an 18% performance improvement on single node views.

As usual, devel numbers given only for sanity checking and query counts, I tried to get the most representative of a few requests, but they're all over the place for just one request.

HEAD:
devel says: Executed 49 queries in 45.64 milliseconds. Page execution time was 175.59 ms.

Concurrency Level:      1
Time taken for tests:   96.150 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      7986000 bytes
HTML transferred:       7434000 bytes
Requests per second:    10.40 [#/sec] (mean)
Time per request:       96.150 [ms] (mean)
Time per request:       96.150 [ms] (mean, across all concurrent requests)
Transfer rate:          81.11 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    82   96  11.3     94     289
Waiting:       82   96  11.2     94     282
Total:         82   96  11.3     94     289

Patch:
Devel says: Executed 45 queries in 33.78 milliseconds. Page execution time was 160.92 ms.

Document Path:          /node/209
Document Length:        7434 bytes

Concurrency Level:      1
Time taken for tests:   78.422 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      7986000 bytes
HTML transferred:       7434000 bytes
Requests per second:    12.75 [#/sec] (mean)
Time per request:       78.422 [ms] (mean)
Time per request:       78.422 [ms] (mean, across all concurrent requests)
Transfer rate:          99.45 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    68   78   8.9     76     137
Waiting:       68   78   8.9     76     136
Total:         68   78   8.9     76     137

I'd expect the results to be even better with a non-database caching backend.

It'd be good to get benchmarks with taxonomy terms and comments attached to nodes - since these would also be cached, to see if we continue to see 15-20% gains or it starts flattening out. Would also be good to see what the change is on a node listing - I'm pretty sure it drops to 5-10% for those since we're already efficient in terms of SQL using multiple load.

carlos8f’s picture

catch: bravo, this is very cool and exciting stuff.

catch’s picture

Side note, I've opened #665618: Add entity caching for users - having tried to do this briefly in entitycache.module, it was tough enough that I don't think it's a good idea to tackle here. However if we can keep this issue focused, it'd be worth trying to get that working in separate patch, then maybe working it in here later if it's not too hellish. Various issues with doing user_load() all over the place at #636992: Entity loading needs protection from infinite recursion, #638070: Router loaders causing a lot of database hits for access checks and #490108: user_load() should add session properties when loading the currently logged-in user.

In answer to Dries' #2. Not really. Both the field cache and entity-level caching should allow us to get rid of the filter cache in Drupal 8, assuming most diy formatted textareas get converted to text fields. If we're able to successfully apply entity caching for users, then that points towards possibly removing the field cache since we'd have no use case in core, and core entities cover a lot of use cases - but again I think that's a D8 thing.

catch’s picture

Status: Needs work » Needs review
FileSize
27.36 KB

OK there's still a bunch of test failures to fix, but all the 'core' tests like comment, taxonomy, node, user, file are passing now - so the rest is cleanup, plus poll/book/translation which need special treatment. However I'm going to mark this as needs review, since all the API changes are now in the patch, along with all the changes to entity.inc, and it needs a wider variety of benchmarks run on it.

Side note, I noticed that we clean the field cache when a taxonomy term is deleted, however our taxonomy field formatter handles missing terms. Also we don't cache any term information, only the reference to the term, in either the field or entity cache. So that's a lot of unnecessary cache clearing. This should be removed in a separate issue probably, but I've rolled it in here for now.

catch’s picture

And a summary of the changes:

This patch now adds a cache_get_multiple() and cache_set() stage to the default entity controller, this is in between fetching from static cache and from the database. Both persistent and static caching remain optional.

The 'cacheable' key in hook_entity_info() has been changed to 'field cache', this remains the default.

Two new keys are added: 'entity cache', and 'entity cache bin'. Both are required if you want to use entity caching, at the moment all core entities create their own cache bins, with taxonomy sharing one bin between vocabularies and terms. It would be feasible to do this with dynamic schema at some point, but the module install system is still pretty fragile at the moment, and there's various opportunities for race conditions (esp user_load() if we're able to do that), enough to make this not worth automating.

Modules are responsible for clearing the entity cache in their update and delete operations - when/if we eventually have an entity API which provides save / update / delete as well as load, that'll be handled centrally.

Field API modules don't need to know anything about the cache. Modules using _load() / _save() / _delete() hooks to interact with entities may need to, but only if they're doing something user / language specific in those hooks, or if they're altering data which goes into _load() outside of the entities own API functions - i.e. Poll and comment.

There are a couple of API changes we could avoid making if we really wanted to, but I've tried to keep variable names etc. as self-explanatory as possible rather than keeping them exactly as is (given changes like that still seem to be acceptable for new APIs).

I'm going to keep working on tests, and hopefully some new benchmarks over the next day or so. But reviews in the meantime are welcome.

Dries’s picture

This looks very promising, but haven't looked at the latest patch yet. Thanks for running with this, catch.

Anonymous’s picture

nitpicks below.

+  /**
+   * Clear the persistent cache for an entity.
+   *
+   * @param $ids
+   *   A single entity ID or array of IDs, if present clear the entity cache for
+   *   these specific entities. If no argument is given, the entire bin
+   *   specified in hook_entity_info() will be flushed.
+   */
+  public function resetCache($id = NULL);

should be $ids no $id.

+    // Call hook_TYPE_load(). The first argument for hook_TYPE_load() are

should be 'is' not 'are'.

-  if (isset($form['identity']['type'])) {
+  if (isset($form['type'])) {

is that hunk from another patch?

possibly bikeshedding, but i think resetCache should be clearCache, as that seems to be more inline with what its does and the terminology we use elsewhere?

scroogie’s picture

Sorry for poluting the issue, but catch, can you point me to your benchmark/profiling doc again? I can't find it anymore :/
Or is it enough to bench with ab?

catch’s picture

Fixed field, poll, translation, book and comment tests, should be close to 100% pass but I'll leave the entire test suite to the bot when it's back up and running.

Remaining TODOs:

1. Benchmark single node with taxonomy terms / comments etc.

2. Benchmark node listing

3. API documentation for hook_ENTITY_load_uncached() - although we have book and poll examples for this already.

catch’s picture

FileSize
35.2 KB

And the patch.

catch’s picture

FileSize
34.49 KB

Crossposted with Justin and scroogie:

Justin: resetCache/clearCache - yeah that seems like a reasonable change. It also means that if anyone is already calling resetCache() with no arguments, they'll get a proper error instead of unknowingly clearing the entire cache bin for that entity. Have made that change and fixed the typos.

@scroogie - ab is fine.

Another thing we need to figure out. When modules are installed or disabled, if they include any hook_ENTITY_load() implementations, those won't get picked up until the cache is flushed.

I think we can do two things here potentially - 1. since hook_entity_info() has the name of the hooks (hook_node_load(), hook_comment_load() etc., we could only do this when modules implement one of those hooks. 2. We can only flush the cache tables for the entity types which are affected. I'm not sure if we'll have all the information at the right time to do this, but seems worth a try. If not we may just have to flush all entity caches everywhere on module disable / enable.

Anonymous’s picture

cut and paste of #drupal discussion about hook_ENTITY_load between catch and i:

(18:25:15) beejeebus: catch: cool. on the hook_ENTITY_load question, creating new entities doesn't clear the cache right?
(18:25:49) beejeebus: catch: so we could get some entities with the new load hook called on them, and some not, until the cache is cleared? that feels messy
(18:26:17) catch: beejeebus: creating new entities?
(18:26:42) beejeebus: catch: if we have a bunch of foo entities cached
(18:27:05) beejeebus: catch: and we enable a module that implements the hook_foo_load
(18:27:28) beejeebus: catch: new foos will pass through that, but existing cached foos wont?
(18:27:36) catch: beejeebus: OK yeah, in the current patch, that is broke.
(18:27:59) catch: beejeebus: but I'm wondering if we can only clear the cache if the module implements a hook_foo_load(), and only for foo. Not just clear everything.
(18:28:10) beejeebus: catch: that seems reasonable
(18:28:45) beejeebus: catch: perhaps that's should be an implementation of hook_modules_enabled ?
(18:29:00) catch: beejeebus: ooh that would be a nice place to put it yeah.
(18:29:08) catch: It'll have to be in system.module but that's OK.

scroogie’s picture

Sorry, I'm new to this. I didn't know what configuration I should test this with, so I just created a node with some comments, tags and terms and got these results.
Configuration is Windows XP, Nginx 0.8.30, PHP-CGI 5.3.1, APC enabled.

node/1 without patch
===============
Finished 300 requests


Server Software:        nginx/0.8.30
Server Hostname:        d7.localhost
Server Port:            80

Document Path:          /node/1
Document Length:        105085 bytes

Concurrency Level:      1
Time taken for tests:   9.391 seconds
Complete requests:      300
Failed requests:        0
Write errors:           0
Total transferred:      31665900 bytes
HTML transferred:       31525500 bytes
Requests per second:    31.95 [#/sec] (mean)
Time per request:       31.302 [ms] (mean)
Time per request:       31.302 [ms] (mean, across all concurrent requests)
Transfer rate:          3293.04 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    16   31   2.4     31      47
Waiting:       16   30   4.4     31      47
Total:         16   31   2.4     31      47

Percentage of the requests served within a certain time (ms)
  50%     31
  66%     31
  75%     31
  80%     31
  90%     31
  95%     31
  98%     31
  99%     47
 100%     47 (longest request)

node/1 with patch:
=============
Finished 300 requests


Server Software:        nginx/0.8.30
Server Hostname:        d7.localhost
Server Port:            80

Document Path:          /node/1
Document Length:        105085 bytes

Concurrency Level:      1
Time taken for tests:   9.281 seconds
Complete requests:      300
Failed requests:        0
Write errors:           0
Total transferred:      31665900 bytes
HTML transferred:       31525500 bytes
Requests per second:    32.32 [#/sec] (mean)
Time per request:       30.938 [ms] (mean)
Time per request:       30.938 [ms] (mean, across all concurrent requests)
Transfer rate:          3331.85 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    16   31   2.2     31      31
Waiting:       16   30   4.8     31      31
Total:         16   31   2.2     31      31

Percentage of the requests served within a certain time (ms)
  50%     31
  66%     31
  75%     31
  80%     31
  90%     31
  95%     31
  98%     31
  99%     31
 100%     31 (longest request)
Jeremy’s picture

The following notices are introduced by this patch (and happen dozens of times with each page load on my test website):

Notice: Undefined index: field cache in field_attach_load() (line 543 of /var/www/html/drupal-7.x-dev-entitycache/modules/field/field.attach.inc).
Notice: Undefined index: field cache in entity_extract_ids() (line 6358 of /var/www/html/drupal-7.x-dev-entitycache/includes/common.inc).
catch’s picture

@Jeremy, you'll need to clear the {cache} table before testing.

@scroogie - thanks, those results are still an improvement, but a much smaller relative improvement than mine (and 30 requests a second seems a lot, maybe nginx really is that fast though). Couple of things - could you check that anonymous users have been given access to view comments, that's off by default. Also please run with -c1 -n1000 to give things time to warm up. And if benchmarking with a lot of comments, it might be worth applying #636992: Entity loading needs protection from infinite recursion to both the before and after version of HEAD, since that's such a massive slowdown at the moment it'll skew results..

Benchmarked a node with a comment and some taxonomy terms this time. This is probably the most favourable example towards caching, since we don't get any advantages of multiple load apart from on the terms. Will try some more unfavourable examples next.

Just a note I got almost exactly the same result the first time I ran the benchmarks, then cleared caches and re-ran the benchmarks both times, then consistently got these, this also happened once before. Not sure if it's just me benchmarking the wrong thing first time or if there's a weird side-effect when applying or un-applying the patch in either direction.

HEAD:
Executed 68 queries in 64.01 milliseconds. Page execution time was 190.02 ms.

Concurrency Level:      1
Time taken for tests:   103.548 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      9455000 bytes
HTML transferred:       8929000 bytes
Requests per second:    9.66 [#/sec] (mean)
Time per request:       103.548 [ms] (mean)
Time per request:       103.548 [ms] (mean, across all concurrent requests)
Transfer rate:          89.17 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    89  103  15.0     98     159
Waiting:       82   96  14.2     90     150
Total:         89  103  15.0     98     159

Patch:
Executed 60 queries in 53.31 milliseconds. Page execution time was 148.35 ms.

Concurrency Level:      1
Time taken for tests:   82.455 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      9455000 bytes
HTML transferred:       8929000 bytes
Requests per second:    12.13 [#/sec] (mean)
Time per request:       82.455 [ms] (mean)
Time per request:       82.455 [ms] (mean, across all concurrent requests)
Transfer rate:          111.98 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    71   82  10.7     78     129
Waiting:       65   75  10.1     71     119
Total:         71   82  10.7     79     129
catch’s picture

One node, 50 comments. Results are not what I expected - ~12% improvement which is too high for one query and a bit of PHP shortcutting.

There seems to be two reasons for this:

1. In devel query log, I'm seeing a 17ms query caused by field_sql_storage_schema() - this should never be called on a regular page load in normal circumstances, needs it's own bug report. Opened #667098: drupal_get_bootstrap_phase() is broken.

2. The regular comment_load_multiple() query takes 25.09ms on my machine, whereas the cache_get_multiple() is > 1ms. This seems to be simply due to the joins it does, there's no filesort or anything. We might be able to optimize this in the uncached case, but it's a genuine improvement as things stand. Opened #667100: comment_load_multiple() query joins users table and is slow.

Concurrency Level:      1
Time taken for tests:   214.880 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      124642000 bytes
HTML transferred:       124114000 bytes
Requests per second:    4.65 [#/sec] (mean)
Time per request:       214.880 [ms] (mean)
Time per request:       214.880 [ms] (mean, across all concurrent requests)
Transfer rate:          566.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   195  215  15.4    210     299
Waiting:      187  206  15.1    202     290
Total:        195  215  15.4    210     299

Patch:
Executed 110 queries in 42.58 milliseconds. Page execution time was 262.16 ms.

Concurrency Level:      1
Time taken for tests:   189.897 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      124642000 bytes
HTML transferred:       124114000 bytes
Requests per second:    5.27 [#/sec] (mean)
Time per request:       189.897 [ms] (mean)
Time per request:       189.897 [ms] (mean, across all concurrent requests)
Transfer rate:          640.98 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   175  190  13.8    186     304
Waiting:      166  181  13.4    177     295
Total:        175  190  13.8    186     304

I can't get devel_generate to generate 10 nodes with proper node bodies yet, might need an update for recent changes in HEAD, but if you ignore the fact that caching appears to allow us to gloss over some existing performance bugs in HEAD, then this looks like consistently between 10-20% with database caching in various scenarios. Would be very happy if we could get one or two more indpendent benchmarks though.

Jeremy’s picture

I'm seeing an overall slowdown with this patch. My fastest page load time is a little faster, and my slowest page load time is a little faster, but the average page load time is slower. It's possible I didn't let the tests run long enough to see full gains from this patch, as after it had run with entity caching enabled I only had 141 cached nodes and 169 cached comments. Or perhaps I'll need to set a minimum cache lifetime to see any gains?

I'm using a large database for testing: ~50,000 nodes, 10,000 users, and ~500,000 comments (all data was generated with the devel module -- I was unable to get taxonomy generation working.) Before each test, I dropped / reloaded the database from a backup, restarted mysqld and apache, and flushed the cache or restarted memcache as applicable. Page caching, block caching, and JS and CSS aggregation were all enabled during all tests. I manually loaded the front page 1 time before each test.

The actual testing was done with jMeter (all tests were developed thanks to the Examiner/NowPublic). I simultaneously ran 6 different tests at the same time to try and generate more realistic traffic: 1) anonymous spiders of the entire website, 2) anonymous visitors favoring the front page and spidering links from the front page, 3) logged in visitors favoring the front page and spidering links from the front page, 4) logged in user posting nodes, 5) logged in user posting comments, and 6) anonymous user creating user accounts. Test 6 is currently adding comments to the same node repeatedly. All tests are run together for 5 minutes, and then all are automatically stopped and the data is collected.

(I also tested the same website with Plain Drupal 7 + memcache to confirm that I saw a significant speedup.)

--

Test 1: Plain Drupal 7

  • Authenticated users, 2.1 front page views per second
  • Authenticated users, 1.1 non-front page views per second
  • Anonymous users, 18.1 front page views per second
  • Anonymous users, 9.0 non-front page views per second
  • Anonymous spider, 2.3 page views per second
  • Comments created, .9 per second
  • Minimum page load time, 34 ms
  • Maximum page load time, 21,200 ms
  • Average page load time, 4,772 ms
  • 90% average page load time, 5,459 ms
  • Overall, 4,559.1 KB/sec

Test 2: Drupal 7 + entity caching patch

  • Authenticated users, 2.0 front page views per second
  • Authenticated users, 1.0 non-front page views per second
  • Anonymous users, 18.0 front page views per second
  • Anonymous users, 9.0 non-front page views per second
  • Anonymous spider, 2.2 page views per second
  • Comments created, .8 per second
  • Minimum page load time, 31 ms
  • Maximum page load time, 19,646 ms
  • Average page load time, 4,866 ms
  • 90% average page load time, 9,043 ms
  • Overall, 4,509.5 KB/sec
catch’s picture

With that schema bug patched, here's more realistic results for HEAD in the 50 comments case. That's more like a 2% improvement which is closer to what I was expecting.

Concurrency Level:      1
Time taken for tests:   194.905 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      124642000 bytes
HTML transferred:       124114000 bytes
Requests per second:    5.13 [#/sec] (mean)
Time per request:       194.905 [ms] (mean)
Time per request:       194.905 [ms] (mean, across all concurrent requests)
Transfer rate:          624.51 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   178  195  13.4    191     285
Waiting:      170  186  13.0    182     272
Total:        178  195  13.4    191     285

Patch:

Concurrency Level:      1
Time taken for tests:   192.491 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      124642000 bytes
HTML transferred:       124114000 bytes
Requests per second:    5.20 [#/sec] (mean)
Time per request:       192.491 [ms] (mean)
Time per request:       192.491 [ms] (mean, across all concurrent requests)
Transfer rate:          632.34 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   179  192  12.2    189     311
Waiting:      170  183  11.7    180     301
Total:        179  192  12.2    189     311

Even more fun, it turns out that bug is largely responsible for the speedup with just a single node as well - basically, if something stupid is happening during node_load(), and you cache node_load(), then you ameliorate the effects of the stupidity. But at least with database caching, the actual caching itself isn't buying us all that much in itself. This might well change with an alternate caching backend, or with more going on in hook_node_load(), but not as exciting :(

catch’s picture

Priority: Critical » Normal
Status: Needs review » Needs work

Crossposted with Jeremy. Yeah this patch allowed us to mask #667098: drupal_get_bootstrap_phase() is broken - which happens on pretty much any situation after cache clear, except for visiting the front page, and was causing at least a 15ms slowdown (easily enough to account for 20% of the page request). Results seemed a bit too good to be true - node caching was more like 10% a few months back, but took a few long looks at the devel query log to figure out what was going on. Bumping this down to normal and CNW :(

scroogie’s picture

catch: Do I need to do anything else besides enabling Page Caching to activate the entitity cache? Or shouldn't I activate page caching? I'm still a bit confused. :)

catch’s picture

Entity caching will only make any difference with the page cache disabled - the page cache bypasses just about everything. Generally when benchmarking core, unless you're making a change to very low-level bootstrap or page caching itself, you should never enable page caching.

Also until the bootstrap issue is solved, you'll need to visit the front page before benchmarking to allow caches to rebuild properly.

Anonymous’s picture

catch: want me to work on #53? don't want to duplicate work if you're already on it.

rickvug’s picture

Can this be assumed to be D8 work now?

catch’s picture

Version: 7.x-dev » 8.x-dev

Yeah we don't get real gains with out of the box Drupal, so that leaves sites with lots of hook_node_load() implementation, or using non-SQL caching, which is contrib for D7 in http://drupal.org/project/entitycache, and we can see how that goes for inclusion for D8.

catch’s picture

Version: 8.x-dev » 7.x-dev
Priority: Normal » Critical
Status: Needs work » Needs review
FileSize
25.7 KB

I'm really sorry to move this back down a day before alpha, but I just realised that entitycache has no clean way to override the base class. I previously thought this was my lack of OOP-fu, but a conversation with Damien indicated it's actually not possible to do nicely in PHP. A quick patch to make entity.inc swappable is a no-go since we have registry breakage if you try to do that as far as I know. So if we want to properly support caching in contrib, core has to support caching even if it doesn't use it.

This patch is exactly the same as the previous patches, except it doesn't enable caching for any core modules. The only thing entity cache would then do from contrib is hook_entity_info_alter() and create cache bins, plus probably a couple of hook implementations to clear translation caches since those changes are specific to having node caching enabled.

catch’s picture

Fixed one missed find/replace and what looks to be an uncaught test failure in node.test - in that it shouldn't be passing in HEAD either.

catch’s picture

Status: Needs work » Needs review
FileSize
27.52 KB

HEAD moving fast today.

catch’s picture

Status: Needs work » Needs review
FileSize
26.59 KB

Not that fast, uploaded the same patch :(

catch’s picture

Status: Needs work » Needs review
FileSize
27.37 KB
catch’s picture

Version: 7.x-dev » 8.x-dev
Priority: Critical » Normal
pwolanin’s picture

Version: 8.x-dev » 7.x-dev
Priority: Normal » Critical

mostly looks good - but I'd like some minor clarification of the variable/key names.

E.g. 'field cache' doesn't suggest to me that it's a Boolean unlike the 'cacheable' it replaced. Maybe 'field caching' ?

moshe weitzman’s picture

D8 is open for business. Folks think we should move this to core?

catch’s picture

fwiw:
- http://drupal.org/project/entitycache has 183 users at the moment (not loads, but it's also increasing each week).
- Most core entity tests pass with it enabled, in the project there's a script to generate the tests which also documents each one that fails and related core issues.
- Last time we benchmarked this, it didn't make much difference, at least with database caching. That might be worth revisiting though.

carlos8f’s picture

also fwiw:

I use the Pressflow entity cache patch (https://code.launchpad.net/~catch-drupal/pressflow/load_cache/+merge/39241) on a fairly large (75k users, 100k nodes) D6 site, with great results. Probably more dramatic in D6 since D6 lacks static caching.

The real benefit is for complex custom sites, with expensive hook_user_load/hook_node_load() implementations which call many queries. For them, entity caching is a huge bonus. It seems like having full support in core for this is better in the long run than having half-core half-contrib like we do now :)

mitchell’s picture

Perhaps ArrayCache DatabaseBackend PERMANENT Bins could be used as an entity storage backend for parts or even collections of entities.

This optimization seems equivalent to views materialization.

Alternatively, database tools like collections and indexes (foreign keys) would have significant performance benefits without the need for caching.

#1869338: Support ETag + If-Match for PATCH requests to prevent concurrent updates is tangentially related.

DamienMcKenna’s picture

Issue tags: +D8 cacheability

Shouldn't this have the "D8 cacheability" tag too?

catch’s picture

It should, but note that a lot of the activity on getting this actually into core has been on #1596472: Replace hard coded static cache of entities with cache backends.

yched’s picture

Priority: Normal » Major

[edit: crosspost with #93 - haven't looked at #1596472: Replace hard coded static cache of entities with cache backends yet, but the remark below has nothing to do with the static cache of entities]

So right now:
- we have a field cache that is a direct descendant of CCK / D7 Field API and only covers configurable fields, not base fields, which is absurd.
- FieldItem classes can provide a prepareCache() method, but it's only triggered if the class is used for a configurable field, which is inconsistent / misleading.

The field cache logic should move from FieldableEntityStorageControllerBase::[load|save|...]FieldItems() methods, that are only about configurable fields, to "somewhere higher up (TBD) in the storage controller flow", at a point that encompasses the loading/saving of both base and configurable fields.
We should probably try put that in a place that's easily overridable, since it's likely that alternate storages (mongo...) might have no need for a field cache in the first place ?

DamienMcKenna’s picture

yched: silly question - is there a need for field cache when there'll be an entity cache?

yched’s picture

@DamienMcKenna: Given the current Entity API ("everything is a field"), a "persistent cache for entities" would mostly be a "cache for all fields, not just configurable fields". There might be a couple extra bits that would need to be in the cache as well, dunno for sure, but this doesn't invalidate #94 IMO.

Static cache is probably different, that's about statically persisting already instantiated $entity objects.

fago’s picture

Yeah, we really really should do that. As yched points out in #94 the current approach of just caching configurable fields is obsolete, misleading and sub-optimal - caching all fields instead is of course better. So yeah this removes the need for field cache then.

fago’s picture

Yeah, we really really should do that. As yched points out in #94 the current approach of just caching configurable fields is obsolete, misleading and sub-optimal - caching all fields instead is of course better. So yeah this removes the need for field cache then.

yched’s picture

Status: Needs work » Postponed

We should settle the "configurable field cache" code in #2083785: Add support for determining which field properties are cacheable first, and then move it up the stack to encompass all fields.

Berdir’s picture

Status: Postponed » Active
FileSize
3.62 KB

Quick proof of concept to do entity field caching for all fields.

Seems to be working quite fine for me. Tests will of course explode. Also didn't remove the nested field caching.

Berdir’s picture

Status: Active » Needs review

Ok, the prepareCache issue went in, let's see what happens...

Berdir’s picture

Status: Needs work » Needs review
FileSize
10.15 KB

Re-roll, removing configurable field cache, which means that those configurable field helpers and the do* pattern can and imho should be removed.. I don't think we need to leave anything in the base class of those. They're useless for e.g. MongoDB.

Berdir’s picture

Status: Needs work » Needs review
FileSize
3.23 KB
12.4 KB
yched’s picture

Fully agreed that the methods stack around "extensible" sql field storage can be greatly simplified after field cache is moved upstream, that was totally the plan in the "switch to entity storage" issue.

Berdir’s picture

Ok, so a lot of those fails are because the we lose things that are just added to entities but not defined, e.g. $entity->translations and the book stuff.

We have an issue to convert the first to a field, and issues to clean up book as well, but we still might need a solution for this. I'm not yet sure what do it, as we'd need access to $entity->values. We could add a getCacheData() method there, which works similar to the one in PrepareCacheInterface. Not sure.

Or we say that we don't support caching things that aren't defined, but then we need to solve the core problems.

plach’s picture

Or we say that we don't support caching things that aren't defined, but then we need to solve the core problems.

If we could do this I'd be very happy :)

Berdir’s picture

Status: Needs work » Needs review
FileSize
605 bytes
12.46 KB

Forgot the use for the PrepareCacheInterface.

@plach: I'd love that too, but might not be so easy. For starters, you need to work on your translation/field issue :p

Maybe we could still invoke hook_entity_load() for the time being when loading an entity from the cache? Just to see how that goes...

Berdir’s picture

Status: Needs work » Needs review
FileSize
7.71 KB
18.76 KB

A bit complicated to do that, #2095283: Remove non-storage logic from the storage controllers would help here, so please help to decide what to do there.

Untested, no idea what's going to happen with this...

Berdir’s picture

Removing the do* pattern, the field methods are now an implementation detail of the fieldable database storage controller . Which means we can more easily refactor them and e.g. make delete work on a list of entities or load just load the values.

catch’s picture

I'd be OK with saying we don't support things at all that aren't defined assuming we can get core to that point, but not caching them is OK too.

Berdir’s picture

Status: Needs work » Needs review
FileSize
7.44 KB
31.68 KB

Fixed a number of tests, array_merge() doesn't work like that...

The remaining ones I either didn't figure out or are actual problems. Like the FilterSecurityTest, do we need to clear the entity cache when changing filter settings? I guess so, but who is responsible for it? text.module, as it puts cached filtered things in there?

Berdir’s picture

Status: Needs work » Needs review
FileSize
3.85 KB
34.41 KB

Some more fixes. The remaining ones are... weird.

Berdir’s picture

The previous patch had some debug left-overs, the interdiff is against #116.

yched’s picture

Text module currently has code to clear the field cache, so yes, if field cache becomes entity cache, text module should clear that one instead ?

plach’s picture

yched’s picture

As a side effect, this places field data in a different cache bin than field metadata (plugin definitions, field definitions), which I've been willing to do for ages :)

Berdir’s picture

Yeah, I'm not sure yet about how exactly the bins should be set up (entitycache 7.x uses a bin per entity type, which would be really easy in 8.x but that might make "clear the cache of all entity types" harder.

yched’s picture

This raises the question of field_cache_clear() - the current patch doesnt seem to touch it, meaning it still empties the "old" bin instead of the new one ? Should it become a method on the storage controller ?

But yes, the ability to empty the cache for a single entity type would be cool, those are our main use cases. This could also be done just through the cid in a single bin.

plach’s picture

FYI (again :) we should be mostly done with #1916790: Convert translation metadata into regular entity fields, reviews welcome.

jibran’s picture

  1. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1497,134 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +          $cids[] = "entity:{$this->entityType}:$id";
    ...
    +          $cid = "entity:{$this->entityType}:$id";
    ...
    +      $cids = array();
    +      foreach ($ids as $id) {
    +        $cids[] = "entity:{$this->entityType}:$id";
    +      }
    

    Is it not nice to have a helper function for this?

  2. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1497,134 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +        // Put the cached field values back into the entities and remove them from
    

    more then 80 chars.

  3. +++ b/core/modules/block/custom_block/lib/Drupal/custom_block/CustomBlockStorageController.php
    @@ -19,21 +19,19 @@
    +    // hook_custom_block_load(), containing a list of block types that were loaded.
    

    More then 80 chars.

Berdir’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
33.76 KB
1.54 KB

Tracking down those test fails. Crazy stuff. Two things fixed:

1. configurable entity references consider target_id = 0 as empty, so it didn't cache it and the default value is NULL (weird on it's own but that's a different story), result in those notices.

2. This one was reaally fun to track down. What happens was this:
a) user with picture is saved
b) after base fields were saved, cache is reset.
c) then, field item list update runs, and adds a new file usage
d) that somehow results in loading the entity again.
e) Now, at this point, we actually load from the database because the static cache was cleared. But the field values have not yet been saved. So we load the entitiy without the updated field values and put it in the static cache.
For now, I just moved the resetCache() down below after saving the fields. Really wondering if more in that order there isn't wrong. Shouldn't we save fields *before* we start invoking postSave()?

fago’s picture

A related point that came to my point recently: What about also caching when entities do not exist, i.e. loading fails? When working with remote entities you'd like to cache that also, but I think it would be reasonable to do that always? Having to clear the cache on creation also shouldn't be a big deal anyway.

fago’s picture

yched’s picture

No db hit on entity load - awesome !

  1. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -7,11 +7,12 @@
     use Drupal\Component\Utility\NestedArray;
    -use Drupal\Component\Uuid\Uuid;
    

    NestedArray doesn't seem to be used anymore either ?

  2. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -827,9 +828,23 @@ public function getQueryServiceName() {
    +   * @param array $entities
    +   *   An array of entities keyed by entity ID.
    

    Don't we use [] typehints now ? (or is it only for @return docs ?)

  3. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -989,9 +1012,16 @@ protected function doDeleteFieldItems(EntityInterface $entity) {
    +   * This method is a wrapper that handles the field data cache. Subclasses
    +   * need to implement the doDeleteFieldItemsRevision() method with the actual
    +   * storage logic.
    

    Leftover

  4. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1497,134 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +    $entities = parent::cacheGet($ids);
    +    if ($this->cache) {
    

    Nitpick, but IMO we should make it cleaner in code comments that parent::cacheXxx() handle static caching.

    // Read from the static cache.
    $entities = parent::cacheGet($ids);
    
    // Read from the persistent cache.
    (...)
    
  5. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1497,134 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +    if ($this->cache) {
    

    The $this->cache flag controls both the static and persistent cache - is that ok ? Don't we want to be more granular ?
    (If not - probably debatable, but I'd find it clearer if the if ($this->cache) checks were made by the calling code rather than inside cacheGet() / cacheSet())
    [edit: can of worms, scratch that]

  6. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1497,134 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +      // Build the list of cache entries to retrieve.
    +      $cids = array();
    +      foreach ($ids as $id) {
    +        if (!isset($entities[$id])) {
    

    It would be handy to have cacheGet($cids) take $cids by ref and remove the successfully loaded entries, like CacheBackendInterface::getMultiple() does.
    Currently we have three layers (static cache, persistent cache, load from db), where each layer needs to array_diff to figure out the misses of the previous layer.

  7. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1497,134 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +        // Put the cached field values back into the entities and remove them from
    +        // the list of entities to query.
    

    Copy/pasted from the old field cache code, but misleading now - the code doesn't put data from the cache into pre-existing $entity objects, but creates fresh objects from the cache data.

  8. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1497,134 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +            // Already put the loaded entity into the cache.
    +            $this->entityCache[$id] = $entities_from_cache[$id];
    

    Comment looks weird. "Populate the static cache" ?
    + The code here is implementation logic leaking out of EntityStorageControllerBase::cacheSet() - should this call parent::cacheSet() instead ?

    Also, could be for a followup, but properties about caching are a bit confusing:
    $this->cacheBackend is the 'cache.entity' service, $this->entityCache is an array of statically cached entities

  9. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1497,134 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +          if ($entity->getUntranslated()->language()->id == $langcode) {
    

    $entity->getUntranslated()->language()->id should be computed once, before looping on langcodes ?

  10. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1497,134 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +  public function resetCache(array $ids = NULL) {
    +    parent::resetCache($ids);
    +    if ($ids) {
    

    No if ($this->cache) check here ? (parent has one).
    Also, not introduced here, but we have cacheGet() / cacheSet(), so the method would be more consistent if named cacheReset() ?

  11. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1497,134 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +  protected function invokeLoadHook($entities) {
    

    Nice move. However:

    - It would make more sense to move it higher up to EntityStorageControllerBase, next to invokeHook() ?
    (configStorageController::attachLoad() would need to be split similarly - not even sure config entities have a need for this method)

    - It is weird to call invokeLoadHook() from attachLoad() and cacheGet() separately. The same entity_load_multiple() results in two separate hook_entity_load($entities) invocations (one for the cache hits, one for the cache misses).
    Can't loadMultiple() do "get entities (from wherever, db or cache), then call invokeLoadHook() on them" ?

  12. --- a/core/modules/field/lib/Drupal/field/Tests/FieldAttachOtherTest.php
    +++ b/core/modules/field/lib/Drupal/field/Tests/FieldAttachOtherTest.php
    

    Looks like it's the whole of FieldAttachOtherTest::testFieldAttachCache() that should move to a testEntityCache() method somewhere in Entity component ?

Also, that's for a separate issue, but the FieldableESC / FieldableESCBase pair looks really confusing after the doXxxFieldItems()/xxxFieldItems() swapping, not clear what is in one vs the other, nor which one should extend the other.
FESC really looks like a "ContentESC" now, and invokeTranslationHooks() / invokeFieldMethods() have no business staying in FieldableESCBase.
Do we already have an issue to clean that up ?

yched’s picture

re @Berdir #128 :

Shouldn't we save fields *before* we start invoking postSave()?

Hm, yes, that would definitely seem reasonable, the current order is weird to say the least...

Berdir’s picture

Thanks for the review!

- Yes, we should probably have separate flags for static and persistent cache. I was just too lazy to introduce that :) The persistent cache setting would probably just replace the existing field_cache one.

There are a lot of related issues now...

#2095283: Remove non-storage logic from the storage controllers moves a lot of attachLoad() stuff around, we also have agreed there to move hook invocations to the entity classes, but in a follow-up that doesn't exist. Having the hooks there should also simplify creating entities, so less code duplication.

#1867228: Make EntityTypeManager provide an entity factory should also help with that.

#221081: Entity cache out of sync and inconsistent when saving/deleting entities cleans up the order of cache/hook/saving.

6. Sounds interesting, wondering how bigger that the patch would get by that.

8. The thing is that cacheSet() writes back into the persistent cache. What we want to achieve here is to add it to the static cache so that the next call to cacheGet() could receive it from there. Calling the parent would work, but not sure about directly calling parent implementations of other methods.

10. Renaming resetCache() would result in a large patch, as that is called a lot. I actually prever the way that is named and I'd rather rename the other two to getCache/getCachedEntities or something like that.

11. No it can't because there are three options. static cache, persistent cache and db. We don't want to execute it for the static cache. However, I think that once those related issues are in, it will come down to two lines: instantiate through factory + $entityClass::postLoad().

12. Yes, probably makes sense.

yched’s picture

8. 10. :
Then I think maybe the current approach of dealing with persistent / static cache by layering overrides of cacheGet() / cacheSet() is brittle. Having explicit methods for "fetch from static cache", "fetch from persistent cache"... might be cleaner.

pounard’s picture

Using a cache chain implementation would be good here? I don't know if this old patch ever reached core...

pounard’s picture

Berdir’s picture

There's a separate issue where that is discussed: #1596472: Replace hard coded static cache of entities with cache backends.

I decided to move forward here, because I don't think a cache chain works here. It's a nice idea, but in practice, we do need to handle static cache differently, static cache contains the entity objects as-is, for the persistent cache, we prepare them and store the values, we invoke the load hook when we get it from there and so on.

pounard’s picture

Ok I didn't catch that point where the persistent cache was only a store for values and post hooks were invoked.

Berdir’s picture

Status: Needs work » Needs review
FileSize
33.76 KB
674 bytes

Re-roll and fixed the use statement.

Haven't addressed yched's review yet.

Berdir’s picture

Status: Needs work » Needs review
FileSize
38.28 KB
10.08 KB

Ok, working on @yched's feedback.

1. Removed.
2. Updated some @params and @returns to use the [] syntax, yes we use that for both.
3. Removed.
4. Worked a bit on the comments, hope that it's clearer now. Completely overriding it is still an option.
5. Switched to use the existing field_cache property, do we want to rename that? see @todo.
6. Implemented. The only downside of that is that it makes mocking of those methods impossible.
7. Comment updated.
8. Updated that comment too, see 6.
9. Improved.
10. Added.
11. As responded already, currently has to be like this, but added a @todo to investigate ways to avoid that completely.

Also started using cache tags instead of emptying the whole cache bin.

amateescu’s picture

Fixed a bunch of test fails, couldn't figure out the one about feed items yet..

jhedstrom’s picture

I haven't figured out exactly why feed tests are failing, but quickly looking through the way those tests are written, they use a mix of aggregator module api functions, and direct database manipulation, which is most likely causing the caching to get out of sync during the test.

Berdir’s picture

Status: Needs review » Needs work
Related issues: +#2127725: Remove category handling from aggregator

Yeah, that is quite messy. Might be useful to have a decision for #2127725: Remove category handling from aggregator, if we'd remove it then we won't have to mess with it here. We might be able to fix that test by adding some manual cache clears, but I'm not sure if that's the right approach.

amateescu’s picture

I tried manual cache clears when I was working on #144, no go :) Or maybe I didn't try hard enough..

yched’s picture

  1. +++ b/core/lib/Drupal/Core/Entity/DatabaseStorageController.php
    @@ -117,13 +117,9 @@ public function loadMultiple(array $ids = NULL) {
    +    // static caching. This will remove ID's that were loaded from $ids.
    

    s/ID's/IDs/, no ?
    (in a couple places in the patch)

  2. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -1467,4 +1488,158 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
    +    // @todo: Rename field_cache?
    +    if ($this->entityInfo['field_cache'] && $ids) {
    

    So $storageController->cache controls static caching, and $entity_info['field_cache'] controls persistent caching ? Not too intuitive :-)

    In general, looks like the names of properties and flags around "static cache' and "persistent cache" could use some harmonization.

    Current patch:
    - Static cache:
    controlled by $this->cache
    stores in $this->entityCache
    - Persistent cache:
    controlled by $entity_info['field_cache']
    stores in $this->cacheBackend, which is 'cache.entity' (kind of clashes with $this->entityCache)

    Suggestion:
    - Static cache:
    controlled by $this->staticCache
    stores in $this->entities
    - Persistent cache:
    controlled by $this->persistentCache
    stores in $this->cacheBackend, which is 'cache.entity'

    ?

Completely overriding [cacheGet() / cacheSet()...] it is still an option

I'd tend to think it would make the code easier to grasp, yes. Forwarding to the parent for static logic brings little code saves, and there is still parent implementation logic leaking in the child implementation ($this->entityCache)

Also, you pointed that a cache chain wouldn't work here because of different contents and the need for hook_entity_load() in one case & not the other, but it seems what the current patch does is actually a one-off implementation of a chained cache ? cacheGet() encapsulates stacked static & persistent cache, and thus we have split calls to hook_entity_load().

Hence my proposal to split into explicit methods for "fetch from static cache", "fetch from persistent cache", explictly called from loadMultiple(), which can then control when to fire hook_entity_load() ?

msonnabaum’s picture

Reroll of #144.

jhedstrom’s picture

Status: Needs work » Needs review
FileSize
792 bytes
39.61 KB

The wrong language code was being used in those failing tests.

Berdir’s picture

Status: Needs work » Needs review
FileSize
39.59 KB

Re-rolling the re-roll.

jhedstrom’s picture

Drupal\system\Tests\Form\FormTest seems to consistently fail with an out of memory error:

Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 268435456 bytes) in /var/lib/drupaltestbot/sites/default/files/checkout/core/lib/Drupal/Core/Database/Connection.php on line 336

Berdir’s picture

Status: Needs work » Needs review
FileSize
38.99 KB

Re-roll. Can't reproduce the FormTest fail and that wasn't there before the re-test. Will try to run it again with the same memory settings if it still fails.

Berdir’s picture

Yay first green patch.

Ok, started to implement @yched's suggestions. Was quite a bit of work, but I agree it's way cleaner.

- Completely moved away from using the cache related stuff from the base class, I think it was a mistake to attempt and extract that there, caching is different for every storage controller. Reverted the changes in the other classes as we no longer use them here
- Introduced staticCache and persistentCache flags, not sure what to do about the entity annotation keys.
- Have two separate dumb methods to get/set static and persistent cache. resetCache() handles both.
- Refactored loadMultiple() to track the caching logic and also moved the mapping/loading fields directly into it and loadRevision(). That allows to to just call to postLoad() for both persistent entities and those from the database. The goal is still to elliminate the first, maybe we could make the entity itself implement PrepareCacheInterface, then it could have a logic similar to the serialization patch and return the array including non-defined values?
- The change to postLoad() to receive entity objects means that the user and comment overrides no longer work there, so I moved that to mapFromStorageRecord(). Also cleaned up a lot of weird variable names there, but didn't go inside that method, which is still a very dark place. We'll get to that in https://drupal.org/node/2137801 i think.

Might introduce some new fails, let's see.

Berdir’s picture

msonnabaum’s picture

The goal is still to elliminate the first, maybe we could make the entity itself implement PrepareCacheInterface, then it could have a logic similar to the serialization patch and return the array including non-defined values?

This was my first thought. setPersistentCache is doing much more than it should be doing. It contains entity serialization logic, which belongs on the entity itself. It needs to have a method that returns an array that can be used as the "$values" argument in the constructor.

The changes in #163 are an improvement, but loadMultiple is still pretty complex and hard to follow. Attached patch is my attempt to simplify it futher:

- Moved logic specific to storage or cache into loadMultipleFrom* methods to isolate them.
- Moved static-cache-miss logic to doLoadMultiple, similar to how we handle caching in other classes. This also creates a logical separation between entities loaded from an external source and those that have already been loaded and are in memory. Entites from cache or storage need hooks run on them, so this seems like a natural boundry to create.
- postLoad gets run on the results of doLoadMultiple, which includes entities from cache and storage. I saw the todo to get rid of that, but I don't really understand why. I'd think we could move anything expensive to a new hook/event and just run the postLoad hooks on a hit.
- Moved all conditional checks for whether or not to cache in the methods themselves. Simplifies the caller and prevents duplicate conditional logic.

There's more ugliness I wanted to get rid of, but couldnt because of loadMultiple's multiple argument types. Started #2157241: Add loadAll() to EntityStorageInterface to fix that.

Berdir’s picture

I want to get rid of postLoad() because that invokes hook_entity_load(), which in turn a lot of modules use to load and attach a lot of stuff to those entities. Right now, we don't put that in a cache because we can't extract those modules edit: values. Adding a method to get the values would allow allow to get those too.

It would be a similar logic as #2027795: Optimize content entity serialization.

Not sure I see the advantages of your changes, need to look at the result and not just the diff, seems to have very deep nesting of method calls, not sure if that helps to understand it. Also I don't see why static cache happens outside and persistent cache inside the helper method.

msonnabaum’s picture

This should fix the failure I introduced.

Not sure I see the advantages of your changes, need to look at the result and not just the diff, seems to have very deep nesting of method calls, not sure if that helps to understand it. Also I don't see why static cache happens outside and persistent cache inside the helper method.

The goal I had was to split up the loadMultiple method into distinct chunks and isolate each responsibility. This reduces the number of temporary variables and conditionals needed, as well as reducing the scope of the remaining variables, resulting in code that's more composable and less error prone. The number of methods and instance variables that loadMultiple depends on is dramatically reduced, so it will have fewer reasons to change in the future, making the code more maintainable/refactorable. We should never look at "very deep nesting of method calls" as a bad thing on its own.

I think it's important here because this *should* be refactored into a different class, sooner rather than later. Caching has nothing to do with a class that expects to be swapped out when the storage backend changes. That said, we don't yet have that class (entity repositories), so this is fine where it is for the time being.

Berdir’s picture

This should fix the failure I introduced.

Apparently not :) Also seeing that fix (interdiff) would be helpful ;)

I think it's important here because this *should* be refactored into a different class, sooner rather than later. Caching has nothing to do with a class that expects to be swapped out when the storage backend changes. That said, we don't yet have that class (entity repositories), so this is fine where it is for the time being.

In theory yes, but I'm not so sure if it really works. From what I've seen so far in entity caching issues is that caching is tightly coupled with the type of entity we're working with (config, content, ...) and the backend. For example, config entities currently (not sure how that the config context system changes affect this, but it will need something similar) need to be keyed by config context in the static cache, because depending on the currently active language, entity_load('node_type', 'article') will *not* return the same information. And persistent cache makes no sense for them, the config system takes care of that. See #1885830: Enable static caching for config entities.

Another example is that I'd love to see a FieldableMongoDbStorageController. I expect that caching will work quite differently there, I think that the values that are extracted right now in setPersistentCache() would be the thing that's actually written to the storage, which would make the loading similar to getFromPersistentCache() , I've no idea how useful a persistent cache is then, as that sounds quite fast already.

There's #1596472: Replace hard coded static cache of entities with cache backends, but we more or less agreed that the static cache chain doesn't work, as that would serialize the entities too, which is obviously not something we want. Some of the others still want to pursue the idea of a separate class to handle caching, but I don't see the advantages of that so far.

msonnabaum’s picture

FileSize
513 bytes

Here's the interdiff, which does fix the previous failure, but I can't recreate these new ones locally.

Berdir’s picture

Status: Needs work » Needs review
FileSize
40.42 KB
735 bytes

Re-roll, see interdiff, that was responsible for those test fails as it killed the $ids === NULL special case.

IMHO, there's too much nesting in those method calls and combined with the by reference arguments, it can result in weird side effects as the interdiff shows that quite nicely.

Yes, my loadMultiple() implementation was a bit long, but IMHO had the advantage that you could read through it and understand how the two caches and loading from the work played together, I think it's distributed too much now. And undocumented :p Maybe we can find a way to do something inbetween?

Would be great to hear some other opinions,

Berdir’s picture

Hidding irrelevant patches.

dawehner’s picture

Compareing #175 and #165 I have to admit that #175 is easier to understand especially because we have more methods you can ignore on the first read. It is kind of good that it just covers the flow of static, then cache then actual loading.

I wonder whether we ever had time to think more about the chainable cache, which would potentially allow us to move the static cache out of the class? This is just something we could research as a follow up.

+++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
@@ -1480,4 +1547,150 @@ static public function _fieldColumnName(FieldInterface $field, $column) {
+          $values = $cache[$cid]->data['values'];
...
+          $entities[$id] = new $this->entityClass($values, $this->entityType, $bundle, $translations);

It is interesting that we don't serialize the actual objects, but just the values, so we don't run into serialization issues, as well as memory problems.

Berdir’s picture

Thanks for the feedback.

See #171 re chainable cache.

This code and the __sleep() implementation of ContentEntityBase is now actually very similar, the main difference being the PrepareCacheInterface part.

yched’s picture

This code and the __sleep() implementation of ContentEntityBase is now actually very similar, the main difference being the PrepareCacheInterface part

Can't seem to find where, but I think the question of "data for entity cache == data for entity serialization ?" was risen earlier.
Do you think we should unify that ? And run getCacheData() when serializing too ? (no strong opinion here, just wondering)

Also, unadressed from #125:
field_cache_clear(), which was so far supposed to clear the "field data cache", still clears the old 'field' cache bin instead of the one that now holds the data. Meaning it's basically useless ? Yet it's still called in like 20 places in core, so it's surprising that we're green ?

- For a couple cases, the call to field_info_cache_clear() might actually happen to be enough; although it's hard to predict when it's ok to rebuild field definitions without refreshing values too...
- Uses in tests are often to clear the tester side after a drupalPostForm($entity), so yeah, not sure how those pass.
- the uses in text_filter_format_update() / text_filter_format_disable() are important though. We cache check_markup()ed strings in the entity/field cache, so we need to clear the cache when a filter_format changes.

Bottomline is: strictly speaking, it's now the EntityStorage that knows if & where field values are cached. So how do we do "clear all cached field values" ?
foreach ($fieldable_entity_type) { $entity_type_storage->resetCache() } ?

yched’s picture

Other than that, I think I'm fine with the methods too, but IMO :

- loadMultipleFromCache() should be loadMultipleFromPersistentCache() (other methods are explicit about which cache level they involve)
- but do we really need both loadMultipleFromCache() and getFromPersistentCache() ? Can't we remove one mental nesting level and just keep getFromPersistentCache() ?
- then rename loadMultipleFromStorage() to getFromStorage() ?

Then we'd have:

function loadMultiple() {
  $static = getFromStaticCache();
  //
  $loaded = doLoadMultiple();
  postLoad($loaded);
  setStaticCache($loaded);
  //
  return $static + $loaded; 
}

function doLoadMultiple() {
  $from_cache = getFromPersistentCache();
  //
  $from_storage = getFromStorage();
  setPersistentCache($from_storage);
  //
  return $from_cache + $from_storage;
}

, which is quite easy to follow IMO ?

Berdir’s picture

@yched

- I don't think we should combine it, prepare cache is about doing extra processing with the assumption that you won't have to do it again. Seems pointless to process markup just because a node is put into the form_state "cache". I do think it should be moved into the class, basically "ContentEntityInterface extends PrepareCacheInterface" :)

- Yeah, those renames/merges make a lot of sense to me, was thinking something similar. I don't like doSomething() helper methods though, wondering if there isn't a better name? Unlike the removed doField*(), this isn't something that you want to implement differently in a subclass, so I still don't really get why we can't inline those 5 lines of code but anyway, a better name would be nice already because I currently have no idea how to document it :)

- Yes, we need to clean up the whole cache clearing, the reason we don't need many cache clears here is drupalPostForm() problem, it's mostly the static cache that's problematic, so our tests are already full of cache clear calls, the persistent cache works mostly transparent. Will have a closer look at this :)

yched’s picture

rename / inline doLoadMultiple():

Well, I started off writing #181 with that in mind too - "let's inline it or at least rename it, Verb() + doVerb() bleh". But looking more closely I'm actually fine with it :-)
In this case, doLoadMultiple() is not actually such a bad name, since this is the part of loadMultiple() that is actually about loading, as opposed to "yay it's already in memory". And the results are what we call postLoad() on...

Inlinling would mean :

function loadMultiple() {
  $from_static = getFromStaticCache();
  //
  $from_cache = getFromPersistentCache();
  //
  $from_storage = getFromStorage();  
  setPersistentCache($from_storage);
  //
  $loaded = $from_cache + $from_storage ;
  postLoad($loaded);
  setStaticCache($loaded);
  //
  return $from_static + $loaded;
}

Well, matter of taste, both work... :-)

Berdir’s picture

Ok, trying to implement those suggestions, leaving that method. Haven't touched the cache clear stuff yet. Interdiff is quite large because I moved some of the new methods around to keep all that stuff together.

yched’s picture

Interdiff looks lovely :-)

fago’s picture

  1. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -195,60 +226,243 @@ public function create(array $values) {
    +    $entities = $this->doLoadMultiple($ids);
    

    hm, I don't see either why doLoadMultiple() would be separated. I wanted to suggested inlining it before reading the comments above :-) Separating getFromStroage() is imho the important part, but not a big deal.

    I think the method names are pretty good now, I'd just keep "loading" in regard to the storage as usually we call "getting" from storage "loading", so shouldn't it be loadFromStorage() also? Caches usually use "get", storage usually uses "load".

    The whole loadMultiple() method should probably live in the base class as it seems to be storage independent?

  2. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -195,60 +226,243 @@ public function create(array $values) {
    +   * @param \Drupal\Core\Entity\ContentEntityInterface[]
    

    you mean @return.

  3. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -195,60 +226,243 @@ public function create(array $values) {
    +    if (is_array($ids) && empty($ids)) {
    

    minor: would be $ids === array() easier to read?

  4. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -195,60 +226,243 @@ public function create(array $values) {
    +          $entities[$id] = new $this->entityClass($values, $this->entityType, $bundle, $translations);
    

    Is there a reason we cannot go with serializing?

    Seems pointless to process markup just because a node is put into the form_state "cache". I do think it should be moved into the class, basically "ContentEntityInterface extends PrepareCacheInterface" :)

    Yes, agreed.

    However, it seems a bit odd to have the logic duplicated - but I've not been able to come up with an approach that works with cache-data but re-uses serialization logic and is nicer either.

fago’s picture

oh, and I forgot two things:

- the issue summary could need an "update"
- great work guys!!!

yched’s picture

re #186.1 :
well, "loading" here means both FromPersistentCache() + FromStorage(), and we call PostLoad() on both. So I don't think one should be getXxx() and the other loadXxx().

Berdir’s picture

I'd still love to move postLoad() into getFromStorage() (= only call it on those), so I wouldn't count on that staying there.

I'm also not sure about the language logic that I currently have in there, I thought that the check there should no longer be necessary now that we simplified the language handling, but it didn't work at all...

fago’s picture

I'd still love to move postLoad() into getFromStorage() (= only call it on those), so I wouldn't count on that staying there.

That's what I'd expect. If we really need a hook for operating on entities fetched from cache it should probably be a separate one, as entitycache module in d7 has it.

well, "loading" here means both FromPersistentCache() + FromStorage(), and we call PostLoad() on both

I see - yes loading is all, but load-from-storage is not necessarily all. Having postLoad() on storage + persistent cache, why is it missing from the static cache then? It's also part of loading :-) Imho, post-load refers to loading from storage - but obviously this makes only sense if we can move postLoad over.

Berdir’s picture

Re-roll and fixed the easy parts from @fago's review.

Not sure about applying the doLoadMultiple() and generic loadMultiple() in the base class to all storage controller, maybe look into that in a follow-up?

Berdir’s picture

Status: Needs work » Needs review
FileSize
45.33 KB
5.7 KB

Ok, yet another fun re-roll and implemented the PrepareCacheInterface idea that I was talking about multiple times.

As explained before, this allows us to also cache non-field values, examples are content translation and book. Tested with the node translation UI tests and it seems to be working.. Due to that, we can make postLoad() a part of getFromStorage().

I think this now better explains fago's and my thinking about the method naming and grouping, it doesn't make much sense now to call postLoad() in getFromStorage() and have doLoadMultiple().

Also the PrepareCacheInterface should now probably move into the Entity namespace.

Left it for now, though.

Berdir’s picture

Status: Needs work » Needs review
FileSize
50.53 KB
5.95 KB

Ah, those test fails mean that it's actually working now for hook_entity_load() :)

Time for hook_entity_load_uncached() then I guess, converted some cases that that. Some will probably need follow-ups to try and make it part of the cached load hook.

Berdir’s picture

Oh nice, didn't expect that this would be enough to fix all those tests :)

Had to re-roll after #2165155: Change $entity_type to $entity_type_id and $entity_info to $entity_type/$entity_types.

This might need some more cleanup, but feedback on the PrepareCacheInterface approach and the method names as they are now would be great :)

jibran’s picture

+++ b/core/modules/system/entity.api.php
@@ -266,18 +266,55 @@ function hook_entity_create(\Drupal\Core\Entity\EntityInterface $entity) {
+ * Act on entities when loaded from the storage or cache.
...
+ * Act on entities of the given type when loaded from the storage or cache.

Acts

tim.plunkett’s picture

No, when in *.api.php docs, you would use "Act" not "Acts".

Berdir’s picture

Status: Needs work » Needs review
FileSize
49.88 KB

Another re-roll, core/lib/Drupal/Core/Field/ConfigEntityReferenceItemBase.php has been removed. Let's see if the isEmpty() methods are now correct :)

Also, 200+ comments now...

moshe weitzman’s picture

Needs issue summary update please.

swentel’s picture

Needed a re-roll again .. hope I got everything right.

msonnabaum’s picture

I thought the staticCache and persistentCache variables were confusing because they suggested that those variables *are* the caches rather than a bool for whether to use caches.

This patch is the same as #203, but removes those variables in favor of method calls to $this->entityType as discussed with berdir in IRC.

The rest looks good to me.

Berdir’s picture

+++ b/core/modules/filter/lib/Drupal/filter/Tests/FilterSecurityTest.php
@@ -83,6 +83,7 @@ function testDisableFilterModule() {
     );
     $this->drupalPostForm('admin/config/content/formats/manage/' . $format_id, $edit, t('Save configuration'));
+    \Drupal::entityManager()->getStorageController('node')->resetCache(array($node->id()));
 
     // Verify that filter_test_replace filter replaced the content.

I think this is a reset that we shouldn't have to do based on what @yched said, so we need to update text.module to clear the correct caches. And probably all other places to call field_cache_clear() right now.

Berdir’s picture

moshe weitzman’s picture

Great issue summary!

I'd feel more comfortble if we have an open issue for "Clean up the field cache handling, introduce a way to do that for the entity cache instead.". I have special disdain for redundant caches.

Berdir’s picture

Issue summary: View changes

Sorry, that line was slightly wrong, it should read "Field cache *clear* handling....". There is no field (value) cache anymore, that has been replaced/merged into the entity cache. Updated in the issue summary.

The only thing left of the field cache is the field_cache_clear() function, which is called in ~20 places throughout core and empties the cache_field table + calls field_info_cache_clear().

That cache clear is now useless, because the values are stored in a different table. We only had a single issue in the tests due to that, that is the manual cache clear mentioned in #205. This is a problem and is blocking this issue (security relevant), so I need to address it anyway in here, will do that asap.

Berdir’s picture

Issue summary: View changes

Ok, this fixes that test without the need for the manual cache clear.

Two problems with that:

1. It's a performance regression, because previously, it was a simple TRUNCATE on the cache_field table, now it's a deleteTags() per enabled entity_type. However, I don't think we can change that without making assumptions about entity cache implementations. We could consider to only call it for content entities or even only for fieldable entities, but it's now possible a non-fieldable entity type uses a text field or there's another reason why we'd need to unset the entity/field cache.

2. It's still something in field.module, not sure about that. We could move it to a method on the EntityManager (clearStorageCache() ?) or we could say that we try to untangle the field_cache_clear()/field_info_cache_clear() mess in a follow-up? We already have #2116363: Unified repository of field definitions (cache + API) and #2148723: Clean up field_info_cache_clear() for example which are related to that.

Berdir’s picture

Berdir’s picture

andypost’s picture

About #209:
1) is this regression measurable? suppose cleaner API in this case better because cache tags are in progress to rework anyway
2) better address in follow-up, this is not in scope

just minor nitpick:

+++ b/core/modules/text/lib/Drupal/text/Tests/TextWithSummaryItemTest.php
@@ -126,38 +126,36 @@ function testProcessedCache() {
-    \Drupal::cache('field')->set("field:$entity_type:" . $entity->id(), $data);
-    $entity = entity_load($entity_type, $entity->id(), TRUE);
+    \Drupal::entityManager()->getStorageController($entity_type)->resetCache();
+    \Drupal::cache('entity')->set("values:$entity_type:" . $entity->id(), $data);
+    $entity = entity_load($entity_type, $entity->id());

Is this a test for resetCache()? so better to pass array($entity->id())

Berdir’s picture

Re-roll.

swentel’s picture

Agree with Andy re #209, we've got #2116363: Unified repository of field definitions (cache + API) and #2148723: Clean up field_info_cache_clear() and had a meeting today with berdir, fago and yched to go forward on the first one, so I would untangle that in those issues.

Berdir’s picture

Status: Needs work » Needs review
FileSize
49.39 KB

Re-roll.

Berdir’s picture

Status: Needs work » Needs review
FileSize
50.86 KB
1.79 KB

Aw, createInstance() stuff...

Berdir’s picture

Re-roll after the storage rename.

Berdir’s picture

Status: Needs work » Needs review
FileSize
50.6 KB
1.32 KB

Missed two new getStorageController() calls.

fago’s picture

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -964,6 +965,40 @@ public function referencedEntities() {
    +        if (!$items->isEmpty()) {
    

    This is going to instantiate field objects, so why bother with reading $data out of $this->values ?
    $data should get started with $this->toArray() and only improve values for everything that implements prepareCache(). This also means that stuff sitting in $item that has no metadata won't be kept, but I think that's the behaviour we want. There is no point in caching craft modules throw on the objects.

  2. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -134,60 +147,241 @@ public function __construct(EntityTypeInterface $entity_type, Connection $databa
    +    // Attempt to load entities from the persistent cache. This will remove ID's
    

    IDs

  3. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -134,60 +147,241 @@ public function __construct(EntityTypeInterface $entity_type, Connection $databa
    +   *   Array of entities from the entity cache.
    ...
    +  protected function getFromStorage(array $ids = NULL) {
    +    $entities = array();
    

    from the storage

  4. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -134,60 +147,241 @@ public function __construct(EntityTypeInterface $entity_type, Connection $databa
    +   *   If not empty, return entities that match these IDs. ID's that were found
    

    IDs

  5. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -134,60 +147,241 @@ public function __construct(EntityTypeInterface $entity_type, Connection $databa
    +    if (!$this->entityType->isFieldDataCacheable()) {
    

    This needs a rename. Can be a separate issue.

  6. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -134,60 +147,241 @@ public function __construct(EntityTypeInterface $entity_type, Connection $databa
    +   * @param \Drupal\Core\Entity\EntityInterface[] $entities
    

    Other type-hint to ContentEntityInterface... That said, shouldn't the caching logic be generally usable and go to the general base storage class - while e.g. being not implemented for config entities for now? It seems weird to have a different loading flow for both. Not sure it's easy to move things around, else it might fit for a follow-up.

  7. +++ b/core/modules/system/entity.api.php
    @@ -267,18 +267,55 @@ function hook_entity_create(\Drupal\Core\Entity\EntityInterface $entity) {
    +function hook_entity_load_uncached(array $entities, $entity_type) {
    

    hm, the name confused me. I'd expected uncached to operated only on uncached entities, i.e. only on entities loaded from storage.

    what about
    hook_entity_load() (runs always) +
    hook_entity_storage_load() ?
    (runs on storage load only)
    I could see us adding more storage hooks to achieve #1729812: Separate storage operations from reactions also.

Berdir’s picture

Thanks for the review.

1. As discussed, tried an unset() instead of the array trick and commented it a bit better.
2. - 4. Fixed
5. Yeah, discussed name with @xjm and decided to do it here, it's only used in a very few places. It's now isPersistentlyCacheable() which is crazy but consistent with others and persistent_cache. Could also be storage_cache and a corresponding method but the static_cache is also storage related...
6. Not visible in the context which method that is, but I added it to one. Yeah, not sure about the whole Entity/ContentEntity thing. @tim is working on moving more stuff to the base storage controller in #2225955: Improve the DX of writing entity storage classes, we agreed that we will this first. Maybe that can move some things back to Entity and check for e.g. PrepareCacheInterface for the save to persistent cache method and so on? But for now I think ContentEntityInterface is OK.
7. Yeah.... As a start, I'm converting comment to use the normal load hook and replaced the ->private hacks with an actual field, which removes the need for that. That just leaves us with book and it shouldn't be doing what it is anyway.

Berdir’s picture

Status: Needs work » Needs review
FileSize
65.06 KB
2.52 KB

Fixing test fails. MapItem => sad face. I think we killed the ability to dynamically define properties based on the current values, so toArray() doesn't work for it. propertyDefinitions() is wrong for it IMHO, as that's the storage, not the properties.

Berdir’s picture

Status: Needs work » Needs review
FileSize
65.04 KB

Re-roll.

Berdir’s picture

Opened a follow-up to fix MapItem properly: https://drupal.org/node/2229181

Berdir’s picture

Status: Needs work » Needs review
FileSize
65.03 KB
619 bytes

Too much sprinting isn't good....

fago’s picture

Status: Needs review » Needs work

Thanks, this looks pretty good already.

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -134,60 +147,241 @@ public function __construct(EntityTypeInterface $entity_type, Connection $databa
    +   * Invokes hook_entity_load_uncached().
    +   *
    

    hm, that's still here and confusing :-/ Maybe could we just go with

    hook_entity_load() (runs always) +
    hook_entity_storage_load()

    instead?

  2. +++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/MapItem.php
    @@ -97,4 +97,13 @@ public function __set($name, $value) {
    +
    

    One new line too much.

  3. +++ b/core/modules/node/lib/Drupal/node/Tests/NodeAccessBaseTableTest.php
    @@ -85,14 +87,13 @@ function testNodeAccessBasic() {
    +        $this->assertEqual($is_private, (int)$node->private->value, 'The private status of the node was properly set in the node_access_test table.');
    

    No table any more as this is a configurable field now.

Wim Leers’s picture

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -964,6 +965,45 @@ public function referencedEntities() {
    +    // @todo: Remove support for this when all entity values are fields.
    

    Link to an issue?

  2. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -964,6 +965,45 @@ public function referencedEntities() {
    +            // If the field item needs to be prepare the cache data, call
    +            // the corresponding method, otherwise use the values as cache
    +            // data.
    

    80 col rule.

  3. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -773,9 +946,26 @@ public function getQueryServiceName() {
    +   * Loads values of configurable fields for a group of entities.
    ...
    +  protected function loadFieldItems(array $entities) {
    

    Comment says "configurable fields", method name says "fields", which implies both base + configurable fields.

    Same elsewhere.

  4. +++ b/core/lib/Drupal/Core/Entity/EntityType.php
    @@ -33,7 +33,7 @@ class EntityType implements EntityTypeInterface {
    +  protected $persistent_cache;
    

    Properties are always camelCased? i.e. s/persistent_cache/persistentCache/.

  5. +++ b/core/modules/system/entity.api.php
    @@ -268,18 +268,55 @@ function hook_entity_create(\Drupal\Core\Entity\EntityInterface $entity) {
    +function hook_entity_load_uncached(array $entities, $entity_type) {
    ...
    +function hook_ENTITY_TYPE_load_uncached(array $entities) {
    

    Shouldn't $entities be passed by reference?

Wim Leers’s picture

Also: now that #2217749: Entity base class should provide standardized cache tags and built-in invalidation has landed: is there a specific reason why this patch does not use cache tags? (As long as there is a good reason, that's fine — I just want that to be a conscious choice.)

Berdir’s picture

Well, cache tags for what exactly? We're storing separate entities here and we don't care about referenced entities as we don't cache them, the only exception are special cases like text where we want to clear the cache when a format changes. We have no API to get those tags, with or without your issue.. And I'm already using cache tags to clear by entity_type and I'm rather unhappy about that, I'd rather have bins per entity_type, at least optionally, but that's not something that the current API easily allows to do. with hardcoded cache bin services (we can request any type of bin but then cache bin removal doesn't work).

But we said that's something we can look at in a follow-up..

And it doesn't apply to all other cases where we call that cache clear function which are like, we changed *something* (module installed/removed, and so on), so lets make sure and drop the cache in case it might be affecting something. Some of those calls might not be necessary but they exist in HEAD (we're just making them slower because cache tags and many calls) and it's nothing that cache tags could help with AFAIK.

Thanks for the reviews, I'll try to re-roll and update the patch asap.

In regards to hook_entity_storage_load(), I'm unsure about a few things: a) It introduces another difference between config and content entities, and while we've discussed having separate hooks for them before, I'm not sure this is a good way to start that. and b) entity_storage_load() is the one you should be using, not entity_load() and I'm worried that people will not bother when porting code because it will work. So after-cache load should be the one with the non-obvious name IMHO. c) Looking at core, book.module is now the *only* place where we need to call the after-cache one, so I would have to rename all the other ones instead of just that.

Berdir’s picture

Status: Needs work » Postponed

Postponing on #2225955: Improve the DX of writing entity storage classes, moved a lot of changes that are in here over there and it will conflict a lot, so let's wait until that is done (hopefully soon), then this issue can focus on introducing the persistent cache and doesn't need to refactor as much stuff as it does now.

Berdir’s picture

Status: Postponed » Needs review
FileSize
52.97 KB

Minimal re-roll after that issue went in. Note that this currently doesn't call the uncached hook anymore but uses the normal one as the uncached, as that's what the base storage class now does. the storage hook would be easier to implement now but my arguments from above remain. also the entity constructor change is ugly, again, minimal change to see what happens.

Berdir’s picture

Status: Needs work » Needs review
FileSize
53.25 KB
1.45 KB

Fixing tests.

Berdir’s picture

Status: Needs work » Needs review
FileSize
54.77 KB
1.43 KB

Wow, file_get_file_references() is such a broken and stupid function :( It overwrites $field_type within the loop, so it depended on the order of fields if it was still the correct one and ended up filtering out the field because the new integer field used by the test module was last in the loop....

Berdir’s picture

Adding hook_entity_storage_load().

Currently doing some profiling on this, seeing a ~15% performance improvement with the previous patch on a view with 100 nodes with 14 fields. This patch should do even better. The biggest difference is that we no longer have to initialize the field objects just to assign the loaded values, with this, we just have to pass it to the constructor, and fields are then only loaded on demand, when needed.

With the current field loading implementation, I consider this to be critical for performance.

Berdir’s picture

#2167167: Remove field_info_*() will hopefully resolve the cache clear questions.

And #2217877: Text filters should be able to add #attached, #post_render_cache, and cache tags is on a good way to kill the need for either at least the core implementations of the PrepareCacheInterface or even the whole API. Because the render cache allows us to not having to calculate the processed text on cache hits. That would remove quite a bit of complexity.

Berdir’s picture

Status: Needs work » Needs review
FileSize
57.99 KB
834 bytes

Stupid typo is stupid.

fago’s picture

Status: Needs review » Needs work

With the current field loading implementation, I consider this to be critical for performance.

Yes, the current logic is totally not ideal, so it's great to see the clean-up happen just as well as having entity caching!

+++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
@@ -156,7 +157,14 @@ public function __construct(array $values, $entity_type, $bundle = FALSE, $trans
+          if (is_array($this->values[$field_name][Language::LANGCODE_DEFAULT])) {

Not sure whether we discussed that already in Szeged, but do we need the is_array() check here?

+++ b/core/modules/system/entity.api.php
@@ -269,18 +269,55 @@ function hook_entity_create(\Drupal\Core\Entity\EntityInterface $entity) {
+function hook_entity_load(array $entities, $entity_type_id) {

This is hook_entity_storage_load() now. yay for the new hook name!

Other than that, this looks RTBC to me!

Berdir’s picture

Status: Needs work » Needs review
Related issues: +#2274517: Remove \Drupal\Core\Field\PrepareCacheInterface

Re-roll and updated for the field_info and related caching stuff removal.

Related: #2274517: Remove \Drupal\Core\Field\PrepareCacheInterface

#251:

1. Yeah, I don't like that either, but not sure if there's a better way.

2. Yep, still need to update the documentation, and there's also still #235 to consider for the hook.

Berdir’s picture

Berdir’s picture

Status: Needs work » Needs review
FileSize
56.33 KB
890 bytes

Fixing the cache tag structure...

Berdir’s picture

PrepareCacheInterface is gone, re-rolling after that...

Note that I left getCachedData() on the entity, because we still need to be able to get a) the values of all languages and b) values of undefined fields from the entity. But I guess we can do this now with an entity specific method (will need to add this to the interface), or we could even go back to make this the __serialize() implementation, as serialize($entity) should now basically work as well?

Also, getting close to 300 comments... :(

Berdir’s picture

Wim Leers’s picture

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -72,13 +74,21 @@ class ContentEntityDatabaseStorage extends ContentEntityStorageBase {
    +   * Cache backend.
    
    @@ -91,12 +101,15 @@ public static function createInstance(ContainerInterface $container, EntityTypeI
    +   *   Cache backend instance to use.
    
    @@ -121,13 +134,158 @@ public function __construct(EntityTypeInterface $entity_type, Connection $databa
    +   * Gets entities from the persistent cache.
    ...
    +   * Stores entities in the persistent entity cache.
    

    It'd be great if these were all consistent.

  2. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -121,13 +134,158 @@ public function __construct(EntityTypeInterface $entity_type, Connection $databa
    +   * @param array &$ids
    ...
    +  protected function getFromPersistentCache(&$ids) {
    

    typehint mismatch.

  3. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -121,13 +134,158 @@ public function __construct(EntityTypeInterface $entity_type, Connection $databa
    +   *   Array of entities from the entity cache.
    

    s/entity cache/persistent cache/

  4. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -121,13 +134,158 @@ public function __construct(EntityTypeInterface $entity_type, Connection $databa
    +      // Get the entities that were found from the cache.
    

    s/from/in/

  5. +++ b/core/lib/Drupal/Core/Entity/ContentEntityInterface.php
    @@ -7,6 +7,7 @@
    +use Drupal\Core\Field\PrepareCacheInterface;
    

    Shouldn't this go away? :)

  6. +++ b/core/lib/Drupal/Core/Entity/EntityType.php
    @@ -38,7 +38,7 @@ class EntityType implements EntityTypeInterface {
    +  protected $persistent_cache;
    

    $persistentCache?

  7. +++ b/core/modules/comment/src/CommentStatistics.php
    @@ -244,6 +244,10 @@ public function update(CommentInterface $comment) {
    +
    +    // Reset the cache of the commented entity so that when the entity is loaded
    +    // the next time, the statistics will be loaded again.
    +    $this->entityManager->getStorage($comment->getCommentedEntityTypeId())->resetCache(array($comment->getCommentedEntityId()));
    
    +++ b/core/modules/user/src/UserStorage.php
    @@ -152,6 +142,8 @@ public function updateLastLoginTimestamp(UserInterface $account) {
    +    // Ensure that the entity cache is cleared.
    +    $this->resetCache(array($account->id()));
    

    Why not Cache::deleteTags($comment->getCommentedEntity()->getCacheTag()) and Cache::deleteTags($account->getCacheTag())?

    If we're consciously choosing not to use cache tags, then we should document why.

Berdir’s picture

1. Better now?
2. "It's complicated" ;) Can be array or null. should be correctly documented now.
3. Changed.
4. Changed.
5. Yes.
6. This is an annotation key, we currently don't use camel case for those. Would be inconsistent with everything else.
7. I disagree :)

a) Cache tags deletion is a low-level API that should not be used directly if there's an API. This is the entity storage, we have an explicit API to clear the cache, calling the cache tag deletion manually is like doing a db_query() against an entity table ;)
b) We currently don't do by-entity cache tags, because we don't need it and if we would, it would be the job of the resetCache() implementation to do that because some entities might not use a persistent cache
c) cache tags can't clear static caches, which is a very important part of resetCache().

So IMHO, it's the other way round, every manual cache tag deletion should explain why there's no better way :) Cache tags are very powerful, but they're also as spaghetti as it gets, as they by design breach every layer that your application has ;)

Also in this interdiff:
- Added some unit tests for cache hit/miss/disabled, hoped to replace the tests in FieldAttachOtherTest but save()/delete() would be very very complicated, so kept that.
- Fixed the test fails. The missing language is because we're now also caching the currently available languages and that path language test apparently manages to add a new language and then change the default language of that entity to that language, because for non-default languages, we already had that check. Will open an issue to add EntityInterface::langcode(), because then we can replace all those language()->id with langcode(), which will avoid the need for the language objects in most cases. Then we can hopefully remove that optimization.
- Updated hook documentation in entity.api.php, still not 100% convinced that this is the correct way round, especially because now we call postLoad() on the entity class as well for entities from the cache. There are no implementations of that for content entities right now but I know some contrib implementations that this will mess up (but they're workarounds for core bugs, like broken unserialize handling for the link field).
- Patch size increased due to unit tests.

Either way, I really hope this is getting close, updated the issue summary again.

Berdir’s picture

Status: Needs work » Needs review
Wim Leers’s picture

1. Yes :)
2. I knew that that was the other possibility — thanks for confirming/clarifying in the docs :)
5. It's still there…
6. Aha — ok!
7. Well, if you're modifying an entity, cache tags are invalidated automatically. These are things that are not part of the entity (i.e. extrinsic versus entity fields which are intrinsic), hence there's no way that the entity system can know that it should invalidate the cache tags! So I think that it makes sense that if this should use cache tags for invalidation, that we'd need to call it manually.
You're right that a manual cache tag deletion would alsomerit extra explanation.
You're also right that they "breach" (invalidate) every cache layer. But IMO the fact in this case you don't want that to happen, is the exceptional thing. And hence merits an extra explanation.
So: I still think that an extra comment here is warranted, though it can be argued that the lack of cache tag invalidation (of the affected entity) is a pre-existing problem.

It's a detail, but it confused me. Your call :)

Berdir’s picture

Re-roll, messy merge conflict in content entity database storage unit tests.

7. I think we discussed this through in IRC. the storage cache does not use cache tags because we only cache single entities without ID's, using cache tags would be overhead we don't want. And the first example already clears the cache tag somewhere (this only happens after comments are saved) or we'd know and the second is not relevant because this is not information that's displayed as part of a cached entity view.

Berdir’s picture

Reroll.

Xano’s picture

Assigned: Unassigned » Xano
Status: Needs work » Needs review

This should fix the minor test failure. I know how to fix the other ones, but I have to do some other work first. I should have a patch ready by tomorrow.

Xano’s picture

FileSize
740 bytes
61.09 KB

Did I mention the last patch was delayed?

Xano’s picture

Status: Needs work » Needs review
FileSize
2.89 KB
62.19 KB

Grrreeeeeeen!

Xano’s picture

Status: Needs work » Needs review
FileSize
2.38 KB
61.75 KB

This is simpler.

Xano’s picture

Status: Needs work » Needs review
FileSize
61.75 KB

Re-roll.

Berdir’s picture

Status: Needs work » Needs review
FileSize
61.65 KB
728 bytes

That test fail had me worried for a minute, but it was just a wrong merge.

Berdir’s picture

278: drupal_597236_278.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, 278: drupal_597236_278.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
FileSize
61.69 KB

Re-roll.

#2144263: Decouple entity field storage from configurable fields is more important, but I would love love love to get this in after that...

Wim Leers’s picture

We briefly discussed this during the D8 performance call. We thought it would be good to have profiling data to see what we actually gain by doing this. I vaguely recall that profiling has happened, but I can't find it. Perhaps it happened in another issue?

Berdir’s picture

I did not post detailed profiling data, see #246 for one data point. That was a single with view 100 nodes.

Note that profiling this heavily depends on a) the amount of entities you load on a page, the more the more difference this will make, c) Depends on the amount of fields, because every field right now adds overhead when loading and c) it also depends on the amount of entities you have in your database, as the executed queries will get slower while it will not affect a cache get that much, especially on a non-database cache backend.

I just repeated this on a site I'm building, again seeing a ~15% increase of the total time, the time spent in ContentEntityDatabaseStorage::doLoadMultiple() goes away almost completely. Note that I did those tests against a cache backend mix of redis (entities) and APC (field definitions, as they're in the discovery cache bin). That doesn't make much of a difference as long as mysql isn't under load and the query cache is warm, though.

Total difference:

Difference in doLoadMultiple():

Those are the entities that have been loaded, grouped by call:
- 1 user
- 2 nodes
- 5 nodes
- 5 comments
- 5 nodes
- 2 shortcuts
- 1 term
- 1 term
- 1 term

=> 23 content entities grouped in 9 calls.

Note that part of doLoadMultiple() moves to other places, because so far, this also triggers the loading of the field definitions for the relevant entity types from cache, which moved to a different place as that happens on the first time we access a field, which does not happen while loading an entity with the latest patch. So that part still has to be done, just a bit later. Was 7ms (2.6%) in my case.

That said, I'm working on getting rid of most of those calls, possibly even avoiding them completely in some cases.

Status: Needs review » Needs work

The last submitted patch, 281: drupal_597236_281.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
FileSize
61.83 KB

Re-rolled now that #2144263: Decouple entity field storage from configurable fields is in, only a single conflict, yay.

plach’s picture

What are we missing for this to be ready? Reviews? A final sign-off?

Fabianx’s picture

Nice work! :)

corvus_ch’s picture

FileSize
86.32 KB

Reroled against latest HEAD.

Status: Needs review » Needs work

The last submitted patch, 289: drupal_597236_289.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
FileSize
62.04 KB

New try at re-rolling this, the previous patch had some wrong merge conflicts on hook documentations and a .htaccess change, was 20kb bigger.

Fabianx’s picture

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -158,7 +158,14 @@ public function __construct(array $values, $entity_type, $bundle = FALSE, $trans
    +          if (is_array($this->values[$field_name][LanguageInterface::LANGCODE_DEFAULT])) {
    +            if (isset($this->values[$field_name][LanguageInterface::LANGCODE_DEFAULT][0]['value'])) {
    

    Isn't the 2nd isset() check enough?

  2. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -563,6 +570,11 @@ public function language() {
    +      // @todo Avoid this check by getting the language from the language
    +      //   manager directly.
    +      if (!isset($this->languages[$this->defaultLangcode])) {
    

    Where would it get the language from directly?

  3. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -342,13 +354,158 @@ public function getTableMapping() {
    +
    +    return $entities_from_cache + $entities_from_storage;
    

    Just asking, but the order is undefined here in which entities are returned?

  4. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -342,13 +354,158 @@ public function getTableMapping() {
    +    if (!$this->entityType->isPersistentlyCacheable() || empty($ids)) {
    +      return array();
    +    }
    

    Even though a loadAll will be rare and expensive anyway. Can't we get the IDs to load in that case?

    This could run a query and populate $ids.

  5. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -342,13 +354,158 @@ public function getTableMapping() {
    +    foreach ($ids as $id) {
    +      $cids[] = $this->buildCacheId($id);
    +    }
    +    if ($cache = $this->cacheBackend->getMultiple($cids)) {
    +      // Get the entities that were found in the cache.
    +      foreach ($ids as $index => $id) {
    +        $cid = $this->buildCacheId($id);
    

    Micro-optimization, but still:

    foreach ($ids as $id) {
      $cid_map[$id] = $this->buildCacheId($id);
    }
    
    $cids = array_keys($cid_map);
    
    ->getMultiple($cids);
    //...
    foreach ($ids as $index => $id) {
      $cid = $cid_map[$id];
    }

    This saves the 2nd lookup to buildCacheId, where if cacheID was more complicated would save real time ...

  6. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -342,13 +354,158 @@ public function getTableMapping() {
    +    foreach ($entities as $id => $entity) {
    +      $this->cacheBackend->set($this->buildCacheId($id), $entity, CacheBackendInterface::CACHE_PERMANENT, array($this->entityTypeId . '_values' => TRUE, 'entity_field_info' => TRUE));
    +    }
    

    It would be clearer to define cache tags before the loop as the arguments in the set() are static:

    $cache_tags = array(
    $entity_type . '_values' => TRUE,
    );

    etc.

    Also do we have a place where one can lookup all defined cache tags besides the source?

  7. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -342,13 +354,158 @@ public function getTableMapping() {
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function resetCache(array $ids = NULL) {
    +    if ($ids) {
    +      $cids = array();
    

    Nit, just from a code perspective, I would decouple this into:

    resetCache and resetPersistentCache to make the code easier to read.

    Currently this inherits docs, so I don't know if it says that this also deletes the persistent cache.

    Also I don't assume there is a base class, which this caching class is inherited from?

    ---

    Seems not to be the case. Is this function defined in the interface? Why was it not needed before? Or is this missing doxygen?

  8. +++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
    @@ -931,9 +1088,26 @@ public function getQueryServiceName() {
    +      if (!$entity->isDefaultRevision()) {
    +        $age = static::FIELD_LOAD_REVISION;
    +        break;
    

    Don't get this, but that means if any entity is another revision it loads for all entities from revision storage?

  9. +++ b/core/lib/Drupal/Core/Entity/EntityType.php
    @@ -40,7 +40,7 @@ class EntityType implements EntityTypeInterface {
    -  protected $field_cache;
    +  protected $persistent_cache;
    

    How can you _set_ this property?

Overall looks absolutely great to me. Tests pass and my comments are mostly nits. This is a weak

RTBC

from me.

Wim Leers’s picture

FileSize
60.2 KB
3.99 KB

I think poor Berdir has had to reroll this one for far too long already. So I tried to add more to Fabianx's review in #292. And I already answered/fixed the things I could, so that Berdir can focus on the most important questions.


  • #292.1: I can only confirm that this code seems to be more difficult than it should be.
  • #292.3: I think so, and I think ideally this would be documented, but then again we don't really do that elsewhere.
  • #292.4: Good catch. I had difficulty understanding what you were saying, so I'll rephrase this. What FabianX was getting at here, is that if $ids === NULL, i.e. if empty($ids) === TRUE, then we don't load anything from the persistent cache. I think the answer has multiple parts: 1) we don't have a CacheBackendInterface::getAll(), 2) even if we had that, this would be a subset of "all" entities, and hence we'd have to still load the remainder from entity storage, but we wouldn't know what the remainder is. The only way to implement this is to 1) have a very cheap way to get the IDs of *all* entities, 2) have a way for entity storage to "load all entities except for X, Y …", which is definitely an edge case. So I think all that has to happen here is for a comment to be added, explaining why the "load all" behavior is not supported by the persistent cache.
  • #292.6: Agreed with the suggestion (and applied), and now, we don't have such a directory/index… Most cache tags originate from entities though, and for those we have EntityInterface::getCacheTag() :)
  • #292.9: It's set by the @EntityType annotation's persistent_cache property.

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityInterface.php
    @@ -7,6 +7,7 @@
    +use Drupal\Core\Field\PrepareCacheInterface;
    

    This one should be removed :) Did that in this reroll.

    (Already mentioned in #258.5.)

  2. +++ b/core/modules/field/src/Tests/FieldAttachOtherTest.php
    @@ -170,17 +170,18 @@ function testEntityDisplayViewMultiple() {
    +   * @todo Remove this when there is equal unit test coverage.
    

    Is this @todo still relevant?

  3. +++ b/core/modules/file/file.module
    +++ b/core/modules/file/file.module
    @@ -1892,15 +1892,14 @@ function file_get_file_references(FileInterface $file, FieldDefinitionInterface
    
    @@ -1892,15 +1892,14 @@ function file_get_file_references(FileInterface $file, FieldDefinitionInterface
               $file_fields[$entity_type_id][$bundle] = array();
               // This contains the possible field names.
               foreach ($entity->getFieldDefinitions() as $field_name => $field_definition) {
    -            $field_type = $field_definition->getType();
                 // If this is the first time this field type is seen, check
                 // whether it references files.
    -            if (!isset($field_columns[$field_type])) {
    -              $field_columns[$field_type] = file_field_find_file_reference_column($field_definition);
    +            if (!isset($field_columns[$field_definition->getType()])) {
    +              $field_columns[$field_definition->getType()] = file_field_find_file_reference_column($field_definition);
                 }
                 // If the field type does reference files then record it.
    -            if ($field_columns[$field_type]) {
    -              $file_fields[$entity_type_id][$bundle][$field_name] = $field_columns[$field_type];
    +            if ($field_columns[$field_definition->getType()]) {
    +              $file_fields[$entity_type_id][$bundle][$field_name] = $field_columns[$field_definition->getType()];
                 }
               }
             }
    

    AFAICT the only change here is to remove the helper variable $field_type in favor of calling the method on the object between 2 and 4 times. Reverted this hunk.


@Berdir: that leaves everything from #292 except #6 and #9, and point 2 in my code review.

Fabianx’s picture

Regarding #292.4:

Can't we just do:

 if (!$this->entityType->isPersistentlyCacheable()) {
  return array();
}

if (empty($ids)) {
  $query_result = $this->buildQuery($ids)->execute();
  $records = $query_result->fetchAllAssoc($this->idKey);
  $ids = array_keys($records);
  unset($records);
}

The double query should be:

a) cached by MySQL in that short time frame anyway
b) faster as most of the entities should be retrieved from cache - when it is warm

Berdir’s picture

Thanks for the review, some additional answers.

1. I'm unhappy about the code there, the constructor has to care about too many variations of values. I think we can move forward with what we have, maybe #1867228: Make EntityTypeManager provide an entity factory would help, but that has been fighting with the very same problem for quite some time now.

2. The language manager, instead of caching the available languages within the entity object, we should request them from the language manager so they would always be up to date. This is a performance optimization that we basically have because there is no langcode() method, only language(), so we always have to return a language object while only needing an ID in 90%+ of the cases. Core is full of language()->getId() code.

3. Being able to load *all* entities is a feature that only makes sense for config entities IMHO, it shouldn't even be supported for content entities (if you really want that, load the ID's yourself and load it explicitly). So not really interested in supporting that. Possibly that should be separated into a loadAll() method.

7. The public API makes no difference between static and persistent cache. resetCache() does exist on the interface and it is implemented in the base class but only for the static cache, because that's all the base class knows about. I'm not sure if we should really expose this in the API. It will never make sense to have a persistent API for config entities for example, because they're already persistently cached as configuration.

8. This is not visible from the API, but there is no loadMultiple() for revisions nor a way to mix loading by revision and not. So if goes into that condition, it simply means that there is only one entity. And even if we would add a way to load multiple revisions, I don't think we ever want to support loading a mix.

+++ b/core/modules/file/file.module
@@ -1892,14 +1892,15 @@ function file_get_file_references(FileInterface $file, FieldDefinitionInterface
           // This contains the possible field names.
           foreach ($entity->getFieldDefinitions() as $field_name => $field_definition) {
+            $field_type = $field_definition->getType();
             // If this is the first time this field type is seen, check
             // whether it references files.
-            if (!isset($field_columns[$field_definition->getType()])) {
-              $field_columns[$field_definition->getType()] = file_field_find_file_reference_column($field_definition);
+            if (!isset($field_columns[$field_type])) {
+              $field_columns[$field_type] = file_field_find_file_reference_column($field_definition);
             }
             // If the field type does reference files then record it.
-            if ($field_columns[$field_definition->getType()]) {
-              $file_fields[$entity_type_id][$bundle][$field_name] = $field_columns[$field_definition->getType()];
+            if ($field_columns[$field_type]) {
+              $file_fields[$entity_type_id][$bundle][$field_name] = $field_columns[$field_type];
             }

The reason I did this change is that this overwrites the $field_type argument passed in to this function, so reverting this should fail again.

Search for this function in this issue, there's a lengthy explanation what's going on and why it's only failing here and not on HEAD.

Wim Leers’s picture

FileSize
61.63 KB
1.52 KB

#295.2: I wanted to add docs for this/open an issue for this, but I don't fully grok what you're getting at, so you'll have to do that.

#295.3 actually answers #295.4, not #295.3. Berdir, shall we remove the ability to load *all* content entities? Then this point of confusion disappears. I definitely agree with you that it doesn't make sense that we have this capability at all.

#295.8: plus, this is a pre-existing code smell, it's just been moved around.

RE: $field_type: wow. Sorry :( And +1 to Wow, file_get_file_references() is such a broken and stupid function :( It'd be funny if it weren't so painful. Hunk restored.


With #295, we now only have to address #292.3, #292.4, #292.5 and #293.2, then this is RTBC.

Berdir’s picture

2. #2303877: Remove cached languages from ContentEntityBase

3. The order of returned entities is ensured at the end of loadMultiple() in the base class. No need to do it multiple times.

4. (sorry for messing up the order) Not in this issue. It's not related to this, except it's a use case that is not supported for the persistent cache. There might even be an issue for loadAll() already.

5. Sure, can do that, doubt it changes much ;)

#293.2: Still true, but it's not really an actionable @todo, just that some of it are cases that we currently can't test with unit tests. And as @catch so nicely put, @todo's without an issue reference are just excuses. So, rewritten to just describe what it is ;)

I think that addresses everything.

Status: Needs review » Needs work

The last submitted patch, 297: drupal_597236_297.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
FileSize
61.74 KB
701 bytes

Ups... That's what happens if you copy code without thinking about it ;)

Not sure why nothing else failed, probably because there are no tests that verify that we load from cache, only that we write into cache other than the unit tests.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

RTBC - if tests pass as all points have been addressed by patches.

catch’s picture

3. Being able to load *all* entities is a feature that only makes sense for config entities IMHO, it shouldn't even be supported for content entities (if you really want that, load the ID's yourself and load it explicitly). So not really interested in supporting that. Possibly that should be separated into a loadAll() method.

iirc this was added explicitly for taxonomy vocabularies in Drupal 7, which are now config entities, so opened #2304935: Deprecate ability to load all content entities at once from Entity::loadMultiple() to kill that off.

Also opened #2304939: Stop loading comment statistics into entity object, not really sure why we do that still.

Patch looks good to me, leaving RTBC a bit longer so I can give it another once over, but won't complain if someone beats me to committing. Nice to see this RTBC after nearly 5 (!) years.

Wim Leers’s picture

If #2304939: Stop loading comment statistics into entity object lands, then we'll only have one use case left for hook_entity_storage_load() in core: translation metadata. Perhaps we can find a way to make that work differently in a follow-up, so we can remove this hook? :)

catch’s picture

Also correction, it's seven years if you count from #111127: Cache node_load.

Berdir’s picture

#1916790: Convert translation metadata into regular entity fields will help with that. Not sure, there might still be a need for it in contrib, we'll see.

plach’s picture

There's already an issue for that, which I am planning to work on as soon as I am done with the entity storage one :)

#1916790: Convert translation metadata into regular entity fields

catch queued 299: drupal_597236_299.patch for re-testing.

catch’s picture

Status: Reviewed & tested by the community » Fixed

I was wrong about 7 years, it's actually 10 years!

#111127: Cache node_load - that issue originally wanted to add persistent caching too, eventually added the static cache only.

Committed/pushed to 8.x, thanks!

  • catch committed 5dad0b8 on 8.x
    Issue #597236 by Berdir, catch, msonnabaum, Xano, Wim Leers, jhedstrom,...
yched’s picture

Take a bow, @Berdir :-)

Fabianx’s picture

Yes! This is great! Congrats, @Berdir, et. al!

moshe weitzman’s picture

10 years! Thanks so much, Berdir.

Wim Leers’s picture

Thank you, Berdir!

cosmicdreams’s picture

Hey gang, writing a blog post about this. Is there somewhere that talks about the performance gains of this patch? #283 ( https://www.drupal.org/node/597236#comment-8914931 ) seems out of date give the modifications that happened after it, and I don't see another review of the performance wins after it.

Wim Leers’s picture

#283 is still accurate; only minor changes happened after it.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

chx’s picture

As far as I can see UserData is no longer injected into UserStorage at all. I haven't read all ~300 comments (sorry) but a few searches haven't shown anything. The issue summary says

Comment and user storage controllers have special data manipulation that needs to run before entity objects are created. They now override mapFromStorageRecords() instead of postLoad(), as that now receives entity objects.

But in fact the patch committed does not contain any mapFromStorageRecords overrides. Could you please clarify this? We are considering putting user data in the user destination in #2268897: Write per user contact settings D6=>D8 migration but this whole issue gives me pause and I would like to understand better the UserData situation before going ahead.

jenlampton’s picture

Was there ever a change record made for entity cache being added to core? The only Change Record I found referenced here was https://www.drupal.org/added-d8-modules, but that doesn't contain any information about entity cache. If anyone knows of one, can we add a link?