Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
In #1983554-30: Remove BC-mode from EntityNG Berdir wrote:
The EntityResolverTest is ugly, that's an endless loop because the entity apparently references itself and getValue(TRUE) there causes it to be loaded again and again. And we don't want to cache the computed entity referenced, but we do want to cache the computed processed texts for example.
Maybe introduce a new method on Field(Item)Interface, something like getCache() or getCacheValues() ? And introduce a new cacheable key for property definitions?
Comments
Comment #1
plachMarking critical as this is blocking #1983554: Remove BC-mode from EntityNG .
Comment #2
yched CreditAttribution: yched commentedadjusting title
Comment #3
catchWhy are we allowing an entity to reference itself?
Comment #4
plachThat's a good question :)
However the problem here is that we are serializing full entity objects, no matter which ones.
Comment #5
yched CreditAttribution: yched commentedCopy/pasting what I wrote in #1983554-35: Remove BC-mode from EntityNG :
Does text 'processed' really have to be a 'computed' property ?
We'd need to make sure it's always present and to-date with the raw text values, but if it was a regular property autopopulated on load / read from cache / value change, then maybe we could stick with "cached values do not include computed properties" ?
Comment #6
Berdir#1983554: Remove BC-mode from EntityNG currently removes support for caching computed properties completely, which introduces a serious performance regression that was fixed a short time ago in #2026339: Remove text_sanitize() since we have TextProcessed as an EntityNG computed property. The test coverage added there does not correctly catch that case (as it injects the cache manually), so we need additional test coverage to verify the creation of caches.
This already is critical (due to the known performance regression, could also be classified as a bug).
Tagging as API change, not exactly sure yet what it will require (or if anything at all), but it shouldn't be anything big, possibly only an API addition/additional method on an interface.
@yched: Not sure. processed seems like the perfect example of a computed property, that is computed based on other properties and not stored anywhere (except cache).
@catch: Apart from there being nothing wrong with a self-reference, that's just an example. The same would happen if you have a back reference like A -> B, B -> A. And even that is irrelevant, we don't want to cache referenced entity objects in the first place, not as part of another entity, as @plach said. That's the same craziness as we already have, where we end up caching usernames in nodes and comments.
Comment #7
yched CreditAttribution: yched commentedYeah, agreed that "computed, cacheable" & "computed, not cacheable" are both valid cases we should support.
Adding a 'cacheable' flag in getPropertyDefinition() (probably defaulting to FALSE ?) looks easy, the issue is "how so we get only the set of properties we want when building data for the cache".
Maybe we could simply not use getValue(TRUE) there, and instead manually iterate on properties and individually fetch each one that is not 'computed + non cacheable'. I mean, other than building field cache data in loadFieldItems(), I'm not sure we have other use cases that would make it worth scratcing our heads to make that fit into some combination of getValue() params ?
Comment #8
BerdirYes, that's exactly what I was considering too. It would make cache writing a bit slow but it will only be called for writing caches for cacheable entities, which shouldn't happen often. Eventually when we start caching all fields of an entity, we could put that logic in __sleep() and just serialize/unseralize $entity.
Talking a bit with @msonnabaum in IRC led to the conclusion that a marker interface or a method could be an alternative to a magic definition key but the problem is that we need to check the magic computed key anyway (or instantiate all non-computed properties too), so checking a second is easier than getting the object and check it with instanceof.
Comment #9
yched CreditAttribution: yched commentedI guess we could even start with getValue(FALSE) to get all non computed properties in one shot, and then iterate / fetch individually just for computed cacheable ?
__sleep() : sounds sexy :-) I wonder if that would be good enough for all the cases where we serialize entities though. (e.g entity forms...)
Comment #10
yched CreditAttribution: yched commentedUsing a marker interface or a method means cacheabilty is per property clsss, not per individual properties. I guess there could be cases for two properties using the same class, but one cacheable and not the other.
So +1 on staying with a flag in the definition.
Comment #11
yched CreditAttribution: yched commentedSomething like this ?
Comment #12
BerdirLooks about right to me :)
We should probably add that key to TypedDataManager::create(), as weird as that is, that seems to be where all those keys are defined. Or do do you have a better suggestion? (As it's only relevant to field items, that's where it would make sense, but that doesn't define getPropertyDefinitions(). I know what you're going to say ;))
And some tests would be good to have, I guess we can put them in TextSummaryItemTest as well, basically the inverse of the existing tests there, instead of injecting a fake cache, make sure that loading an entity resulted in the correct caches being written.
Comment #13
msonnabaum CreditAttribution: msonnabaum commentedCould we at least encapsulate the knowledge of the those flags in methods? Getting an array back and then making decisions based on its keys is very procedural :(
Comment #14
BerdirMethods on what? We'd have to instantiate all properties of a field item to be able to call a method on them (or check an interface), that messes with the lazy-instantiation logic of those classes.
Comment #15
yched CreditAttribution: yched commentedActually, this should let us do less iterations in the most frequent case where there are no cacheable computed properties.
+ Added doc in TypedDataManager::create() - I won't say what you know I'm going to say since you know I'm going to say it :-)
+ updated testProcessedCache()
@msonnabaum: agreed with @Berdir, + this is consistent with how the rest of core deals with definitions from getPropertyDefinitions(), changing that seems way out of the scope of this issue.
Comment #16
BerdirI'd suggest to add additional tests, not replace them. There are two different things here, a) that things are written into cache (currently not covered by tests), b) that writing them back from the cache actually works as expected and they're not re-computed.
Comment #18
yched CreditAttribution: yched commented@Berdir: right, added back the existing tests.
The warnings in NodeValidationTest were caused by a non-set body field.
Empty fields still have an item object with NULL properties, and the code populating the cache in loadFieldItems() tried to get the processed values for that.
This then failed in weird ways because check_markup() with a NULL format triggers filter_fallback_format(), which requires the filter config to be installed, which is not the case in NodeValidationTest (DrupalUnitTest).
Updated patch includes two fixes (each fixes the fails separately, but each make sense on their own IMO):
- run $items->isEmpty() before trying to put stuff in the cache
- optimize the case of NULL / empty string in TextProcessed (D7's text_field_load() had a similar stopgap optim)
Comment #19
yched CreditAttribution: yched commentedComment #21
yched CreditAttribution: yched commented#18: field_cache_computed_properties-2083785-18.patch queued for re-testing.
Comment #22
BerdirThis looks good to me. Looking the code and xhprof, I think this pretty well optimized and it only runs when entities are put into the cache, compared to actually loading the values, this takes no time at all.
@msonnabaum was talking to me about possibly moving the "get cacheable fields into a methods" of FieldInterface $items, but I'm not sure about that. It's not really business logic to me, it's storage logic. The business logic is what each computed property does. Such a method would be on FieldInterface (or ConfigFieldInterface), where it would be visible to anyone (e.g. widgets and formatters, which receive FieldInterface $items).
If a method, then I rather think it would be a protected helper method in FieldableEntityStorageControllerBase, available only to subclasses of that.
Either way, I'd vote to get this in like this and discuss making it re-usable/encapsulate it in a follow-up issue. I'd like to see for example a MongoDB storage controller implementing this logic (not sure where, possibly write cacheable computed properties right into the storage?) and then check if there is something that can be re-used?
Comment #23
yched CreditAttribution: yched commentedAgreed. #597236-94: Add entity caching to core points out refactorings that should happen around the field cache - with similar questions like "what would Mongo storage do ?". Let's see where they put us and reconsider if / where to put a helper method then.
Comment #24
fagoIt's not the whole field item object that's cached, but the field item properties - not? So the docs are misleading here. -> Needs work.
Howsoever, extending this key here which only makes sense in the field item property context sucks - imho we should avoid doing something like that.
So why not go with a method on the field item class that controls which properties can be cached? By default it would be all non-computed properties, but implementations can easily override it to add stuff like the processed property.
We've got prepareCache() on field item classes already as well as other storage related stuff like presave(), insert() and update() methods. It's not storage-engine dependent logic though so imho it's ok.
Comment #25
yched CreditAttribution: yched commentedAgreed that is sucks, but "everything is a TypedData" :-). getFieldDefinition() is the place where you provide the metadata about your properties. Having to declare the specific "is cacheable" metadata in a different method seems sad.
However, it's true that there might be something to be done with FieldItem::prepareCache() - what we're doing here is exactly "preparing the cache". The problem is that what we currently do in prepareCache() is also alter the current $item by preparing the loaded values for the entity in the current request. We could repurpose prepareCache() slightly to both return an array of data to cache (a base implementation would return the "non computed + cacheable computed") and alter the $item properties if needed. Might be a bit confusing though.
This would also mean we can stop special casing prepareCache() in a separate PrepareCacheInterface, since there would always be a prepareCache implementation and we would create the Field/FieldItem objects anyway on cache miss - but that is what the current patch is doing anyway, and I don't see how we can avoid that if we want to access computed properties to put them in the cache...
Comment #26
andypostAlso I'd like to mention about the need to load multiple calculated properties for: comment statistics and history markers
Comment #27
BerdirWow. See, this is why fago is the maintainer and am I only co-maintainer ;)
I *really* like this.. Yes, the separate interface for prepare cache was a nice idea, but right now unnecessary as we initialize anything anyway, especially on a cache miss (we want to avoid doing it on a cache hit but we don't either)
Two remarks:
1. Not sure if I got the legacy implementation right, the only implementation left is field_test_field_load (please RTBC that :)), which will probably fail anyway due to the second point.
2. This will only call the prepare cache stuff on cached entity types. That will very likely cause some fails, hopefully only in test_field, because it's doing weird stuff in there. We could still call the method for entity types that don't use the cache, or we could just document it to work like this? After all, they're class now, can compute on-demand and what not.. Seems better to not try to calculate stuff in advance that we might not need and won't cache?
Comment #29
BerdirOk, we had a look at those test fails:
- As expected, the additional key stuff failed as it's only called when an entity is cached and the logic explicitly checks loading revisions too. That's no longer necessary, as you can implemented computed properties if you need something like that. So we agreed that this is by design and removed that assertion. Also changed another test to use a cached entity type.
- DateTimeItem is ugly, as it's relying on a date property that is not defined. Added some hacks to try to get it to work somehow, but doesn't quite work yet. Will need a computed property ...
Comment #30
yched CreditAttribution: yched commentedYeah, those date fixes spilling in the various widgets & formatters are a bit ugly right now. We might push that to a followup, but then at least we should denote those places with a @todo marker
Comment #31
fagoI'm not sure about removing the optional interface + instantiating all the objects during case miss.
Yes, that is what is happening right now - but I think we should/can fix/optimize the code to not do that. It should be perfectly fine to load all the values, build a flat array of values and pass it on to entity object + cache things. The only field item objects we really have to instantiate are those with custom-caching behaviour -> so we check for the interface.
Also, I do not think the possibility to customize caching should be limited to configurable fields as field caching will have to be replaced by entity caching - #597236: Add entity caching to core.
Comment #33
BerdirYeah, I think we should fix date properly here, not exactly sure how yet.
Yeah, but it's the field item that implements the interface, not the field. I don't see how that could possibly work.
Absolutely. But not here. So adding that method here and then move it up to FieldItemInterface over there...:)
Comment #34
fagoThat's the way it already works in HEAD? It just gets the class and checks for the interface - cannot we keep doing that?
Comment #35
BerdirYes, but we either need to call getValue() anyway so doing this separately seems useless to me, it would be a condition of calling getValue() (on the field, but that wouldn't change anything, the items are still created) or prepareCache(). While we should possibly update the documentation or even rename the method, this seems way easier.
And even after we have a single $values array, we still need to instantiate the entity object to get those that are computed and we very likely need to run his code after we call the load hooks, so I expect it could be quite confusing if we use the original $values array and mix in stuff from other calls.
I can change it back if you insist on it, but I'm not sure to what exactly, to be honest? Mixing getValue() and prepareCache() (or getCacheableValues() or whatever) seems confusing to me.
Added a datetime_computed field, I think our getDateTime() stuff is broken and should return a Drupal date time object, I think the error comes from there, but the I don't know how we could handle the storage stuff anyway.. that's weird.
Comment #36
BerdirConclusion from the discussion with @fago:
- We will keep the interface, that it's easier to optimize it later again
- Rename prepareCache() to getCacheValues()
- We don't need bother with checking the interface on the class name right now, because we lost some hidden optimization when we removed the BC decorator ( I think that caused some of the confusion in here)
- We'll however keep the simple logic for now
We should also try to get rid of getValues(TRUE), theme function #items should switch to use the field item list, the legacy stuff is going away, the only remaining case that uses it is then some sorting and I don't think that's necessary there.
Will re-roll tonight.
Comment #37
BerdirLet's see if this works. Interdiff would probably bigger than the patch, so useless.
Comment #39
BerdirAccidently implemented writing to cache for all fields, not just configurable :)
I hope the other test fails are a testbot problem.
Comment #40
BerdirUpdated after the field item change.
Comment #42
Berdir#40: field_cache_computed_properties-2083785-40.patch queued for re-testing.
Comment #44
yched CreditAttribution: yched commentedSuggestion :
"If a field type implements this interface, this method will be used instead of the regular getValue() to collect the data to include in the cache of field values."
+ extraneous empty line
That whole docblock can be refactored IMO.
Suggestion:
(the last paragraph is can still be enhanced IMO, but this is what I have for now)
Should we dig on this here, or in a followup ?
Not sure $values is the best param name here. It's about return data to cache.
maybe prepareCacheData() / $cache_data ?
Do we really need to provide code for this ? isn't it provided in a parent implementation ?
'date' is a computed property, right ? so when would $item->date == FALSE ?
same here (and in DateTimeDefaultWidget)
Why do we need that implementation if it just does as the "default" behavior ?
Comment #45
BerdirThanks, great review as usual :)
1. Fixed.
2. Updated accordingly. Reads quite fine to me like this.
3. I just moved this from the previous prepareCache() implementation. I don't think it's something we need to fix here.
4. Not sure, prepareCacheData() a) returns a new term (data, we use values elsewhere) and b) doesn't describe that it returns something IMHO
5. Yes, we probably can, makes sense.
6. & 7. Yes, but the field item can be empty.
8. Left-over, forgot to delete this, removed.
Comment #46
andypostneeds issues link
Comment #48
Berdir#45: field_cache_computed_properties-2083785-45.patch queued for re-testing.
Comment #50
BerdirWrong interface check.
Comment #51
yched CreditAttribution: yched commented@Berdir: yes, sorry, agreed about *prepare*CacheData() being less clear, getCacheData() was what I actually had in mind.
Then I think the method name *not* using "Value/Values()", as we do in other places, is actually a feature :-), because what we do here is different from what we do in other places. We're building data for the cache, not values for the runtime FieldItem.
So, still +1 on getCacheData() / $cache_data (or $data ?) as a "typical" var name in implementations.
Comment #52
BerdirOk, let's do that then.
Comment #53
yched CreditAttribution: yched commentedtypo / language - ".. needs to prepare..." ?
s/$values/$data ? :-)
Just nitpicking - this is RTBC anyway.
Comment #54
das-peter CreditAttribution: das-peter commentedAddressed ycheds nitpicks ;) Should remain RTBC.
Comment #55
das-peter CreditAttribution: das-peter commentedyched just told me there's another nitpick in
DateTimeItem
- so here's the adjustment.Still keeping it RTBC.
Comment #56
yched CreditAttribution: yched commented;-)
Comment #57
BerdirThat's what I get for messing with the same functions in 3 (or 4?) different criticals :)
Re-roll, re-removed the invoke prepare cache method. Patch is smaller because there's less code to remove.
Comment #58
catchLooks great. Committed/pushed to 8.x, thanks!
Comment #59
BerdirRemoving sprint tag.
Comment #61
Wim LeersThe changes introduced in this issue are potentially causing some unforeseen problems, see #2972988-15: Error when saving a denormalized entity with text fields with "processed" property.