Updated: Comment #0
Problem/Motivation
The Entity Field API is designed to lazy-instantiate field item and field item list objects only when actually being used. Right now, the current storage code assigns them to loaded entity objects, which means they have to be instantiated just to be be able to load it. In all available languages.
Proposed resolution
Refactor and rename attachPropertyData/loadFieldItems()
to loadFromSharedTables() / loadFromDedicatedTables() (and also the corresponding save methods) and refactor the second to work on the raw values array instead of already created entity objects.
That has a number of advantages:
- Simplify the flow and only have a single place where we actually create the entity objects, after we loaded all the values (Right now, we have two places and different flows, depending on whether you have a translatable entity type or not)
- Improve performance, because not having to create all those objects means this is much faster and needs less memory.
- Clean up the nameing of those methods, making it more consistent with how we name them elsewhere, resulting in code that is easier to understand.
Remaining tasks
User interface changes
None
API changes
None. The patch is almost completely contained in SqlContentEntityStorage
, with one minor exception, which is a very exotic field test that was relying on an internal implementation detail to function.
Beta phase evaluation
Issue category | Task because nothing is broken. |
---|---|
Issue priority | Major because this is a considerable performance improvement |
Prioritized changes | Improves performance |
Disruption | Not disruptive, no API changes, nobody should notice anything, except *maybe* alternative entity storage backends |
Comment | File | Size | Author |
---|---|---|---|
#34 | load-values-2137801-34.patch | 18.93 KB | Berdir |
#28 | mapFromStorageRecord.png | 283.11 KB | Berdir |
#18 | load-values-2137801-18-interdiff.txt | 6.8 KB | Berdir |
#18 | load-values-2137801-18.patch | 18.9 KB | Berdir |
#15 | load-values-2137801-15-interdiff.txt | 1.66 KB | Berdir |
Comments
Comment #1
BerdirComment #2
yched CreditAttribution: yched commentedRelated: #2142549: Support multi-column field types in base tables
Comment #3
fagoI opened #2144631: Add a revisionable key to field definitions, which might help to know what's revisionable.
Comment #4
plachComment #5
BerdirWhat can I say, seemed like a great issue to work on at 1am :p
Refactored the load methods to have only a single place where we create entity objects instead of two, and loading all the data upfront. Also renamed some variables, like making sure thatn when we have $entities, they are actually entity objects, and when not, renamed to $values or so. Downside is some places where we foreach over $values, went with $entity_values for the inner variables, better suggestions are welcome.
Some parts are not so pretty yet, maybe we can
EntityFieldTest passes, but that's likely not covering fun parts like delta/revision loading.
Comment #7
yched CreditAttribution: yched commentedAw, this is a beauty - and a much needed overhaul, the method names in there are sooo 2013 :-p (phpdocs will need to be adjusted accordingly)
Comment #8
BerdirWill try to fix those tests later. BulkDeleteTest is very funny test fail and proves that it actually works. This is what happens:
1. We create the fields and entities, and then load them. We only access entity key values like the bundle, that we have fast-paced through $this->entity keys and don't need the field definitions to access them. So we never actually load them.
2. Then we delete the field.
3. Now we load the values from the field table, and try to compare with the entity.
BUT, we never loaded the field definitions in the entity. so trying to access to deleted field loads them *now*, and now the field doesn't exist anymore and we get those errors.
I can see two ways to fix this:
a) Quickfix: Force a fetch of the field definitions in the entity, so that they are loaded before we delete the field.
b) Rewrite it to do the same query before and after and then compare the results of the query.
other ideas?
Comment #9
BerdirThis should fix the test fails. was missing support for not having a language key defined, looks like we only have accidental test coverage for that, e.g. because EntityTestLabel has no langcode key defined.
Yeah, docblocks need to be updated, also wasn't sure about the method names yet. maybe LoadSharedFields() (or FieldValues() and loadDedicatedFields()? but it's not the field that's dedicated or not, it's the storage...
Comment #11
plachThe langcode key is mandatory only if the entity type is translatable, maybe that's the reason?
Comment #12
BerdirWrong langcode checked, this should be better.
Comment #13
BerdirYay green. Some documentation improvements.
Comment #14
amateescu CreditAttribution: amateescu commentedThe actual logic of this code is a bit above my head but here's a small nitpick review:
Missing (optional) and "Defaults to FALSE." stuff.
Looks like the doc should mention that $translation is by-reference.
lol! Why is that? Can we improve it in this patch?
Same as above for the by-reference part. Also, is it intentional that $load_from_revision doesn't have a default?
Are these constants used anywhere now or can they be removed?
Comment #15
BerdirThanks.
3. I already did, just forgot to remove that comment ;) See interdiff for the last patch.
4. I'm not sure. It's not called anywhere else, so we always call it with a value anywhere.
5. They're still used in a few places, e.g. in entity query. There's afaik an existing issue for removing it.
I'm not sure if we should also update the corresponding save methods. Not the implementations, just the method names.
Comment #16
plachThis is awesome! Did you already try some profiling to see how much we are gaining here? Anyway, even if the gain were close to 0, this is a very nice refactoring :)
I'm wondering whether we really need the
$translations
array. I suspect when it was introduced we did not support langcode-keyed$values
arrays, so we needed a way to tell the entity constructor which translation were available. Couldn't we just derive the available translations from the$values
array inside the constructor?If we want to be fully consistent with the new terminology this should be
::loadFromSharedTables()
.Lastly, I think we should also rename the save methods for consistency.
Comment #17
BerdirDidn't profile this yet, but I think this is a nice performance improvement when loading entities from the storage.
1. I think we still need it, but I'm happy to be convinced otherwise. Note that languages are keyed below the field name in the values array still, so we'd need to loop over all fields to find the complete list of languages. If it were the other way round then it would be easy.
Comment #18
BerdirRenamed a few more methods for save and delete, better suggestions are welcome.
Comment #19
yched CreditAttribution: yched commented+1 on the new (loadFrom|SaveTo|deleteFrom)(Shared|Dedicated)Tables() method names, maps how the StorageSchema names them, which only makes sense.
As an additional step, would be nice to group them a bit more consistently in the class, current order seems a bit random. Probably best in a followup, since it would make the diff harder to review.
Comment #20
BerdirYes, that's exactly why I didn't do it yet :)
Comment #21
yched CreditAttribution: yched commentedJust deleteFrom*() doesn't seem to be like the other ones: there is a deleteFromDedicatedTables() but no deleteFromSharedTables() ?
Comment #22
BerdirCorrect, that all happens in doDelete(), would be overkill to move that to a separate method I think, it's just an if (hasTable) { delete query }
What we could do is make the delete in the dedicated table multiple, but again, not sure if we should do more refactoring here.
Comment #23
plach+1 on the interdiff :)
About #16.1: we are already looping on the
$values
array in the constructor, so collecting language codes there wouldn't be problematic, I think. Moreover this would be less logic to be repeated in alternative storage implementations.Comment #24
fagoAmazing clean-up!
I know this is pre-existing, but loading more stuff in "mapFromStorageRecords" is not what would one expect.
Can we move that loading to an earlier step?
Yeah, sounds like we should try that in dedicated issue.
Comment #25
plachWe could repurpose #2430673: ContentEntityBase::__construct() lacks documentation for that, but since we are touching that code not sure it's worth.
Comment #26
BerdirYes, it doesn't. But it *does* belong right between the two steps that *do* belong into mapFromStorageRecords() (converting $records into a values array that can be passed into the entity class and creating that entity class). I honestly don't know how to change that in something that makes more sense.
Same for translations, I had another look and I'm not sure. Yes, we already loop over $values, but we would have to loop over languages inside each $value array as well, and find any language that we don't know about yet. That feels like it might be slower than what we have now. I think the issue to do this would be #2142885: Simplify ContentEntityBase internal field storage by removing special treatment for LANGCODE_DEFAULT, because that would make it trivial to get rid of the $translations argument.
Udated the issue summary and added a beta eval.
Still want to some performance comparisons.
Comment #27
BerdirOk. Was running the following script on a real site:
The site only has a single language, but ~25 configurable fields and *a lot* of content (320k nodes), so a fair amount of that time is probably spent on the actual queries.
HEAD: 0.52s
Patch: 0.39s
I think that's pretty nice :)
Did not yet do an actual profiling of where that time is spent, that would probably also be good to know. I'v planned to do a fair amount of profiling with that site, to see how it performs.
Comment #28
BerdirOk, repeated the same with xhprof, see attached screenshot.
mapFromStorageRecords() is 64% faster and uses 44% less memory (A part of that memory will then instead be used later, but only the fields that are actually being accessed).
Also, as usual, the uprofiler/xdebug overhead is huge (even if just one of those extensions is enabled). The same script with without either of those extensions enabled is 0.28s vs 0.22s, so almost twice as fast.
Comment #29
yched CreditAttribution: yched commentedYay, impressive numbers :-)
What is left to be done here, then ?
Comment #30
BerdirNot much I think? @plach already mentioned that he's OK with changing/removing $translation later, I think it's up to @fago to RTBC this, assuming he's OK with keeping the map/load method names/calls as it is for now.
Comment #31
BerdirComment #32
fago>I honestly don't know how to change that in something that makes more sense.
Fair enough. I do not see a good nor easy way either, so let's leave anything else #2142885: Simplify ContentEntityBase internal field storage by removing special treatment for LANGCODE_DEFAULT. This is a large clean-up anyway! So RTBC then.
>I think that's pretty nice :)
I'd rephrase that and say it's pretty awesome! :)
Comment #34
BerdirSmall conflicts with #2431329: Make (content) translation language available as a field definition.
Comment #35
plachComment #36
alexpottCommitted fcc8056 and pushed to 8.0.x. Thanks!
Thanks for adding the beta evaluation to the issue summary.
Comment #38
yched CreditAttribution: yched commentedAwesome !