Updated: Comment #0

Problem/Motivation

The Entity Field API is designed to lazy-instantiate field item and field item list objects only when actually being used. Right now, the current storage code assigns them to loaded entity objects, which means they have to be instantiated just to be be able to load it. In all available languages.

Proposed resolution

Refactor and rename attachPropertyData/loadFieldItems() to loadFromSharedTables() / loadFromDedicatedTables() (and also the corresponding save methods) and refactor the second to work on the raw values array instead of already created entity objects.

That has a number of advantages:

- Simplify the flow and only have a single place where we actually create the entity objects, after we loaded all the values (Right now, we have two places and different flows, depending on whether you have a translatable entity type or not)
- Improve performance, because not having to create all those objects means this is much faster and needs less memory.
- Clean up the nameing of those methods, making it more consistent with how we name them elsewhere, resulting in code that is easier to understand.

Remaining tasks

User interface changes

None

API changes

None. The patch is almost completely contained in SqlContentEntityStorage, with one minor exception, which is a very exotic field test that was relying on an internal implementation detail to function.

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Task because nothing is broken.
Issue priority Major because this is a considerable performance improvement
Prioritized changes Improves performance
Disruption Not disruptive, no API changes, nobody should notice anything, except *maybe* alternative entity storage backends
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Berdir’s picture

Priority: Normal » Major
yched’s picture

fago’s picture

I opened #2144631: Add a revisionable key to field definitions, which might help to know what's revisionable.

plach’s picture

Issue tags: +entity storage
Berdir’s picture

Status: Active » Needs review
FileSize
11.97 KB

What can I say, seemed like a great issue to work on at 1am :p

Refactored the load methods to have only a single place where we create entity objects instead of two, and loading all the data upfront. Also renamed some variables, like making sure thatn when we have $entities, they are actually entity objects, and when not, renamed to $values or so. Downside is some places where we foreach over $values, went with $entity_values for the inner variables, better suggestions are welcome.

Some parts are not so pretty yet, maybe we can

EntityFieldTest passes, but that's likely not covering fun parts like delta/revision loading.

Status: Needs review » Needs work

The last submitted patch, 5: load-values-2137801-5.patch, failed testing.

yched’s picture

Aw, this is a beauty - and a much needed overhaul, the method names in there are sooo 2013 :-p (phpdocs will need to be adjusted accordingly)

Berdir’s picture

Will try to fix those tests later. BulkDeleteTest is very funny test fail and proves that it actually works. This is what happens:

1. We create the fields and entities, and then load them. We only access entity key values like the bundle, that we have fast-paced through $this->entity keys and don't need the field definitions to access them. So we never actually load them.
2. Then we delete the field.
3. Now we load the values from the field table, and try to compare with the entity.

BUT, we never loaded the field definitions in the entity. so trying to access to deleted field loads them *now*, and now the field doesn't exist anymore and we get those errors.

I can see two ways to fix this:
a) Quickfix: Force a fetch of the field definitions in the entity, so that they are loaded before we delete the field.
b) Rewrite it to do the same query before and after and then compare the results of the query.

other ideas?

Berdir’s picture

Status: Needs work » Needs review
FileSize
13.1 KB
3.24 KB

This should fix the test fails. was missing support for not having a language key defined, looks like we only have accidental test coverage for that, e.g. because EntityTestLabel has no langcode key defined.

Yeah, docblocks need to be updated, also wasn't sure about the method names yet. maybe LoadSharedFields() (or FieldValues() and loadDedicatedFields()? but it's not the field that's dedicated or not, it's the storage...

Status: Needs review » Needs work

The last submitted patch, 9: load-values-2137801-9.patch, failed testing.

plach’s picture

The langcode key is mandatory only if the entity type is translatable, maybe that's the reason?

Berdir’s picture

Status: Needs work » Needs review
FileSize
13.09 KB
1.06 KB

Wrong langcode checked, this should be better.

Berdir’s picture

Yay green. Some documentation improvements.

amateescu’s picture

The actual logic of this code is a bit above my head but here's a small nitpick review:

  1. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -610,18 +610,20 @@ protected function buildCacheId($id) {
    +   * @param bool $load_from_revision
    +   *   Flag to indicate whether revisions should be loaded or not.
    

    Missing (optional) and "Defaults to FALSE." stuff.

  2. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -630,37 +632,42 @@ protected function mapFromStorageRecords(array $records) {
    +   * @param array $translations
    +   *   List of translations, keyed on the entity ID.
    ...
    +  protected function loadFromDataTables(array &$values, array &$translations) {
    

    Looks like the doc should mention that $translation is by-reference.

  3. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -696,36 +703,30 @@ protected function attachPropertyData(array &$entities) {
    +        // This is not nice.
    

    lol! Why is that? Can we improve it in this patch?

  4. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -1214,36 +1215,28 @@ protected function getQueryServiceName() {
    +   * @param array $values
    +   *   An array of values keyed by entity ID.
    +   * @param bool $load_from_revision
    +   *   Flag to indicate whether revisions should be loaded or not.
    ...
    +  protected function loadFromDedicatedTables(array &$values, $load_from_revision) {
    

    Same as above for the by-reference part. Also, is it intentional that $load_from_revision doesn't have a default?

  5. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -1214,36 +1215,28 @@ protected function getQueryServiceName() {
    -    $age = static::FIELD_LOAD_CURRENT;
    ...
    -        $age = static::FIELD_LOAD_REVISION;
    

    Are these constants used anywhere now or can they be removed?

Berdir’s picture

Thanks.

3. I already did, just forgot to remove that comment ;) See interdiff for the last patch.

4. I'm not sure. It's not called anywhere else, so we always call it with a value anywhere.

5. They're still used in a few places, e.g. in entity query. There's afaik an existing issue for removing it.

I'm not sure if we should also update the corresponding save methods. Not the implementations, just the method names.

plach’s picture

This is awesome! Did you already try some profiling to see how much we are gaining here? Anyway, even if the gain were close to 0, this is a very nice refactoring :)

  1. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -630,37 +632,42 @@ protected function mapFromStorageRecords(array $records) {
    +    $translations = array_fill_keys(array_keys($values), array());
    

    I'm wondering whether we really need the $translations array. I suspect when it was introduced we did not support langcode-keyed $values arrays, so we needed a way to tell the entity constructor which translation were available. Couldn't we just derive the available translations from the $values array inside the constructor?

  2. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -630,37 +632,42 @@ protected function mapFromStorageRecords(array $records) {
    +    $this->loadFromDataTables($values, $translations);
    

    If we want to be fully consistent with the new terminology this should be ::loadFromSharedTables().

Lastly, I think we should also rename the save methods for consistency.

Berdir’s picture

Didn't profile this yet, but I think this is a nice performance improvement when loading entities from the storage.

1. I think we still need it, but I'm happy to be convinced otherwise. Note that languages are keyed below the field name in the values array still, so we'd need to loop over all fields to find the complete list of languages. If it were the other way round then it would be easy.

Berdir’s picture

Renamed a few more methods for save and delete, better suggestions are welcome.

yched’s picture

+1 on the new (loadFrom|SaveTo|deleteFrom)(Shared|Dedicated)Tables() method names, maps how the StorageSchema names them, which only makes sense.

As an additional step, would be nice to group them a bit more consistently in the class, current order seems a bit random. Probably best in a followup, since it would make the diff harder to review.

Berdir’s picture

As an additional step, would be nice to group them a bit more consistently in the class, current order seems a bit random. Probably best in a followup, since it would make the diff harder to review.

Yes, that's exactly why I didn't do it yet :)

yched’s picture

Just deleteFrom*() doesn't seem to be like the other ones: there is a deleteFromDedicatedTables() but no deleteFromSharedTables() ?

Berdir’s picture

Correct, that all happens in doDelete(), would be overkill to move that to a separate method I think, it's just an if (hasTable) { delete query }

What we could do is make the delete in the dedicated table multiple, but again, not sure if we should do more refactoring here.

plach’s picture

+1 on the interdiff :)

About #16.1: we are already looping on the $values array in the constructor, so collecting language codes there wouldn't be problematic, I think. Moreover this would be less logic to be repeated in alternative storage implementations.

fago’s picture

Amazing clean-up!

+++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
@@ -630,37 +632,42 @@ protected function mapFromStorageRecords(array $records) {
+    // Load values from shared and dedicated tables.
+    $this->loadFromSharedTables($values, $translations);
+    $this->loadFromDedicatedTables($values, $load_from_revision);

I know this is pre-existing, but loading more stuff in "mapFromStorageRecords" is not what would one expect.
Can we move that loading to an earlier step?

1. I think we still need it, but I'm happy to be convinced otherwise. Note that languages are keyed below the field name in the values array still, so we'd need to loop over all fields to find the complete list of languages. If it were the other way round then it would be easy.

Yeah, sounds like we should try that in dedicated issue.

plach’s picture

Yeah, sounds like we should try that in dedicated issue.

We could repurpose #2430673: ContentEntityBase::__construct() lacks documentation for that, but since we are touching that code not sure it's worth.

Berdir’s picture

I know this is pre-existing, but loading more stuff in "mapFromStorageRecords" is not what would one expect.
Can we move that loading to an earlier step?

Yes, it doesn't. But it *does* belong right between the two steps that *do* belong into mapFromStorageRecords() (converting $records into a values array that can be passed into the entity class and creating that entity class). I honestly don't know how to change that in something that makes more sense.

Same for translations, I had another look and I'm not sure. Yes, we already loop over $values, but we would have to loop over languages inside each $value array as well, and find any language that we don't know about yet. That feels like it might be slower than what we have now. I think the issue to do this would be #2142885: Simplify ContentEntityBase internal field storage by removing special treatment for LANGCODE_DEFAULT, because that would make it trivial to get rid of the $translations argument.

Udated the issue summary and added a beta eval.

Still want to some performance comparisons.

Berdir’s picture

Ok. Was running the following script on a real site:


use Drupal\node\Entity\Node;

\Drupal::entityManager()->getStorage('node')->resetCache();
$nids = \Drupal::entityQuery('node')
  ->condition('type', '*****')
  ->range(0, 100)
  ->execute();

$start = microtime(TRUE);
Node::loadMultiple($nids);
$end = microtime(TRUE);

var_dump($end - $start);

The site only has a single language, but ~25 configurable fields and *a lot* of content (320k nodes), so a fair amount of that time is probably spent on the actual queries.

HEAD: 0.52s
Patch: 0.39s

I think that's pretty nice :)

Did not yet do an actual profiling of where that time is spent, that would probably also be good to know. I'v planned to do a fair amount of profiling with that site, to see how it performs.

Berdir’s picture

FileSize
283.11 KB

Ok, repeated the same with xhprof, see attached screenshot.

mapFromStorageRecords() is 64% faster and uses 44% less memory (A part of that memory will then instead be used later, but only the fields that are actually being accessed).

Also, as usual, the uprofiler/xdebug overhead is huge (even if just one of those extensions is enabled). The same script with without either of those extensions enabled is 0.28s vs 0.22s, so almost twice as fast.

yched’s picture

Yay, impressive numbers :-)

What is left to be done here, then ?

Berdir’s picture

Not much I think? @plach already mentioned that he's OK with changing/removing $translation later, I think it's up to @fago to RTBC this, assuming he's OK with keeping the map/load method names/calls as it is for now.

Berdir’s picture

Assigned: Unassigned » fago
fago’s picture

Assigned: fago » Unassigned
Status: Needs review » Reviewed & tested by the community

>I honestly don't know how to change that in something that makes more sense.
Fair enough. I do not see a good nor easy way either, so let's leave anything else #2142885: Simplify ContentEntityBase internal field storage by removing special treatment for LANGCODE_DEFAULT. This is a large clean-up anyway! So RTBC then.

>I think that's pretty nice :)
I'd rephrase that and say it's pretty awesome! :)

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 18: load-values-2137801-18.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
FileSize
18.93 KB
plach’s picture

Status: Needs review » Reviewed & tested by the community
alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed fcc8056 and pushed to 8.0.x. Thanks!

Thanks for adding the beta evaluation to the issue summary.

  • alexpott committed fcc8056 on 8.0.x
    Issue #2137801 by Berdir: Refactor entity storage to load field values...
yched’s picture

Awesome !

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.