Problem/Motivation
The Workspace module (#2784921: Add Workspaces experimental module) will be making heavy use of entity revisions by swapping each loaded entity with the revision assigned to a specific workspace. This will add one database query for each entity that is swapped, because we only have a "single load" method for revisions.
Proposed resolution
Add a loadMultipleRevisions()
method to the entity storage.
Remaining tasks
Review.
User interface changes
Nope.
API changes
API addition: a new loadMultipleRevisions()
method is added to \Drupal\Core\Entity\EntityStorageInterface
and all the storages provide a base implementation for it.
Data model changes
Nope.
Original issue summary
The ::load method always for any number of conditions, but if a condition match the revision id field, it is treated special and converted to the $revision_id variable. This is then passed off to buildQuery. This is then used to join in the revision table.
The problem is that the current way allows for only a single revision id to be pass. I feel this is contrary to the fact that the load method is a mulitple entity load method. If we allow an array for revision_id, in the buildQuery method, this would allow for a multiple load, when load is called with many revision_ids.
Our particular case, is that we are implementing a wrapper controller class, that first looks up what revision should be queried, before passing it off to the particular entities load. and we need to be able to do this for multiple entities at one time.
Comment | File | Size | Author |
---|---|---|---|
#51 | interdiff-51.txt | 6.74 KB | amateescu |
#51 | 1730874-51.patch | 26.46 KB | amateescu |
#46 | interdiff-46.txt | 3.31 KB | amateescu |
#46 | 1730874-46.patch | 26.01 KB | amateescu |
#44 | interdiff-43.txt | 4.4 KB | amateescu |
Comments
Comment #1
e2thex CreditAttribution: e2thex commentedfirst patch
Comment #2
e2thex CreditAttribution: e2thex commentedHere is a clean version
Comment #3
e2thex CreditAttribution: e2thex commentedtagging
Comment #4
e2thex CreditAttribution: e2thex commentedadd lsd tag
Comment #5
BerdirNew features need to be added to 8.x first and then backported.
Note that #1184272: Remove deprecated $conditions support from entity controller completely revamps how revisions are loaded but it doesn't allow for multiple revisions either.
Comment #6
Berdir#2: 1730874.patch queued for re-testing.
Comment #8
googletorp CreditAttribution: googletorp commentedIt seems like there is something wrong with the patch format from #2, so tried to remake the patch.
Comment #9
BerdirIt's not that easy anymore. There's now loadRevision() on the controller, which returns a single revision explicitly.
I was considering making it a multiple operation, but decided against that as the required change would imho become too big for that other issue.
So, what needs to happen here to make this possible:
- loadRevision() should accept an array of revision id's, with a type hint.
- entity_revision_load_multiple() should be added, entity_revision_load() should become a wrapper of that function
- The attachLoad() stuff probably needs to be refactored, the revision argument removed, we should instead check each entity if it's the current revision (isCurrentRevision()), create two arrays of entities and call the field attach function accordingly.
- And of course, this needs tests.
Comment #10
disasm CreditAttribution: disasm commentedIf I understand this issue correctly, the entity field query should take an array of revision id's. As such, I've created a simple test that passes an array of 2 revisions to EntityFieldQuery and expects an array of 2 entities to be returned. If I understand Berdir correctly, the current attempt at writing patches falls short of implementing it, so I'm not including the original in my patch, just a test only.
Comment #11
disasm CreditAttribution: disasm commentedComment #13
indytechcook CreditAttribution: indytechcook commentedThis is a start but not working yet.
Comment #14
BerdirShould be "IN (:revision_ids)"
You're looping over the entities, check the one you loaded and then call field_attach_load() for all entities. Which means that it will be executed multiple times.
What you probably should do is loop over them, create two arrays, one that is revision-specific and one that isn't. Then call the correct field attach function for each if the array is not empty.
In the long term, we should probably get rid of the second function and deal with this within field_attach_load().
Remember to set issues to needs review if you upload a patch. If you don't want to have tests run on it, you should append the -do-not-test.patch suffix. The problem with keeping it at needs work is that all patches will be sent to the testbot once it's set to needs review so you will have to wait longer once you're actually interested in the results :)
Comment #16
indytechcook CreditAttribution: indytechcook commentedThanks @Berdir.
I've updated the patch. I wasn't ready to have it reviewed but having the tests run by the bot instead of locally is a good idea. Let's see where we are at. I'm sure tests will need to be updated and the test from #10 needs to be added
Comment #18
attiks CreditAttribution: attiks commentedNeeds backport to 7.x
Comment #18.0
attiks CreditAttribution: attiks commentedfilled out desc
Comment #19
Berdir#2095283: Remove non-storage logic from the storage controllers removes the weird $revision_id argument from the attach function, would however still only support loading either revisions or the latest version of all entities, although that might actually be fine.
If someone wants to work on this then this should keep the loadRevision() method and instead add a new loadRevisions or loadRevisionMultiple() method (kind of weird here, but it's a common pattern, so not sure)
Comment #24
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedHere's a fresh patch, written from scratch in #2912562: Add support for loading multiple entity revisions at once so there's no interdiff. Includes test coverage and everything, so that tag is no longer needed.
Comment #25
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedOops, forgot to attach the patch :/
Comment #26
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedI just looked at the original patches from this issue and I like this hunk more than what I had.
Comment #27
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedAny chance someone wants to review this patch? It would be a big performance improvement for #2784921: Add Workspaces experimental module.
Comment #28
hchonov@amateescu, it looks good and is pretty straightforward change.
I've encountered just some minor nits ...
What about naming the parameter $entity_key?
I would use "multiple entity revisions".
"..or an empty array..."
Could we move this method to ContentEntityStorageInterface? I know it is probably placed there because of the method ::loadRevision() living in this interface as well, but actually regular entities have nothing to do with revisions.
instead of negating we could exchange the places in the ternary operator?
As this is a protected method we are allowed to enforce that the parameter is an array. Why do we have to support passing a single value?
Comment #29
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@hchonov, thanks for the review!
Re #28:
1, 2, and 3: Fixed.
4: IMO it's better to be consistent and offer the method at the same place where we have
loadRevision()
. We could open a followup if you want to move them out to a separate entity storage interface :)5: I've used the negated condition because I wanted to be consistent with all the other similar checks that we have in that file..
6: That's because I wanted to provide full BC support for contrib/custom code which might be calling that method. However, after looking a bit more into it I noticed that there was a flaw in the return value, easy to fix though.
Comment #30
hchonov4. I thought so :). I guess we'll do this in D9, so nothing to do or discuss here.
5. Ok.
6. Let's then trigger a deprecated error in case the method isn't called with an array e.g.:
Comment #31
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedSure, let's do that :)
Comment #32
hchonovI am sorry, but I've noticed now that the last part of the sentence doesn't sound that good - "do one by one.", but we'll have to rewrite anyway because of the next point.
Let's put the entity revisions in an array keyed by the entity ID and execute both
::invokeStorageLoadHook
and::postLoad
with that array, as this way we'll havecount($revisions) - 1
less calls to both methods, which means<code>count($revisions) - 1
less hook invocations and entity class postLoad invocations :). Otherwise we'll invoke all the methods for each entity revision one by one, which doesn't make sense as the executed methods expect an array of entities anyway. Needs work for this.Comment #33
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedWe can't do that because the hook implementations expect a structure like this:
and with your proposal we would give them a structure like this:
That would break every hook_entity_load() implementation..
I've re-worded the comment, hope it's more clear now :)
Comment #34
hchonovOh ok, this is the case where we are loading multiple revisions for the same entity. But what if we are not? Could we build groups of entities for which to call the hooks together? What I mean based on your example would result into:
and
This will be still faster as in this case we'll execute the hook logic only 2 times instead of 4 times.
Comment #35
hchonovI suggest something like this:
Comment #36
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedTBH I think that's an unnecessary optimization and it makes the revisions you receive in the hook implementation look a bit arbitrary. Consider this case:
The hook implementations will receive two objects on the first call and just one on the second call simply because they are revisions of a different entity. That looks a bit awkward IMO but I'm interested to hear what others think about it.
Comment #37
hchonovWith your last example we'll still have one hook invocation less. But this is actually a rare use case.
I was thinking that this functionality makes perfect sense for the module entity_reference_revisions and the use case there is that you usually reference different entities, in which case this optmization makes perfect sense. I see the use case of loading different revisions for the same entity as the use case that will not occur that often like the other one.
Paragraphs relies on entity_reference_revisions and there this optimization makes perfekt sense as well.
Comment #38
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedRight, the entity_reference_revisions use case is covered pretty well by your approach. Let's do it then :)
Comment #39
hchonovThank you! It looks better now and ready to go :)
Comment #40
larowlanminor nit: but thing readability would be aided here if we named the variable revision_id rather than id, so it is clear what id we're keying it by
We need a change record and this needs to reference it
Comment #41
BerdirThe implementation here seems fine. I'm a bit worried about the minor changes to the protected methods. As mentioned by @larowlan, we should have a change record that lists all those changes, in case someone has been overriding the methods. See also below.
Worth documenting explicitly that they will be keyed by revision ID?
I see you're also added the deprecation exception for tests, we should possibly also have a follow-up to remove that, I think that's what's been suggested in #2607260: [meta] Core deprecation message introduction, contrib testing etc..
Also, this deprecation message only covers one side of the equation, in case someone calls it with a single revision ID. but if there's an implementation expecting a single value, it would fail.
The thing is, we have two options:
1. What this patch does, make those slight changes to the arguments, which could break subclasses if they rely on those arguments.
2. Add new methods, deprecate the old. But that would mean we would no longer call those methods (at least for loading revisions, maybe both). and that would break overriddes even more.
Given that, I think those changes are acceptable (aka option 1), changing protected methods is allowed, even though most of those methods are specifically designed to be overridden. I think the most problematic one is buildQuery() because we actually add a new argument there?
Comment #42
hchonovRe #40.1
If we rename it then to
$entity_id
as it is the ID of the entity and$revision_id
usually is used to reference the revision ID of an entity.Re #40.2 & #41:
Our BC policy states the following about protected methods:
The methods might be overridden, but still we've defined in our BC policy that they might change and we're not making promise to provide BC layer when they change. The latest patch contains a BC layer for
SqlContentEntityStorage::doLoadRevisionFieldItems()
even if it isn't required to.Re #41 about
buildQuery
:Actually we don't add a new parameter to
buildQuery
but as of now require that it is an array instead of a single value exactly like the change indoLoadRevisionFieldItems
. We could probably add there the same BC layer by converting a single value into an array and trigger deprecation notice exactly like inSqlContentEntityStorage::doLoadRevisionFieldItems()
.I do understand that it is nice to add the BC layer whenever possible, but I am not sure that we have to do it even at places where we're not required to.
Comment #43
Berdir> I think that's what's been suggested in
Looks like the latest approach there is to have an automatic detection based on the version specified in the message.
Comment #44
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedFixed #40 and #41 and here's the change record: https://www.drupal.org/node/2924915
Comment #45
hchonov@amateescu, what do you think about adding the same BC layer to
buildQuery
and mentioning the change to its parameter in the change record as well?Comment #46
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedSure, let's do that: https://www.drupal.org/node/2924915/revisions/view/10725999/10726441
Comment #47
hchonov@amateescu, thank you for addressing all the remarks since #40.
Comment #48
catchIt'd be nice to do without the continue 2 if we could, although also this is pretty compact so it's not too bad. What about the outer loop as an array_walk() or something? If the nested foreach is the best option that's OK, but didn't see it discussed.
I couldn't find much else to complain about.
Comment #49
hchonovWhat about something like this:
Comment #50
catchThat's much better, CNW for that change.
Also I'm not sure about this change - it's an abstract method on a base class, so we're expecting subclasses to implement it. Should we be adding doLoadRevisionsFieldItems() and deprecating this method instead of changing the signature? This is a bit borderline with the bc policy tbh.
Comment #51
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@catch, sure thing, I don't mind adding a new method for that, it even allows a cleaner BC implementation :)
Comment #52
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedIf we're happy with the interdiff from #51, we need to update the change record to mention the new method.
Comment #53
catch#51 looks good to me, thanks for adding the new method and it does seem a bit cleaner.
Comment #54
BerdirAgreed, looks good to me as well.
Comment #55
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThis is the part that concerns me the most tbh, so I opened an issue to discuss it better: #2926540: Split revision-handling methods to a separate entity storage interface
I also updated the CR.
Comment #56
hchonovNice. Let's get this in :).
Comment #57
catchCommitted 0b90895 and pushed to 8.5.x. Thanks!
Comment #59
penyaskitoThis commit made the lingotek tests fail. I'm still researching why, but maybe it introduced a BC issue.
Comment #60
penyaskitoOK, this is fixing an undesired bug at the same time, that somehow my tests were relying on:
Before this commit,
entity_revision_load('node', NULL)
would return the first node revision. After this commit, would return NULL which I would expect.Comment #61
hchonov@penyaskito, I guess everything is good then and there is nothing to worry about.