Discussed that with a couple people in Portland. We do have a serious rampant issue with dependency injection vs. serialization, we keep bumping into those issues, and we'll keep bumping into them.
- Any class that receives injected dependencies (whether by constructor injection or setter injection) typically keeps a reference to its dependencies in class members. That's how dependency injection works.
- Drupal being Drupal, it serializes stuff in several places:
- Anything that is in a $form / $form_state : the FormInterface object, but also any object whose methods are used by FAPI #callbacks ('#pre_render' = array($some_object, 'someSubmit')).
- Anything that gets cached, stored in state(), or more generally in the k/v stores...
- When an object is serialized, all the objects it references get serialized as well, and recursively, all the dependencies of those objects...
- This leads to a serialization chain of hell: #1909418: Allow Entity Controllers to have their dependencies injected hit a case where serializing the form controller meant serializing the EntityManager -> the Container -> (among many other things) the DrupalKernel. This got fixed by making sure the kernel was serializable - which is just insane :-).
Those serialization chains cause two problems:
- In the "bad" cases, some of the objects down the chain can't be serialized (like, it contains closures), and you get a fatal error.
- In the "good" cases, nothing will break, but:
- Serialization will just silently result in a needlessly huge string - e.g. serializing a plugin manager should be a no-go, since this means serializing the whole statically cached list of plugin definitions stored in CacheDiscoveryDecorator.
- This leads to subtle bugs, since on unserialize, the dependencies get unserialized fine, but are now *different objects* than the ones the rest of the code gets from the container --> many assumptions are potentially screwed (+ memory duplication issues as well).
In that sense, case 1 is better than case 2, since the latter means committing silent performance drag and subtle bugs down the road.
Bottomline is:
- Services injected from the DIC should never, ever get serialized as part of service consumers. The DIC just plays bad with the concept of serialization.
- Drupal *will* serialize stuff. Form API, notably, needs to persist stuff between form display and form submit (or "multistep" rebuild)
I don't think we can go on with piecemeal fixes of each individual occurrence of those issues each time they appear.
The answer for "DIC-friendly serialization" seems to be:
- On serialize, do not serialize dependencies.
- On unserialize, re-pull them from the container. Granted, this breaks mocking, but serialization/unserialization is not something you typically do in unit tests, so I don't see this as a real problem.
How to do this might not be too simple though. Two approaches come to mind:
- Classes that want to be "DIC-friendly serialized" need to implement an unserialize() method that hardcodes the services ids of its dependencies. That means doing something similar to the create() factory method that is currently used at instantiation time in a couple places in core (controllers, some plugins).
Hardcoding service ids in more and more classes is not a joyful perspective, though. - Crazy idea:
- Modify the DIC so that each time a service gets instantiated, the service id by which it got instantiated gets placed in a public __serviceId (or something) property on the object.
- Classes that want to be "DIC-friendly serialized" do:
- On serialize() : foreach member, if (!empty($this->member->__serviceId)) {just serialize the __serviceId string instead of the whole object});
- On unserialize() : $this->member = \Drupal::service($service_id)
In other words: No hardcoded knowledge of service ids in consuming objects. Avoid "service locator"-style pulling of the dependencies on unserialize, by making sure each injected services knows how to "re-pull" itself.
--> If the object got its dependencies injected when it was created, it will just receive the same services on unserialize.
The idea scared me a bit at first, but I don't really see where this would break, and the more I think about it, the more it sounds like a reasonable approach to bridge "dependency injection" and "serialization".
Thoughts ? :-)
Original issue by tim.plunkett
Has been moved over to #2020385: Implement Serializable for entity controllers after I (yched) outrageously hijacked the issue :-(
Comment | File | Size | Author |
---|---|---|---|
#87 | serialize-2004282-87.patch | 12.6 KB | smiletrl |
#87 | interdiff-84-87.txt | 602 bytes | smiletrl |
#84 | serialize-2004282-84.patch | 12.59 KB | dawehner |
#79 | serialize-2004282-79.patch | 12.56 KB | dawehner |
#79 | interdiff.txt | 5.47 KB | dawehner |
Comments
Comment #1
fagoShortly discussed that with crell, who said that's the best we can do. He suggested going with the Serializable interface though.
Comment #2
yched CreditAttribution: yched commented[edit: copied in the OP]
Comment #3
Crell CreditAttribution: Crell commentedBon Dieu...
Drupal serializes a crapload of stuff, that's true. Is there any way we can reduce what gets serialized? And/or only serialize pure data objects, at least in some cases, so that we side-step this problem? (Pure data objects = no dependencies, like the Request object.)
I'll ping some Symfonians and see if they have insights.
Comment #4
Stof CreditAttribution: Stof commentedthe clean way is to serialize only data-objects. This would solve most serialization issue (except complex references in these data-objects if the same object appears in several places though).
As soon as services appear in the object graph, a proper unserialization needs to get the current instance in the container, which cannot be done in PHP (well, it could be hacked in drupal thanks to the globally accessible container, as suggested by @yched, but it is far from being clean).
Note that Symfony itself faces this issue. See https://github.com/symfony/AsseticBundle/issues/168 for instance, which cannot be solved as long as the serializable resource need to be able to use Twig. This is why https://github.com/symfony/symfony/pull/7230 exists to try finding an issue.
Comment #5
yched CreditAttribution: yched commented"Only serialize pure data objects":
I'm afraid this ship has sailed.
- The whole FormInterface construct rely on serializing the FormInterface object, which is typically a (request) controller,
- Entity forms more specifically serialize the EntityFormController (controller in the 'entity' sense, nor "request"),
(+ the entity itself, and the EntityFormDisplay (config entity) - but those two currently can't receive injected dependencies)
"Limiting what gets serialized":
Not sure that's sustainable - at least it hasn't been so far :-).
The problem is also that supporting '#callback' => array($some_object, 'someMethod') (possibly set by 3rd party code in a form_alter()) arbitrarily broadens the scope of what can end up being serialized.
The existence of a DrupalSerializableInterface, based on the serialize()/unserialize() behaviors described in #2, might help narrowing the set of objects we actually support here...
And that's only for forms. Not sure about serialization / caching of render arrays, or caching of other stuff, or things placed in state / k/v...
Comment #6
dawehnerThanks for writing that down.
Another level of complexity is created if we talk about context of plugins instead of simple dependency injection.
Just an idea: If we have a custom serialization interface form_get_cache could look like:
Comment #7
fagoAlso - our primary data objects (entities and fields) contain logic and should not stay pure, we want to inject dependencies there as well. So I agree that re-pulling services from the global DIC is the best we can do.
Comment #8
yched CreditAttribution: yched commentedSo I should probably have created a separate issue for my rant in #2, but it has now effectively taken the issue over. Sorry about that @tim.
I went ahead and moved #2 to the OP, and adjusted the issue title accordingly.
I created #2020385: Implement Serializable for entity controllers to keep track of the original need.
I also think this is at least major, possibly critical.
Comment #8.0
yched CreditAttribution: yched commentedIssue hijack...
Comment #9
yched CreditAttribution: yched commented@amateescu pointed at another nasty occurrence - PHP 5.4 only : #2025451: Drupal\entity_reference\Tests\EntityReferenceAdminTest fails on PHP 5.4.9 and libxml 2.9.0
Comment #10
amateescu CreditAttribution: amateescu commentedBecause I promised @yched and to get it off my mind...
Note that this does *not* fix #2025451: Drupal\entity_reference\Tests\EntityReferenceAdminTest fails on PHP 5.4.9 and libxml 2.9.0 and that it should also be used by #2059245: Add a FormBase class containing useful methods.
Also, I deviated a bit from the proposed resolution in the OP by creating a base class. I think this is much more useful than writing it again and again in each "DIC serialization friendly" class.
Comment #11
amateescu CreditAttribution: amateescu commentedUmm, no, PluginManagerBase is in Component.
Comment #13
dawehnerThis can't work because the compiled container uses a totally different class.
Here is an alternative approach.
Comment #15
dawehnerAdditional I also think that we should just store all the services in one key, so unserialization is a bit faster. Just iterate over $this->_services and get it from the container.
Comment #16
amateescu CreditAttribution: amateescu commentedWhat exactly will be faster, is there a getMultiple() that I'm not seeing?
Comment #17
dawehnerYou don't have to iterate over each entry of the class, but just over the actual services.
This time with an actual working implementation.
Comment #18
amateescu CreditAttribution: amateescu commentedAnd now the serialized string has a "magic" __services property and the code did not get prettier IMO. Maybe some profiling would help to see if iterating over each member of the class is actually a problem first?
Comment #20
BerdirAs discussed in IRC with @dawehner, I agree with using a separate property to store the serialized IDs, mostly for two additional reasons:
- It just feels wrong to replace properties that are documented as \Drupal\SomeClass with a string, even if it's never visible during runtime. That PHP allows it doesn't mean we should mis-use it IMHO.
- It could lead to very crazy bugs: $this->useBackend = 'database'. And on unserialize, that's going to be replaced by the database connection :)
Either document __services as a property or don't bother with conditionally setting/unsetting it. Not both :)
That would simplify the code quite a bit.
Comment #21
dawehnerGood point, here is a simplification of that point as well as fixing the bugs which appeared.
Comment #22
yched CreditAttribution: yched commentedThanks so much for attacking this folks, I intended to, but I'm swamped with the Field API queue...
When we discussed it with @Crell in NYCCamp, he was very reluctant on having \Drupal::getContainer() in tons of classes (all those who extend DependencySerialization).
The alternative we can up with was (functions / method / interface names are mere proposals):
- Introduce custom drupal_serialize() / drupal_unserialize() functions
Those functions are used in place of PHP's serialize() / unserialize() by code that wants to serialize objects in a "DI safe" manner (e.g form.inc)
They take care of recursion in arrays & objects
- They trigger custom DrupalSerializableInterafce methods in objects rather that PHP's Serializable::serialize() / unserialize()
- drupal_unserialize() does
$container = \Drupal::getContainer()
and injects as a param in DrupalSerializableInterafce::unserialize()The objects themselves are then not container aware, drupal_unserialize() says "repopulate yourself with this container"
Comment #23
dawehnerWow this sound quite a lot of more complexity to just not use Drupal::getContainer() in a single base class.
Let's see whether the maintainers will accept that API change.
Comment #24
yched CreditAttribution: yched commentedWell, it's a base class so the method will actually live in lots of objects...
I personally tend to send see having classes pull the container *in the specific context of unserialization* as not really problematic either, but Crell was strongly against.
Comment #25
dawehnerWe could get this patch in and then iterate on top of it. We need all of the logic in the patch anyway ... Discussions for the drupal_serialize API change will come up etc.
It seems okay to use Drupal::methods for static methods (we use it all over the place in entities), so it should be kind of okay for this special methods as well.
Comment #26
yched CreditAttribution: yched commentedWe should test the existence of $this->__services, it won't be set if serialize encountered no services.[edit: ignore me, I'm stupid]
Nitpick: "dependency injection friendly methods for serialization" ?
I guess this is added only as proof that the patch works, but should not be part of the commit ?
Comment #27
yched CreditAttribution: yched commentedNitpick again: for consistency with the __serviceId property added in services, the __services property used to store service ids in serialized objects could be named __serviceIds ?
Comment #28
dawehnerI like that
Let's add some unit tests to ensure it works as expected.
Comment #30
dawehner#28: drupal-2004282-28.patch queued for re-testing.
Comment #31
dawehnerForgot a bit of the review..
Comment #32
dawehnerForgot the interdiff.
Comment #33
yched CreditAttribution: yched commentedWe should probably see if this lets us remove the serialize() / unserialize() stuff that had to be added to DrupalKernel in #1909418: Allow Entity Controllers to have their dependencies injected ?
Comment #34
dawehnerThis does not work, because the drupal kernel is initialized outside of the container world, see
Comment #35
yched CreditAttribution: yched commentedYes, but i thought maybe the fact that we stopped serializing services means in turn we stop hitting a case where we find ourselves serializing the kernel that happens to be referenced in one of those services ?
Point being, I'd tend to consider that being in a situation where you find yourself serializing the kernel is the sign that you're really doing something wrong, we should let that fail instead of making it silently work.
Comment #36
dawehnerLet's see what the bot things of doing it.
Comment #38
yched CreditAttribution: yched commentedDrupalKernelInterface should stop extending Serializable then ?
Comment #39
dawehnerGood catch. It would have been nice to throw something if someone really tries to.
One open question is: what kind of classes will use that baseclass. We could potentially go with ControllerBase, the potential FormBase as well as PluginBase ...
Comment #40
yched CreditAttribution: yched commentedTrue - I'd also be in favor of throwing an exception if someone tries to serialize a plugin manager, that's an insane thing to do...
Well, ControllerBase holds a ref to the container, so safe-serializing it would even be a little more involved - I'm not sure we really have cases were a controller would end up serialized though ?
Forms most definitely, they are the #1 thing on the top of the serialization chain.
PluginBase would be a good thing too, EntityControllers... - basically anything that can have a create() factory method ?
Comment #41
dawehnerRight, controllers should not life outside of the http kernel. PluginBase is a bit complicated as it is part of the components so you would have to create a new PluginBase which extends From SerializationBase
and copy all the code.
Comment #42
yched CreditAttribution: yched commentedTrue... Still, better to do that in a Core/Plugin/PluginBase (extending Component/Plugin/PluginBase) rather than provide nothing and have each plugin type [edit: each plugin implementation class, actually] having the same dilemma (copy methods from PluginBase or from DependencySerialization) ?
Comment #43
dawehnerWhat about making a new follow up for talking about the case of plugins? This potentially will change all of the existing plugin base classes etc.
Comment #44
yched CreditAttribution: yched commentedYes, sounds more reasonable...
Comment #45
dawehnerOpened a follow up #2070309: Figure out how to solve serialization hell for plugins
Comment #46
tim.plunkettI'd REALLY love to have this wait just a little bit for #2059245: Add a FormBase class containing useful methods (which is almost done).
Then FormBase can extend this and be done with it.
Comment #47
yched CreditAttribution: yched commentedIn other news, we've been facing worrying behavior differences with Serialize interface between PHP 5.3 & 5.4 :-(
[edit: doh, forgot to link to the issue... It's the one @claudiu.cristea links in the next comment]
Comment #48
claudiu.cristeaRelated: #2071969: Serialize interface not reliable.
Comment #49
tim.plunkettRerolled for FormBase
Comment #50
dawehnerJust a normal rerole.
Comment #51
BerdirPer @chx, it is not allowed to use __, that is reserved for PHP. Use _serviceId instead.
Comment #52
chx CreditAttribution: chx commentedIt's more of a convention than anything else but this convention is one we want to keep.
Comment #53
chx CreditAttribution: chx commentedA few more nitpicks:
+ * Provides a dependency injection friendly methods for serialization.
a is not necessary
$object->{$key}
when there is no ambiguity on the meaning of the expression{} is not needed. $foo->$bar['baz'] would mean that $bar is an array and $bar['baz'] is calculated first but $object->$key can't be misinterpreted.
Finally, why does serialize run a clone and two get_object_vars...? Why it doesn't run a single get_object_vars and manipulate the array to its heart content?
Comment #54
dawehnerGood points chx. Here are some fixes.
Comment #55.0
(not verified) CreditAttribution: commentedlink to original issue
Comment #55.1
pfrenssenFormatting.
Comment #56
smiletrl CreditAttribution: smiletrl commentedI didn't see why #54 fail, but here's a code clean try. The first failed test in #54 passes locally with this new patch.
Comment #57
dawehnerThis is indeed quite simpler!
Comment #58
twistor CreditAttribution: twistor commentedNit. Normalize the spelling of _serviceId vs. _serviceIDs.
I think either $var['_serviceIDs'] needs to be initialized to an empty array here, or $this->_serviceIDs needs to be set to an empty array at the end of unserialize. I prefer the former so that people cannot mess with $this->_serviceIDs.
It's possible that an object will have different stored services during two different serialize() calls.
Comment #59
tim.plunkettI guess this nees some more thought after #2071969: Serialize interface not reliable
Comment #60
dawehnerRewrote it using __sleep and __wakeup, though the problem is that __sleep just works with member variables, so you can't just put arbitrary values on objects anymore which I think drupal does here and there.
Comment #61
yched CreditAttribution: yched commented__sleep() looks wrong, its return value should be the array of property names that will be included in the serialization:
array('foo', 'bar', '_serviceIDs');
(yes, __sleep() is a bit weird...)
Comment #62
dawehnerOh.
Comment #63
smiletrl CreditAttribution: smiletrl commentedI think the latest patch should look this:)
Edit:
$keys should unset the services key here.
Anyway, I don't think __sleep / __wakeup is a good idea here imho.a) Patch introduced at #2071969: Serialize interface not reliable is to fix the serialization issue when it comes to object property that references to another obejct implementing method serialize() too. But in our case here, we are trying to avoid serialize this kind of properties( dependency services), so we use unset() to eliminate these properties from serialize process. The only benifit I could see is to avoid other properties, except the services, have potential problems with serializable interface.b) Maybe some properties, except the eliminated services, have been set private in declaration, which will lead to following problem:Considering a) and b), I'd like to preserve serializable interface here and dig it at #2074253: Fatal error when using Serializable interface on PHP 5.4:)Comment #64
dawehnerWell, using private properties is considered as practice anyway, so I don't see a big issue with that.
Comment #65
yched CreditAttribution: yched commented@smiletrl: your reasoning only holds when the object using DependencySerialization is the top-level object being serialized, not when it's being serialized as part of recursive serialization of a larger array ($form, $form_state...).
Also, this patch here only unset()s referenced objects *that are services*, not other objects that might be referenced in the serialized object. Those will still be serialized as usual.
So no, I do think Serialize interface is unsafe here.
Comment #66
smiletrl CreditAttribution: smiletrl commented@yched, yeah, right. I missed a lot here:)
Comment #67
dawehnerSo @yched, do you think this issue is ready to fly?
Comment #68
dawehnerAs this fixes a lot of test failures of #2068471: Normalize Controller/View-listener behavior with a Page object this is also critical.
Comment #69
yched CreditAttribution: yched commentedMostly nitpicks:
Uber nitpick, feel free to disagree & ignore:
FooBase is what we use for base classes, so feels a bit weird for an alias to the symfony Container.
Maybe SymfonyContainer ?
(same in ContainerBuilder.php)
As @twistor mentioned in #58, I think we'd need to reset $this->_serviceIDs to an empty array at the beginning of the method (case of successive serialization of the same object in possibly different states). Emptying it after it has done its job in __wakeup() might be a good idea too (needless clutter)
(Also agreed with his remark that _serviceIDs should be _serviceIds for consistency with $service::_serviceId)
$this->dependencySerialization is created and used all within the testSerialization() test method, so maybe it doesn't need to be a property in $this and we can save documenting it ?
unfinished sentence ?
To better demonstrate the behavior, should we:
- not explicitly set $service->_serviceID
- create the test object with new TestClass($container->get('test_service'), 'example') ?
- test that $service->_serviceId == 'test_service' after that ?
Shouldn't this be assertIdentical() ? (what we want to test is identity, not just equality)
Comment #70
smiletrl CreditAttribution: smiletrl commentedThis is trying to address concerns at #69.
It might be ok to preserve the _serviceIds property for object, after it's been serialized: a) code could use _serviceIds to target all injected services quickly after serialization; b) Currently, _serviceIds only works with serialization, and it would be initiated as an empty array before serialization (in __sleep). So, I guess it's not necessay to empty it again.
Rewrite TestClass to make use of the popular 'create' method to inject dependencies.
Haven't tested it locally, see if bot likes it:)
Comment #72
yched CreditAttribution: yched commentedThanks @smiletrl.
- No need for the create() factory method in the test, this is completely unrelated to the code being tested, and it makes the test harder to read for no reason. Creating an object with new Class($dependency) is perfectly legit, the "create() factory method" is nothing more than a pattern that core uses in a couple places where it makes sense, but it doesn't here.
- I still vote for emptying _serviceIds at the end of wakeup(). This patch is only about ensuring correct unserialization, not about providing "oh, when an object has been unserialized, you can find this _serviceIds property in it and do interesting stuff with it" - that's not a behavior we want to support and maintain.
So -1 to introducing a "state" about "the object has been through unserialization". Let's cleanup after ourselves.
- And still +1 for emptying _serviceIds at the beginning of __sleep() too. An object continues its life throughout the request after serialized has been called on it. A second serialization of the object later in the request should not build on the _serviceIds left by the first serialization, the object might have changed state (i.e $this->some_service that was set the 1st time may now be NULL). Might not cause actual bugs with the current code, but it's much safer to make sure of it. No reason to persist a state of _serviceIds that was only relevant at the specific time the object was serialized.
Comment #73
dawehnerComment #74
smiletrl CreditAttribution: smiletrl commentedThanks for clearing 'create':)
Some other nitpicks:
1).
_serviceIDs VS _serviceIds.
2).
As @yched mentioned at #69, point 5), I agree here should use $container->get('test_service') as paramter injected into TestClass. Otherwise, the mocked $container is useless here.
3). As mentioned above,
+ $service->_serviceId = 'test_service';
should probably be deleted and it should be tested after service gets injected. This property _serviceId will be assigned value at $container->get(), which is exactly what we create a new container for:
I guess code at #70 has done it properly, i.e. deleting it and testing it,?
4).
Perhaps emptying, i.e., $this->_serviceIds = array(); is a better idea over unset? "_serviceIds" is an intrinsic property of
abstract class DependencySerialization
other than some arbitrary property assigned during its life cycle. And the default value has been assigned an empty array. I guess it's better to keep consistency here, which is the same case at the beginning of __sleep().5).
If we use $this->_serviceIds = array(); at __wakeup(), then this unset code could be deleted.
Thhis could be either avoided, or something like assertEmpty?
6).
This property has alreay existed:) Maybe set it public in following code to aviod declaration again?
Comment #75
dawehnerHere is an extra test file, I forgot yesterday evening, note: we do a unit test, so we should just test the dependency serialization not the container.
I can't see, how this should work without the code you mentioned.
I unsetted it because I though that people should just not even try to rely on the existance of it, what about setting it to NULL by default?
Let's not sacrifice actual running code just to make tests a bit easier.
Comment #77
yched CreditAttribution: yched commented- Yeah, sorry, my bad - $dependencySerialization = new TestClass($container->get('test_service', 'example'); cannot work, since the test uses a mock container, so Drupal\Core\DependencyInjection\Container::get() does not actually run and $_serviceId is not automatically assigned.
And this is fine - one test for the behavior of serialize/unserialize($object_with_dependencies) wrt $_serviceIds, one test for the behavior of the container wrt $_serviceId.
- +1 on keeping _serviceIds protected and default to NULL in the property definition, and unsetting it back to the same NULL value at the end of __wakeup().
- Sad panda on last patch switching back to _serviceIDs though :-).
- Nitpick: DependencySerializationTest::getInfo()
'name' => 'Service dependency serialization.',
should have no trailing dot.
(same in ContainerTest)
- Nitpick: ContainerTest & DependencySerializationTest
+ $service->key = 'value';
This doesn't seem to serve any purpose, we now check assertSame() anyway.
Comment #78
yched CreditAttribution: yched commentedoops, crosspost with testbot :-/
Comment #79
dawehnerHa, I really got confused that we switched to ID.
Fixed all your points, let's see whether this broken test is reproducible.
Comment #80
yched CreditAttribution: yched commentedYay, looks good and passes locally :-).
This works for me.
Comment #82
smiletrl CreditAttribution: smiletrl commented#79: serialize-2004282-79.patch queued for re-testing.
Comment #84
dawehnerHa!
Comment #85
smiletrl CreditAttribution: smiletrl commentedThis tells it's really necessary to empty _serviceIds in the beginning of __sleep:)
Comment #86
yched CreditAttribution: yched commentedHah :-)
Comment #87
smiletrl CreditAttribution: smiletrl commentedMinor change.
Comment #88
catchIt's not great that we have to do this, but I can't think of a better solution and what's done here is very readable/predicatlbe. Possibly some other class which can't inherit from this one might need those methods, but then it can copy and paste, or we can add a trait later in core or contrib.
So... committed/pushed to 8.x, thanks!
Comment #89
yched CreditAttribution: yched commentedJust wondering - does this mean we are possibly considering requiring PHP 5.4 before release ?
Comment #90
catchWe could add a trait without a PHP 5.4 requirement, by not using it in core.
Comment #91
dawehnerThis would be a great idea in general though I am wondering whether the amount of people who complain that they there are things in core which does not work with 5.3
Comment #92.0
(not verified) CreditAttribution: commentedMore formatting.
Comment #93
XanoFollow-up: #2142515: Provide a method of DI for entities.