This is either brilliant or moronically stupid. At this hour I'm not sure. I'll lay it out anyway.
Problem/Motivation
1) We were originally planning to store the configuration of what handler is mapped where by default in whatever system the Configuration Management Initiative came up with. However, that has yet to materialize. I do not know when it will.
2) While relying on that model does give us a, theoretically, nearly O(1) setup process, it does introduce a dependency on the CMI API. It also means we have to figure out how one registers stuff for that initial level. How do modules inject new stuff there? We have been punting on that so far...
3) As we can see from the Http handler, some handlers may potentially have non-trivial configuration. That is also true for the just-committed HandlerPathSystem, which needs to tie into the path system (currently hard coded functions) in order to do its job. Ideally we want the path resolution system to be an object that we can inject to it, but that's not possible if our only configuration option is the $params array.
4) As others have noted, the $params array is yet-another-giant-hard-to-document-Drupal-array-dear-god-I-thought-we-were-trying-to-kill-thse.
5) Registering a class name by string and a serialized array to pass to it is so... PHP 4.
6) There's an open feature request to add already-instantiated handlers: #1290360: Ability to set already spawned handlers
Proposed resolution
Go watch http://www.slideshare.net/fabpot/dependency-injection-in-php-5354 before continuing. Pay special attention to slide 47-77 or so. Note what it does with anonymous functions to lazy-create objects. (Ignore the use of __get() for now.) That's kinda hot. (Note from ChX: you can understand what we are doing without watching those 30 slides.)
So suppose we use that as our handler registration mechanism instead? Screw CMI.
To wit, we would do the following:
1) The really base-level handlers (http, path, and other things that core needs to provide) would get hard coded into the bootstrap instead of hard coded into system.module's context_init hook.
2) hook_context_init() actually stays more or less as it exists now rather than being euthanized once CMI is in place.
3) Instead of registering a handler like this:
$context->setHandler($context_key, '\Name\Of\Handler', $array_of_configuration_we_hope_is_enough);
We do it like this:
use Name*Of*Handler as Handler;
$something = "Something very important";
$context->setHandler($context_key, function ($context) use ($something) {
return new Handler($something);
});
That way, the Handler still doesn't get instantiated until it gets used but we have dramatically more flexibility in terms of how we configure it. Instead of a $params array, the constructor can do whatever we want, even accepting other objects, taken from the scope at the time the handler was registered. What gets instantiated immediately is the closure object, which in practice should be fairly small.
4) We start up the context system earlier, even before the config system initializes. That makes it possible to use the context system even before the concept of modules or even configuration exists in the system. (Maybe CMI could access context to determine where to pull context from? Like, say, language???)
5) After the module system initializes, we add another context object to the stack and pass that through hook_context_init(). Those modules that have something to add/override can add stuff then.
In essence, we move the context object from being halfway to a data dependency injection container to being a complete data dependency injection container.
Advantages
1) No CMI dependency. CMI could even depend on context if it really wants to, but that's up to Greg. Really, no dependency at all.
2) Much more flexibility in terms of how we create and configure handlers.
3) The code should be much easier to read and follow, since everything will be in code, not class names in strings (which IDEs can't help you with). Also, fewer arrays of doom. Check http://drupal.org/files/1331504-closure-handlers.patch for how simpler the init code becomes.
4) We get to leverage those hot and sexy new anonymous functions in PHP 5.3.
Disadvantages
1) We still have a bootstrap hook, and it will still get used. That means more module files get loaded. Many of those will get loaded anyway, and there's other work going on to make those module files smaller so it's less of an issue, but they still get loaded. (Arguably this is no worse of a problem than hook_boot() is now.)
2) Initialization is more CPU cycles than just sucking strings out of a config file. However, if the config files are using XML or requiring a database, then maybe that would be a good trade off. I don't think we can benchmark this yet, so we'll have to go with reasoned analysis.
3) I don't know how much memory a small closure object takes up, but I suspect this approach will use slightly more RAM than what we're doing now. Not dramatically, but a small amount more.
Remaining tasks
Assuming the above is a good plan, I can try to crank this out this weekend.
API changes
See above.
So, Hot Or Not?
| Comment | File | Size | Author |
|---|---|---|---|
| #18 | 1331504-closure-handlers.patch | 25.8 KB | Crell |
| #4 | 1331504-closure-handlers.patch | 3.6 KB | Crell |
Comments
Comment #1
Crell commentedIn IRC chx noted that passing $context into the closure is not really necessary. We can't really use it, but we will be passing it to getValue() a few microseconds anyway. So let's leave that part off and just do:
Comment #2
chx commentedMy understanding of closures is at http://drupal4hu.com/node/291 . Pay attention to the simple_printer. I think I like this.
Comment #3
Crell commentedI have not updated the unit tests for this yet, so those won't work, but this is the engine-level change. It's actually quite small.
However, it occurs to me that catch's request earlier tonight in #1323210-16: Discoverability and documentation would suddenly become possible if, instead of creating the request object internal to the Http handler, we create it once and then pass it into the closure (the $something above), and pass it into multiple handlers that handle parts of the Http request. To wit, someething like:
(Mind you, I'm not sure I agree with that request but this would make it possible to do. :-) )
The more I think on this, the more I like it.
Comment #4
Crell commentedI meant to post this, really...
Comment #5
pounardAs I mentionned on IRC, why not keeping polymorphism, i.e. all of these 3 being valid:
For this particular method, which only acts at context init time, I don't see why polymorphism could be bad.
But if I'm not wrong, this looks like use cases of DI components of both Symfony and Zend if we go deeper into this path, it feels like reinventing the wheel.
Remember that Closure is an object itself, some simple handler objects might not worth the shot of doing lazy instanciation for them.
Comment #6
Crell commentedI wouldn't call that polymorphism. That's providing 3 separate API calls. (Doing that all in the same method would be bad for DX, as they all take different parameters and do different things with them.)
As to why not to do all of them, just because it's possible to offer 50 ways of doing something doesn't mean it's a good idea. Too many options is bad for DX, as it's harder to document, harder to read, and harder to debug. (Just look at Perl.)
The 3rd item in #5 is almost identical to the 2nd, and the 2nd can do everything the 3rd can do. So eliminate that code path. The first item in #5 is what we have now in the wscci branch, but as others, including pounard, have mentioned previously that results in a poor, difficult to document DX for the handler objects. We previously decided to stick with it because we couldn't come up with something better. Now we have something better. :-)
While there is an extra overhead here for the closure, I think in practice it's negligible. It's one, maybe 2 ZVals. So 40 bytes or so. Not something I'm worried about.
True, this is much closer to the DI systems used by some of the major frameworks. I consider that a good thing. :-) They don't support the stacking setup we do, though, so we'd still need to build that in anyway. While we could build something like this using Pimple (the micro-DI system that the slideshow above talks about), we'd have to extend it enough that we may as well rewrite it.
But it does mean if we adopt Symfony's DI or Pimple or something for service objects/plugins later (a central service locator is on the table to consider later), it's a consistent API model that people can learn and use in both places. So that's even more of an argument in favor of moving to this API approach. I like. :-)
Comment #7
pounardThe closure approach seems good but does not fit to all use cases. When it comes to pragmatic context handler population using some kind of dynamic driven configuration, they become almost impossible to do except by eval'ing some code. Closure class cannot be instanciated manually therefore it's impossible to dynamically create them.
For all the dynamic/pragmatic handler set we need to have either the ability to set the class description using the array, either the already instanciated object directly. If we implement at least one of those (and it is a mandatory thing to do for the sake of DX) then we fallback on polymorphism: so if we have polymorphism in order to handle 2 use cases, then why not 3.
For all that matters, this polymorphism only interfer with context init time, it's not really dangerous then because really limited in term of scope (and even human scope, only a few will play with that): only code that mess with the context internals will use this, i.e. context values consumers will never know a thing about this (they don't really need).
We can skip polymorphism by setting a more atomic API with functions such as setHandlerByCallback(), setHandlerByInfo() or setHandlerObject() etc... but it feels more like interface pollution for a minimal real gain here.
Still, having more than one method for setting handler is mandatory, and I already wrote the code managing that quite well at least 10 times, code that you still reject and don't want to see because of "cyclomatic complexity" (if that matters, I only added 2 "if" statements in an isolated and less than 10 lines method, which seems quite well encapsulated and really not complex to me).
Comment #8
Crell commentedRegarding dynamic/manually-instantiated closures, untrue:
http://us2.php.net/manual/en/language.oop5.magic.php#language.oop5.magic...
Do you have a use case where a closure would NOT be able to handle setting up a handler? I cannot think of one, but I can think of several where it is better than classname-as-string. The only reason to do classname-as-string is so that we can persist configuration to a file. If we're not doing that, then it becomes unnecessary.
Comment #9
pounardYes, there's the __invoke, I agree, still a lot of magic there.
I was more thinking use case such as:
Thus moving handler instanciation complexity into the ContextInterface instead of letting the developper having to rewrite it in case it needs to proceed to such algorithm.
Comment #10
chx commentedNote that this issue is a really really big decision to add closures into Drupal and needs a hell lot more attention than an obscure issue on an obscure sandbox. Just because we went PHP 5.3 doesnt mean we want to use all features.
Edit: and, I know of course that closures will happen nonetheless because we have no way to reach out to people, no established way to make decisions like this. I am mostly unsure of things like this, of where do we want to steer Drupal and I am scared noone else seems to be unsure. I am mostly leaving Drupal 8 alone because so much of my work seems not to be so good and so I am letting others having maybe better ideas exercise them. Yet, I can't just not raise a voice here.
Comment #11
webchickWhat the heck is http://groups.drupal.org/core for, if not for questions like this..?
Comment #11.0
webchickFix stupid oversight with code samples.
Comment #11.1
chx commentedCopied a very short, to the point code example.
Comment #12
yched commentedI like the approach, and am in principle all for bringing closures and dependency injection containers for our notion of "handlers". That's an abstract construct to ingest, but at least an industry standard rather than a drupalism.
Might be premature, but from the few readings I've been doing on the topic I have a feeling though that when DICs are used massively on a project (I'm thinking of the future "plugins" system as well ?), manually registering all your handlers/"services" through explicit code like the snippets in #3 gets tedious/fragile pretty fast, and also less spottable as a pattern when reading through code, and that large and extensible projects would tend to want a more automated declarative syntax. Which is why in practice all DICs in the "framework" layer in Symfony seem to rely on the ServiceContainerBuilder class (either through declations in YAML/XML files or through direct PHP runtime code), which takes care of "writing" the closures for you - http://components.symfony-project.org/dependency-injection/trunk/book/04... :
Comment #13
Crell commentedWe just had a discussion on this topic in the WSCCI office hours channel. Result:
There's three possible registration mechanisms, as described in #5 above.: Static (what we do now, with a class name string and a config array), Pre-instantiated (pass in an object that's already setup), and closure (what this issue is proposing).
For simple cases, where the class doesn't need any configuration, all three are close enough performance-wise and complexity-wise to be a wash. There's little if any practical difference.
The difference comes in the more complex use cases, where an object has lots of configuration and/or external dependencies. In that case, the closure and pre-instantiated approaches are equally expressive and equally complex, in terms of LOC. However, the closure approach will be better performing because the setup code can be delayed by the closure until the first time the handler is requested. Remember, many context handlers may never be used so any setup time spent on those is wasted.
The static approach is actually harder to read in that case, because rather than executable PHP code we have yet-another-big-array. The only advantage here is that, unlike a closure, we can serialize the registration information for the handler and store it in a configuration system. We hope to have one of those eventually (CMI), but right now we do not.
So... What I'm going to propose for the time being is that we switch to the closure approach as the primary mechanism, but keep the static registration around, documented as "you shouldn't use this; it's only here in case we have a use for it later, it may be removed if we cannot find a use for it." That way we can revisit this question later once core is more developed and determine if it's even an appropriate way to do configuration-driven handlers in the first place.
Cool?
Comment #14
Crell commentedyched: If we introduce a service locator for logic objects/plugins, definitely we should look at a more robust syntax. However, I'm unsure if we want/need something that complex right now for context. We couldn't just subclass off of Symfony's DiC because we need the stacking logic, which would require overriding enough of the base class that we probably wouldn't get any serious benefit out of it.
Comment #15
Crell commentedyched: Actually, looking at it again, #12 is just a more elaborate (and arguably more flexible) version of static registration.
Comment #16
Crell commentedUpdate from today's IRC meeting: We're going to drop the static approach too and go all-closure to keep the initial core patch as slim as possible. Once CMI is in core we can revisit. It's not that much code to add back later if appropriate. :-)
Marking critical as this is a blocker for submitting the next patch to core.
Comment #17
chx commentedNote this is how complexity creeps into Drupal. We have an implementation which is ugly. Then we have a slightly less but still impossible to understand implementation. Everyone is glad -- it's much better than the original implementation let's go with it! But the overall impact by then is completely out of everyone's mind -- which is adding something utterly complex to Drupal.
Comment #18
Crell commentedCurrent patch, also in the repo as a new branch. This converts all handlers to closure, fixes tests, and cleans up the Http handler to take an injected request object. Everything gets a lot cleaner along the way, especially testing of the Http handler.
I think we can now clean up the mock class some, too. I'll try to look into that tomorrow before merging. Meanwhile, review away!
Comment #19
Crell commentedchx, I'm unclear if you're talking about our current pile of inconsistent functions that is the ugly implementation, or the static-handler-registration system in wscci now.
I think the closure approach is, if anything, easier to read (less "utterly complex") than the static version... assuming you understand anonymous functions in PHP. But we learned how to deal with arrays, then learned how to deal with objects, we can learn how to deal with anonymous functions. They're a part of the language, like 'em or lump 'em. The overall impact is very much in mind, and my concerns with this approach are already listed above. We decided it was still a step forward, and of course we can adjust this approach as we convert more things to it and exercise it further.
If you mean Context API vs. what we have now, I'd argue that a single point of reference for request information is less complex, conceptually, than a half-dozen or more functions that few people even know exist. (I didn't even realize until last week that we weren't relying on $_GET['q'] anymore and were just faking it for BC reasons. That blew my mind! It's so obtuse it would never have occurred to me we would do that.) Context API itself is fully self-contained. Once we can convert stuff like the path lookup routines from global functions to something injectable, even those dependencies will be removed. More loosely coupled components reduces complexity, increases flexibility, and improves maintainability.
The logic here is, if anything, less complex than many systems in Drupal already.
Comment #20
catchThat's a shame because I've been trying to point that out for you for literally months, it was a particularly bad example to use for replacing direct with the http handler for this reason.
Comment #21
pounardJust for fun, closures will obfuscate the debugging:
EDIT: I though it worth being mentioned.
Comment #22
neclimdulI'm pretty sure the impact will be fairly small since the closures task is small and tightly defined. The backtrace should still be clear enough to not affect debugging.
Comment #23
pounardAgree, as long as the closures are used sparingly, I guess it's fine. The only thing in that particular use case it hides is you cannot debug it to know what it really does, they must remain really simple.
Comment #24
Crell commentedcatch: In hindsight, I realize that. But since I'd not seen the original issue it blew my mind that we'd make "write into $_GET['q']" something we do unconditionally. It's so weird I couldn't even get my brain to understand what you were saying. :-) That's my point about the Context API being less weird and complex than what is in core now.
Comment #25
chx commentedOne must admint, however grudgingly this piece of code is actually readable.
Edit: i deleted most of my comment. I do get it, the crux of the matter is that when you write this anonymous function then you can actually use the help of an IDE to write the class name. As a small bonus, we can do more complex things in the function than a new if that's necessary. Another small bonus is that the internal init code becomes a tiny bit simpler (not much).
The major disadvantage is that while the init code becomes more readable and flexible, debug becomes harder because a closure is simply not dumpable.
Question: mandating a named function, a handler factory and passing the name of that , wouldn't that be better? Or even more, no need to mandate anything just name it callable instead of closure inside the init the code and document as such and let people pass in whatever they want. This just requires a search-replace on the existing patch. The question then becomes, want to have named factories on the core handlers or not?
Comment #26
chx commentedwebchick pointed out http://www.htmlist.com/development/extending-php-5-3-closures-with-seria... which on one hand is slow and scary but devel module can use the technique described therein to actually dump the function body of a closure (yes I tested this and it works):
(your homework is to write a quine closure based on this :P )
Comment #27
andypostThere'sa sandbox project that already has some implementations in this area http://drupal.org/sandbox/mgiacomi/1131452
Comment #28
neclimdulMy gut is still screaming "this feels wrong" about this approach to building context's but I'm not Colbert and I can't come up with an actual problem with this approach. I'm just throwing this out so the context of what I'm about to say is clear, I don't think debug clarity is actually a valid concern.
I still contend that while the closure itself is not dump-able, the backtrace will be clear. Also, any place you would have access to the closure, you should logically have access to the associated key which should map to documentation that could be generated by devel and be easily grep-able locally. Furthermore I have a feeling we're going to build out a nice section of api.drupal.org devoted to contexts where we could easily provide maps for at least core's contexts.
Thanks for finding this code though chx, I have a feeling it will be useful.
Part deux: I'm not sure I understand your suggested alternatives chx. If the search-replace is easy do you think you could provide it so we can apply it and review its merits in each of our environments?
Comment #29
Crell commentedI believe what chx is suggesting is to make the following legal as well:
Which would be functionally (as in, have the same effect as) the same as the snippet in #25. The pro of that approach would be that debuggers and backtrace dumps have been able to handle that for years. It would also be very simple to implement in addition to closures.
The cons of that approach that I see would be:
1) It's a different syntax that people could run into and therefore have to grok. I want to try and minimize the number of those developers have to deal with (as above).
2) You'd have to keep in mind making sure that the function is loaded before it gets called. It doesn't have to be loaded when you register it, but it would have to be when you call it. That duality is one of the big problems with our bootstrap process right now, because you could call anything at any time, which means we need functions loaded at all times, which makes bootstrap expensive. (That's one of the reasons we've adopted PSR-0 classes and are emphasizing object-oriented approaches; autoloading.) To be fair, the closure approach needs the code loaded as well, but you can 100% guarantee that it is loaded because it's defined when you register it. So that's one less thing to have to think about.
3) With closures, the code is defined right there where you register it. You don't have to go hunting for it. With functions, they could be defined almost anywhere. It's quite common for functions to be defined "elsewhere in the same 2000 line file" right now, which makes it often hard to follow the logic without a really solid IDE with both forward and back operations.
4) It wouldn't actually improve debugging at all for handlers registered with a closure (obviously). So if you wanted the more traditional approach, you'd have to code it yourself. It doesn't give you any advantage in debugging someone else's code. See point #1.
So I wouldn't really be in favor of supporting named functions here. As both chx and neclimdul note, finding and introspecting a closure object is still possible.
Comment #30
chx commented1. It'd be a PHP callback. documented and used through PHP everywhere. For eg if you check the uasort handbook page it expects a callback and links to the callback page. This is a well known PHP pattern. It's tested by is_callable (not is_callback -- that'd be logical and therefore it's not what PHP is doing). Drupal have, rather arbitrary way, decided to use function_exists throughout core IMO that's a bug and needs to be fixed (there are issues already). Now you would pick another arbitrary callback type.
2. If you are worried about being loaded, then pass in a method and autoload it.
3. This is true.
4. Well, core gives an example and people follow ie most handlers would be functions / static methods. We would be liberal in what we accept (all callbacks) and conservative in what we do ( function names / static methods).
Comment #31
pounardHum I guess chx has a point here: instead of supporting anynomous closures we can support callbacks: closures are callbacks anyway, it makes a lot of sense to me.
Thanks chx, sounds like a really good compromise to me.
Comment #32
Crell commentedIf we remove the type hint, that would implicitly allow any type of callback, true. However, I do not at all agree with making bare functions the default approach, and certainly not with that being what core does. As noted in #29, bare functions have a number of problems, and are less capable than closures because they cannot inherit scoped variables. See #3 for an example of where that would be important.
Really, the only arguments against closures I've seen are:
1) We can't persist them, so they can't get saved to CMI or similar. This is certainly true, and we will be revisiting that question after we have a Config API to work with.
2) They have a less useful stack trace. Also true, but as chx noted in #26 that's a solvable problem (and, given that most will be fairly small, shouldn't really be a problem in the first place in practice).
3) This would be the first significant use of closures in core, so we're not used to them. I actually have a great deal of faith in Drupal developers' ability to pick up new language syntax, especially when it makes the code in question easier to read. :-)
So if "people will follow what core does" (which is generally true), what core does should be closures, not static function names that are harder to read, less flexible, and as far as am aware unique across the PHP world. (As in, harder for any new developers to wrap their heads around.)
Comment #33
Crell commentedI've pushed new code in this branch that removes the Closure type hint, and eliminates most of our one-off dummy test classes. By converting ContextMockHandler directly, with closures, I was able to simplify the unit tests quite a bit. We can probably eliminate the rest of the test classes too shortly, but I wanted to get these knocked out and pushed first.
I will probably merge this branch tomorrow. Right now I have to go shopping. :-)
Comment #34
chx commentedNote: http://www.php.net/archive/2011.php#id2011-09-27-1 Added callable typehint. It exists just we cant use it. Pity.
Comment #35
Crell commentedBranch has been merged. We're almost ready for a core patch!
Comment #36
webchickPer catch, and blessed by Larry, moving this and all other WSCCI issues to the Drupal core queue.
Comment #37.0
(not verified) commentedLink instead.