OK, after a lot of leadup, here's a patch to introduce pluggable subsystems to core, aka "handlers". As previously discussed this is single-axis routing only. Multi-axis routing may or may not come later if we can fix the issues in the attachment and mapping logic. (See the Handler module for where that's being worked on.)

Background:

http://www.garfieldtech.com/blog/testable-apis
http://www.garfieldtech.com/blog/drupal-handler-rfc
http://www.garfieldtech.com/blog/handlers-rfc-2
http://www.garfieldtech.com/blog/dependency-injection

What this patch does:
- Adds an "environment object" concept to encapsulate the HTTP request and key Drupal systems.
- Adds a new mechanism for defining pluggable points, called "slots".
- Adds a mechanism for defining things that fulfill those points, called "handlers".
- Adds a mechanism for associating handlers to slots for a given "target" (arbitrary string to define a sub-case of a slot; the exact same logic as database handlers.)
- Adds unit tests for the above.
- Adds extensive docblock-based documentation for the above.
- Adds API documentation for the 2 new hooks introduced.

Do not be scared off by the size of it! I think easily 2/3 of the patch is unit tests and documentation, as it should be.

Patches this patch would greatly help:
#340283: Abstract SimpleTest browser in to its own object
#126197: Option to Disable IP Logging
#335411: Switch to Symfony2-based session handling
#259103: make pluggable password hashing framework more generic and use class auto-loading.
#223075: Adaptive path caching
#331180: fix pluggable smtp/mail framework
And probably others I'm forgetting.

I think thats enough documentation about this patch. If not, the patch itself is very heavily commented. :-) (I think some whitespace fixes snuck in as well. Thanks, Eclipse...)

One other note: I still have no better name than "slot". If Dries and webchick agree on a better name I am willing to switch over to it, but only once. :-) Please please please please Please do not bikeshed this issue on what to name slots unless you have commit access. Thank you.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

RobLoach’s picture

chx’s picture

So we wrote DB:TNG and only now, after having like 1-2 months not working constantly on it, it dawns on me how hard it is to follow the codeflow of that. I have written #363687: DB:TNG codeflow is impossible to follow and I have more to follow up that, probably as soon as tomorrow (just a bunch of inline comments to help read). I am squarely opposed to adding another convoluted OOP subsystem to Drupal until we are very, very sure we are going the right direction and the resulting code is easy to understand.

Crell’s picture

Convoluted? Handlers are about 1/10 as complex as DBTNG, and most of the interesting bits are all routing code. That's all procedural.

The actual OOP parts are about as simple OOP as you can get.

birdmanx35’s picture

I'm pretty sure that #358663: Implement a locking framework and make caching pluggable should be on the list too.

RobLoach’s picture

I think it would be beneficial to store slot_load() and handler_load() in the cache system. Right now it makes a call straight to the database if it's not statically cached....

Crell’s picture

If we store the slot and handler info in the cache system, then we can't use handlers for the cache system. :-) Fortunately the queries we use here are really quite simple. Eaton suggested also having an array in settings.php that can be used as an override for those who want even more speed, which would be a good follow-on patch.

webchick’s picture

Subscribe.

catch’s picture

subscribe.

mfer’s picture

subscribe. I'll be giving this a good look over in the next couple days.

drewish’s picture

subscribing

Dave Reid’s picture

This would also be useful for #64866: Pluggable architecture for drupal_http_request(). Subscribing.

mfer’s picture

Before I nitpick the code I'd like to hear back from Dries or webchick on the overall concept.

Overall, I like the concept. Reading the tests is a good way to get an understanding on handlers and how to use them.

What I don't like is the 2 queries per hander use in most cases. First, you do a slot_load() to get the factory function (1 query required) and then you call handler_factory_default() which looks up the class to call (1 query required). The second query can be removed if you have your own factory function. The proposed solution in #6 would work for me, though.

I was, also, wondering why we have interfaces defined for the slots but we aren't checking that a class implements the interface before we instance it.

Damien Tournoud’s picture

This patch is a kitten-killer in wanting to do at least to radically different things at the same time:

  • Making Drupal stateful and easier to test by implementing a DrupalEnvironment class
  • Implementing a generic framework for pluggeable subsystems

Those have (different) impact on the code base, and could easily be implemented separately. There is no real need to mix them.

Having thought about this, I doubt we really need a generic framework for our pluggeable subsystems. Simply using a proper Strategy pattern consistently across core would be more than enough. I doubt that most factory functions will share the same code anyway.

What I would like to see is simply a way to implement consistent "routing tables" defined in settings.php, as discussed already elsewhere.

Status: Needs review » Needs work

The last submitted patch failed testing.

Crell’s picture

Status: Needs work » Needs review
FileSize
50.65 KB

Another update hook added. Chasing HEAD.

Responses to comments so far:

1) Regarding using two queries, that's true. I don't really know how to reduce that, though, while still offering both a single point of access (handler()) and the option for system-specific factories (which are mandatory for many subsystems). I'm open to suggestions on how to better handle that lookup as long as we still get both of those features.

2) We're not actively checking the interface for a few reasons. One, we don't babysit. Two, if you use type hinting then PHP will check for you and give you a fatal error is if you screw up that would be in a more logical place than where we would call it. Three, that's the sort of error that is going to show up really really really early in development so the odds of an even remotely competent developer getting it wrong are minimal. It's nice when the language does error checking and documentation for us. :-)

3) Why environment and handlers are in one patch: Right now nothing in core could really leverage the environment object sanely except handlers. Generally patches with new features aren't accepted unless they're used in at least one place, so I didn't want to deal with the long thread of "but why do we need this?" on an environment-only patch. True I don't do a real implementation of handlers in this patch, yet. I considered it but decided I wanted to at least get the engine out for review first, in case the API was going to change. I also was able to list a half-dozen places where handlers would be useful in the OP, which shows that yes there is a use for it. It's also possible that we'll need to tweak the environment class for the handlers, so no sense breaking that up. If the committers really want the patch split up I can, if we're going to actually go forward with both parts.

4) Simply using a proper Strategy pattern consistently across core would be more than enough. And if it's consistent enough to be the same structure throughout, why shouldn't it be a standardized system that we can rely on, optimize once, and document so that it's easy to understand different systems? We standardize systems all the time; there's currently an effort to standardize everything around drupal_render() and then optimize the heck out of it. The same logic applies here.

5) I don't actually think that most slots will implement their own factory. The generic one should cover a great many use cases, and in practice I see only a select few cases:

A) Single-handler (no targets), return the same object every time.
B) Single-handler, return a new object every time.
C) Multiple targets, return the same object every time cached by target.
D) Multiple targets, return the same object every time cached by class.
E) Multiple targets, return a new object every time.

The generic handler already provided covers A and C. B and E could also be covered by a single factory function. That leaves just the oddball D, for which I can't actually think of a use case right now. If it happens, that slot can implement its own factory.

Arguably I can see us merging A, B, C, and E by adding another "cache" property (or something) to the slot definition. Then the generic handler can simply skip static caching for that slot. That gives us 4/5 use cases covered by one implementation. The only reason then to have your own factory would be if you had pparticularly odd logic, or if you wanted a cleaner function name to call directly. (E.g., cache('page')->set() instead of handler('cache', 'page')->set()).

6) As for how the routing happens, as I noted in #6 a settings.php-based override is definitely a possible follow-on for those who want better speed. However, that should not be the only way. Why should I have to go to settings.php to, for instance, change which core-provided path alias system I'm using? That could very easily be controlled by a radio button. For other systems, say password hashing, we probably want that set once and not changed. So I left it up to each subsystem to decide what the UI (or lack thereof) is appropriate. At the code level, however, it's a single, consistent API.

Damien Tournoud’s picture

The only reason then to have your own factory would be if you had pparticularly odd logic, or if you wanted a cleaner function name to call directly. (E.g., cache('page')->set() instead of handler('cache', 'page')->set()).

That's a damn good reason.

cache($cache_id, $cache_bin)->get();
email($recipient)->send($mail);
password_generator($user)->hash($password);
http_request($url)->post();

None of them can use a one-size-fits-all, mono dimensional approach.

My conclusion: factoring all this into a single implementation serves no purpose.

Plus:

handler('cache', $cache_id, $cache_bin)->get();

is way uglier than:

cache($cache_id, $cache_bin)->get();
Crell’s picture

It sounds like you're completely misunderstanding what a target is. You're not going to use a different email backend for every single email address you send to, are you? Of course not.

The target is just the sub-slot you're calling. It is NOT cache_id and cache_bin. It's JUST cache_bin.

So the real implementations would be:

cache($cache_bin)->get($cache_id);
email()->setHeaders($headers)->send($recipient, $mail);
password_generator()->setUser($uid)->hash($password);
http()->setUrl($url)->setHeaders($headers)->post();

Note the lack of a target specified for email(), password_generator(), and http(), since I can't see a reason for them to have multiple implementations. They probably wouldn't even use targets. All of those are accounted for by the current generic factory.

Again, the target is unrelated to the details of a specific request. It is a class of request. Cache is probably the core system that would have the most different values for its target, and it maxes out at under 10 right now on most sites. Most core systems would not even use the target.

pwolanin’s picture

I'm generally in agreement with Damien - I think the environment object is a very useful idea on its own.

There doesn't to me appear to be a compelling reason to have a system for handlers so much as a pattern as Damien suggests. A few of these, especially the cache subsystem, need some special handling because they come into play before other systems are initialized -but let's not make all the rest extra complex as a side effect.

@Crell - if you look at my patch for the (very simple) e-mail back-end interface, a major feature is that you can specify via the factory a different class to handle the sending based on the message ID or the module that's sending the e-mail. I can readily imagine cases where one would want 2 or 3 different mail back-ends to be active in parallel.

mfer’s picture

@pwolanin Where and how are you suggesting we store the configuration for these type systems? The settings.php file won't work because you can't have a gui that alters the config. What is your proposed alternative? The variable system?

If we don't go this route we need an alternative way because this is popping up in a lot of active patches. What's the alternative?

RobLoach’s picture

Storing the whole slot registry in the cache system, and storing its configuration in the variable system won't work?

Crell’s picture

If we store configuration in the variable table, then it becomes harder to bypass the database entirely. If we want to be able to swap out very-early subsystems like cache or session without forcing a database initialization, we need to be have access to the handler index before the database loads. Variable is not well suited to that, especially if we add multi-routing later as I had originally hoped to be able to do. The variables you'd be setting would be horrifically complex. Breaking it out to its own system makes it simpler, not more complex.

@pwolanin, OK, I can see how you would want to route different messages to different backends. Then you'd use the message ID as the target. Most would not be configured so would fall back to the default, while the few you care about would be configured to some other backend.

I agree that env is useful on its own, but nothing else in core right now can use it without major refactoring. This introduces env and uses it. We can then use it in other places in follow-up patches. In IRC Damien was suggesting rewriting bootstrap to use it. That's great, but it's also a quagmire that I don't want to deal with myself. :-) Nor do I want wading through bootstrap to hold back offering a new extension mechanism that Drupal is currently lacking.

Status: Needs review » Needs work

The last submitted patch failed testing.

mfer’s picture

@Rob Loach if we use the caching system to store the information we can't use handler system for the caching system.

What if we do something similar to how the variable system works. In the settings.php file we have $slots and $handlers. If caching and the variables aren't initialized we are using data from the settings.php file. When they get initialized we add more data to them for other systems. See variable_init and it's usage in the bootstrap process to see what I mean.

This would give us the performance boost by allowing the config to be in the settings.php file and it would cut down the lookup to one cache lookup for each handler use. But, the storage for the config would be the variable system. Do we want to store them there?

Crell’s picture

@mfer: That's essentially what was suggested in #6 above. There are two problems with using the variable array, however:

1) Once we get to multi-routing, which was the original goal, the resulting lookup arrays are going to be way too complicated to write by hand. We'll want to have an export mechanism similar to schema API that can be copied and pasted, and integrating that into variable will be difficult. It also means storing rather large arrays in the variable table, which then take time to deserialize.

2) I forgot to mention this one above, but to properly bypass the DB entirely we'll also need partial overrides for the registry in settings.php to allow the system to know where a given class or interface is stored without hitting the registry tables. That certainly doesn't make any sense in the variable table.

alex_b’s picture

Subscribe.

neclimdul’s picture

Looks fairly good. Like some of how handlers like this could work.

the db query in handler_factory_default() has a problem where the argument is 'slot' instead of ':slot'

mfer’s picture

FileSize
28.3 KB

I'm throwing a 3rd implementation for handlers/routing into the ring. See #367282: A simple handler solution. It can be used before the database/registry is available (example included). It's rough and needs some work but the concept is there with notes.

neclimdul’s picture

@mfer So from a quick glance thought this I took out that it removes the interfaces, the environment object and stores the information in the variable table. I'm not sure that really is so much a 3rd approach as a toned down version of this patch. chx's patch is quite a bit different approach in that it directly maps functions to individual implementations. No concept of handlers in the way these to patches group them(objects).

I think the interfaces isn't a bad thing so I'm not really sure why its gone. It does have some advantages like forcing the proper implementations of the constructor at the language level. The constructor is also gone which I think might be a bit of a mistake since it means the handler looses its context.

The environment object is something I'm a bit on the fence about. My initial notes on this patch have a big Kitten Killer scribbled down much like your earlier comment. But then I talked to Crell about why he decided to put that in this patch and he did convince me it has some merits. Accessing the subsystem of Drupal through this object does feel a bit like overkill but then having some things setup and available to you like the handler function are really handy. It also demonstrates some of the design reasoning to me for things like the interfaces and some of the algorithms.

I do however feel the env system could actually carry itself as its own patch despite Crell's concerns. If nothing else, for its testing benefits. Also, I think this particular patch could work just as well without it. So like I said, on the fence.

The variable system storage I feel is a bad choice. It does give you that $conf settings.php hackery but it just feels bad. There's the possibility for a lot of information being stored here and this just feels like a really bad idea. What if a system like views used these handlers for example? I'd rather the previous patch get the ability to be setup in setting.php.

So my general feeling is that while leaner, this implementation doesn't net us much gain.

neclimdul’s picture

You can ignore my last comment I guess. It seem that patch is being worked on at #367282: A simple handler solution now.

Crell’s picture

chx and I just had a very productive 2 hour phone conversation, and we've worked out a roadmap going forward. Reader's digest version:

- We move forward starting from the patch in this issue.
- chx has endorsed the use of OOP here. :-)
- For now, we remove the environment object. chx successfully argued that despite my efforts to the contrary it's going to leak out into other parts of Drupal sooner rather than later, which will result in much-needed bikeshedding that we don't want to drag down this issue.
- After this patch goes in, we follow-on with an environment patch. Whether that lands in D7 or D8 will depend on how that issue goes. I suspect there will be much discussion in DC.
- Improve documentation with explicit examples of what is a "slot" and a "handler", using the cache system as an example.
- I am going to see if I can collapse factory functions out for now, since above we narrowed the 90% use case down to only two possible logical paths. Still TBD if we get much from doing so.
- Additional performance tweaks in the lookup engine can come in later patches.
- Probably as a follow-on patch, we add a settings.php-based bypass array for handler configuration that lets us wire-up handlers to slots without touching the database. Most sites won't need that, but high-performance sites can use that for cache routing.
- In this patch, we convert the path lookup system to use handlers in addition to a handler.
- Later patches can convert cache, passwords, and the other possibilities already mentioned.
- In a separate issue, we work out how to access the registry without the database. Could be SQLite based, could be settings.php-based. That should be discussed over here: #367660: Make the registry available always

Expect a revised version of this patch as soon as I have time to write it. :-)

mfer’s picture

@crell what about storing the handlers config in a sqlite database. It could be available much earlier then. It could be in the same database as registry system if we go that way.

chx’s picture

@mfer that's a separate issue. @damz we need the hooks and the central system to make it possible for an UI to be written.

Crell’s picture

We discussed storage extensively as well. For "system tables", which includes the handlers config, the registry, the system table, and various others, there are definite performance benefits to moving them off to SQLite or copying them off to SQLite. There are also benefits to putting selected configuration in settings.php so you don't even have the file stat. Those are all separate issues from Handlers themselves and considerably larger problems than they seem at first blush, so that's also a matter for a separate patch.

Crell’s picture

Status: Needs work » Needs review
FileSize
43.99 KB

Right then!

This does not yet include a conversion of the path lookup system, but I wanted to get it up for review. It should cover everything else mentioned above. I'll post another version later that does paths after a little feedback and a good night's sleep. :-)

Open questions:

- I'm a bit undecided on the signature for handler_attach(). Right now it's handler_attach($slot, $handler, $target), but I could see an argument for making it handler_attach($slot, $target, $handler) so that it is closer to hander(). Thoughts?

- I've replaced the factory with a "reuse" flag, and modified handler() accordingly. While doing so, however, a thought struck me. Why make that part of the slot definition? Why not let the handler itself declare if it's reusable or not via a method, for maximum flexibility? Sprinkle in a little inheritance and most handlers won't even need to define anything, since it would be slot-wide, but those that do want to be different from their slot definition can do so. Thoughts?

Status: Needs review » Needs work

The last submitted patch failed testing.

mfer’s picture

Overall I'm a fan of this patch. It's simpler than the last go around. The environment object has been removed (but can and should be added back in another patch. It would be good for testing.) This is similar to the simple handlers patch I rolled so I've closed that one to go this route instead. It better solves some of the issues run into there.

Crell (or anyone else), can you provide a use case for the reuse flag? This came up in the patch I wrote as a question. What would be a case when we don't reuse a handler? I imagine one exists but can't think of one and was challenged not to have the feature and always reuse it.

If we go with a signature of handler_attach($slot, $target, $handler) $target will no longer be an optional parameter. For a majority of the use cases there will be just one target. I'd like to see target optional everywhere and default to 'default'. Though, this does lead to some inconsistencies in the signatures.

Some finer details in the patch itself:

  • handler_factory_default() in handlers.inc is not used in this patch. Looks like a holdover from the last version of the patch.
  • There's a .htaccess rewritebase in the patch from Crells dev enviornment.
  • The comment on handler() still refers to calling the proper factory. This is no longer the case and should be updated.
  • In the comment for hander_attach() it says, "Asociate a given handler to a give slot for the specified target." Notice the misspelling of associate.
  • Is there a reason require_once is used in handlers_rebuild to load handlers.inc and drupal_function_exists() is used in cases like handler_attach() to lazy load the functions (and handlers.inc)? Is it because handler_build is used in cases where the registry isn't available (like installation?) If not this is inconsistent. If it is handlers_rebuild might need to be inserted into install.php somewhere (and maybe update.php). Also, would handler_attach()/detach() need to be called in an install profile (I think so)? Is the registry available there?
  • There is a double period in the comment about handlers in the defgroup for handlers in handlers.inc.

These comments are just what popped out at me on a quick pass. I'll look it over in more detail later.

After this patch lands I'll roll a patch to add a variable to settings.php to allow for them to be configured there (unless someone else wants to). This will be important for systems like the cache system.

Crell’s picture

For when we DO want to reuse the object: Cache system.
For when we do NOT want to reuse the object: Fetching a new image object to manipulate.
For when we may want to vary it per handler within a slot: Um, I'm not entirely sure yet. :-)

Yep, there's still some cruft from the last version lying around. I'll remove in the next round.

Gah, bot, what is wrong with the unit tests? They all ran fine for me. Maybe it's the htaccess file...

Valid point on the function signature. That's probably why I put it in that order originally, and then forgot. :-)

handlers_rebuild() is called from drupal_flush_all_caches(), so I'm working on the assumption that none of the base subsystems (like registry) are in a stable state. Arguably I could use a require_once() in handler_attach() et al, but as a matter of style I tend to use the registry when possible. There may well be other places that handlers_rebuild() needs to be called. Suggestions welcome. chx, do we have a registry yet in install profiles?

As mentioned before we should *not* use the variable system for storing overrides. We should use a separate array in settings.php that can extend with us as we make the routing system more powerful. Doing that in $conf would get ugly fast.

chx’s picture

What about this? This is similar to what core does for languages:

function handler($slot_id, $target = NULL) {
  static $handlers;
  if (!isset($target)) {
    $target = '';
  }

  if (empty($handlers[$slot_id][$target])) {
    $record = db_query_range("SELECT class, reuse FROM {handler_attachments} WHERE slot = :slot_id AND target IN (:target_1, :target_2) ORDER BY target DESC" array(
      ':slot_id' => $slot_id,
      ':target_1' => $target,
      ':target_2' => '',
    ), 0, 1)->fetchObject();

and yes, the empty string is a fine array index.

Crell’s picture

Why is that better than using the union?

Crell’s picture

Status: Needs work » Needs review
FileSize
41.42 KB

New version with minor cleanups based on mfer's comments. Bot, what do you think now?

chx’s picture

Status: Needs review » Needs work

I dunno why that's better than the UNION, we should ask some MySQL experts and make languages and handlers use the same construct.

chx’s picture

Status: Needs work » Needs review

Crossposted, sry!

Crell’s picture

I mean is it faster than the union? More DB agnostic? Fewer calories? Is the sort order (empty string first, reversed with DESC) guaranteed across all databases?

mfer’s picture

@crell I was thinking the variable in settings.php would be $handlers but that's another issue. I agree, it shouldn't be in $conf.

chx’s picture

http://www.mysqlperformanceblog.com/2007/10/05/union-vs-union-all-perfor... both UNION ALL and UNION distinct use temporary table for result generation. definitely not good!

About the sort order, sure, this is not NULL which causes all sorts of weirdness.

David Strauss’s picture

Subscribing by request of chx. I need to look into the UNION vs. OR issue.

Crell’s picture

David: I'll defer to you on what the most performant way to run that query is, and adjust the rest of the code accordingly. Let us know soon. :-)

mfer: Ah, OK. Great, we're thinking along the same lines then.

Wim Leers’s picture

Subscribing.

Crell’s picture

Priority: Normal » Critical
FileSize
63.46 KB

Right then! Here's a new version. I think I got all the tabs cleared out this time... (Stupid Eclipse...)

Changes since last version:

- Converts path lookup to use handlers.
- Provides 2 handlers for the path_lookup slot: Lazy and Pre-Cache. Lazy is what we use now. Pre-cache is a simple re-implementation of the old 4.6 mechanism. It could likely be improved but that's a matter for a separate patch. Its existence is enough for now. :-) (Yes, that means the path system gets new features in this patch, as a side effect.)
- Notes via @todo places where we could arguably improve things even further in later patches. In particular, it occurs to me that custom_url_rewrite_inbound() and custom_url_rewrite_outbound() should either become handlers or go away entirely in favor of just implementing your own path handler via subclassing.
- Adds a very simple admin page to swap between the available path lookup engines. I deliberately didn't spend much time on the UI here. Let's not kitten this patch to death as we can decide later where this admin should go. It's mostly just to show how easy such an admin is to build. :-)

In order to accomplish the above, I also did the following:

- Added a few handler API calls to support the UI.
- Introduced a HandlerBase abstract class in addition to the interface. The idea here is to allow implementers to either subclass or just implement an interface anywhere down the inheritance tree. That gives maximum flexibility while allowing the common case to skip the boilerplate code. I did the same for the path lookup classes themselves as a demonstration of how.
- Made a slight change to the handlers schema.
- I discovered much to my amusement that the path system may not have been the best place to start after all, since it initializes part way through the bootstrap process. In order to handle the edge case of systems that aren't initialized yet, I added an extra check to the handler() factory to regenerate the lookup tables if the slot itself is brand new.
- That, in turn, led to the discovery that the registry cannot be rebuilt until after BOOTSTRAP_FULL completes and system.module has loaded. After some discussion with chx in IRC, I tweaked the registry rebuild routine to explicitly load the extra files it needs in case it is called early, which it will when the handler system needs to rebuild early, which it needs for the path system to even work. Yay!

I think that covers everything. Bumping to critical given the number of other issues that seem to be depending on this working (see above).

Crell’s picture

Oh yes, a note about code organization: Since the handler info hooks are registry-style hooks that will only be very rarely called, I started a system.registry.inc file for them, where such hooks should live. We can move other stuff there later. The two path lookup classes are both in separate files because we'll never need both in the same page request, so we don't want to load both code blobs. The interface and base class, however, will always be needed so those go in system.module for now. (Arguably they can go elsewhere later, but we're not at the optimize-by-cut-and-paste point quiet yet.)

Status: Needs review » Needs work

The last submitted patch failed testing.

Crell’s picture

Status: Needs work » Needs review
FileSize
63.06 KB

OK, scratch that. Turns out that we need to know about the path handler from within the installer, which means we can't rely on the registry because when in the installer we use the maintenance version of module_implements(), which doesn't do anything but a function_exists(). So I've moved the hooks back to system.module for now. That's part of a much larger issue with the registry and maintenance mode right now. Also, added a no-slots and no-handlers check. Otherwise no changes from the last version.

Crell’s picture

FileSize
62.28 KB

One of these days I'll remember to not include my htaccess file when rolling a patch.

chx’s picture

Priority: Critical » Normal

I have asked MySQL Enterprise Support whether

SELECT class, reuse FROM handler_attachments WHERE slot = $slot AND target = $target
UNION
SELECT class, reuse FROM handler_attachments WHERE slot = $slot AND target = '' LIMIT 1

or SELECT class, reuse FROM handler_attachments WHERE slot = $slot AND target IN ($target, '') ORDER BY target DESC LIMIT 1 is faster. The answer is, the first query is incorrect, you wanted

(SELECT class, reuse, target FROM handler_attachments WHERE slot = $slot AND target = $target) 
UNION
(SELECT class, reuse, target FROM handler_attachments WHERE slot = $slot AND target = '')
ORDER BY target DESC LIMIT 1;

and

If there are a large number of records which match either $target or '', your second query would be faster.

I can ask further clarification what's the case for a few records, I bet the answer is that for a few records the results are approximately the same.

As

(SELECT class, reuse FROM handler_attachments WHERE slot = $slot AND target = $target) 
UNION
(SELECT class, reuse FROM handler_attachments WHERE slot = $slot AND target = 'default') LIMIT 1

relies on the fact that the result from the first query comes first and the UNION'd results come after. Relying on the order of any result set without an explicit ORDER in SQL leads to unpredictable result. Because of this I recommend renaming 'default' to ''. This allows you to easily order the result set and does not make the 'default' word a taboo for targets. And changing the UNION to OR because it's a shorter query.

chx’s picture

Priority: Normal » Critical

Did not want to remove the critical priority.

Crell’s picture

Renaming "default" to '' would be a major WTF, and inconsistent with the way targets work in the database layer. Unless there's a documented case where UNION gives an order other than lexical order (which we rely on here), then I'm inclined to keep it as is.

chx’s picture

There is no need to expose the '' to the end user just make it target optional, done. Once again, relying on the order of any result set without an explicit ORDER in SQL leads to unpredictable result. We are working as an ANSI SQL system not one that holepefully works...

If you are bent on 'default' then add a weight and index on that. Edit: weight in the DB, 0 for default, 1 for everything else, not visible to end users, and order on that.

Edit2: language subsystem already uses empty string for language neutral.

Crell’s picture

I mean a wtf when looking at the DB directly. Also:

If the slot-defined default is '', and has a value "a".
The user defines a default target of "b".
The user then requests handler for target "bob", which doesn't exist...

We still need to request the default target to get "b", so we're right back where we started. Unless you're saying the user needs to pass '' as the target to attach the fallback case, in which case you're exposing the '' WTF to the user/module developer. Ungood.

Anything else besides the query bikeshedding? :-)

chx’s picture

It's not bikeshedding at all. It's better queries and code I am after. All I asked for was renaming 'default' to '' and I adjusted the code already showing that so if you ask for target 'foo' and that's not found it'll look for the empty target which is the default.

Now, review

  1. Store this overdiscussed SQL and args in variables so when you re-run it, it's less code.
  2. You want to check target in the retrieved record before creating a new handler. It very well might that you netted the default and if it's reusable, then win, you do not need to create a new one.
  3. No review is complete without whitespace complaints. handler_ensure_defaults query looks untidy and you added a new empty line to path.test.
  4. In the past, variable_get in url() indicated that sometimes it is worthy to static cache something like handler in drupal_get_path_alias to save one function call and an array lookup by its key, it got removed in #356721: Regression: Test failures with clean URLs. I would love to see the comment here enhanced by adding a few words about performance ("improve performance") instead of "avoid lookup" and a moment of meditation over whether we will get a bug here because of the static or not.
  5. Also care to mention that the two path wrapper static cached the same object?
  6. Add a big fat TODO to the admin screen that this needs to be extended to some sort of generic UI. Which is not the concern of this patch.

    so far I found these. Nice that path tests pass!

Status: Needs review » Needs work

The last submitted patch failed testing.

Crell’s picture

Update: After talking with chx to figure out what exactly he is talking about in #1 and #2 above, I'm working on a revised patch that adds better caching to handler(). Of course, I'm running into test failures that I blame him for. :-) Will post a revised version once I have those worked out.

Crell’s picture

FileSize
66.24 KB

Right, so, I'm going to post what I've got and let chx try to figure out why there's still a test failing. :-) There's some debug code in testUnsettingHandler() that I've commented out. The only problem right now is that the test to attach and detach a handler in the same request is failing and returning the wrong handler object. I cannot figure out why. The best I can think of is the target caching, but I've added a reset parameter (which I consider a horrible hack and would rather not use if there is any possible way not to) and it's still failing. It was not failing before.

chx, over to you. ;-)

chx’s picture

yessir! working on it.

chx’s picture

Status: Needs work » Needs review
FileSize
38.18 KB

Here we come. It not just passes the tests but there is more caching, too.

chx’s picture

FileSize
63.13 KB

With the numerous new files.

chx’s picture

FileSize
63.06 KB

Hopefully this version of handler() is easier to understand. Tests still pass.

drewish’s picture

Status: Needs review » Needs work

[chx said he had a new patch so I've only reviewed includes/handlers.inc from #65]

Since I'm unclear on how all these pieces fit together I'm mostly reviewing it from a documentation standpoint. First off let me say the docs are a great starting point I was able to read them and get a rough idea of how the pieces work together so kudos for that. That said now I'm going to get all anal about this while it's still new to me and the shortcomings still apparent.

+ * Slot
+ * A slot is a system or subsystem which we want to be easily swappable. A slot
+ * is defined by a name and an interface. For example, the Cache system is a
+ * "slot".  All handlers for a given slot must conform to the PHP interface
+ * defined by the slot.

Since it doesn't seem like the Cache system is implemented in this patch could we reference the "Path lookup system"? And specifically list the name (path_lookup) and interface (PathAliasInterface) that way I could go look at them if I was so inclined I'd also make it clear that it's an interface in the PHP OOP sense (I think listing the class name will help).

+ *
+ * Handler
+ * A Handler is a class that implements the interface for a given slot. For example,
+ * a database cache implementation and a memcache cache implementation are both
+ * handlers for the Cache slot.  Handlers may be transparently swapped out for
+ * one another.

Same goes here. I'd stick to concrete examples that are in core rather than plan to be in core. So lets reference PathAliasPrecache and PathAliasLazy as two alternative Path lookup systems that can be swapped out.

+ * Targets
+ * A target is a specific case within a slot. Every slot may have an unlimited
+ * number of arbitrarily defined targets, and every target may have a different
+ * handler attached to it.  For example, "page", "menu", and "filter" are
+ * different targets used by Drupal for the Cache slot.

This could use some work... I've got no idea what "specific case within a slot" means... Who defines the targets? The Handler? Target? Caller? It's not clear. The menu example helps but it would be nice to follow the Path lookup example all the way through.

I'd really emphasize that you pick the handler per Target. So you could associate different algorithms to different Targets for performance reasons.It seems like that's a big part of the win of the whole handlers system and explaining that will justify making someone figure out the complexity.

+ * @code
+ * handler_attach('slot_name', 'handler_name', 'target');
+ * @endcode
+ *
+ * That will overwrite any previously associated handler. A handler may then
+ * be requested using the handler() factory function:
+ *
+ * @code
+ * $handler = handler('slot_name', 'target');
+ * $handler->someMethod();
+ * @endcode
+ *
+ * If no handler has been attached to that slot and target, the handler attached
+ * to the target "default" is used instead.  A slot/target may also be "reset"
+ * to the default handler:
+ *
+ * @code
+ * handler_detach('slot_name', 'target');
+ * @endcode
+ *

I appreciate seeing the examples but it would be better to see them with real code. Paths perhaps?

+ * If no target is specified, "default" is assumed.  That allows for a given
+ * slot to not make use of the system's multi-target capabilities by simply
+ * omitting the target whenever it is called, which results in a target of
+ * "default" being used in all cases.

Where are the defaults defined? Does the Handler come with a default? How is that associated.

+/**
+ * Rebuild the handler slot info registry.
+ *
+ * @return
+ *   The array of slots just declared.
+ */
+function handler_slot_build() {

Does this operate on the last handler added or all handlers? the "just declared" has me wondering.
Could we get some @see tags to the hook(s) this invokes?

+/**
+ * Rebuild the handler info registry.
+ *
+ * @return
+ *   The array of handlers just defined.
+ */
+function handler_handler_build() {

The function name and description seem at odds... build or rebuild?
Again, could we get some @see tags to the hook(s) this invokes?

+     ->updateExcept('class', 'handler')
+    ->execute();
+
+	  // If there's no attachments for this slot other than the default, cache
+	  // that to the variable system for faster lookups.
+	  $attachments = db_query("SELECT target, class, reuse FROM {handler_attachments} WHERE slot = :slot", array(':slot' => $record->slot))->fetchAllAssoc('target', PDO::FETCH_ASSOC);
+	  if (count($attachments) == 1 && key($attachments) == 'default') {
+	    $attachments = current($attachments);
+	    unset($attachments['target']);
+	    variable_set('handler_default_' . $record->slot, $attachments);
+	  }
+  }
+
+
+
+}

Has some tab indenting problems and extra new lines.

Dave Reid’s picture

BTW, handler() is missing a PHPdoc for the $reset parameter.

chx’s picture

Assigned: Crell » chx
Status: Needs work » Needs review
FileSize
63.73 KB

Worked a bit the comments. Assigning to myself until Crell swears to Druplicon that he will fix his IDE and comment style and neither post patches with TABs nor with period-space-spaces in them. I hope I killed all of them.

David Strauss’s picture

This looks very interesting for swappable Field API storage engines.

drewish’s picture

Assigned: chx » Unassigned
Status: Needs review » Needs work

man crell and his dot space space was pervasive... though at least it's clear where all the great documentation came from:

  1. +      // Try to get the associated handler.  If the first query finds an associated
    +      // handler, we use that.  If not, the second query will always find the
    +      // default handler.  By UNIONing them together we get the fallback default
    
  2. + * behavior and a way to type-check a handler object.  Interfaces for specific
    
  3. +  // slot-defined default handler.  That ensures that we always have at least
    
  4. +  // that doesn't cause a problem.  Note that because objects pass by handle
    
  5. +    // Now request the handler.  We should get back the default.
    
  6. +      'description' => 'This slot does nothing.  It\'s just to test lookup behavior of reusable slots.',
    
  7. +         'description' => "This handler doesn't do anything to the string.  It passes through unaltered.",
    
  8. + * appropriate interface.  How they do so is entirely up to them.
    
  9. + * This hook should return a nested array of slot definitions.  Each key
    
  10. + *   The PHP interface that defines this slot.  The interface must extend
    
  11. + *   in hook_handler_info().  All slots must have at least one handler
    + *   available.  If not specified, "default" is used.
    + * reuse (optional)
    + *   By default, if a given target on a slot is requested multiple times in
    + *   the same page request the same object will be returned each time.  The
    + *   object will be statically cached.  To disable that behavior and return
    + *   a newly instantiated object for every request, set this value to FALSE.
    + *
    + * This is a registry-style function.  It should normally be placed in the
    
  12. +      'description' => 'This slot does nothing.  It\'s just to test lookup behavior of reusable slots.',
    
  13. + * This hook should return a doubly-nested array of handler definitions.  The
    + * first key is an existing slot.  Its value is an associative array of handler
    + * machine-readable names and handler definitions.
    
  14. + *   The PHP class that defines this handler.  The class must implement the
    
  15. + * This is a registry-style function.  It should normally be placed in the
    
  16. +         'description' => "This handler doesn't do anything to the string.  It passes through unaltered.",
    
  17. +        'description' => 'The PHP class for the handler attached to this slot/target.  It is stored in this table to make lookups faster.',
    
  18. +        'description' => 'Whether or not this handler may be reused.  It is stored in this table to make lookups faster.',
    
  19. +        'description' => 'The PHP class for the handler attached to this slot/target.  It is stored in this table to make lookups faster.',
    
  20. +        'description' => 'Whether or not this handler may be reused.  It is stored in this table to make lookups faster.',
    
  21. +    'description' => 'Aliases are looked up one at a time as they are requested.  Good for sites with a very large number of path aliases.',
    
  22. +    'description' => 'All aliases are loaded from the database at once and then looked up in memory.  Good for sites with a much smaller number of path aliases than pages.',
    
Crell’s picture

Status: Needs work » Needs review

@DavidStrauss: For field API storage we'll probably need multi-axis routing, which this doesn't do yet. I want to add that, but there's still screwiness there to work out so I wanted to get this version in first at least. Of course, that begs the question of whether we really should go all out in perfecting the routing algorithm now when we want to replace it with a multi-axis one later. I'd actually say no. Just make it work for now, don't squeeze every cycle out of it, and we'll get back to it later with a multi-axis version sometime in March/April after I work out the math for that. (Which is partially done; see the blog posts mentioned in the OP.)

Regarding whitespace: I bloody hate Eclipse. And it's not my fault the Interweb doesn't know how to typeset proper English.

pwolanin’s picture

I don't like the term "slot" since this is really just the name of an interface plus some meta-data. It's also a rather uninformative name. I'd prefer "subsystem" or even just "interface".

chx says that we have a separate "slot ID" so we can have shorter names like "cache" - I'm not convinced that this is better than just using the name of the interface itself. e.g.

function system_interface_info() 
  $interfaces['PathAliasInterface'] = array(
    'title' => 'Path aliases',
    'description' => 'Lookup engine for path aliases',
    'default_handler' => 'lazy',
  );
  return $interfaces;
}

function drupal_clear_path_cache() {
  handler('PathAliasInterface')->clearCache();
}

I guess I can see that one class may implement multiple interfaces, so there may not be a 1-to-1 mapping between a handler ID and class name, as distinct from what it seems the case is for "slots".

chx’s picture

Assigned: Unassigned » chx
FileSize
63.65 KB

More comment fixes in the defgroup section thanks to keithsmith and merlinofchaos.

keith.smith’s picture

Posted a bunch of other comment and string changes to #drupal-dev, for chx to pick up when he has a chance.

chx’s picture

FileSize
65.06 KB

Heaps of comment fixes by keithsmith and pwolanin and removed finally all the badness in spaces and tabs. Grrrr.

chx’s picture

Reviewers: please start with handlers.inc , that's where the @defgroup is so it's likely anyone trying to understand will start there. It's a lot harder otherwise.

pwolanin’s picture

Reviewing some of this in more detail - handler_attach() and handler_slot_rebuild() do not seem to check that the desired handler or default handler implements the required interface. It might also be useful to actually require that all the interfaces extend HandlerInterface.

Without these sorts of checks, it seems to me that we are throwing away one of the more useful feature of doing this OO style.

Nit-pick: we define the _info() hooks to return arrays for each slot or handler, but then load them out of the DB as objects and use them as such. Can we be consistent?

Also, there is an inconsistency in the DB schema vs. the variable naming - the table has 'slot' while in code we call it $slot_id. I think these should match. Same for $handler_id. Or better yet, jsut use the interface name and class name and do away with these ID mappings.

Suggested (not fully tested) code for interface checking below. I think we first need to re-order the calls here:

/**
 * Rebuild the handler and slot registries.
 */
function handlers_rebuild() {
  require_once DRUPAL_ROOT . '/includes/handlers.inc';
  handler_handler_rebuild();
  handler_slot_rebuild();
  handler_ensure_defaults();
}

Then we should not accept a handler class that does not implement the correct interfaces. Note also the change to the foreach ($slot_id).

function handler_slot_rebuild() {
  $slots = module_invoke_all('slot_info');

  // Set default values, to avoid NULL issues if nothing else.
  foreach ($slots as $slot => $info) {
    $slots[$slot] += handler_slot_defaults();
  }

  // Let other modules alter the handler target registry if needed.
  drupal_alter('slot_info', $slots);

  if ($slots && drupal_function_exists('handler_load')) {
    // Build the target data.
    $insert = db_insert('handler_slot_info')->fields(array('slot', 'title', 'interface', 'reuse', 'default_handler', 'description'));

    foreach ($slots as $slot_id => $info) {
      $handler = handler_load($slot_id, $info['default_handler']);
      if (isset($handler->class)) {
        $interfaces = class_implements($handler->class);
      }
      // Make sure the handler implements the expected interfaces.
      if ($handler && $slot && isset($interfaces['HandlerInterface']) && isset($interfaces[$info['interface']])) {
        $insert->values(array(
         'slot' => $slot_id,
         'title' => $info['title'],
         'interface' => $info['interface'],
         'reuse' => (int)$info['reuse'],
         'default_handler' => $info['default_handler'],
         'description' => $info['description'],
        ));
      }
      else {
        watchdog('error', 'Invalid default handler @handler_id for @slot_id in function handler_slot_rebuild()', array('@handler_id'=> $info['default_handler'], '@slot_id' => $slot_id));
      }
    }

    // Don't do the deletion until after we've gotten the new data prepared.
    // That reduces the potential race condition window.
    db_delete('handler_slot_info')->execute();
    $insert->execute();
  }

  return $slots;
}

Note - this function seems to have had some flow-control bugs also:


function handler_attach($slot_id, $handler_id, $target = 'default') {

  // Lazy-load the utility function.
  if (drupal_function_exists('handler_load')) {

    $handler = handler_load($slot_id, $handler_id);
    $slot = slot_load($slot_id);
    if (isset($handler->class)) {
      $interfaces = class_implements($handler->class);
    }
    // Make sure the handler implements the expected interfaces.
    if ($handler && $slot && isset($interfaces['HandlerInterface']) && isset($interfaces[$slot->interface])) {
      // We store the class name in the database in addition to the handler ID
      // so that we can immediately instantiate the class upon lookup.
      db_merge('handler_attachments')
       ->key(array(
         'slot' => $slot_id,
         'target' => $target,
       ))
       ->fields(array(
         'handler' => $handler->handler,
         'class' => $handler->class,
         'reuse' => (int)$slot->reuse,
       ))
       ->execute();
  
      // If we've associated something other than the default target, disable the
      // extra variable system caching of handler attachments.
      if ($target != 'default') {
        variable_del('handler_default_' . $slot_id);
      }
    }
    else {
      watchdog('error', 'Failed to attach handler @handler_id to @slot_id in function handler_attach()', array('@handler_id'=> $handler_id, '@slot_id' => $slot_id));
    }
  
    // Flush the target cache.
    handler(NULL, NULL, TRUE);
  }
}

mfer’s picture

These have come a long way in a short time. Yay. Warning: this review was done at 3am for me. Not my most awake time.

+/**
+ * Get the handler object for the handler associated to this slot/target.
+ *
+ * @param $slot_id
+ *   The slot for which we want to look up a handler.
+ * @param $target
+ *   The target for which we want to look up a handler.
+ * @return
+ *   A loaded handler object.
+ */
+function handler_get_attached_handler($slot_id, $target = 'default') {

The comment on this could be confusing to some. I imagine someone will ask what's the difference for a handler object like $handler = handler('cache', 'filter'); and a handler object listed here? Can we make this comment more clear.

In the test function testDefaultHandler() we have

+    if ($handler instanceof HandlerInterface) {
+      $handler->setString($this->sampleString);
+      $mangled = $handler->getString();
+
+      $this->assertEqual($this->sampleString, $mangled, t('Default handler returned correct string.'));
+    }

If $handler is not and instance of HandlerInterface doesn't that mean our test failed? Shouldn't this be a test itself in here? The same things happens in most of the methods on HandlersNotReusedTestCase. Not sure if we need to check this every time.

I was only able to make it halfway thought the patch. At this point it looks good. I'll look it over more later.

ejort’s picture

subscribe

Crell’s picture

Assigned: chx » Unassigned

I actually do not agree at all with using the path system as the documentation example. It's a non-obvious example. A caching backend is almost the canonical example of where factories are useful. The only reason we're not converting that here is that it's more complex to do than the path alias system.

The interface is not enforced at runtime directly for mostly performance reasons. However, anything that takes a handler as a function/method parameter can enforce it via a type hint if it chooses to. I don't think we need to add more runtime checks than the language gives us already. If we were to enforce the interface, the time to do it would be at handler_rebuild(), and we then make a watchdog record if something fails. Possibly even a screen error.

I disagree also with dropping the slot/handler key name. Typing in long case-sensitive interface names every time you need to request a handler is a DX problem, especially as the interface may not be as self-descriptive as we'd want. It also means if we ever change the interface name (unlikely but hey, could happen), we'd have to change every call to it.

I'd much rather type handler('cache', 'menu') than handler('CacheSystemInterface', 'menu');

I'm OK with using slot_id in the DB.

Arrays vs. objects, eh. The handler_load() and slot_load() routines mirror node_load(). They're returning a single handler thingie. Although arguably it could be a closer mirror to menu_get_item(), since the actual handler object is something else. I guess I can go either way here.

drewish’s picture

crell, i'd be into updating the docs with better examples once they're in core but until then real code that can be examined is better than code that might be added later. we can update the example once something better is in core.

i'm also wondering why the slot's default is optional. it seems like every slot need to have a default handler. if that's not the case how is a missing default handled?

Crell’s picture

The default default handler is "default". Documenting that as not optional is fine by me, but as a general practice I try to provide reasonable default behavior for everything whenever possible, even if it is unlikely to be used. It makes later code much cleaner, too (such as NOTICE/isset checking). So I'd want to leave the code in there as is, even if we document it as required.

drewish’s picture

hook_slot_info() needs to document that the default handler must be defined or else bad things may happen. meaning that the module implementin hook_slot_info() should implement hook_handler_info() and define a default handler.

drewish’s picture

FileSize
41.62 KB

re-rolled with some @see tags in the docs and fixed the conflict with system_update_7019().

drewish’s picture

FileSize
68.06 KB

whoops forgot the new files.

chx’s picture

FileSize
39.45 KB

Rerolled against HEAD. Modified the hook_slot_info comment on default_handler.

chx’s picture

drewish’s picture

I just wanted to weigh in and say that I'm very happy with the current documentation and API exposed to callers. I won't really speak to the internals.

drewish’s picture

Status: Needs review » Needs work

handler_get_attached_handler() needs to call drupal_function_exists('handler_load') the same way that handler_attach() does:


/**
 * Get the handler object for the handler associated to this slot/target.
 *
 * @param $slot_id
 *   The slot for which we want to look up a handler.
 * @param $target
 *   The target for which we want to look up a handler.
 * @return
 *   A loaded handler object.
 */
function handler_get_attached_handler($slot_id, $target = 'default') {
  // Lazy-load the utility function.
  if (drupal_function_exists('handler_load')) {
    $handler_id = db_query_range("SELECT handler FROM {handler_attachments} WHERE slot = :slot_1 AND target = :target
        UNION SELECT handler FROM {handler_attachments} WHERE slot = :slot_2 AND target = 'default'", array(
        ':slot_1' => $slot_id,
        ':slot_2' => $slot_id,
        ':target' => $target,
      ), 0, 1)->fetchField();

    return handler_load($slot_id, $handler_id);
  }
}

Also we need to rename all the @params from $slot_id to $slot as pwolanin suggested in #78.

chx’s picture

Status: Needs work » Needs review

We can fix minor things like that later, I desperately would like to see in depth reviews and opinion from Dries/webchick on slot/subsystem.

Damien Tournoud’s picture

I still do not believe that we need either hook_slot_info(), nor hook_handler_info(). We already have a function, class and interface registry, we don't need two new ones. Simply adding some meta-data to the definitions will easily do the trick. Consider:

/**
 * Fancy string.
 * 
 * Do fancy stuff to strings.
 *
 * @defaultHandler default
 * @reuse FALSE
 */
interface FancystringInterface extends HandlerInterface {

}

and

/**
 * No string mutation.
 *
 * This handler doesn't do anything to the string. It passes through unaltered.
 */
class FancystringDefault extends FancystringBase implements FancystringInterface {
  public function getString() {
    return $this->string;
  }
}
Crell’s picture

Without the metadata hooks, there is no way to build a reasonable UI, or enforce valid handlers. And further bastardizing our already bastardized docblock syntax is not a replacement for a hook that follows well-established Drupal conventions.

The registry is just for lazy loading. It does not track inheritance or docblock parsed data, nor do we need it to.

Damien Tournoud’s picture

@Crell: I don't believe we need an UI *at all*.

Having an UI means relying on the database for handler routing. We don't need that. The bootstrap process is already less robust because of the registry (don't get me wrong: I like the registry, but nonetheless, it makes Drupal rely on data it needs to fetch from the database). We don't need to make it even less robust by relying on even more database data.

The debate is not if we need to move to a cleaner OOP pattern for pluggable subsystems. Of course we need to.

The key of this debate is on how to implement a robust routing system for those pluggable subsystem. I don't believe that the patch, in its current form, implement such a robust routing system. I suggested in #362747: Create a call routing system (fix broken dynamic includes) to define the routing table in settings.php. I still believe that's the way to go.

Crell’s picture

And there is nothing in this patch that precludes an override in settings.php. In fact, we discuss that extensively higher up in this thread. Do you want that included in the initial patch? We can certainly do that. It's been on the table since October. I'm trying to avoid too large of a mega patch.

And with the variable-based caching chx introduced, we skip the database for any non-target-using system and any that have nothing but the default configured. With no extra work in settings.php.

The problems you point out have already been solved.

drewish’s picture

FileSize
68.19 KB

I fixed the bug in handler_get_attached_handler() calling handler_load() before including it.

Corrected the documentation on the @params for $slot_id and $handler_id.

I inlined handler_slot_defaults() and handler_handler_defaults() since they were only called once and didn't seem to add much.

Damien Tournoud, I really don't see the win in forcing the metadata into docblock comments. You end up needing some mechanism for querying it and I can't see it being nearly as straight forward and reusing the hook infrastructure we've already got. I also think that we do need the ability to provide a UI for selecting handlers. If someone wants to switch image toolkits it seems silly to force them to edit the settings.php file to do that.

Crell’s picture

handler_slot_defaults() was a deliberate design decision. I don't want to have to factor that array back out later if we need it, and they're rarely called so there's no noticeable performance hit to having them separate. Separate, though, they can be more easily documented. Please bring those back. :-) Will review the rest of #96 when I can.

I had a long conversation with Dries in #Drupal earlier today so will be making some considerable revamps to the patch in the next roll. Just an FYI.

bjaspan’s picture

subscribe

Status: Needs review » Needs work

The last submitted patch failed testing.

Crell’s picture

I've created a g.d.o group for continuing work on this issue, in light of the BoF at DCDC: http://groups.drupal.org/handlers (It should be approved soon, I hope.)

mfer’s picture

Is this still being worked on? If so, where does it stand?

Crell’s picture

I'm in the process of moving this patch into the handler module 7--1 branch, where we will work on it further (per discussion in the group) and include a series of implementations to show that it does in fact scale. Then we come back with a series of patches for core. More soon, in the group.

Crell’s picture

Title: Handlers: Pluggable subsystems for core » Plugins: Swappable subsystems for core
Assigned: Unassigned » Crell
Status: Needs work » Needs review
FileSize
46.35 KB

And we're back! After much discussion at DC, a lot of offline distraction, much discussion, and a fair bit of work off in contrib for a while, we're back with a new patch. There have been a number of changes.

1) They're now called Plugins instead of Handlers. "Handlers" was the original name because they seemed very close to the "Handlers" concept in Views 2. However, working with Views 2 more in the past few months I've realized that no, they're not Handlers as Views uses the term because they're not tightly coupled to a schema. They are what Views calls Plugins. So I adopted that name for clarity.

Related to that, according to merlinofchaos Views could be adapted to use this version of Plugins rather than its version reasonably easily. It would still need to do its own routing logic, bypassing the core attachment mechanism, but would be able to leverage the discovery and configuration mechanism. So this issue is kinda-sorta views-in-core related. :-) (More views-derived infrastructure in core.)

2) There is now a partial implementation of configuration objects. We realized in DC that yes, we do in fact need to have per-attachment configuration. That's now handled by a separate configuration object, which has its own Views-like form handling methods. So far I'm not leveraging those as I'm still working on the front-end to automate that, but the base infrastructure is there.

3) The introduction of the configuration object has made the "cache simple mappings to the variable table" logic a bit more complicated, as there's more data to store there. I'm not sure yet what the best solution is, so for the moment I've disabled it. Suggestions and feedback greatly appreciated!

4) There's still a ton of unit tests. Yay! :-)

5) Because with the configuration object the core system is doing more, I decided for now to punt on the "routing handlers/plugins" concept for now. The edge cases are really edge, and even if you have to write your own attachment logic like Views would there's still benefit to be gained from the registration and discovery system, and the standardized configuration mechanism. Routing Plugins can be added later if we really need or, since they're really just a type of plugin object, could even be a contrib addition.

6) Since our last episode, a number of other core systems have adopted a system modeled very much on this approach. The cache system and the Job queue use this approach almost directly, while field_sql_storage, and the file transfer system do or could use something similar with a little work. That allows us to standardize some new core APIs now, and open up an easy way to build more extensibility into many many core systems.

As I said this is still a work in progress, but I wanted to get it back out in public for review and feedback. Please review and provide feedback! The sooner we can get this mechanism polished and in core, the sooner we can leverage it in more places pre-code freeze. A list of places we could do so can be found earlier in this thread. :-)

David Strauss’s picture

Not to bikeshed, but I'd prefer a term other than "plugins" because in the general software world, "plugins" and Drupal modules basically occupy the same territory. I'd hate to confuse new users and developers. I personally like the term "handlers"; whether it's what Views considers handlers is largely irrelevant to me.

mfer’s picture

Status: Needs review » Needs work

Here are some initial observations/questions I have.

Why isn't the config object stored in the variables table? When you only have the default handler you can't have config options. I know this is for performance and I don't know that I want config objects in the variables table. But, it's a limitation that isn't noted and really shouldn't be there.

For early running systems like cache and path this runs registry_rebuild() and plugins_rebuild(). This seems terribly inefficient. Not yet sure of another way to approach this.

What about configuring plugins in settings.php?

Some nitpicking...

Why is the ['math']['add'] plugn defined for simpletest? It doesn't look to be in use.

In PluginConfigurationBase why does the options() method return array() and not the variable $options? I'm guessing this is an oops

In plugin() why is there the line

  if (FALSE && $record = variable_get('plugin_default_' . $slot_id, array())) {

A hold over from debugging?

Crell’s picture

Status: Needs work » Needs review
FileSize
46.74 KB

The FALSE is to temporarily disable the variable-table caching, since as mfer notes it doesn't currently handle config objects and sticking those in the variable table doesn't sit well with me. I don't know what the alternative is yet. I've added a code comment to explain that it is temporary and that we need to fix it. :-)

The registry rebuild only happens in extremely rare cases like the installer, or when you've just moved some code around so the registry is stale. I've updated the comments accordingly. In practice I don't think it will ever get called except during the installer. I ran into that problem earlier when trying to convert the path lookup system, which is needed extremely early and in the installer as well so basically nothing works yet. Cleaner solutions are welcome here, too.

Configuring plugins in settings.php: Excellent question! Since the configuration object is mostly a wrapper around simple key/value pairs, we can probably export it to an array and include it in the to-be-written mapping override in settings.php.

math/add was defined mostly because I wanted to have at least 2 plugins defined, so that I knew if the test got back the right one. There has to be a "wrong" answer.

PluginConfigurationBase: Ooos. :-)

Re the name: Plugin, module, extension, they're all used interchangeably by different projects to mean essentially the same thing. English does not have enough nouns to go around. I'm using the name with the closest parallel already in use in Drupal and requires the least changes to existing Drupal knowledge. As I said in the original post, please please please please please don't bikeshed on the names unless you are Dries or webchick, as that is a sure-fire way to derail this issue and keep it from ever getting in. :-(

yched’s picture

Looks really cool. As Crell noted, this would provide an excellent background layer for Field storage engines. I'm even wondering about widgets and formatters, although that might be a little more far-fetched.

How does the patch handle the case when an attached plugin is no longer available ? e.g the module that provides the plugin has been disabled ?
Just asking, because this is one of the nastiest things CCK and Views, for instance, have to handle with their own 'plugins'

Crell’s picture

Fields, storage engines, and widgets should absolutely follow this model, even if they don't use this mechanism specifically. Formatters may be able to, although they're more theme system than anything else.

At the moment, I don't think "class not found" is handled all that gracefully. I'm not sure how we want to handle that. In some cases, falling back to the default-specified implementation could be appropriate. In others, that may result in even harder to debug errors. I'm open to input here.

yched’s picture

Hm, yes, the behavior for "class not found" might depend on use cases. So probably the plugin API just needs to provide convenient DX to let the client code act according to its needs: fall back to default, block functionality in whatever way makes sense, or fail with a bang.

If a module providing a plugin is disabled, the corresponding rows in {plugin_attachments} remain, but those in {plugin_info} disappear on next plugin_rebuild(), right ? So it might be as simple as making sure plugin_load() returns NULL (which seems to be the case already)

Then client code can do:

$plugin = plugin_get_attached_plugin($slot_id, $target);
if (!$plugin) {
  $plugin = plugin_get_attached_plugin($slot_id, 'default');
  // or
  $disable_feature = TRUE;
  // or
  throw new Exception();
}

or maybe the exception should be raised by plugin_get_attached_plugin() and caught (or not) in client code ?

Then there's the case of 'module providing plugin is uninstalled' - should the attachments be cleared then ?

BTW, shouldn't plugin_plugin_rebuild() start by clearing all existing rows, instead of always inserting ?

Crell’s picture

plugins_plugin_rebuild() and plugins_slot_rebuild() both build their multi-insert statement first, then run a delete, then run the insert. That keeps the time between wiping and rebuilding the table minimal, and reduces the race condition during which another lookup from elsewhere could go badly. Although now that I think about it, that's a perfect case to add a transaction, too. :-)

I think, actually, an exception would be the correct way to handle the "where did it go?" issue. That's certainly an "exceptional" case. However, that's something that would ideally be caught in the factory, of which we have only the one because pluggable factories were shot down earlier in the thread for performance reasons. So we'd need to standardize on a behavior and hope that other systems don't need to swap their factory too often.

Although I actually just had a thought there that I really should have had months ago. We dropped the pluggable factory because it required two queries, one to load the slot and one to load the attached handler/plugin. But we already have all kinds of denormalized information in the attachment table. Why not just include one more bit of data there? Then we load the record based on the slot/target, and pass the whole record off to the factory specified in that very record? Then it can handle reusability, exception handling, and whatever else. Then we just provide a few stock factories, and new ones are easy to add. And it's all one query.

That's going to require more coding than I have brainspace for tonight, but I'll see about getting that done and posted shortly. Keep reviewing in the meantime!

Crell’s picture

So while working on what I described in my last comment, I came to two conclusions.

1) We can't push reuse off to a sub-called loader, because then the static cache is spread out all over the place and cannot be reliably reset.

2) Really, all of the interesting parts happen before than anyway so we don't gain much.

So, that's a dead end I think. However, something else then occurred to me. What we really want is to make the entire attachment mechanism, the factory, pluggable per slot, but without an extra query for every lookup. We can denormalize a lot of information to the plugin_attachments table, but not the full router, so that doesn't work here.

However, in recent versions of this patch we used the variable system as a cache for the simple case of attachments. But... why not use that for the slot's factories? Cache THAT to the variable table, and then plugin(), plugin_attach(), and plugin_detach() become simple wrappers for those classes, in the same way that db_query() is a wrapper for the Database class's routing off to the appropriate DB driver. That also allows those three methods to share data via protected object members, which makes resetting static caches much more easy and efficient.

Then, the "target" routing mechanism simply becomes one possible implementation. We pass *anything* after the slot_id in plugin() to the load method of that loader object, and it can do what it wants with it. Then some systems could use a variable-based backend, some could use targets, some could use multiple parameters and handle defaults however they wanted to, etc.

The degenerate case, then, for slots that have only one possible backend active at a time, would have 0 queries. Everything would be stored in the variable table, potentially even a serialized config object for that one plugin. (Not 100% on that yet.) Further optimization becomes each loader's responsibility, and we provide a few common ones in core. One could even write a "base everything on settings.php and don't save configuration changes" loader, _alter that in, and push all configuration slot-by-slot to settings.php.

There's only one downside to this approach I see. It absolutely requires the variable system to have been initialized before any plugins can be loaded. Per drupal_bootstrap(), that means the following phases cannot use plugins:

DRUPAL_BOOTSTRAP_CONFIGURATION,
DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE,
DRUPAL_BOOTSTRAP_DATABASE,
DRUPAL_BOOTSTRAP_ACCESS,
DRUPAL_BOOTSTRAP_SESSION,

Configuration, there's nothing there to plug. Database couldn't use plugins anyway, we know that. Access, I don't see anything there to plug. Session, I believe, could be moved to after DRUPAL_BOOTSTRAP_VARIABLES as the two are not at all dependent on each other and Drupal will not bail out in session anyway, so there's no performance cost there. The only one that would fail is the cache system, specifically eary_page_cache, aka aggressive caching. So this approach should work for everything *except* aggressive page caching, which in turn means the cache system. Which sucks, as the cache system is such an ideal use case.

That leaves us two possible solutions.

1) Ignore the cache system and make that one one-off on its own.

2) Split off aggressive page caching as a special case, and make the rest of the cache system use the normal target system.

I personally favor approach #2, but don't know enough about aggressive caching to say for sure how feasible that is. I frankly never use aggressive caching as nearly all of the sites I build have at least one module incompatible with it anyway. :-)

To clarify what I'm talking about, the plugin_* functions would become something like this (give or take some cleanup and probably a syntax error or two). Note that unless we have not yet cached the slot's loader class, the following code implies no database calls whatsoever.

function slot_loader($slot_id) {
  
  static $loaders = array();

  if (empty($loaders[$slot_id]) {
    $loader = variable_get('slot_loader_'. $slot_id, NULL);
    if (is_null($loader)) {
      $slot = slot_load($slot_id);
      variable_set('slot_loader_' . $slot_id, $slot->loader);
    }

    if (!class_exists($loader)) {
      $loaders[$slot_id] = NULL;
    }

     $loaders[$slot_id] = new loader();
  }

  return $loaders[$slot_id];
}

function plugin($slot, $target) {
  $args = func_get_args();
  array_shift($args);

  $loader = slot_loader($slot);

  return $loader->load($slot, $args);
}

plugin_attach($slot, $plugin, $target, $config) {
  $args = func_get_args();
  array_shift($args);

  $loader = slot_loader($slot);

  return $loader->attach($slot, $plugin, $config, $args);
}

plugin_detach($slot, $target) {
  $args = func_get_args();
  array_shift($args);

  $loader = slot_loader($slot);

  return $loader->detach($slot, $plugin, $args);
}

Thoughts?

David Strauss’s picture

The variables system is half-initialized during DRUPAL_BOOTSTRAP_CONFIGURATION; it has the variables configured in settings.php. I think it's reasonable to expect users who sub out the cache layer to edit settings.php; we do now. (Or am I missing that this is a complex value?)

Another option is to specifically write out a file storing the cache plugin configuration. It would minimize the special case-ishness.

Crell’s picture

Early page cache is also pre-database, so one way or another it would need to be configured entirely from settings.php. For cache maybe that's OK, but I'm still not 100% convinced.

As for writing out to a file as a cache, that would have to either be an ini-esque syntax, which is probably not flexible enough, or PHP code, which means executing PHP code in the files directory, which makes the security kittens upset.

pwolanin’s picture

@Crell - having to set the cache in settings.php (At least to utilize the aggressive cache) seems totally fine.

Crell’s picture

pwolanin: Does that include any and all configuration, such as memcache settings?

Damien Tournoud’s picture

Note that DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE has nothing to do with aggressive caching, but with (as the name implies) early caching. This one is pre-database, so obviously pre-handlers.

*But*, we know that there are links between the database system, the registry and the variable system, that *will* prevent the plugins approach from working for the cache subsystem. For an example of such a cross-dependency, see #344088: cache.inc cannot be fully converted to dbtng.

Crell’s picture

Hm. I'm inclined at this point to move forward with the proposal above and punt on just how much of the cache system we could move to this setup later. Just try it and see, then tweak later as needed.

pwolanin’s picture

@Crell - forward, please!

Owen Barton’s picture

Regarding recent discussions on the caching layer. I agree that it is perfectly acceptable to special case this, especially for the early page cache, but potentially for all of it if that is how it needs to be. Most of the high value cache swaps are early page. I can't think of any cache backend swaps users might want to do that adding a line to settings.php would be any kind of blocker.

+  /**
+   * Creates the configuration form for this configuration object.
+   *
+   * Unlike a normal form function, you should not return $form from
+   * this method.  Simply alter $form as-is.
+   *
+   * @param $form
+   *   The form object to which to add our fields.
+   * @param $form_state
+   *   The state for this form.
+   */
+  public function optionsForm(&$form, &$form_state);

The fact that this is a form alter is a pretty odd pattern, compared to (for example) block forms. What about the case of a plugin being used in to different contexts (targets?), such as imagemagik being used with imagecache but also with a JS imagefield editing/cropping widget? Module X wants the plugin forms nicely wrapped in a collapsible, but module Y doesn't - or one wants the settings above their other form elements, and one below? Either the plugin needs to make some risky assumptions about what it should be doing, or else the plugin consumer module needs to do a bunch of hacking to fix up the form elements afterwards (also seems error prone).

I would suggest that this be a normal "return $form" function. We could still offer a separate altering step, however I think that could still have trouble - if we are serious about plugins being abstract enough for different use cases they should define a proper interface for the kind of effects they would like to have (e.g. method supress() { return array('suppress_image_manual_size'); } - could be a standard interface or perhaps just leave it up to plugins to extend as needed.

+ * @param $target
+ *   The target inside the subsytem the plugin belongs to.

This description is obtuse to say the least :)
The code could really use some examples to help people reading the code frame the terminology a bit before they get into abstract definitions.

+  $slots = module_invoke_all('slot_info');

We have just moved install profiles from hook_*_info to .info files (the last remaining instance). I think it is worth considering if we should use hook_*_info or .info files here. Obviously .info files adds complexity of one sort, but the standardization reduces complexity of another sort (pattern sprawl). Would plugins ever need dependencies (e.g. on a module that provides a "back-back-end" API to both a specific plugin, but also more generally to diverse non-plugin users).
I am undecided here.

Crell’s picture

Thanks, Owen!

The optionForm() signature is borrowed directly from the Views 2 plugin objects, quite deliberately as they are conceptually very very similar and it's a structure people are already familiar with. I am not bound to that, however, if we find a different signature works better. I do want to try and keep it easy for Views to adopt this model, though, as it means fewer moving parts in core and contrib.

I'm not sure I follow all of the variants you describe for the image engine case. Reversing the form like that seems silly to me (from a non-image-guru standpoint), but I will note that by separating the plugin class from its configuration class it is possible to use the same class in two different plugin definitions with two different configuration classes. As long as the interfaces match, it all works. :-)

The targets, well, my latest crazy idea would make that just one possibility among many. I'll hold off commenting there until I've managed to code more. :-)

Slots absolutely are not .info file material, not anymore than hook_menu or hook_theme are. There's a lot more metadata there than we can reasonably capture in info files in their current form (and making them any more complex is a dead end, IMO). There's also the dead-simple alter hook, too. If anything, the info-registry-hook is an increasingly common pattern, and one I am trying to encourage. :-) CCK and Views use that format extensively, core uses it in a half-dozen places, etc. There's an issue that is probably going to have to wait until Drupal 8 at this point to move hook_block to that model as well, which I think is a great idea. The similarity between .info file extensions and the _info suffix is purely coincidental.

neclimdul’s picture

I talked to merlin about the reason for the signature optionsForm is based on and the reasoning isn't actually surprising at all. The return $form style ends up giving you either an nasty mess of slow and tricky array_merge's or a #tree mess. Its somewhat of a limitation of the FAPI that I've also run into and its not one that we're going to solve in FAPI in 7 I don't think.

A side note, the reason Views doesn't have any issues with this setup is it uses smaller more contextual forms and Javascript magic. The ramifications of that may be something we should consider(may be a bikeshed to the overall issue here?), though I think the model is fairly tried and true in Views.

Crell’s picture

I don't know that we'll be doing much JS magic, certainly not as much as Views, but smaller more contextual forms is how I figure these would be used, yes. Thanks for the research!

Crell’s picture

FileSize
53.54 KB

So my time has been scarce the past week or so, unfortunately. I have started on the conversion I mentioned above to loader objects, which is looking very promising but still has a number of challenges in it. neclimdul asked me to post what I've got so he can take a crack at it, so here it is.

What I'm doing now (for both him and others who want to weigh in) is each slot defines a loader class. That loader class has 3 methods: load, attach, detach. The plugin(), plugin_attach(), and plugin_detach() functions then simply determines the correct loader object for the requested slot, instantiates it (with static cache), and hands off the request en masse to that object to deal with. The slot->loader mapping is cached in the variable table so there is typically no database query required after the first time the slot is ever loaded.

This actually greatly simplifies a lot of otherwise ugly looking code. It also means that we *can* support different routing configurations besides targets; targets are just one loader's parameters. Other loaders could easily be defined that have no routing information (useful for cases where there are no "sub" items, such as what mail backend to use or what password hashing algorithm to use) and use the variable table for their storage, or those that are highly custom to a specific slot, or whatever. I have the 2 basic target-based loaders nearly completed but haven't updated the test case module yet.

The part I've not figured out yet is how that affects the interface. Either we always pass in an ordered array from plugin(), in which case we may as well just use func_get_args() and all its yuckiness, or we cannot really enforce the interface as the method signature will vary with the loader. Thoughts here welcome.

neclimdul, have at. :-) Ping me in IRC if you need.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Version: 7.x-dev » 8.x-dev
catch’s picture

Priority: Critical » Major

No such thing as a critical task.

wojtha’s picture

Subscribing.

pounard’s picture

Some notes:

In the first patch attached, you are still copying superglobals (such as $_GET, $_POST and others), you really should use them directly (either using a reference, either just manipulating it).

In the patch provided in #106, you define a plugin configuration interface. I think this is basically an error. you should:

1. Ask the configuration initiative to build a mapping layer, reusable for everything.

2. Creating a basic "options" interface, without UI, using only get() method (and maybe implementing Iterator interface could be a plus), this is the configuration initiative Work.

3. Extend this options interface as "UX providing options" for forms, that would remain optional.

4. Use the first basic options interface everywhere in code, use if ($options instanceof MySuperConfigurableInterface) {} to determine weither or not you should display a form in all UX.

The idea behind using a mapping layer (for which the basic options implementation would be the only thing you have to worry about) would be to make everything exportable as a hierarchical tree of scalar values.

I did this kind of implementation there which is basically a derivative fork of Zend_Config.

EDIT: By the way, I like neclimdul's plugin code in its sandbox.

dvessel’s picture

sub

bojanz’s picture

Status: Needs work » Closed (duplicate)