One of the things we want to be able to do is cache context information that is derived for the duration of the request. That is a simple, obvious performance boost.
However, there's a problem when one piece of context depends on another. For example:
- "current node" is a context item that depends (originally) on the GET string.
- "current OG" is a context item that depends on current node.
- Block A calls current OG, which in turn calls current Node. Both are derived and cached.
- Block A calls sub-block B.
- Block B mocks current node to something else and passes that on to Block C.
- Block C asks for the current OG.
- Block C now gets the wrong OG, as it was derived from the current node up in Block A, which is different.
- Hilarity ensues.
I'm not sure how to resolve this. Someone at DrupalCon Chicago suggested that handlers explicitly declare their dependencies, too, so that we know how to intelligently clear (or at least mask) the cache internally. That's a possibility, but I worry about the complexity that would create.
Someone please help figure this out! :-)
See the patch in #47 for a unit test that fails due to this bug. The goal of this issue: Make that unit test work.
Comments
Comment #1
pounardCurrent OG won't probably not only depend on the node, but could depend from many other things. OG and general "Node handler" should probably totally unlinked because the only thing an OG is really tied to is the entity on which the field has been put.
Maybe the solution is to be more "static". If an OG context is set, it probably should'nt be overridable. Nevertheless, the dependency problem remains, maybe each module that provide a context implementation should be able to set its context the sooner as possible and shouldn't allow it to be modified (maybe except at this particular time where it sets it, but shouldn't be overridable once set).
I'm looking for another dependency that could cause this, I don't see one right now.
If you implement a HMVC derivate, each block would have its own controller, therefore they would each one of them build their own context from the original request data without being annoyed by what the other's did. This would solve the Block B override for Block C.
Comment #2
Crell commentedExcept that a block being able to override the context for its sub-blocks is very specifically a feature we want to have. So that kind of nesting is not something we can eliminate. Block B overriding context for Block C is a feature, not a bug. :-)
Also, in most cases we're not setting a context value. We're setting the handler that will derive the value on request, and not bother if it's never requested.
Comment #3
pounardOk, I have a better sighting now. In the case you described, if Block B changes the current Node context, it just shouldn't interfer with current OG. If current OG does not depends on Node (only for its init at best) then it shouldn't change because of block B.
If contextes are set and can be fetched from the "parent context" then when Block C asks for the current OG, he gets the current OG from A. If he asks for the current Node, he gets the current node from B. Just avoid too much dependency and it will work.
Contextes should be "atomic" in a way, OG is OG, Node is something else, maybe OG was derivated from the Node while spawning, but once spawned the Node override shouldn't alter the initial OG set, except if Block B explicitely override it.
Comment #4
Crell commentedIt sounds like you're proposing that we forbid one context key to depend on another. I don't think that's feasible.
In the default case, the "current node" depends on http:get:q.
In the typical case, "current OG" is "the first OG that the current node is assigned to".
If the current node changes, then current OG would by necessity change.
If we do not clear the cache , we run into this problem:
Suppose we have a block region, Foo, configured with two blocks: One shows the body of the "current" node (or what it thinks is the current node), and the other shows the titles of the most recently posted nodes in the same OG as the main node. That's a rather common use case, actually.
Now, that region is inside another region, Bar, that has mocked "current node". There are many reasons it may have wanted to do that; the most obvious example is displaying 2 nodes side by side. (I've done this before for a client, and it was highly ugly.) We now end up with the following situation:
- The main "current node" for the request, from the URL, is node 5.
- Region Bar has mocked current node to be 12.
- Our "titles of nodes in the same OG" block will request "current OG". If another block before it has requested current OG before, it will get the OG of node 5. If not, it will get the OG of node 12.
Indeterminate context like that is precisely the sort of problem we want to avoid. That becomes a race condition. It is even worse once we start caching blocks based on their context information, because then the cache state of one block will affect the rendering of another block. That's just no good at all. :-)
It's a tricky problem. I'm not sure how to resolve it yet other than explicit declaration and clearing of dependencies, which makes me cringe.
Comment #5
pounardI'm not sure there is a clean solution, what you attempt to do is kind of tricky. Business context is hard to keep as you describe it.
I'd rather go towards a more simple solution where context are basically set up using the input request for most of it, then whatever is the current block they would be mostly the same.
Why on the earth would you want to have dual OG context on the same user input? I mean, OK, your samples are explicit, but this is rather complex and in the end, the problem is obfuscated itself.
Let me rewrite your first sample, from the original post:
There is a flaw in this statement: either OG always derivates from a node, if OG is enabled, each time you have a "context node" if an OG is tied to it, you will have the linked OG as context, which would give you (in caps and between brackets, my changes):
If I assume that OG always derivate from a node, if OG module is enabled, then asking the mock node context would give me the OG context of that node (totally legal here), simple inheritance.
Now, let me assume that OG IS NOT tied to a node, but arbitrary set by the request, you have to solutions:
First, let the API do its job:
Thus, if Block B doesn't care about changing the context.
But now, if the business operation of Block B is all about OG, Block B controller would attempt to find the maching Mock Node OG, which would give:
What I mean here is:
And last, but not the least:
Contextes, dependent or not should be stacked. Each level of the stack would have its own cached override (only if an override happened), but still cached. Once the vertical browsing of the stack ends (let's say here something like the decorator pattern maybe) each time you level one up, you unstack the current cached overrides and throw them.
Each controller in your PAC (this can happen only if you have a real HMVC or derivative) will then have the responsability of stacking context when entering it, and unstacking it when going out.
I could make you a simple sample code if you want, based on your actual module, but not tonight. But as I it, this seems quite a simple problem to solve, it's like variable visibility in most languages, all about local override for a particular subtree.
Your original problem here is maybe about the fact that OG depends from a node, if so, then if a node is the context then I implicitely have a new OG. But if my OG is not derivated from a node, then another stuff would create the OG context, but if each level of the stack (each block or region) has its own controller, each level of the stack would then override the context depending on the business stuff it has to implement, therefore the problem is not your to solve, but let the specific implementations doing it.
EDIT: Some typoe.
Comment #6
pounardAnd another stuff, never talk about cache when trying to implement a complex software design, think pure algorithmics first, implement it, then optimize it later, it will clarify the pattern, let you write a good documentation for others, then once done, you will optimize like a madman and leave the code obsucated, but with a good documentation.
EDIT: I'm sorry my english is not very good (talking about the upper post).
Re-EDIT: Don't take it wrong if I argue a lot, this is because this is really an interesting topic and I think you have a lot in mind that worth the shot to tell and mature with other's mind.
Comment #7
pounardAs I promised, here is a Front controller/dispatcher loop/controller/context implementation with an actual working use case.
It as greatly inspired by Zend framework, but fully hand written in three hours for fun. It's more simpler than stuff that ZF can offer, but it also take into account some problematics that ZF actually doesn't (business context and ESI integration, which are two important stuff for the Butler project). It actually works.
Here are the components it defines:
It also notice the View concept, which seems to be already fully implemented in Drupal, at least for HTML using the rendering API and the theme system.
This a basic implementation, with no optimizations (not much). It performs well, the provided test case defines some contextes, 3 stacked controllers and context inheritance, execution time is not visible on my development box.
This is only a proposal, with a centralized cache handling and potential ESI support directly at the controller level, which allow any specific implementation to benefit from those implicetely.
Hope you'd like it, or hate it, at least it will give some thougth about the full butler project.
Comment #8
eaton commentedI'm coming to the conversation a bit late, but I think this isn't necessarily a safe assumption -- very very few things in Drupal are explicitly set by the request, and unless you're directly visiting the primary landing page for an organic group, "What is the current OG?" can depend on a number of different factors -- and some of those factors are a bit fuzzy even once they're known. One complication is that a given node can belong to multiple organic groups, and a given user can belong to multiple organic groups.
When Crell and I first started talking about the issue of context, we basically broke it down into three pieces:
Coming back to your statement above -- ie, 'The OG is arbitrarily set by the request,' I think we have to distinguish which of the three ways we are arriving at the OG. Is it because someone explicitly visited a path that maps to the primary entity for that OG? (for example, admin/groups/groupname, or something like that)? Is it because someone wrote custom code that explicitly requested "The Current Organic Group" from the context object, and the OG module was able to answer the question based on rules specific to the OG module? Or is it because the currently logged in user belongs to an OG, and custom code retrieved that collection of OGs? All, technically, could be considered context but only the first one is really part of the request.
It sounds like the complication that is causing concern comes from the fact that there could be multiple contextual questions asked that return the same data type. (For example 'Who is the currently logged in user' and 'Who is the administrator of the current section of the site'). If we're really talking about request context, though, that shouldn't cause any problems. The only hitch comes when we want to pass information to the response system, or a rendering layer, and we have to feed "A user" to a block that renders account information, or pass "A node" to a block that renders a teaser. That's what page callback functions are responsible for, and what page manager in the Panels/CTools world is responsible for. The context system itself can't be smart enough to handle all of that logic -- it just has to handle providing answers to the questions it's asked.
Am I missing something?
Comment #9
pounardNo I think you are right. The context system can't guess the business logic itself. Looking at ZF front controller plugin system (I will contact people responsible for this part of research and post on groups, don't worry), it explicitely tell the developers to use the preDispatch() operations for settings their context, then handling this kind of business logic inside the controller callback (in ZF the controller action, in Drupal the page callback).
The framework in any case cannot be responsible for handling the business logic, same problem as the UX one.
I think context is a concept where the inside data is undetermined for the framework, and only the business oriented modules can know what they get or what to do with it. If you want to handle context in the framework, the only thing you can do is a registry of named piece of data that business oriented modules have set by themselves prior to run the final controller action. I think the only matter of business logic the framework could do is to impose the fact that contextes should be named and serializeable, in order for it be able to build reversible and predictable cache keys; Either for spawning single controller action without the page context, either for extended caching abilities.
Comment #10
Crell commentedI would agree. While the intent is for the context object to focus on "the incoming request and things derived from it", there is, realistically, no way that we can enforce that. So we shouldn't try to shoe-horn it in there. Context is "information that meets these criteria that may be needed in multiple places".
Comment #11
catchAlso arriving very late here.
I'm not sure what level of caching you're trying to talk about here - is it just static caching during the page render? If that's the case I agree with pounard that trying to build caching into the context object itself isn't necessarily the way to go. However I'm very interested in us having a context system that allows for much better approaches to caching in core than we currently do. This is the other half of #636454: Cache tag support, and IMO it's something that needs to be in mind from the start of this.
When we're rendering entities, assuming we keep the basic pattern of entity_type, id, view mode and language as being the minimum data required to get a rendered version of an entity as a string (which is to an extent what page caching has now in terms of efficiency, but most other Drupal caching doesn't), then we want to avoid loading that node until it's absolutely necessary - in most cases on cache misses.
This would mean the context object should only be storing a unique pointer to a context (entity type + id for nodes, view mode or similar, language code etc.) - the context plugins or whatever entities and languages end up being in relation to this, would then be responsible for handling their own caching.
Let's say we have the node/123 which has multiple blocks on hit:
4 blocks are based on the current node.
1 block lists nodes that are referenced directly
1 block lists nodes that were recently posted in a taxonomy term referenced by the node.
It's completely fine if the four blocks listing the current node, just get entity type, id and view mode. Calling node_load() is nearly free once it's been done once. If they are rendered by ESI callbacks, that callback is going to need to load the node anyway (whether the context system handles this or not).
If we have a list of nodes based on a relations field, the block responsible for generating this list is going to need the loaded node on a cache miss - since the related nodes are stored in that node object.
However, if we only cache that block as a string of HTML based on the 'current node id', then every time that cache is invalidated (currently in core, this would be on every single entity save), we need to go and load all the nodes and rebuild it from scratch.
Much more efficient is to cache the block (which gets wiped every five minutes), but also the component parts of that block at the highest level possible. So the best case is that on a cache miss for the block, we get a cache hit on the full rendered string for each of the nodes in that block - and can do this without loading the nodes at all (and potentially have ESI callbacks for those).
This way, when the highest level cache (the block) is invalidated, we have the list of nids and view modes without doing any work, then I can fetch the string corresponding to those - no rendering nor loading needs to happen for those individual nodes. This absolutely requires that the context be a unique pointer rather than a full object- since if I have to do entity_render($node, $view_mode, $language) I can never escape loading the node somewhere before I can get to the cached rendered string, even if it's not needed. The page cache allows that kind of efficient look up now, the issue with it is invalidation.
If context plugins are responsible for their own caching, then we would have to be doing some nasty stuff in the context object itself for that to need caching in there, although of course it depends how things pan out.
Also it is cross-posting/off-topic but related I have been thinking a lot about the caching API itself recently, specifically allowing lower level things (like taxonomy terms attached to a node) to invalidate higher level caches (like the rendered output of that node). Most of that discussion is here at the moment #636454: Cache tag support. It is very likely we'd want allow cache tags to be derived from context, and allow those cache tags to be pulled upwards (similar to how #attached css and js are pulled upwards via the render API now - so if lower level items are cached, their css/js still appears on the page - this is going to be an issue with ESI too).
Comment #12
jherencia commentedSubscribing...
Comment #13
jdubbwya commentedsubscribing
Comment #14
damienmckennaAt Bonnier dalejung built a custom caching engine that had two parts - the first part cached objects with a set of identifiers to identify the object and other objects which were loaded by it (noderefs, nodes loaded in a view), e.g. for node/123 it might say "node;node:mycontenttype;node:123" and for a view it might say "view;view:my_view;node;node:123;node:124;node:125"; the second part would then trigger on any changes and invalidate anything that matched that object, e.g. editing node/123 would invalidate the my_view output, or editing the mycontenttype content type would invalidate all nodes of that type. There were also lots of other options, lots of hooks.
Comment #15
dmitrig01 commentedJust thought of something -- this could be too complex, but it should be possible to tag onto offsetGet to keep track of what other context a given piece of context is derived from. This example would be completely dysfunctional, but it's just the basic idea.
Comment #16
dmitrig01 commentedEr sorry, it's late. I meant that this:
would probably happen when changing the node context, where $key = 'node'.
Comment #17
Crell commentedIf we can track that information without too much overhead, that probably would solve the issue. I don't know if we want to try and derive it automatically from usage, or simply require handlers to implement a method that declares what context keys they're going to care about. That may simplify the implementation (and performance is key here).
Dmitri, can you try implementing such tracking on the current codebase? I can give you access to the repository to just make a new working branch for it. Let's see if that can resolve the issue without too much of a performance hit. *crosses fingers*
Comment #18
dmitrig01 commentedSure, I'd happily try it out.
Comment #19
Crell commentedYou've got write access. Make a new issue branch and go to town. :-)
Comment #20
dmitrig01 commentedGreat, I've committed and pushed an approach that I think should work. I'm attaching the patch here just for reference.
It does involve some trickery between parent and child contexts, namely if a child context can't find anything and falls back to the parent, if the parent needs to instantiate a new handler, the new handler's assigned context is the child context. I *think* that this is a good thing, and it's certainly needed for this to work.
Another very minor issue I noticed is everytime a handler object is hit (i.e., http:get:q, http:get:foo), a new handler object is instantiated. Wouldn't it be better to cache handler objects, and just change $handler->context, $handler->params, etc, as needed (which shouldn't be very often)?
Comment #21
dmitrig01 commentedComment #22
Crell commentedHm, I thought I had the code persisting the handler objects already. Aren't they?
Although, if we are caching the returned values, do we need them a second time?
Comment #23
dmitrig01 commentedWe do need them a second time, for example if we need http:get:q and http:get:foo, we need to hit the http:get handler twice. Anyway, I'll open a new issue for that.
Comment #24
Crell commentedUgh. Sorry for taking so long to review this, Dmitri. :-(
These need docblocks to make it clear what they're doing.
Let's not embed a ternary like that. It makes the code way too hard to read.
This is testing something different than the original test method, so it should be a new test method. We're using UnitTestCase here so each test is nearly instantaneous. :-)
Also, the get() method could use some better internal docs. I can't easily follow what it's doing and why, which means future readers won't be able to follow it either. (A better name might be good, too.)
Comment #25
Crell commentedRefiling to the sandbox queue...
Comment #26
Anonymous (not verified) commentedsubscribe.
Comment #27
mr.moses commented.
Comment #28
Anonymous (not verified) commentedComment #29
matason commentedsuscribe
Comment #30
pounardI think this problem is not solvable with the current design.
If the foo:bar value is set by a handler in the local context (say the root context),
Then the foo:baz value is a derivative of ht foo:bar, using its own handler (still root context),
If I create a new context, overriding foo:bar (changing value) but not the other:
Non cached trace:
1. Ask foo:baz to new context: no cache value
2. Handler lookup in new context: no handler
3. Ask parent (root context) derivate foo:baz from local foo:bar
FAIL
Now, with cached trace:
1. Ask for foo:bar and foo:baz in root context: both values are cached
2. Ask for foo:bar in new context: cache lookup says no.
3. Handler lookup in new context: override found, get new value and cache it.
4. Ask for foo:baz in new content: no override found
5. Handler lookup, void, ask parent (root context): root context gets value based on local foo:bar
FAIL
This is not due to caching, this is due to actual overall design.
Caching which value as been derivated or not in each context will create new arrays, bloat a bit more memory, but I don't really see where it comes to help us without complefying a lot the whole algorithm.
I think we have to change the whole high-level design, we have two choices here:
1. Handler inheritance
2. The error shall pass
I won't talk about 2. since it won't make anyone happy to voluntarely let bugs in:)
Let's see more into 1. Handler inheritance:
This solution would assume that handlers set into root context can be used directly by new contextes: we need a getHandlerAt($contextKey) method, see:
1. When new context don't have a value: handler lookup
2. If no handler, direct parent handler lookup: found a handler!
3. Use handler in child context instead of parent context.
This would even be more performant, worst case scenario is:
Current design:
1. New context cache lookup: no value
2. New context handler lookup: no value
3. Root context cache lookup: no value
4. Root context handler lookup: got one!
5. Return value
Handler inheritance:
1. New context cache lookup: no value
2. New context handler lookup: no value
3. Root context handler lookup: got one!
4. Return value
And this, by using caching or not, would remain the same algorithm (non cached version just drops out the 1. operation in both case and the 3. operation in second case).
Plus, I'm wondering if caching itself might be provided by handlers instead of context (except for arbitrary set values): using a handler that calls an entity controller would get a cache node (almost everytime) it only adds 2 function calls, consume less memory, and avoid static cache reset problems.
The thing that worth the most being locally cache remains definitely the handlers on each key position: this the handler lookup algorithm is complex and mess up a lot with arrays it's always good to avoid that as much as we can.
Comment #31
pounardAfter re-reading the patch from dmitrig01 I guess that it's what he has done, except I don't see the use of derivated array.
I would definitely go for a more granular approach: using
getHandlerFor($contextKey)instead of array operation over thehandlers()method (which IMHO is poorly named, it should begetHandlers()for being more explicit).By using the
handlers()method, you recopy and array (which is not recopied until you modify it thanks to the PHP copy on write feature) over which you do array operations: I guess replacing it by agetHandlerFor($contextKey)would be more readable and avoid some array operations there.Because handler discovery algorithm goes from specialized to non specialized, I guess that non specialized, i.e. "foo" handler, would override parent more specialized handler, i.e. parent "foo:bar" handler, this is a question: do we support this?
A replacement of my proposed algorithm would be to use a
getHandlerAt($contextKey)instead, difference between For and At is that At checks the strict index given while For would check the less specialized methods.Using the *At() allows then to ask the parent handler for each level while we attempt handler discovery:
1. "foo:bar:baz" gives nothing
2. Ask parent for "foo:bar:baz": nothing
3. "foo:bar" gives nothing
4. Ask parent for "foo:bar": got a handler!
If we don't end up with "horribly:long:context:with:a:thousand:parts:in:it" the performance impact of the extract function call will definitely be neglictable.
For this more granular approach, we might want to separate handler instanciation from the handler lookup function and create its own function taking the exact context key as parameter and does nothing if nothing is strictly set at this position.
It makes the handler lookup algorithm loose a lot of code and make it a lot more explicit and readable too.
Comment #32
pounardThis proposal opens a new problem: handlers are aware of the context they actually use:
Crell commented the code as this:
In HandlerAbstract class code.
We could change the
getValue(array $args)method signature as this:getValue(array $args, ContextInterface $context = null)so the handlers objects would become stateless objects (and removing circular dependency): it's basically almot the flyweight pattern implementation.This way, we win on three sides:
1. Removes the circular dependency
2. Make context being stateless and shareable: less memory consumption
3. We can implement the handler inheritance algorithm without having to get the handlers info definition array from parent: no more info array in this algorithm, huge win.
4. Additionally, we restore context being part of HandlerInterface instead of HandlerAbstract: this makes the design consistent (it is inconsistent as in actual implementation: this is a huge flaw).
Handlers, of course, would still be instanciated only on a per-needed basis.
Comment #33
pounardAdditionally, this would also allow to set already isntanciated handlers to the context instead of a info array without altering the execution flaw and without having to patch the offsetGet() anymore: it's extensible and iisolated: huge win again.
Comment #34
pounardHere is a first patch:
1. Introduces getHandlerAt() function.
2. Move handler instanciation into a specialized method.
3. Removes the ContextInterface from HandlerAbstract constructor, moves it to getValue() method signature.
4. Fixes unit tests to work with these new signatures.
5. Allow registerHandler() to accept a HandlerInterface instance as second parameter, also proceeds to more consistency checks.
Comment #35
pounardAnd here is the full patch:
1. Fixes bugs left in the previous
2. Added parent call to getHandlerAt() while descending the specialization chain.
Handler inheritance is complete! All tests pass. Now I need to write test for the use case of the issue.
FIY I did push a branch in the WSCCI git, named context-inheritance.
Comment #36
pounardAlso pushed this patch, which add tests and fixes some PHP notices being thrown.
Comment #37
Crell commentedShouldn't the return value be assigned to something?
A string that is the name of a class will never be an instanceof an interface. I think you probably mean
http://php.net/manual/en/function.class-implements.php
Let's not do this. Make $params default to an array, not to NULL. That simplifies logic all over the place, both for the core system and for handler implementations.
This is one of those pieces of code that can be simplified if $params is always an array, not NULL.
Also, I'm not sure why we're throwing the exception here. That would cause a fatal if uncaught, which is no better than a class-not-found fatal. Is this really an improvement? We don't do class_exists() checks anywhere else in core before instantiating (save for the DB layer where it's to decide which class to instantiate, which is another matter).
Also, this method doesn't get a handler. It creates/loads a handler. It should rather be named something like createHandler($handler_info).
And yes, using $handler_info, not $handlerInfo, since it's a local variable.
This concerns me. A method called getHandlerAt() should do nothing but retrieve a handler instance. It should not have extra unexpected side effects.
Lazy has only one Z. Also, "instantiation" is the correct spelling.
This is a separate issue. Please leave it out of this branch so we can focus on what's actually changing here.
No. Please do not try to slip in additional functionality from other issues. Stick to the issue in question. We cannot properly review it otherwise.
I do not understand what "This is a strict method" means.
The name is also not self-explanatory. Why At instead of For? For would make more sense, although I frankly don't think a preposition is needed here at all.
Also, this doesn't seem to address caching at all. It seems like good cleanup, but it doesn't address caching. :-(
Comment #38
pounardI will open new issues for most of the stuff in there that shouldn't be here, but when I re read the issue original post: the question is not "how do we cache?" but "how do we solve the problem that might annoy us when we'll cache?" so I guess the context inheritance is a good place to start for this issue.
I'm not fond of caching everything by the way, especially not context: most of its information will be cached by the other layers, this is higly dynamic stuff that may spawn dozens of objects: I don't want to end up with that much more cache queries.
In my head, the word "cache" associated to "context" raise the same questions as catch asked in his post upper: static caching in context scope, we already are doing, what other cache layers could we have here, I mean, honestly without bloating memory and without caching stuff which already are cached?
Comment #39
pounardNo because the call only ensures the value is set into the object's internals.
No, my code explicitely allow the developer to pass a live instance as registered handler: if you look at the code, you'll see that instead of putting it into the handlerClassses property, it dispatches it into the handlers property directly. Look at my unit tests and you'll see it work flawlessly (I did use direct live instances injections for the additional tests).
This is IMHO a feature we cannot skip: besides, if we go a stateless handlers implementation we have no reason to delay their construction: some of them will probably carry alsmot no property and the info array would then cost more in memory than the instance itself.
$params is not part of the interface, but of the abstract implementation: it shouldn't be used as in design. Custom handlers using directly the interface without having this parameter would receive data they should not to their constructor: so I guess -even if I'm against in most cases- a bit of magic for constructing the handlers here wouldn't hurt, assuming that $params is the constructor parameters array and not the first constructor parameter.
Same as upper, because handler interface and abstract implementation are not sync about this, I prefer to keep the complexity at construct time (it makes sense here).
Raising a exception is be better: last chance exception catching goes to into the error handler and display the error page: this does not cause a WSOD. In a perfect world this error page would turn into a full debugging output for developers. We cannot allow WSOD to happen because a module break everything, but we can safely let an exception goes up to the upper layer and set a 500 error with a nice error page.
Agree.
This is a comment and will be wiped out (I suppose) before any potential merge (if it happens): it highlights the flaws of the current code and that is something we want while we are in heavy development phase: FIXME and details are good, even related to other issues: it ensures that if code forks too much we still have a trace of where are the other issues.
Use preposition in function names: editors autocompletion will let you see the preposition directly but not always the parameters: explicit is better for the developer. If we don't use preposition and actually implement both At and For, how can you differenciate them then? What would mean the getHandler() method, seriously? Does it refers to my gran'ma's handler? My house's handler? The handler of Robert de Niro? I think the preposition is a useful bit of information, let's not argue for 2 letters when it makes things really explicit.
Further, my whole point is getting the handler "for" context key is the handler that handles this key, "at" refers to a specific fixed position, so it assumes that it wont give you get a less specialized instance if none found at the exact given position: sounds pretty explicit to me.
Follow-up, opened:
#1297114: Flyweight pattern implementation for handlers: make them stateless objects
#1297132: Allow registering of live handler instances into context
#1297142: Remove stack logic complexity from the Context object business functions
Comment #40
pounardRe-focusing: beejeebus & matason, any advancements?
Comment #41
owen barton commentedI am not clear if/how either of the current patches resolve the specific issue in the description, but the approach that seems most obvious to me is to flatten and normalize the information. If we do this, it no longer becomes a cache clearing problem. So for example, rather than having just "Current OG" as a top-level context item we would have "OG X". Then we would have a lookup array that notes the OG context that has been previously found for specific nodes (or speaking generally "context parents"). i.e.
- node 1: OG X
- node 2: OG Y
Then, in the hierarchical context store the OG context can be derived using the lookup array without ever needing to reload contexts or cache clears. I am not sure if this is the approach the current patches are taking or if this is compatible with the current architecture - for example this does require that the "context patent" can be reliably encoded in a key/string (contexts may need to determine and return the "parent key" for it to be reliably the same across different calls).
Comment #42
Crell commentedI'm not sure what you're saying, Owen. The problem is that we do want to use the same context key, and have it change when adding a layer (and only when adding a layer). So saying "you over here, you use current_og_x, while you over there, use current_og_y", that defeats the purpose.
A consumer of the context API should not know, care, or even be able to find out if the value it's requesting has been overridden by a new context layer.
Comment #43
pounardTo get back to the original issue, for the original problem as described in the original post from Crell, I think that handlers sharing inside the context hierarchy as I proposed seems an elegant way to achieve this, since now we have stateless handlers.
Comment #44
Crell commentedpounard: That reduces the number of objects in memory, potentially, but doesn't change the problem of cached *values*.
Comment #45
pounardOh yes, it solves a part only of the problem. But if you re-read the #30 it also solves a part of the cached values problem.
Any ideas anyone?
Comment #46
Crell commentedOK, to try and refocus this issue, attached is a patch (also found in a branch for this issue) that adds a unit test that demonstrates what *should* work.
At the moment, it actually fails backwards. That is, the override doesn't happen because the handler is using the lower level context! Exciting. :-)
What that means is we need to at least pass the top-most context object down the line when deriving to pass off to handlers. That should be a fairly easy operation now thanks to #1297114: Flyweight pattern implementation for handlers: make them stateless objects. We probably just need to have an internal protected method that takes a context object, and getValue() just calls that with $this. Problem solved.
Once that's done, however, we should run up against the original caching issue described in the summary.
Comment #47
Crell commentedOK, I went ahead and did the refactoring I mentioned in #46. Now we're getting the error I'd expect.
The goal for this issue: Make this test pass. :-)
Hopefully that clears things up.
Comment #48
pounardThanks for the tests, will try to do this during the weekend.
Comment #49
pounardPush the
1100198-handler-inheritancebranch on sandbox.Patch attached.
Test passes. I used the exact methodology I described above: handler inheritance.
For this to work, I had to add a new public method
ContextInterface:: getHandlerAt($context_key)else the interface would have been desynchronized with the handler inheritance feature.I factorized some code to make it smaller:
Context::hasParent()andContext::getParent()because now this code is used more than once (makes thegetValue()function much more readable though). Don't worry though I kept them protected because I know you don't want your API to be usable directly by developers and you don't want to pollute the interface (this is a huge error for DX IMHO but I respect the actual spec).Context::createHandlerFromInfo($handlerInfo): this may not a real benefit right now, but this a nice API method that can be used outside, we should keep it for DX sake. Plus it documents the handler array description nicely.Context::getHandlerAt()implementation is able to traverse recursively through parents and locally set the found handler reference in the current context: parents traversal can only happen once.It definitely makes the
getValue()code a bit smaller and much more readable, at the cost of only 2 additions functions calls.Comment #50
pounardDefinitely needs review since the attached patch works as expected and test passes.
Comment #51
pounardAh, and by the way, it actually implements a lot of details of in #1290454: Proposal: Remove the parent object identifier with a direct reference and #1297142: Remove stack logic complexity from the Context object business functions believe me, we will get there eventually.
Comment #52
Crell commentedWhy is this static?
params must always be an array. If there are none, then it's an empty array. That mades the code flow simpler.
It looks visually like this changes the traversal algorithm. It searches down the stack for a handler (ignoring values) first, before trimming the string to try it as arguments. That's why the tests pass. That's not the algorithm we want to have.
Comment #53
pounardIt can be used a pure helper, I don't see why it should be kept as an instance method here. I hate global state but I do not hate static: my function here is stateless, it fits well as a static helper.
It doesn't complexify the code path, it adds a simple if() that doesn't goes much deeper. If you don't like the if statement you should have said it from the begining.
Forcing parameters to be a constructor param in interface signature is a an error, IHMO, either make it optional, or proceed to a setter injection maybe, or a more elegant method could be the params key to be a constructor parameters array and send them via call_user_func_array().
I usually don't like call_user_func_array() (I hate magic, most of the time), but in this particular case, it's code which will be run less than 20 times in the full execution flow (I don't really know how much but clearly an insignificant number of times) and it has the benefit of allowing the handler interface to be constructor free: it allows developer to use their object the way they want, as long as it implements the getValue() method, it allows more flexibility for implementors.
Besides, having a constructor free interface and a method to set instances instead of info array would definitely be a huge benefit. In the cases where the handler are small and doesn't carry a lot of data removing the lazzy instanciation would be a nice shortcut and probably would be faster anyway.
You are right for the traversal logic, didn't notice that: at least now we know there is no tests to check this! I will fix that.
Comment #54
Crell commentedpounard: The code currently works on an empty-array model. Please do not change it. That is off topic for this issue, and I am not going to accept code that changes that logic.
It looks like we don't have that traversal checked properly, you're right. I guess we need another test for that.
Comment #55
pounardAdded the missing tests there #1318164: Adding traversal logic unit testing
Comment #56
pounardPushed the traversing logic test on my branch also. Removed the params array since it doesn't belong to this issue. Fixed some whitespace problems. Patch attached here as well.
Comment #57
pdrake commentedI spent some time yesterday and today working on this - I think my patch approaches this a bit differently, but all of the tests pass (including the traversal logic test linked above).
Comment #58
pounardI though of that approach as soon as I saw Crell's patch (I think that he wanted us to do exactly this), but it needs the parent to be a Context instance and not a ContextInterface instance, because the retrieveValue() is not part of the ContextInterface interface.
It's clearly efficient, but it's a technical choice that close some doors arround the API extension.
EDIT:
Why are you cloning the parent and not passing it directly?Because else the value would be cached in the parent context, wrongly, obviously...Comment #59
pounardFollowing @pdrake's patch, did the same, but exposing the retrieveValue() exposing it into the public interface which allows to use the ContextInterface as parent and not the Context class directly.
I renamed retrieveValue() as findValue() which seems more explicit since it's a lookup. "Retrieve" and "get" are almost synonyms in the english language and naming this way is confusing.
In order to avoid the parent clone, I ensured that findValue() does not cache the value internally. This value is cached only in the getValue() method when getting back the value from findValue().
Also embeded the #1318356: Fix the wrong actual traversal logic algorithm and provide unit testing patch which allows to break the lookup algorithm as soon as possible.
Works fine and all tests pass as expected (thanks to pdrake).
I still do not like this method and would prefer to expose more atomic handler handling functions into the interface and build a generic retrieval method based on the interface only instead: this would be a lot cleaner in many ways.
EDIT: Did this patch over my own branch, will provide another one in 5.
Comment #60
pounardFixed patch attached.
Comment #61
pounardAttached a fixed patch that does not changes the traversal logic, but keeps all the previous patch modifications. It removes the traversal logic testing from it since there's an updated patch elsewhere and I don't want to create a merge hell.
EDIT: Pushed on 1100198-handler-inheritance-pdrake-version new branch.
Comment #62
Crell commentedI'm certain this needs to be updated at this point, after /core-mageddon and the stack-object branch went in.
Comment #63
pounardYes indeed, will do in the next few day, maybe tomorrow.
Comment #64
pounardFixed everything to match actual git state. All tests pass.
Comment #65
pounardComment #66
Crell commentedIf the keys are always fixed at 2, why not break it up and pass in the class name and params as separate parameters? That would make this method a bit cleaner.
The language here is b0rked. It also is too long for the short-description. The second sentence should be its own separate paragraph and cleaned up to be more descriptive than just "this does stuff".
Not a parent. A valid handler object.
Why is this public? It's only being used internally. It can easily be just protected.
I'll admit, I don't fully comprehend at the moment how or why this refactoring would fix the caching issue. :-) I'll run actual tests on it shortly.
Comment #67
pounardI made the hasParent() method public because it can safely be used for any other mean. Doesn't hurt since the getter and the setter already are public.
Comment #68
pounardIt does because the values are retrieved by the findValue() method using the context passed as parameter, so the handler that runs will run the given context (not the one that run the method). The basics here is that if no value and no handler stop is found, then it run the exact same findValue() function over the parent, giving itself as context to run with.
The findValue() does not cache anything then, because when called over the parent, it can return a value relative to another context than itself, so we would cache invalid values. Caching is only done in the getValue() accessor.
Another path to follow to solve the issue would have been to be able to get the handler from the parent, and not get the value from the parent, but the implementation I first did was more intensive because it triggered more handler lookups, so I stayed on pdrake's version of the algorithm.
I still would prefer the other way arround, but this one works. I just really don't like dealing with the context as parameter of findValue().
Comment #69
Crell commentedOK, I get it now. The difference is that rather than caching all values that get requested, the only value that is cached at all is the initial value requested. Any intermediary values are not cached.
That does neatly solve the problem if incorrect values being cached, but it solves it by not caching intermediary values at all. :-) I fear the performance impact of that.
However, that is a step up from where we are now, I think. So I'm going to make the changes in #66, and then merge this. Then we can open a new follow-up issue to figure out how to sanely cache intermediary values. Stand by!
Comment #70
Crell commented#69 has been done.
Comment #71
webchickPer catch, and blessed by Larry, moving this and all other WSCCI issues to the Drupal core queue.
Comment #72.0
(not verified) commentedAdd link to unit test that demonstrates the problem.