One of the things we want to be able to do is cache context information that is derived for the duration of the request. That is a simple, obvious performance boost.

However, there's a problem when one piece of context depends on another. For example:

- "current node" is a context item that depends (originally) on the GET string.
- "current OG" is a context item that depends on current node.
- Block A calls current OG, which in turn calls current Node. Both are derived and cached.
- Block A calls sub-block B.
- Block B mocks current node to something else and passes that on to Block C.
- Block C asks for the current OG.
- Block C now gets the wrong OG, as it was derived from the current node up in Block A, which is different.
- Hilarity ensues.

I'm not sure how to resolve this. Someone at DrupalCon Chicago suggested that handlers explicitly declare their dependencies, too, so that we know how to intelligently clear (or at least mask) the cache internally. That's a possibility, but I worry about the complexity that would create.

Someone please help figure this out! :-)

See the patch in #47 for a unit test that fails due to this bug. The goal of this issue: Make that unit test work.

Comments

pounard’s picture

Current OG won't probably not only depend on the node, but could depend from many other things. OG and general "Node handler" should probably totally unlinked because the only thing an OG is really tied to is the entity on which the field has been put.

Maybe the solution is to be more "static". If an OG context is set, it probably should'nt be overridable. Nevertheless, the dependency problem remains, maybe each module that provide a context implementation should be able to set its context the sooner as possible and shouldn't allow it to be modified (maybe except at this particular time where it sets it, but shouldn't be overridable once set).

I'm looking for another dependency that could cause this, I don't see one right now.

If you implement a HMVC derivate, each block would have its own controller, therefore they would each one of them build their own context from the original request data without being annoyed by what the other's did. This would solve the Block B override for Block C.

Crell’s picture

Except that a block being able to override the context for its sub-blocks is very specifically a feature we want to have. So that kind of nesting is not something we can eliminate. Block B overriding context for Block C is a feature, not a bug. :-)

Also, in most cases we're not setting a context value. We're setting the handler that will derive the value on request, and not bother if it's never requested.

pounard’s picture

Ok, I have a better sighting now. In the case you described, if Block B changes the current Node context, it just shouldn't interfer with current OG. If current OG does not depends on Node (only for its init at best) then it shouldn't change because of block B.

If contextes are set and can be fetched from the "parent context" then when Block C asks for the current OG, he gets the current OG from A. If he asks for the current Node, he gets the current node from B. Just avoid too much dependency and it will work.

Contextes should be "atomic" in a way, OG is OG, Node is something else, maybe OG was derivated from the Node while spawning, but once spawned the Node override shouldn't alter the initial OG set, except if Block B explicitely override it.

Crell’s picture

It sounds like you're proposing that we forbid one context key to depend on another. I don't think that's feasible.

In the default case, the "current node" depends on http:get:q.

In the typical case, "current OG" is "the first OG that the current node is assigned to".

If the current node changes, then current OG would by necessity change.

If we do not clear the cache , we run into this problem:

Suppose we have a block region, Foo, configured with two blocks: One shows the body of the "current" node (or what it thinks is the current node), and the other shows the titles of the most recently posted nodes in the same OG as the main node. That's a rather common use case, actually.

Now, that region is inside another region, Bar, that has mocked "current node". There are many reasons it may have wanted to do that; the most obvious example is displaying 2 nodes side by side. (I've done this before for a client, and it was highly ugly.) We now end up with the following situation:

- The main "current node" for the request, from the URL, is node 5.
- Region Bar has mocked current node to be 12.
- Our "titles of nodes in the same OG" block will request "current OG". If another block before it has requested current OG before, it will get the OG of node 5. If not, it will get the OG of node 12.

Indeterminate context like that is precisely the sort of problem we want to avoid. That becomes a race condition. It is even worse once we start caching blocks based on their context information, because then the cache state of one block will affect the rendering of another block. That's just no good at all. :-)

It's a tricky problem. I'm not sure how to resolve it yet other than explicit declaration and clearing of dependencies, which makes me cringe.

pounard’s picture

I'm not sure there is a clean solution, what you attempt to do is kind of tricky. Business context is hard to keep as you describe it.

I'd rather go towards a more simple solution where context are basically set up using the input request for most of it, then whatever is the current block they would be mostly the same.

Why on the earth would you want to have dual OG context on the same user input? I mean, OK, your samples are explicit, but this is rather complex and in the end, the problem is obfuscated itself.

Let me rewrite your first sample, from the original post:

- "current node" is a context item that depends (originally) on the GET string.
- "current OG" is a context item that depends on current node.
- Block A calls current OG, which in turn calls current Node. Both are derived and cached.
- Block A calls sub-block B.
- Block B mocks current node to something else and passes that on to Block C.
- Block C asks for the current OG.
- Block C now gets the wrong OG, as it was derived from the current node up in Block A, which is different.
- Hilarity ensues.

There is a flaw in this statement: either OG always derivates from a node, if OG is enabled, each time you have a "context node" if an OG is tied to it, you will have the linked OG as context, which would give you (in caps and between brackets, my changes):

- "current node" is a context item that depends (originally) on the GET string.
- "current OG" is a context item that depends on current node.
- Block A calls current OG, which in turn calls current Node. Both are derived and cached.
- Block A calls sub-block B [Here context override happens]
- Block B mocks current node to something else and passes that on to Block C.
- Block C asks for the current OG. [Here another context override happens]
- Block C now gets the THE OG OF MOCK NODE.

If I assume that OG always derivate from a node, if OG module is enabled, then asking the mock node context would give me the OG context of that node (totally legal here), simple inheritance.

Now, let me assume that OG IS NOT tied to a node, but arbitrary set by the request, you have to solutions:
First, let the API do its job:

- "current node" is a context item that depends (originally) on the GET string.
- "current OG" is an independant context (still derivated from the GET string).
- Block A calls current OG, BUT DO NOT CARE ABOUT NODE. Both are derived and cached.
- Block A calls sub-block B [Here context override happens]
- Block B mocks current node to something else and passes that on to Block C.
- Block C asks for the current OG. [Here another context override happens]
- Block C now gets the THE ORIGINAL OG FROM REQUEST

Thus, if Block B doesn't care about changing the context.

But now, if the business operation of Block B is all about OG, Block B controller would attempt to find the maching Mock Node OG, which would give:

- "current node" is a context item that depends (originally) on the GET string.
- "current OG" is an independant context (still derivated from the GET string).
- Block A calls current OG, BUT DO NOT CARE ABOUT NODE. Both are derived and cached.
- Block A calls sub-block B [Here context override happens]
- Block B mocks current node to something else AND CHANGES OG CONTEXT and passes that on to Block C.
- Block C asks for the current OG. [Here another context override happens]
- Block C now gets the GOOD OG, as it was derived from the BLOCK B NODE.

What I mean here is:

  • Contextes are never inderterminate, always take the later. If I got 2 OG, one at request input time (say A) and the other set later (say B) and C inherits from B, then C have the B OG context (overriden), it's purely OOP way of thinking.
  • Most of business contextes will not derivate from the user input itself, but will be set by modules that does business stuff, so once a module override works in its scope, it overrides all that happens in there

And last, but not the least:

  • This is not a "dependency" problem, this is a "scope" problem, pure visibility. The way to solve it is by implementing a stack.

Contextes, dependent or not should be stacked. Each level of the stack would have its own cached override (only if an override happened), but still cached. Once the vertical browsing of the stack ends (let's say here something like the decorator pattern maybe) each time you level one up, you unstack the current cached overrides and throw them.

Each controller in your PAC (this can happen only if you have a real HMVC or derivative) will then have the responsability of stacking context when entering it, and unstacking it when going out.

I could make you a simple sample code if you want, based on your actual module, but not tonight. But as I it, this seems quite a simple problem to solve, it's like variable visibility in most languages, all about local override for a particular subtree.

Your original problem here is maybe about the fact that OG depends from a node, if so, then if a node is the context then I implicitely have a new OG. But if my OG is not derivated from a node, then another stuff would create the OG context, but if each level of the stack (each block or region) has its own controller, each level of the stack would then override the context depending on the business stuff it has to implement, therefore the problem is not your to solve, but let the specific implementations doing it.

EDIT: Some typoe.

pounard’s picture

And another stuff, never talk about cache when trying to implement a complex software design, think pure algorithmics first, implement it, then optimize it later, it will clarify the pattern, let you write a good documentation for others, then once done, you will optimize like a madman and leave the code obsucated, but with a good documentation.

EDIT: I'm sorry my english is not very good (talking about the upper post).

Re-EDIT: Don't take it wrong if I argue a lot, this is because this is really an interesting topic and I think you have a lot in mind that worth the shot to tell and mature with other's mind.

pounard’s picture

StatusFileSize
new10.52 KB

As I promised, here is a Front controller/dispatcher loop/controller/context implementation with an actual working use case.

It as greatly inspired by Zend framework, but fully hand written in three hours for fun. It's more simpler than stuff that ZF can offer, but it also take into account some problematics that ZF actually doesn't (business context and ESI integration, which are two important stuff for the Butler project). It actually works.

Here are the components it defines:

  • A response interface definition and its base implementation. Also provide a page implementation that really render a full page (working).
  • A business controller definition, and its base implementation, taking into account caching problematics for later ESI integration and page build using HTML pieces aggregation that can all come from cache.
  • A simple business context definition, that only is here in order to exists. It's been into account directly into controllers, which provide contextes inheritance using a chain or responsability pattern.
  • A dispatcher definition, which as the responsability to be able to give the render order to controllers. It actually provides three implementations:
    • A no caching implementation (purest one)
    • A basic caching implementation that actually works (but caching policy could be better here)
    • And a sample ESI implementation not fully working (not crashing neither, it just create foo URL's instead of working one, because of the caching and context restoration problem).
  • A front controller, that is responsible to aggregate the HTTP query (or other) initial context, spawn the dispatcher, request, and response objects, and run the initial dispatch loop that is the actual born of the full HMVC pattern execution flow

It also notice the View concept, which seems to be already fully implemented in Drupal, at least for HTML using the rendering API and the theme system.

This a basic implementation, with no optimizations (not much). It performs well, the provided test case defines some contextes, 3 stacked controllers and context inheritance, execution time is not visible on my development box.

This is only a proposal, with a centralized cache handling and potential ESI support directly at the controller level, which allow any specific implementation to benefit from those implicetely.

Hope you'd like it, or hate it, at least it will give some thougth about the full butler project.

eaton’s picture

Now, let me assume that OG IS NOT tied to a node, but arbitrary set by the request, you have to solutions:

I'm coming to the conversation a bit late, but I think this isn't necessarily a safe assumption -- very very few things in Drupal are explicitly set by the request, and unless you're directly visiting the primary landing page for an organic group, "What is the current OG?" can depend on a number of different factors -- and some of those factors are a bit fuzzy even once they're known. One complication is that a given node can belong to multiple organic groups, and a given user can belong to multiple organic groups.

When Crell and I first started talking about the issue of context, we basically broke it down into three pieces:

  1. Context that is explicit in the request that instantiated Drupal. For example, raw HTTP headers, content-type information from the request, cookies, and so on. After some discussion, we think it's worth also bringing along pre-loaded entities that are explicitly named in the request. For example, on the path: node/1, that node is being explicitly named as part of the request. On the page user/2, that user is being explicitly named. At least right now, this class of context information maps to a couple of core mechanisms: session.inc, and menu autoloaders in menu.inc.
  2. Context that can be discovered by examining the request, but is not explicitly specified in the request. Core itself provides several examples of this type of contextual information. "Am I in the Blog section?" "Am I in the forums?" "What particular forum section am I in?" "Whose blog am I reading?" "Am I on an administrative page?" and so on.
  3. Information that can be discovered about a particular entity that is part of the current context. For example, "What taxonomy terms has this node been tagged with?" or "What user roles is this particular user in?" These really aren't context at all -- historically, they've just been mixed up with context quite often, and making sure that it's treated as a separate class of information from #2 is important.
  4. Information about the currently built response to the incoming request. For example, the current request may have asked for a particular language via its http headers, but the current response may use a different language entirely based on other rules. Based on Crell's principle of not-allowing-context-to-be-changed-after-it's-defined, we don't want people going in and explicitly altering the request language to trick other code into using a different language. A lot of things, like the currently selected theme, are about the type of response being built rather than the context of the request. Keeping that distinction clear is important.

Coming back to your statement above -- ie, 'The OG is arbitrarily set by the request,' I think we have to distinguish which of the three ways we are arriving at the OG. Is it because someone explicitly visited a path that maps to the primary entity for that OG? (for example, admin/groups/groupname, or something like that)? Is it because someone wrote custom code that explicitly requested "The Current Organic Group" from the context object, and the OG module was able to answer the question based on rules specific to the OG module? Or is it because the currently logged in user belongs to an OG, and custom code retrieved that collection of OGs? All, technically, could be considered context but only the first one is really part of the request.

It sounds like the complication that is causing concern comes from the fact that there could be multiple contextual questions asked that return the same data type. (For example 'Who is the currently logged in user' and 'Who is the administrator of the current section of the site'). If we're really talking about request context, though, that shouldn't cause any problems. The only hitch comes when we want to pass information to the response system, or a rendering layer, and we have to feed "A user" to a block that renders account information, or pass "A node" to a block that renders a teaser. That's what page callback functions are responsible for, and what page manager in the Panels/CTools world is responsible for. The context system itself can't be smart enough to handle all of that logic -- it just has to handle providing answers to the questions it's asked.

Am I missing something?

pounard’s picture

No I think you are right. The context system can't guess the business logic itself. Looking at ZF front controller plugin system (I will contact people responsible for this part of research and post on groups, don't worry), it explicitely tell the developers to use the preDispatch() operations for settings their context, then handling this kind of business logic inside the controller callback (in ZF the controller action, in Drupal the page callback).

The framework in any case cannot be responsible for handling the business logic, same problem as the UX one.

I think context is a concept where the inside data is undetermined for the framework, and only the business oriented modules can know what they get or what to do with it. If you want to handle context in the framework, the only thing you can do is a registry of named piece of data that business oriented modules have set by themselves prior to run the final controller action. I think the only matter of business logic the framework could do is to impose the fact that contextes should be named and serializeable, in order for it be able to build reversible and predictable cache keys; Either for spawning single controller action without the page context, either for extended caching abilities.

Crell’s picture

I think context is a concept where the inside data is undetermined for the framework, and only the business oriented modules can know what they get or what to do with it.

I would agree. While the intent is for the context object to focus on "the incoming request and things derived from it", there is, realistically, no way that we can enforce that. So we shouldn't try to shoe-horn it in there. Context is "information that meets these criteria that may be needed in multiple places".

catch’s picture

Also arriving very late here.

I'm not sure what level of caching you're trying to talk about here - is it just static caching during the page render? If that's the case I agree with pounard that trying to build caching into the context object itself isn't necessarily the way to go. However I'm very interested in us having a context system that allows for much better approaches to caching in core than we currently do. This is the other half of #636454: Cache tag support, and IMO it's something that needs to be in mind from the start of this.

When we're rendering entities, assuming we keep the basic pattern of entity_type, id, view mode and language as being the minimum data required to get a rendered version of an entity as a string (which is to an extent what page caching has now in terms of efficiency, but most other Drupal caching doesn't), then we want to avoid loading that node until it's absolutely necessary - in most cases on cache misses.

This would mean the context object should only be storing a unique pointer to a context (entity type + id for nodes, view mode or similar, language code etc.) - the context plugins or whatever entities and languages end up being in relation to this, would then be responsible for handling their own caching.

Let's say we have the node/123 which has multiple blocks on hit:

4 blocks are based on the current node.

1 block lists nodes that are referenced directly

1 block lists nodes that were recently posted in a taxonomy term referenced by the node.

It's completely fine if the four blocks listing the current node, just get entity type, id and view mode. Calling node_load() is nearly free once it's been done once. If they are rendered by ESI callbacks, that callback is going to need to load the node anyway (whether the context system handles this or not).

If we have a list of nodes based on a relations field, the block responsible for generating this list is going to need the loaded node on a cache miss - since the related nodes are stored in that node object.

However, if we only cache that block as a string of HTML based on the 'current node id', then every time that cache is invalidated (currently in core, this would be on every single entity save), we need to go and load all the nodes and rebuild it from scratch.

Much more efficient is to cache the block (which gets wiped every five minutes), but also the component parts of that block at the highest level possible. So the best case is that on a cache miss for the block, we get a cache hit on the full rendered string for each of the nodes in that block - and can do this without loading the nodes at all (and potentially have ESI callbacks for those).

This way, when the highest level cache (the block) is invalidated, we have the list of nids and view modes without doing any work, then I can fetch the string corresponding to those - no rendering nor loading needs to happen for those individual nodes. This absolutely requires that the context be a unique pointer rather than a full object- since if I have to do entity_render($node, $view_mode, $language) I can never escape loading the node somewhere before I can get to the cached rendered string, even if it's not needed. The page cache allows that kind of efficient look up now, the issue with it is invalidation.

If context plugins are responsible for their own caching, then we would have to be doing some nasty stuff in the context object itself for that to need caching in there, although of course it depends how things pan out.

Also it is cross-posting/off-topic but related I have been thinking a lot about the caching API itself recently, specifically allowing lower level things (like taxonomy terms attached to a node) to invalidate higher level caches (like the rendered output of that node). Most of that discussion is here at the moment #636454: Cache tag support. It is very likely we'd want allow cache tags to be derived from context, and allow those cache tags to be pulled upwards (similar to how #attached css and js are pulled upwards via the render API now - so if lower level items are cached, their css/js still appears on the page - this is going to be an issue with ESI too).

jherencia’s picture

Subscribing...

jdubbwya’s picture

subscribing

damienmckenna’s picture

At Bonnier dalejung built a custom caching engine that had two parts - the first part cached objects with a set of identifiers to identify the object and other objects which were loaded by it (noderefs, nodes loaded in a view), e.g. for node/123 it might say "node;node:mycontenttype;node:123" and for a view it might say "view;view:my_view;node;node:123;node:124;node:125"; the second part would then trigger on any changes and invalidate anything that matched that object, e.g. editing node/123 would invalidate the my_view output, or editing the mycontenttype content type would invalidate all nodes of that type. There were also lots of other options, lots of hooks.

dmitrig01’s picture

Just thought of something -- this could be too complex, but it should be possible to tag onto offsetGet to keep track of what other context a given piece of context is derived from. This example would be completely dysfunctional, but it's just the basic idea.


class Context implements ArrayAccess {
  protected $nested = FALSE;
  protected $derived = array();
  protected $currentDerived = array();
  protected $cache = array();
  function offsetGet($key) {
    if (isset($this->cache[$key])) {
      return $this->cache[$key];
    }
    $nested = $this->nested;
    if ($nested) {
      $this->currentDerived[] = $key;
    }
    else {
      if (!empty($this->derived[$key])) {
        foreach ($this->derived[$key] as $derived_key) {
          unset($this->cache[$derived_key]);
        }
      }
    }
    $this->nested = TRUE;
    $result = $this->get($key);
    $this->cache[$key] = $result;
    $this->nested = FALSE;
    if (!$nested) {
      $this->derived[$key] = $this->currentDerived;
    }
    return $result;
  }
}

dmitrig01’s picture

Er sorry, it's late. I meant that this:

      if (!empty($this->derived[$key])) {
        foreach ($this->derived[$key] as $derived_key) {
          unset($this->cache[$derived_key]);
        }
      }

would probably happen when changing the node context, where $key = 'node'.

Crell’s picture

If we can track that information without too much overhead, that probably would solve the issue. I don't know if we want to try and derive it automatically from usage, or simply require handlers to implement a method that declares what context keys they're going to care about. That may simplify the implementation (and performance is key here).

Dmitri, can you try implementing such tracking on the current codebase? I can give you access to the repository to just make a new working branch for it. Let's see if that can resolve the issue without too much of a performance hit. *crosses fingers*

dmitrig01’s picture

Sure, I'd happily try it out.

Crell’s picture

You've got write access. Make a new issue branch and go to town. :-)

dmitrig01’s picture

Status: Active » Needs review

Great, I've committed and pushed an approach that I think should work. I'm attaching the patch here just for reference.

It does involve some trickery between parent and child contexts, namely if a child context can't find anything and falls back to the parent, if the parent needs to instantiate a new handler, the new handler's assigned context is the child context. I *think* that this is a good thing, and it's certainly needed for this to work.

Another very minor issue I noticed is everytime a handler object is hit (i.e., http:get:q, http:get:foo), a new handler object is instantiated. Wouldn't it be better to cache handler objects, and just change $handler->context, $handler->params, etc, as needed (which shouldn't be very often)?

dmitrig01’s picture

StatusFileSize
new4.81 KB
Crell’s picture

Hm, I thought I had the code persisting the handler objects already. Aren't they?

Although, if we are caching the returned values, do we need them a second time?

dmitrig01’s picture

We do need them a second time, for example if we need http:get:q and http:get:foo, we need to hit the http:get handler twice. Anyway, I'll open a new issue for that.

Crell’s picture

Status: Needs review » Needs work

Ugh. Sorry for taking so long to review this, Dmitri. :-(

+++ b/butler.module
@@ -229,6 +229,10 @@ class DrupalContext implements DrupalContextInterface {
+  protected $nested = FALSE;
+  protected $derived = array();
+  protected $currentDerived = array();

These need docblocks to make it clear what they're doing.

+++ b/butler.module
@@ -277,11 +296,18 @@ class DrupalContext implements DrupalContextInterface {
+        $this->handlers[$context_key] = new $this->handlerClasses[$context_key]['class']($child ? $child : $this, $this->handlerClasses[$context_key]['params']);

Let's not embed a ternary like that. It makes the code way too hard to read.

+++ b/butler.test
@@ -149,12 +149,21 @@ class ButlerTestCase extends DrupalUnitTestCase {
     // Calling the same context variable should always give the same value
     // due to immutable caching.
     $this->assertEqual($butler['foo:bar'], $butler['foo:bar'], t('Identical context keys always return the same value.'));
+    $this->assertEqual($butler['inherited'], $butler['foo:bar'], t('A handler can depend on context from another handler.'));
+
+    $butler2 = $butler->addLayer();
+    $butler2->registerHandler('foo:bar', 'ButlerTestCaseHelperOne');
+    $butler2->lock();
+    $this->assertEqual($butler2['foo:bar'], 'one', t('When reregistering a handler, the context should have its value changed.'));
+    $this->assertEqual($butler2['foo:bar'], $butler2['inherited'], t('A dependent context should have its cache cleared when the parent is changed.'));
   }

This is testing something different than the original test method, so it should be a new test method. We're using UnitTestCase here so each test is nearly instantaneous. :-)

Also, the get() method could use some better internal docs. I can't easily follow what it's doing and why, which means future readers won't be able to follow it either. (A better name might be good, too.)

Crell’s picture

Project: Butler » WSCCI
Version: 7.x-1.x-dev »

Refiling to the sandbox queue...

Anonymous’s picture

subscribe.

mr.moses’s picture

.

Anonymous’s picture

Assigned: Unassigned »
matason’s picture

suscribe

pounard’s picture

I think this problem is not solvable with the current design.

If the foo:bar value is set by a handler in the local context (say the root context),
Then the foo:baz value is a derivative of ht foo:bar, using its own handler (still root context),
If I create a new context, overriding foo:bar (changing value) but not the other:

Non cached trace:
1. Ask foo:baz to new context: no cache value
2. Handler lookup in new context: no handler
3. Ask parent (root context) derivate foo:baz from local foo:bar
FAIL

Now, with cached trace:
1. Ask for foo:bar and foo:baz in root context: both values are cached
2. Ask for foo:bar in new context: cache lookup says no.
3. Handler lookup in new context: override found, get new value and cache it.
4. Ask for foo:baz in new content: no override found
5. Handler lookup, void, ask parent (root context): root context gets value based on local foo:bar
FAIL

This is not due to caching, this is due to actual overall design.

Caching which value as been derivated or not in each context will create new arrays, bloat a bit more memory, but I don't really see where it comes to help us without complefying a lot the whole algorithm.

I think we have to change the whole high-level design, we have two choices here:
1. Handler inheritance
2. The error shall pass
I won't talk about 2. since it won't make anyone happy to voluntarely let bugs in:)

Let's see more into 1. Handler inheritance:
This solution would assume that handlers set into root context can be used directly by new contextes: we need a getHandlerAt($contextKey) method, see:
1. When new context don't have a value: handler lookup
2. If no handler, direct parent handler lookup: found a handler!
3. Use handler in child context instead of parent context.

This would even be more performant, worst case scenario is:
Current design:
1. New context cache lookup: no value
2. New context handler lookup: no value
3. Root context cache lookup: no value
4. Root context handler lookup: got one!
5. Return value

Handler inheritance:
1. New context cache lookup: no value
2. New context handler lookup: no value
3. Root context handler lookup: got one!
4. Return value

And this, by using caching or not, would remain the same algorithm (non cached version just drops out the 1. operation in both case and the 3. operation in second case).

Plus, I'm wondering if caching itself might be provided by handlers instead of context (except for arbitrary set values): using a handler that calls an entity controller would get a cache node (almost everytime) it only adds 2 function calls, consume less memory, and avoid static cache reset problems.

The thing that worth the most being locally cache remains definitely the handlers on each key position: this the handler lookup algorithm is complex and mess up a lot with arrays it's always good to avoid that as much as we can.

pounard’s picture

After re-reading the patch from dmitrig01 I guess that it's what he has done, except I don't see the use of derivated array.

I would definitely go for a more granular approach: using getHandlerFor($contextKey) instead of array operation over the handlers() method (which IMHO is poorly named, it should be getHandlers() for being more explicit).

By using the handlers() method, you recopy and array (which is not recopied until you modify it thanks to the PHP copy on write feature) over which you do array operations: I guess replacing it by a getHandlerFor($contextKey) would be more readable and avoid some array operations there.

Because handler discovery algorithm goes from specialized to non specialized, I guess that non specialized, i.e. "foo" handler, would override parent more specialized handler, i.e. parent "foo:bar" handler, this is a question: do we support this?

A replacement of my proposed algorithm would be to use a getHandlerAt($contextKey) instead, difference between For and At is that At checks the strict index given while For would check the less specialized methods.

Using the *At() allows then to ask the parent handler for each level while we attempt handler discovery:
1. "foo:bar:baz" gives nothing
2. Ask parent for "foo:bar:baz": nothing
3. "foo:bar" gives nothing
4. Ask parent for "foo:bar": got a handler!

If we don't end up with "horribly:long:context:with:a:thousand:parts:in:it" the performance impact of the extract function call will definitely be neglictable.

For this more granular approach, we might want to separate handler instanciation from the handler lookup function and create its own function taking the exact context key as parameter and does nothing if nothing is strictly set at this position.

It makes the handler lookup algorithm loose a lot of code and make it a lot more explicit and readable too.

pounard’s picture

This proposal opens a new problem: handlers are aware of the context they actually use:

Crell commented the code as this:

abstract class HandlerAbstract implements HandlerInterface {

/**
 * Reference to the context object.
 *
 * Note: This creates a circular reference. We should probably get rid of it
 * and pass it every time.
 *
 * @todo Get rid of this property and avoid the circular dependency.
 *
 * @var Drupal\Context\ContextInterface
 */
 protected $context;

In HandlerAbstract class code.

We could change the getValue(array $args) method signature as this: getValue(array $args, ContextInterface $context = null) so the handlers objects would become stateless objects (and removing circular dependency): it's basically almot the flyweight pattern implementation.

This way, we win on three sides:
1. Removes the circular dependency
2. Make context being stateless and shareable: less memory consumption
3. We can implement the handler inheritance algorithm without having to get the handlers info definition array from parent: no more info array in this algorithm, huge win.
4. Additionally, we restore context being part of HandlerInterface instead of HandlerAbstract: this makes the design consistent (it is inconsistent as in actual implementation: this is a huge flaw).

Handlers, of course, would still be instanciated only on a per-needed basis.

pounard’s picture

Additionally, this would also allow to set already isntanciated handlers to the context instead of a info array without altering the execution flaw and without having to patch the offsetGet() anymore: it's extensible and iisolated: huge win again.

pounard’s picture

StatusFileSize
new12.11 KB

Here is a first patch:
1. Introduces getHandlerAt() function.
2. Move handler instanciation into a specialized method.
3. Removes the ContextInterface from HandlerAbstract constructor, moves it to getValue() method signature.
4. Fixes unit tests to work with these new signatures.
5. Allow registerHandler() to accept a HandlerInterface instance as second parameter, also proceeds to more consistency checks.

pounard’s picture

StatusFileSize
new13.38 KB

And here is the full patch:

1. Fixes bugs left in the previous
2. Added parent call to getHandlerAt() while descending the specialization chain.

Handler inheritance is complete! All tests pass. Now I need to write test for the use case of the issue.

FIY I did push a branch in the WSCCI git, named context-inheritance.

pounard’s picture

Also pushed this patch, which add tests and fixes some PHP notices being thrown.

Crell’s picture

+++ b/includes/Drupal/Context/Context.php
@@ -102,36 +104,7 @@ class Context implements ContextInterface {
+      $this->getHandlerAt($offset);

Shouldn't the return value be assigned to something?

+++ b/includes/Drupal/Context/Context.php
@@ -180,18 +153,121 @@ class Context implements ContextInterface {
+    if ($class instanceof HandlerInterface) {

A string that is the name of a class will never be an instanceof an interface. I think you probably mean

http://php.net/manual/en/function.class-implements.php

+++ b/includes/Drupal/Context/Context.php
@@ -180,18 +153,121 @@ class Context implements ContextInterface {
+      if (isset($params)) {
+        $this->handlerClasses[$context_key] = array('class' => $class, 'params' => $params);
+      } else {
+        $this->handlerClasses[$context_key] = array('class' => $class);
+      }

Let's not do this. Make $params default to an array, not to NULL. That simplifies logic all over the place, both for the core system and for handler implementations.

+++ b/includes/Drupal/Context/Context.php
@@ -180,18 +153,121 @@ class Context implements ContextInterface {
+  protected function getHandlerFromInfo($handlerInfo) {
+    if (class_exists($handlerInfo['class'])) {
+      if (isset($handlerInfo['params'])) {
+        return new $handlerInfo['class']($handlerInfo['params']);
+      } else {
+        return new $handlerInfo['class'];
+      }
+    } else {
+      throw new ContextException("Class '" . $handlerInfo['class'] . "' does not exists");
+    }
+  }

This is one of those pieces of code that can be simplified if $params is always an array, not NULL.

Also, I'm not sure why we're throwing the exception here. That would cause a fatal if uncaught, which is no better than a class-not-found fatal. Is this really an improvement? We don't do class_exists() checks anywhere else in core before instantiating (save for the DB layer where it's to decide which class to instantiate, which is another matter).

Also, this method doesn't get a handler. It creates/loads a handler. It should rather be named something like createHandler($handler_info).

And yes, using $handler_info, not $handlerInfo, since it's a local variable.

+++ b/includes/Drupal/Context/Context.php
@@ -180,18 +153,121 @@ class Context implements ContextInterface {
+   * This internal implementation also sets value in order to avoid a new
+   * path traversal later if we need to get the value at this context key.

This concerns me. A method called getHandlerAt() should do nothing but retrieve a handler instance. It should not have extra unexpected side effects.

+++ b/includes/Drupal/Context/Context.php
@@ -180,18 +153,121 @@ class Context implements ContextInterface {
+        } else if (isset($this->handlerClasses[$context_key])) {
+          // Lazzy handler instanciation if it exists as an info description.

Lazy has only one Z. Also, "instantiation" is the correct spelling.

+++ b/includes/Drupal/Context/Context.php
@@ -180,18 +153,121 @@ class Context implements ContextInterface {
+          // FIXME: This is a pure copy/paste from the offsetGet() method, this
+          // why I actively defend the getParent() method returning a null
+          // object implementation if no parent is set, removing all of this
+          // identifier logic from Context business methods.

This is a separate issue. Please leave it out of this branch so we can focus on what's actually changing here.

+++ b/includes/Drupal/Context/ContextInterface.php
@@ -10,15 +12,27 @@ interface ContextInterface extends \ArrayAccess {
-   * @param string $class
+   * @param string|HandlerInterface $class
    *   The name of the class that will handle this context key, unless overridden.

No. Please do not try to slip in additional functionality from other issues. Stick to the issue in question. We cannot properly review it otherwise.

+++ b/includes/Drupal/Context/ContextInterface.php
@@ -10,15 +12,27 @@ interface ContextInterface extends \ArrayAccess {
+  /**
+   * Get handler at the given index, if any. This is strict method and it must
+   * not return a less specialized handler.
+   * ¶
+   * @param string $context_key
+   * ¶
+   * @return HandlerInterface
+   */
+  public function getHandlerAt($context_key);

I do not understand what "This is a strict method" means.

The name is also not self-explanatory. Why At instead of For? For would make more sense, although I frankly don't think a preposition is needed here at all.

Also, this doesn't seem to address caching at all. It seems like good cleanup, but it doesn't address caching. :-(

pounard’s picture

I will open new issues for most of the stuff in there that shouldn't be here, but when I re read the issue original post: the question is not "how do we cache?" but "how do we solve the problem that might annoy us when we'll cache?" so I guess the context inheritance is a good place to start for this issue.

I'm not fond of caching everything by the way, especially not context: most of its information will be cached by the other layers, this is higly dynamic stuff that may spawn dozens of objects: I don't want to end up with that much more cache queries.

In my head, the word "cache" associated to "context" raise the same questions as catch asked in his post upper: static caching in context scope, we already are doing, what other cache layers could we have here, I mean, honestly without bloating memory and without caching stuff which already are cached?

pounard’s picture

Shouldn't the return value be assigned to something?

No because the call only ensures the value is set into the object's internals.

A string that is the name of a class will never be an instanceof an interface. I think you probably mean

No, my code explicitely allow the developer to pass a live instance as registered handler: if you look at the code, you'll see that instead of putting it into the handlerClassses property, it dispatches it into the handlers property directly. Look at my unit tests and you'll see it work flawlessly (I did use direct live instances injections for the additional tests).
This is IMHO a feature we cannot skip: besides, if we go a stateless handlers implementation we have no reason to delay their construction: some of them will probably carry alsmot no property and the info array would then cost more in memory than the instance itself.

Let's not do this. Make $params default to an array, not to NULL. That simplifies logic all over the place, both for the core system and for handler implementations.

$params is not part of the interface, but of the abstract implementation: it shouldn't be used as in design. Custom handlers using directly the interface without having this parameter would receive data they should not to their constructor: so I guess -even if I'm against in most cases- a bit of magic for constructing the handlers here wouldn't hurt, assuming that $params is the constructor parameters array and not the first constructor parameter.

This is one of those pieces of code that can be simplified if $params is always an array, not NULL.

Same as upper, because handler interface and abstract implementation are not sync about this, I prefer to keep the complexity at construct time (it makes sense here).

Also, I'm not sure why we're throwing the exception here. That would cause a fatal if uncaught, which is no better than a class-not-found fatal. Is this really an improvement? We don't do class_exists() checks anywhere else in core before instantiating (save for the DB layer where it's to decide which class to instantiate, which is another matter).

Raising a exception is be better: last chance exception catching goes to into the error handler and display the error page: this does not cause a WSOD. In a perfect world this error page would turn into a full debugging output for developers. We cannot allow WSOD to happen because a module break everything, but we can safely let an exception goes up to the upper layer and set a 500 error with a nice error page.

Also, this method doesn't get a handler. It creates/loads a handler. It should rather be named something like createHandler($handler_info).

Agree.

This is a separate issue. Please leave it out of this branch so we can focus on what's actually changing here.

This is a comment and will be wiped out (I suppose) before any potential merge (if it happens): it highlights the flaws of the current code and that is something we want while we are in heavy development phase: FIXME and details are good, even related to other issues: it ensures that if code forks too much we still have a trace of where are the other issues.

The name is also not self-explanatory. Why At instead of For? For would make more sense, although I frankly don't think a preposition is needed here at all.

Use preposition in function names: editors autocompletion will let you see the preposition directly but not always the parameters: explicit is better for the developer. If we don't use preposition and actually implement both At and For, how can you differenciate them then? What would mean the getHandler() method, seriously? Does it refers to my gran'ma's handler? My house's handler? The handler of Robert de Niro? I think the preposition is a useful bit of information, let's not argue for 2 letters when it makes things really explicit.

Further, my whole point is getting the handler "for" context key is the handler that handles this key, "at" refers to a specific fixed position, so it assumes that it wont give you get a less specialized instance if none found at the exact given position: sounds pretty explicit to me.

Follow-up, opened:
#1297114: Flyweight pattern implementation for handlers: make them stateless objects
#1297132: Allow registering of live handler instances into context
#1297142: Remove stack logic complexity from the Context object business functions

pounard’s picture

Re-focusing: beejeebus & matason, any advancements?

owen barton’s picture

I am not clear if/how either of the current patches resolve the specific issue in the description, but the approach that seems most obvious to me is to flatten and normalize the information. If we do this, it no longer becomes a cache clearing problem. So for example, rather than having just "Current OG" as a top-level context item we would have "OG X". Then we would have a lookup array that notes the OG context that has been previously found for specific nodes (or speaking generally "context parents"). i.e.
- node 1: OG X
- node 2: OG Y
Then, in the hierarchical context store the OG context can be derived using the lookup array without ever needing to reload contexts or cache clears. I am not sure if this is the approach the current patches are taking or if this is compatible with the current architecture - for example this does require that the "context patent" can be reliably encoded in a key/string (contexts may need to determine and return the "parent key" for it to be reliably the same across different calls).

Crell’s picture

I'm not sure what you're saying, Owen. The problem is that we do want to use the same context key, and have it change when adding a layer (and only when adding a layer). So saying "you over here, you use current_og_x, while you over there, use current_og_y", that defeats the purpose.

A consumer of the context API should not know, care, or even be able to find out if the value it's requesting has been overridden by a new context layer.

pounard’s picture

To get back to the original issue, for the original problem as described in the original post from Crell, I think that handlers sharing inside the context hierarchy as I proposed seems an elegant way to achieve this, since now we have stateless handlers.

Crell’s picture

pounard: That reduces the number of objects in memory, potentially, but doesn't change the problem of cached *values*.

pounard’s picture

Oh yes, it solves a part only of the problem. But if you re-read the #30 it also solves a part of the cached values problem.

Any ideas anyone?

Crell’s picture

StatusFileSize
new1.78 KB

OK, to try and refocus this issue, attached is a patch (also found in a branch for this issue) that adds a unit test that demonstrates what *should* work.

At the moment, it actually fails backwards. That is, the override doesn't happen because the handler is using the lower level context! Exciting. :-)

What that means is we need to at least pass the top-most context object down the line when deriving to pass off to handlers. That should be a fairly easy operation now thanks to #1297114: Flyweight pattern implementation for handlers: make them stateless objects. We probably just need to have an internal protected method that takes a context object, and getValue() just calls that with $this. Problem solved.

Once that's done, however, we should run up against the original caching issue described in the summary.

Crell’s picture

StatusFileSize
new4.24 KB

OK, I went ahead and did the refactoring I mentioned in #46. Now we're getting the error I'd expect.

The goal for this issue: Make this test pass. :-)

Hopefully that clears things up.

pounard’s picture

Thanks for the tests, will try to do this during the weekend.

pounard’s picture

Push the 1100198-handler-inheritance branch on sandbox.
Patch attached.

Test passes. I used the exact methodology I described above: handler inheritance.

For this to work, I had to add a new public method ContextInterface:: getHandlerAt($context_key) else the interface would have been desynchronized with the handler inheritance feature.

I factorized some code to make it smaller:

  • Added Context::hasParent() and Context::getParent() because now this code is used more than once (makes the getValue() function much more readable though). Don't worry though I kept them protected because I know you don't want your API to be usable directly by developers and you don't want to pollute the interface (this is a huge error for DX IMHO but I respect the actual spec).
  • Deported the handler instanciation from info into a public static method Context::createHandlerFromInfo($handlerInfo): this may not a real benefit right now, but this a nice API method that can be used outside, we should keep it for DX sake. Plus it documents the handler array description nicely.
  • The Context::getHandlerAt() implementation is able to traverse recursively through parents and locally set the found handler reference in the current context: parents traversal can only happen once.

It definitely makes the getValue() code a bit smaller and much more readable, at the cost of only 2 additions functions calls.

pounard’s picture

Status: Needs work » Needs review

Definitely needs review since the attached patch works as expected and test passes.

pounard’s picture

Ah, and by the way, it actually implements a lot of details of in #1290454: Proposal: Remove the parent object identifier with a direct reference and #1297142: Remove stack logic complexity from the Context object business functions believe me, we will get there eventually.

Crell’s picture

Status: Needs review » Needs work
+++ b/includes/Drupal/Context/Context.php
@@ -80,13 +82,111 @@ class Context implements ContextInterface {
+  public static function createHandlerFromInfo($handlerInfo) {

Why is this static?

+++ b/includes/Drupal/Context/Context.php
@@ -80,13 +82,111 @@ class Context implements ContextInterface {
+    if (isset($handlerInfo['params'])) {
+      return new $handlerInfo['class']($handlerInfo['params']);
+    }
+    else {
+      return new $handlerInfo['class'];
+    }

params must always be an array. If there are none, then it's an empty array. That mades the code flow simpler.

It looks visually like this changes the traversal algorithm. It searches down the stack for a handler (ignoring values) first, before trimming the string to try it as arguments. That's why the tests pass. That's not the algorithm we want to have.

pounard’s picture

It can be used a pure helper, I don't see why it should be kept as an instance method here. I hate global state but I do not hate static: my function here is stateless, it fits well as a static helper.

It doesn't complexify the code path, it adds a simple if() that doesn't goes much deeper. If you don't like the if statement you should have said it from the begining.

Forcing parameters to be a constructor param in interface signature is a an error, IHMO, either make it optional, or proceed to a setter injection maybe, or a more elegant method could be the params key to be a constructor parameters array and send them via call_user_func_array().

I usually don't like call_user_func_array() (I hate magic, most of the time), but in this particular case, it's code which will be run less than 20 times in the full execution flow (I don't really know how much but clearly an insignificant number of times) and it has the benefit of allowing the handler interface to be constructor free: it allows developer to use their object the way they want, as long as it implements the getValue() method, it allows more flexibility for implementors.

Besides, having a constructor free interface and a method to set instances instead of info array would definitely be a huge benefit. In the cases where the handler are small and doesn't carry a lot of data removing the lazzy instanciation would be a nice shortcut and probably would be faster anyway.

You are right for the traversal logic, didn't notice that: at least now we know there is no tests to check this! I will fix that.

Crell’s picture

pounard: The code currently works on an empty-array model. Please do not change it. That is off topic for this issue, and I am not going to accept code that changes that logic.

It looks like we don't have that traversal checked properly, you're right. I guess we need another test for that.

pounard’s picture

Added the missing tests there #1318164: Adding traversal logic unit testing

pounard’s picture

Pushed the traversing logic test on my branch also. Removed the params array since it doesn't belong to this issue. Fixed some whitespace problems. Patch attached here as well.

pdrake’s picture

Status: Needs work » Needs review
StatusFileSize
new4.35 KB

I spent some time yesterday and today working on this - I think my patch approaches this a bit differently, but all of the tests pass (including the traversal logic test linked above).

pounard’s picture

I though of that approach as soon as I saw Crell's patch (I think that he wanted us to do exactly this), but it needs the parent to be a Context instance and not a ContextInterface instance, because the retrieveValue() is not part of the ContextInterface interface.

It's clearly efficient, but it's a technical choice that close some doors arround the API extension.

EDIT: Why are you cloning the parent and not passing it directly? Because else the value would be cached in the parent context, wrongly, obviously...

pounard’s picture

Following @pdrake's patch, did the same, but exposing the retrieveValue() exposing it into the public interface which allows to use the ContextInterface as parent and not the Context class directly.

I renamed retrieveValue() as findValue() which seems more explicit since it's a lookup. "Retrieve" and "get" are almost synonyms in the english language and naming this way is confusing.

In order to avoid the parent clone, I ensured that findValue() does not cache the value internally. This value is cached only in the getValue() method when getting back the value from findValue().

Also embeded the #1318356: Fix the wrong actual traversal logic algorithm and provide unit testing patch which allows to break the lookup algorithm as soon as possible.

Works fine and all tests pass as expected (thanks to pdrake).

I still do not like this method and would prefer to expose more atomic handler handling functions into the interface and build a generic retrieval method based on the interface only instead: this would be a lot cleaner in many ways.

EDIT: Did this patch over my own branch, will provide another one in 5.

pounard’s picture

Fixed patch attached.

pounard’s picture

Attached a fixed patch that does not changes the traversal logic, but keeps all the previous patch modifications. It removes the traversal logic testing from it since there's an updated patch elsewhere and I don't want to create a merge hell.

EDIT: Pushed on 1100198-handler-inheritance-pdrake-version new branch.

Crell’s picture

Status: Needs review » Needs work

I'm certain this needs to be updated at this point, after /core-mageddon and the stack-object branch went in.

pounard’s picture

Yes indeed, will do in the next few day, maybe tomorrow.

pounard’s picture

Fixed everything to match actual git state. All tests pass.

pounard’s picture

Status: Needs work » Needs review
Crell’s picture

Status: Needs review » Needs work
+++ b/core/includes/Drupal/Context/Context.php
@@ -104,6 +113,51 @@ class Context implements ContextInterface {
   /**
+   * Create a new handler instance from given info array.
+   *
+   * @param array $handlerInfo
+   *   Array containing the handler definition. Keys can be:
+   *     - 'class' : (string) The handler class and only mandatory key.
+   *     - 'params' : (mixed) First parameter to pass to the constructor.
+   * ¶
+   * @return Drupal\Context\Handler\HandlerInterface
+   *
+   * @throws ContextException
+   *   If the given handler info is invalid.
+   */

If the keys are always fixed at 2, why not break it up and pass in the class name and params as separate parameters? That would make this method a bit cleaner.

+++ b/core/includes/Drupal/Context/Context.php
@@ -104,6 +113,51 @@ class Context implements ContextInterface {
+   * Get the handler at the given context key. This is method plays a role for
+   * handler inheritance within contextes.

The language here is b0rked. It also is too long for the short-description. The second sentence should be its own separate paragraph and cleaned up to be more descriptive than just "this does stuff".

+++ b/core/includes/Drupal/Context/Context.php
@@ -104,6 +113,51 @@ class Context implements ContextInterface {
+   *   A valid parent, or NULL if none found.

Not a parent. A valid handler object.

+++ b/core/includes/Drupal/Context/ContextInterface.php
@@ -35,6 +35,13 @@ interface ContextInterface {
   /**
+   * Does this object have a parent context.
+   *
+   * @return bool
+   */
+  public function hasParent();

Why is this public? It's only being used internally. It can easily be just protected.

I'll admit, I don't fully comprehend at the moment how or why this refactoring would fix the caching issue. :-) I'll run actual tests on it shortly.

pounard’s picture

I made the hasParent() method public because it can safely be used for any other mean. Doesn't hurt since the getter and the setter already are public.

pounard’s picture

It does because the values are retrieved by the findValue() method using the context passed as parameter, so the handler that runs will run the given context (not the one that run the method). The basics here is that if no value and no handler stop is found, then it run the exact same findValue() function over the parent, giving itself as context to run with.

The findValue() does not cache anything then, because when called over the parent, it can return a value relative to another context than itself, so we would cache invalid values. Caching is only done in the getValue() accessor.

Another path to follow to solve the issue would have been to be able to get the handler from the parent, and not get the value from the parent, but the implementation I first did was more intensive because it triggered more handler lookups, so I stayed on pdrake's version of the algorithm.

I still would prefer the other way arround, but this one works. I just really don't like dealing with the context as parameter of findValue().

Crell’s picture

OK, I get it now. The difference is that rather than caching all values that get requested, the only value that is cached at all is the initial value requested. Any intermediary values are not cached.

That does neatly solve the problem if incorrect values being cached, but it solves it by not caching intermediary values at all. :-) I fear the performance impact of that.

However, that is a step up from where we are now, I think. So I'm going to make the changes in #66, and then merge this. Then we can open a new follow-up issue to figure out how to sanely cache intermediary values. Stand by!

Crell’s picture

Status: Needs work » Fixed

#69 has been done.

webchick’s picture

Project: WSCCI » Drupal core
Version: » 8.x-dev
Component: Code » wscci

Per catch, and blessed by Larry, moving this and all other WSCCI issues to the Drupal core queue.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Add link to unit test that demonstrates the problem.