Splitting off from #1100188-14: Cloning of context objects and following...
Whatever the internals of the context object, it will return something. There are two things it could return: A primitive lookup key or an object. For example, $context['node'] could return:
1) A fully loaded node object.
2) A small array containing nid and language (or similar), and then the caller is responsible for calling node_load() on that.
Pros for option 1:
- It's really simple on the caller. You get back an object (if what you're requesting is an object in the first place) and you can use it immediately, even treating $context['node'] as a variable itself.
- Unit testing and mocking of context is really easy. You simply load the node object you want to take out, assign it to the right context key, and you're done. Piece of cake.
Cons for option 1:
- Because objects pass by handle in PHP, what you get back is a reference to the object in the EntityController class's static cache. Change that object and you're now changing global state outside of your scope, which we want to prevent.
- We would still need to reduce the context information to primitives anyway in order to use it as a cache key, which we want to do for many reasons, including Varnish caching and smart cache invalidation. #636454: Cache tag support
Pros for option 2:
- We don't carry around or cache large objects, only small primitives.
- It's really easy to turn a context object into a cache config key.
Cons for option 2:
- It's more work for the caller. They need to pull back $context['nid'] and $context['language'] and call node_load() themselves.
- It's harder to unit test and mock. We'd have to mock node_load() too, or else move that to an injected service object that gets called separately. (That may be a good idea anyway, granted.) (Note that "fill in the database before running the test" like we do now is not unit testing, it's slow-arse integration testing that we want to minimize.)
I'm not sure which trade-off is good here. Discuss. :-)
| Comment | File | Size | Author |
|---|---|---|---|
| #46 | butler_context_values_1177246_46.patch | 302 bytes | Stalski |
| #43 | butler_context_values_1177246_43.patch | 9.86 KB | Stalski |
| #40 | butler_1177246_40.patch | 11.45 KB | Stalski |
| #38 | butler_1177246_38.patch | 11.11 KB | Stalski |
| #34 | 1177246-context-value.patch | 3.65 KB | xtfer |
Comments
Comment #1
pounardKeys would definitely be better for cache keys and static cache purposes, as you said, I definitely stick on that side.
But I'd propose a 3rd option:
$context['nid']but it returns a "context helper", with two functions:getRawValue()andgetNode()loadEntityNode()that does thenode_load()itself and returns itPros:
Cons:
Example:
Comment #2
catchCouple of things here.
There are likely to be some context objects that you need to know the identifier but don't always need the full object. A classic example would be language - while we currently have global $language, 99% of the time in core, the only thing that gets accessed is the language code - $language->language for use in caching keys.
So there are three parts to this:
1. Determining the current context for a particular key - in some cases (maybe all?) this may itself be done when that's demanded (i.e. find me the current organic group for this node does not need to be front loaded).
2. Finding the current organic group based on a node should give you a node ID - at least entityFieldQuery + entity_load() enforce that kind of pattern and it's a good one. The node ID (or langcode or user ID) may be enough to generate a cache key just by itself - if that's all that's needed, then it'd be great to just do that. However it's possible cache key generation happens via methods on the context object, not by requesting context keys and figuring it out.
3. Lastly, there is requesting the actual object that the context handler is referring to.
If we decide #2 is only valuable for caching (and possibly very rare edge cases), then as long as the context system has helpers to generate cache keys that doesn't need to be available to code that requests the context object, so returning objects sounds good (although I'd still like to store only keys internally.
On this, what happens if a function that requests context needs to trigger a node_save() or any other crud operation? The more things request context, the more likely this is going to become. Custom code, triggers, actions all do things like this all the time. In that case, you very much do not want later functions in the same page to be operating with a stale node object.
Currently changing objects without then _save()ing them is unsupported even though it's not enforced. This is part of #154859: Document that cached entity objects are not necessarily cloned any more, I'd want to see very, very detailed performance numbers on clone before going down that route, especially memory. Either way I'm not sure that very old discussion, which is nothing new, should affect decisions made here.
Comment #3
catchI originally posted that comment on the duplicate issue that Crell opened, hadn't seen pounard's when I wrote it.
Being able to get both the key and the value may be worth the extra method for callers. However I'm not sure why we need a special getNode() method, isn't something like this enough?
The magic __get() is supposed to be expensive, but I doubt we are going to be requesting context keys and values hundreds or thousands of times per request.
Comment #4
catchJust to clarify on clone:
1. the context class does something like $context['node'] = (clone) node_load($nid); and static caches this itself.
2. Some function reacting to a rule or trigger does $node->foo = 'bar'; node_save($node);
3. node_save($node); resets the static cache, but it will not reset anything for $context['node']; so now it's stale.
4. More unlikely, say another function does the same as #2, it's going to save a $node with stale data.
That seems like as likely a scenario as people doing $node = node_load($nid); $node->foo = 'bar'; then not saving it.
If nodes aren't a great example, presumably $_SESSION is going to be available as context at least potentially, and functions need to set or unset stuff in $_SESSION regularly.
But if we don't cache any objects in the context class, then this whole argument devolves to the various APIs that provide contexts, and that seems good to me.
Comment #5
pounard@#3 Your code is pretty much what I explained upper, do not store obects, just set a proxy method to the real object factory.
The __get() method is a bit expensive, but it's not if you don't call it a thousand time. And by the way, PHP 5.3 did serious optimizations with function calling and stuff, so that tends to be not that true anymore (PHP 5.2 is officially unsupported now). What I mean here is that even if Drupal officially supports PHP 5.3, a lot of hosting will probably switch to PHP 5.3 so the performances should be targetted for this environment.
@#2 +1 for language, who needs the full metadata anyway? 99% of time you only need the language code.
That's pretty much I'd propose the middle solution, that pretty much looks like your code. Plus, even if you don't have the helpers for spawning the node, considering that you are actually fetching the 'nid' key means that you KNOW what you are fetching and you know how to load it.
Comment #6
Crell commentedMy concern with having an id and value method/property of every context key is two fold:
1) There's no guarantee that the context value you're accessing is an object with obvious ID. You could be looking for "current node", but could also be looking for "value of the http accepted compression format header", which is always going to be a string. (Obviously you would never want to a cache key based on that, but you could want to access it.) Or, less crazy example, $context['language'] itself is just a string, not an object. I don't know that ->id and ->value makes sense as a universal pattern.
2) Some objects may need multiple IDs to load. Catch likes to use the nid-and-language example on a multi-lingual site. What does ->id return then?
3) How would you setup a mock value when overriding? Currently, the syntax is something like:
You can also set an alternate handler rather than a literal value.
So I'm not sure how that would work if you're tracking both lookup key and value. Which one do you specify when overriding?
Comment #7
pounard#6 for 1) and 2) The getter method seems transparent enough to allow something natural, if I take you code sample and try to adapt it with catch's suggestion:
Still for 1) The syntax, even with the
valueaccessor seems straighforward enough, just need to be documented.For 3) The mockup function would just be a deep clone of current context array object. I don't need to write a sample here because it's quite basic, but in this case
clonekeyword is probably best thanmock()function.For 1), about the context propagation (language at node load time), you have three solutions:
__clonetime, and this same value object getter use it for loading.NodeController::load($conditions, $context = NULL)method.Sample:
I really don't like back ref, that's what made Views 2 almost impossible to debug, but in this case, if the code is clean and the design remains simple and is put on some diagrams for easy comprenhension, I don't see why we shouldn't use them.
This method ensure context propagation in the lower level API's and that's good!
To avoid backref, we could imagine something else:
Then the NodeController could itself have:
Comment #8
Crell commentedIt's going to take me a bit to fully digest #7, but I think there may be something in there. I was hoping to avoid having both a context handler object AND a context value object, though. That is, as noted above, extra complexity (more places for things to go wrong, more bugs, etc.).
One side note, the reason for using ArrayAccess is so that we can do $context['http:post:var']; we determined in Copenhagen that the namespacing potential there is very powerful, especially when you can attach a single handler to http:post. But you obviously can't do $context->http:post.
Given that we would still have a context handler (eg, if you don't specify $c['node'] directly, a handler fires that will examine $context['http:get:q'] for a node/nid pattern, grab that nid, and load that, and then potentially cache it), where would you split the functionality between the handler and the value wrapper? That seems unclear to me.
Comment #9
pounardBasically:
Is exactly the same as:
It's just a convention.
I'd bet that the right choice is the one that allows the more flexibility technically, but otherwise it's exactly the same. You can even implement both on the same object!
Value objects only allows to embed business method inside the context itself and provide a proxy pattern toward the business factories, but nothing mandatory here, it's not the more important part about it. I think the more important part is the fact that the context could propagate to the business stuff such as entity controllers themselves.
Comment #10
catchLoading a node is not dependent on language - this is the case whether using entity-level translation (core translation module) or field translation.
So you would have $context['node']; and $context['language'] - two completely separate keys, not one combined.
Language is something that is determined either by querying a list of nodes restricted to a particular language or languages (Drupal 6 style), or rendering fields based on a langcode, so http://api.drupal.org/api/drupal/modules--node--node.module/function/nod... (Drupal 7 core style, which now co-exists with D6 methods).
In the former case the query will be affected based on language context. In the second example, node_view() requires language context (or optionally requires it depending on whether locale and other modules are enabled).
That doesn't mean there's a no valid use case for $id being an array, but node + language isn't it.
Comment #11
pounardYes I see, my point was more about "passing the context to the factory" in a generic way (not only for language itself, it could be for anything else, maybe some context restrictions or query rewrite).
Comment #12
Crell commented@catch: Ah, OK, I misunderstood you then since language and node are frequently mentioned together.
Comment #13
jherencia commentedSubscribing...
Comment #14
pounardJust saying, but maybe at the early beginning it should remain simple, and should not provide any of business functions except __get() and __set() (or ArrayObject implementation, doesn't really matter). Let the consumer do whatever it has to do with the raw scalar values inside the context.
It would allow having a partial working context system really quickly, and business matter can be added later what it worth. Business will be useful only at the two "context usage endpoints":
Comment #15
dmitrig01 commentedHrm. I'm not really sure I really see either of the cons of option 1 as that significant:
- Because objects pass by handle in PHP, what you get back is a reference to the object in the EntityController class's static cache. Change that object and you're now changing global state outside of your scope, which we want to prevent.Just return a clone of EntityController's static cache. Problem solved.
- We would still need to reduce the context information to primitives anyway in order to use it as a cache key, which we want to do for many reasons, including Varnish caching and smart cache invalidation.The caching layer, I assume, will be invisible to people writing plugins, at least for the most part. Thus, it should be fairly simple to keep track of the id of the object requested internally, and afterwards, when accumulating keys to cache, ask each context handler thingy for the id by which they want to be referenced. Then, they can take this id to retrieve the full object.
I do, however, think that option 2 does have one advantage: when only the node id is needed, it's a waste to load the whole node. However, there are two possible ways I see around that. The first is to have it be possible to get $context['node'] and $context['node:id']. The second is something like this (this would probably be implemented on the entity level, and mitigates the first con mentioned of option 1):
Comment #16
pounardWhatever is the context, a business object such as an entity shouldn't be contextually modified. Clones are bad.
Comment #17
eaton commentedThat applies to almost anything that's "contextual" though. The user object shouldn't be modified directly, but it's still used for contextual decisions. If something appears in context, it's for use in decision-making, not for modification/editing. I don't think that issue should affect the choice of key vs. object...
Comment #18
dmitrig01 commentedIn a perfect world, clones may be bad, but in that perfect world, people wouldn't modify the objects returned by context. Unfortunately, we live in an imperfect world, which means that we need to solve these problems. In our imperfect world, putting a ->lock()/->unlock() method on an Entity object would significantly slow down all of Drupal, since all properties of all entities would need to be accessed with __get.
Comment #19
catchYou're not even attempting to justify why we should do that just stating 'we must' and making assumptions. If the context system stores keys then it's not that system's problem to deal with the clone or not clone issue. And no one has answered what should happen if a function that takes context triggers an entity save.
Comment #20
fagoThat's how we handle it everywhere in Drupal since d7, so why should we handle it different here or change it again?
Comment #21
Crell commentedcatch: I'm unclear which post you're responding to?
At the moment I am leaning toward saying keys (well, primitive values) only, as it simplifies a great deal in our implementation. However, the downside of that approach is that a context consumer (most likely a plugin) would then need to use that key to load the object it needs. That means we are immediately breaking encapsulation, since we cannot simply inject a fake node into a consumer. We have to go create the node first, then pass in the fake node's key for it to node_load(). That eliminates half the benefit of forcing all input through this channel.
The natural solution to that would be a service locator, as a few others have suggested elsewhere. That's a very nice solution with a long history of existing implementations, but is also another moving part I was hoping to avoid for the time being.
Now, on a conceptual level I rather like the service locator approach. It's clean, flexible, powerful, extensible, all that good stuff. I would actually rather like it if we end up there eventually, as that would allow us to move the vast majority of Drupal's useful code over to true unit tests. However, it's also yet another moving part that we have to think about, implement, test, and get working throughout Drupal. It also results in a more verbose usage; whether that's too verbose for people to get comfortable with I'm not sure.
Just cloning the node returned from whatever load mechanism may help with the side effect issue, but it does nothing to solve the caching and Varnish ID issues. So that's not a complete solution either way.
So I think that's where we stand currently. If I'm missing a piece, please speak up. :-)
Comment #22
dmitrig01 commentedWhat would the downside of the approach I outlined in #15 be? When it's mocked, a different node object, filled out with fake properties, can be provided, which greatly simplifies it. It also provides a way for having either keys or actual node ids work.
If we wanted to get really complex, we could introduce some sort of tracker on the offsetGet method that would track whether the node that was gotten ended up filled out, etc., but that may be overcomplicating it.
Comment #23
catch@Crell I was referring to Dmitri's post immediately above mine insisting we should always do something wrong (clone everything) because sometimes people get things wrong. I should try to get out of the habit of responding to issue from my phone but I am getting fed up with that discussion both here and elsewhere.
This is closer to what I'm after, but I agree it's overly verbose, and I'm not sure it needs to be so verbose to get the same effect.
If we have a service locator, then what you've written doesn't look that far of what entity_load() does already - currently entity_get_info() has all kinds of dependencies on the module system but that doesn't necessarily need to be the case eventually (or for other context-providing stuff) - the config initiative and general clean-up should allow the entity info itself to be mocked or injected including which class is used to load it. In that case everything would have a dependency on the configuration system (and possibly plugin system if and when that exists), but those could be used to handle control otherwise.
Another option would be the context system allowing dependency injection directly:
So to get the id:
To get the full node object:
When you need dependency injection, use a setter:
getObject uses whichever service is set.
Internally it looks something like:
If we could figure that out cleanly, then the user of the $context object isn't doing much more than accessing $context->node - the complexity is handled internally, and we provide methods to mess with it when people need to.
Comment #24
Crell commentedIn today's WSCCI meeting in IRC, we kicked this question around for a good 90 minutes. :-) What we basically concluded was:
1) We can't perfectly lock down all context values, especially where objects are concerned, like entities.
2) We can design the system to discourage people from mucking around in context, however.
3) We're going to assume that the Entity system is going to get improved in such a way that all loaded entities become classed objects. (If it doesn't, then the Entity system has bigger problems than we're dealing with here anyway.)
We therefore came up with the following plan, which we should try to implement and see what happens. :-) Basically it is along the same general lines as pounard in #7 and dmitrig01 in #15.
Proposal:
There are two things that can be returned from a context handler, or assigned to a context key explicitly:
That's it. Nothing else.
ContextValueInterface includes a method that will return a load key for that object. It is up to the object to return something meaningful. The load key is the value by which we can load that object later, such as nid, view machine name, etc.
Because it's an interface, it can be implemented by any class or base class.
Example:
This offers the following benefits:
Note that this still allows for the possibility of a caller to modify a context value object, and for a caller to call node_save() (or equivalent) on an object tracked by context. At present, the answer to the second is "don't do that, stupid!". For the former, we want to try and block that but don't know how yet. We will revisit that question later.
We also considered forcing all primitives to be wrapped in an object for consistency. However, that may cause other problems in terms of both API /DX and performance. For the time being we decided to table that. That does mean that a consumer does need to know if the key is it requesting is an object or a primitive.
Thus, when a consumer accesses content:
When turning a context object into a set of keys that can be used to regenerate it (which we need for caching, ESI, etc.), we can do something like:
Now we need someone to volunteer to try and implement this thing. :-) Any takers?
Comment #25
sunComment #26
xtfer commentedIts a bit OT, but in the process of building a registry module for D6 (http://drupal.org/project/registry), I've been paying close attention to developments in this thread. I expect the Alpha 3 release to have some semblance with the proposal above (the current Alpha 2 is a bit different), at least as far as the kinds of properties to be set and how to access them, and I mention it in case anyone is interested.
Comment #27
xtfer commentedIs there any reason why getValue and getKey cant be implemented in the same class? For example...
This keeps the key and value implementation consistent. Otherwise we end up with contextKey implemented on the objects themselves, while the handlers requiring an extra class definition.
Comment #28
Crell commentedxtfer: Two reasons.
1) Performance. That's an extra function layer we have to go through every time we want to access the value.
2) DX. Having some context keys return a useful value and others return an object from which you need to extract a useful value, and that not being cleanly delineated in the context key itself, makes the code uglier and harder to work with.
Comment #29
xtfer commentedOkay, I understand 1:performance, however the performance overhead would potentially be quite small, and you would also be able start addressing concerns about modifying context values, loading objects not properly classed (like the current entity system), or situations where more information was needed about a string context (context is a string, but that string is part of a larger object/array)? Also, you could potentially use
ContextHandlerExample::contextKey()without instantiating ContextHandlerExample, which would be quicker.In the case of 2:DX, aren't you going to run into that anyway? contextKey implemented on Node or ContextHandlerNode still has to return node_load($id) either way (probably), and under the proposal in #24, the key and value accessors will be split across the handler and the object, which seems more complicated to me.
Additionally, it would be useful if Butler ever gets a workable 7.x release, as its unlikely that 7.x core will be reworked to accomodate the class implementation (my comments stem from an attempt to write the contextKey implementation into the Butler 7.x branch and having no reliable way to load objects like nodes for testing).
Comment #30
Crell commentedIn practice I'm not sure we'll be able to do a full backport, frankly. It's looking less viable these days. :-( Either way, the needs of D8 should be the primary focus, not backportability. Make it work first before we make it backport. :-)
I don't know why you'd want to call ContextHandlerExample::contextKey() statically. In fact if it's a non-static method you can't. (At least not without generating an E_STRICT, because really, don't do that.)
If it's Node itself that is returned, then you can do simply $context['node']->title() to get the node object's title (for instance). Rather than having to do $context['node']->getValue()->title(), whereas you wouldn't need the getValue() if it's a string. That makes it inconsistent.
Comment #31
xtfer commentedI think Im behind the times on static methods... I was under the impression they were faster but having done more research this is probably not the case, so ignore that.
Comment #32
xtfer commentedHere's a patch which basically implements the functionality described in #24...
1) interface ContextValueInterface
2) A new method, contextKeys(), on DrupalContextInterface, which returns an array of keys for caching etc.
Additionally, there is an extra sub-module Butler Examples, to hold development related code not part of Butler itself. This contains an object implementing ContextValueInterface, for testing.
This is also available in a sandbox repository at http://drupal.org/sandbox/xtfer/1254810.
Comment #33
Crell commentedWoah, thanks, xtfer!
I think the intent of this method isn't quite right. Rarely would we need to know the value of every context key in a given context object. What we may need to know is the value of all context keys used in a given scope. For that, we'd probably need to track if a given key was accessed, and at what stack level. I think it would make sense that if we access key foo, then it is considered "used" by that context object and all of its ancestors. Then we can retrieve that data.
I wasn't intending to go quite that far in this patch, but if you want to include that logic I'm OK with it.
It's probably not good to cache that in a property since a context object can still be read from, and usedKeys() (which this method should really be named) should be idempotent.
I don't believe we need an examples module. We're hopefully moving this code into a core sandbox soon. Instead, this code should go in the existing test suite to make sure it all works properly.
Uh, this should cause a parse error. :-)
There's a fair bit of stray whitespace in the patch, too.
Comment #34
xtfer commentedWhoops! That's sloppy. I've cleaned up the patch.
Given that keys are normally set when they are accessed via
$context->offsetGet($offset), for any given request only those keys required should be set, which is roughly what the implementation in the patch at #32 assumes.Variants on this...
a) Log the $offset in offsetGet then iterate over them in
usedKeys(), orb) Set the usedKey record in offsetGet directly and merely return the extant values in
usedKeys, orc) Assume that anything set has been accessed, and just iterate over that (i.e. #32).
I assume that (c) would be faster during the construction of the $context object, and (b) quicker when returning usedKeys. (a) is only faster if, for some reason, there are context values which aren't required for the request (but that begs the question what they are doing in context). (a) also has a downside that the key must be unset if the value is removed.
The exception I can see here is for context values set directly (i.e. via
$context->offsetSet($offset)), do we assume that this has now been "accessed", or do we just ignore it till someone uses it – the problem here being instances where the object is used then set. Do we accomodate that (c), or filter them out till referenced (a or b).Anyway, this patch implements (a) for comparison. Its missing the "scope" concept, it merely returns the current contexts keys (not its parents).
Comment #35
xtfer commentedIf thats the approach you were thinking of, let me know and I'll add tests.
Comment #36
Crell commentedYes, that's more what we're looking for, I think, approach A. We cannot assume that anything with a value has been used, since it may be set explicitly.
Although... Now that I think about it, I've had this discussion with Sam and David Strauss before. We cannot rely on "this key has been used" to be the deciding factor on whether or not it's relevant. A given block/plugin/scope/whatever may only use a given key under some circumstances (eg, check the nid only if the user is not in the admin role). So we don't just need "give me what's been used", we need "give me these values that I know are relevant, just trust me." Or maybe both. Hm.
Let's go ahead with this for now and add explicit fetching in a later patch. We still need tests, and there's still some whitespace in there. :-)
Comment #37
xtfer commentedYes, I'd been considering that as well, though that shouldn't be too hard. Array of values? The caller could do its own filtering in any case. Tests forthcoming,,,
Comment #38
Stalski commentedI also gave it a try, I did not see these new posts before.
I think the patches could be merged, since I did not do anything for the cached keys yet.
In this patch are included:
- A example object that is fetched from context through handler
- Overrides of the example object
- Problem with context requests not knowing that the contextKey was updated.
This is done in a page callback on butler-objects/1 and one simpletest function is added.
Comment #39
attiks commentedsub
Comment #40
Stalski commentedThe patch in #38 shows we might have a problem with requesting offset keys within context (keys).
I've added a simpletest function that exposes this problem in level 2 where a "id" property is requested while the id returned from level 1. So currently, if we don't explicitly set the offset key for a context key (E.g. example:foo), than we will get the wrong ID.
The question here is: Do we need to do load the properties of an object that is put into context?
Larry told me to create an issue for it, but this is a better place imo. The patch included is improved for testing purposes.
Thx for feedback.
Comment #41
Crell commentedI think the patch in #34 is the right way forward here. It's more complete, but just needs tests. Hopefully either xtfer or Stalski can write those in the next day or so, as then we can get this committed before Friday's code sprint. :-)
Stalski did find and point out to me a possible issue with the way overrides interact with nesting. That's what he's talking about above. That is irrelevant to the question of the value we return itself, so I do believe that should be a separate issue. Stalski, please open a new ticket for it.
Also, xtfer, you should re-pull your sandbox. I committed a few patches late last week, including fixing the "oh yeah, we don't actually stack properly" bug. :-)
Comment #42
Stalski commentedI'll give it a try tomorrow morning. I do have to say I don't see much difference except that the patch in 40 includes tests.
So I'll take another look and try to understand what you meant. Probably the patch I tried is a too big a step.
Comment #43
Stalski commentedI had the whitespaces as well, those should be fixed. Also added a closing ) in an if clause.
I added a implementation of the contextValueInterface as well as a handler that can take a parameter to mimic the loading of a example instance (E.g. load from url)
Most basic tests are included. I hope you can work with that tomorrow, have fun at the sprint ;)
Comment #44
Stalski commented@crell, cross-linking the promised issue: #1260320: Problem when requesting a property of an object from overridden context
Comment #45
Crell commentedI was able to fix up the tests a bit in #44 and get it committed. The code is now all merged into the 7.x-1.x branch. Thanks guys!
Comment #46
Stalski commentedIncludes a patch that fixes the changed tests.
Comment #47
Crell commentedOopsies. Committed #46. Thanks!
Comment #48
xtfer commentedGood stuff. Thanks for working through the tests Stalski.