Replacing placeholder tokens like %user, [title], %login-url and so on with the appropriate values is an extremely common task for both core and contrib modules. At present, the code to both provide proper replacement tokens and to parse text for them is reimplemented every time. Sending out user email is the most visible place for it in core.

Providing a central function for token replacement, and a standard hook for modules to expose custom replacement tokens, is a good thing. I wrote token.module to do that, and it's already required or recommended by a number of modules including the next revision of ecommerce.

The attached patch adds a token.inc to drupal's includes directory and makes it part of DRUPAL_BOOTSTRAP_FULL.

To apply token replacement to a chunk of text:

token_replace($original, $type = 'global', $object = NULL, $prefix = '[', $suffix = ']')

$original is the source text to perform substitutions on: it can be either a simple string, or an array of multiple strings.

$type and $object are to be used when performing substitution based on a particular Drupal object. Replacing tokens in an email with values from a particular user account, or replacing tokens in a path alias pattern with values from the node being aliased, are two examples.

$type should contain the general object type (node, comment, user, etc.) while $object should contain the object itself.

$prefix and $suffix can be used to override the default token style. For example, to replace tokens using %this style, pass in '%' and '' for the $prefix and $suffix values. Note that passing in a prefix but NOT suffix value can lead to false positives if two tokens are named in a similar fashion (%node_term and %node_term_id, for example).

Tokens can be exposed by core and contrib modules using two hooks: hook_token_values and hook_token_list.

hook_token_values($type, $object = NULL)
This function should return a keyed array of placeholders, and their replacement values. $type contains the current context -- 'node', 'user', 'global', etc. $object contains the specific node, user, etc. that should be used as the basis for the replacements. *Only* generate and return replacement tokens when $type is something that your module can really deal with. That helps keep things speedy and avoid needlessly searching for jillions of replacement tokens. For example:

function my_user_token_values($type, $object = NULL) {
  if ($type == 'user'):
    $user = $object;
    $tokens['name']      = $user->name;
    $tokens['mail']      = $user->mail;
    return $tokens;
  }
}

hook_token_list($type = 'all')
This function is used to provide help and inline documentation for all of the possible replacement tokens.

As with hook_token_values, $type indicates the context that token help is being generated for. Unlike hook_token_values however, you should show ALL tokens at the same time if $type is 'all'. As such, the help text should be keyed by the $type context your module will use when doing the actual replacements. For example:

function my_user_token_list($type = 'all') {
  if ($type == 'user' || $type == 'all'):
    $tokens['user']['name']      = t("The user's name");
    $tokens['user']['mail']      = t("The user's email address");
    return $tokens;
  }
}

Please note that the currently attached patch does not expose any tokens. That code should be added to user.module, node.module, system.module, book.module, taxonomy.module, and so on.

Feedback on the implementation details is much appreciated. It's working very well as a contrib module at the moment, but in core it will almost certainly deserve some additional polishing.

CommentFileSizeAuthor
#259 token.test.patch2.88 KBwebchick
#254 node-tokens-dsm.patch2.01 KBJohn Morahan
#251 node-tokens-dsm.patch795 bytesJohn Morahan
#243 token-in-core.patch68.75 KBeaton
#242 token-in-core.patch66.86 KBeaton
#240 token-in-core.patch76.98 KBeaton
#232 token-in-core.patch65.28 KBeaton
#221 token-in-core.patch63.59 KBeaton
#219 token-in-core.patch63.07 KBeaton
#215 token-in-core.patch62.48 KBeaton
#207 token-in-core.patch62.38 KBeaton
#203 token_d7.patch53.63 KBfago
#200 token-in-core.patch54 KBeaton
#198 token-in-core.patch53.95 KBeaton
#188 token-in-core.patch53.12 KBeaton
#187 token-in-core.patch53.16 KBeaton
#183 token-in-core.patch51.51 KBeaton
#175 token-in-core.patch54 KBeaton
#173 token-in-core.patch48.95 KBeaton
#166 token-actions.patch6.07 KBeaton
#164 token-in-core.patch47.88 KBeaton
#160 token-in-core.patch42.44 KBeaton
#159 token-in-core.patch42.44 KBeaton
#157 token-in-core.patch42.34 KBeaton
#156 tokin.zip2.74 KBeaton
#155 token-in-core.patch42.91 KBeaton
#155 token-test.jpg53.33 KBeaton
#154 token-in-core.patch43.24 KBeaton
#123 token-in-core.patch35.23 KBeaton
#91 filter-line-break-tokens.patch724 bytescaktux
#74 token_113614_74.patch31.02 KBpaul.lovvik
#57 token_proposal.patch1.86 KBchx
#56 token_proposal.patch2.7 KBchx
#52 token_proposal.patch3.01 KBchx
#51 token_proposal.patch2.72 KBchx
#50 token_proposal.patch2.7 KBchx
#36 drupal_tokens.patch41.49 KBrecidive
#24 token_24.patch42.06 KBpaul.lovvik
#7 token.inc_.txt7.1 KBeaton
token_0.patch6.14 KBeaton
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

webchick’s picture

huge +1 for this. if we could integrate this into the menu system, it would provide all kinds of nifty opportunities.. like tracker/[user-uid] and such.

I need to make time to do a more thorough review.

eaton’s picture

If we're looking for good approaches to optimization, figuring out a more efficient way to do the prefix/suffix handling is one opportunity. Another is figuring a way to make the token domains/types more intuitive. Right now the actual token replacement and the token help generation require different data structures because of the domains.

One final possibility is a way for tokens to be dynamically calculated. In other words, rather than hard-coding the generation of [node-created-YYYY], the ability to do [node-created-nnn], where nnn can be any PHP date formatting string. That would introduce some of the same weirdness that the Drupal 4 and 5 menu system has, though, with both 'cached' and 'uncached' items.

In any case. As it stands NOW the system works. Additional improvements are also possible.

Dries’s picture

At first glance, this looks like over-engineered. A token system should operate on a string with tokens and should have no knowledge of object types. I'd start by eliminating all the "token types", and by getting rid of "token styles".

Replacing tokens in a string is EASY. All we need is a simple convention of what the format of these tokens should be (i.e. %token).

The only challenge I can think of, is having a registry of globally available tokens (i.e. the site name). Having a registry of tokens is a much more general problem. Template based theme system could take advantage of that too.

Another problem with this, IMO, that it isn't scalable. There are potentially a bazillion tokens, and we can't use a polling system for that. We'd end up using 1% of the available tokens, but we'd still pay a performance penalty for generating all those tokens.

You know, the more I think about this, the more I'm leaning towards a (secure) 'PHP mode' where people can pull in whatever custom code/tokens they have.

I have to think about this some more, but right now, I honestly thing this is taking the wrong approach.

eaton’s picture

Another problem with this, IMO, that it isn't scalable. There are potentially a bazillion tokens, and we can't use a polling system for that. We'd end up using 1% of the available tokens, but we'd still pay a performance penalty for generating all those tokens.

This attempts to solve it by only polling for the tokens that are necessary in the particular domain/type. A module that exposes a buch of node-related tokens (number of views, title, etc) won't generate its tokens if token_replace($string, 'user') is called. The token system doesn't actually have any knowledge of the objects in question; it just uses that as a limiting mechanism.

Your'e right, though -- the whole problem this solves is the central registry of tokens. That's the only purpose of this include file. The token_replace() function is just a convenience, something we essentially get for free as long as we're maintaining the registry.

As I mentioned in #drupal -- If 100% of the code in the above patch is scrapped and the problem is solved in another way, I'll be happy. It's something that needs to be examined, though, as we're seeing an explosion in modules implementing this functionality with no way to share the important token values.

quicksketch’s picture

I generally like this approach. I can think of dozens of places where this is applicable. The menu system is an obvious candidate. Taxonomy and profiles could offer a lot of useful tokens also. Outside of core, pathauto, webform, ecommerce, and more implement their own token systems.

Yes, string replacement is really easy. But often finding what variables are created by other modules is not an easy task. Having them displayed just by enabling the module makes things easier for both administrators and developers.

Some more positives:
- Code does not affect locations that don't use tokens (processing time-wise, after the file include)
- Retrieving tokens is (usually) a purely PHP procedure, no calls to the database (until Eaton writes custom_tokens of course :D)
- Tokens are generally an administration-only tool. For example Pathauto only needs tokens when it's generating the URLs, not after they've been created.
- A central code location will immediately cut out repetitive code from many modules
- Convenience, naturally this is the biggest advantage

Negatives:
- Standardization of token format would be preferable if possible. Like Dries indicates, I'd like to see ONLY %token or [token] everywhere instead of several different formats. I think that flexibility is a carryover from token.module where Eaton was trying to make it a full replacement for all existing token implementations.

Patch specific, token_node.inc and token_user.inc would need to be removed before committing :D

eaton’s picture

- Standardization of token format would be preferable if possible. Like Dries indicates, I'd like to see ONLY %token or [token] everywhere instead of several different formats. I think that flexibility is a carryover from token.module where Eaton was trying to make it a full replacement for all existing token implementations.

Indeed. I would argue, though, that the %format common in core is not useful in many cases, where tokens like %title and %title-truncated (for example) collide in unhappy ways. %surrounding% tokens is more robust.

Agreed, though, that if this were to go into core we could almost certainly eliminate that prefix/suffix provision and just tell everyone to standardize.

eaton’s picture

Status: Needs work » Needs review
FileSize
7.1 KB

I've attached the latest version of token.inc. More and more big-name modules are either supporting it or listing it as a requirement -- the ecommerce package, organic groups, etc. Obviously, that doesn't mean that we should add it to core 'just because'. It just indicates that there is a strong need for a centralized solution to this problem, and much to be gained by allowing modules to share their tokens rather than reimplementing each time.

Concerns have been raised about DB-intensive token operations -- ie, a token that builds a list of all nodes, or something crazy like that.

  • First, tokens are best used during operations like the building of an email message (information about a user's purchase, a notification about a password change), the saving of a node (auto-generating a title from other node data) and so on. The occasional hits from, say, loading a vocabulary record or doing a user_load() to get full token data is negligable in those contexts. using them in the inner loop of a menu tree building operation isn't
  • Second, expensive token operations should be discouraged. This is nothing new -- it's possible to put a hellaciously expensive query in hook_link_alter() and bog down every node listing page on the site. Educating developers about best practices is the solution.

There are several key places this can be used in core at the moment -- the most important is the user mails that get sent out. While it is theoretically true that we can simply implement our own str_replace code in each location, things like inserting user profile fields into the emails, or intelligently noting the number of user points they curently have, or things like that, can't be done. This token system is about as simple as it gets while still preserving the usefulness of a central token registry.

I think the usefulness of supporting multiple token prefixing schemes is nice to preserve; %this and [this] are the most common, but they're both necessary IMO.

Dries’s picture

Status: Needs review » Needs work

I still think this is over-engineered. That or I don't fully understand the purpose of this patch. Maybe if there was a patch that showed how it simplifies or improves certain aspects of core (or a contributed module) ...

webchick’s picture

I don't know that it could be said that this patch simplifies any code in core; as you've said, str_replace is pretty darn simple. ;)

What it does do is three things:

a) Eliminates duplicate code in contributed modules. For example, code like:

$variables = array('@username' => $account->name, .... );

Is found in user.module, pathauto.module, og.module, invite.module... Let's say at some point we decide to add a new core user propertly, like $account->show_name. Now, this variables array has to be updated again in user.module, pathauto.module, og.module, invite.module.....

2. There is currently no central method for sharing these placeholders / tokens between modules (apart from token.module itself). This is the real power of this patch. With it, my Organic Groups invite e-mails can reference a user's overall karma points given to them by the karma.module, and point to their private message inbox URL from the private_message.module... and so on. Currently, without token.module, the only way to do this would be putting in all kinds of hackish "if (module_exists())" calls in OG... bleh.

Centralizing these tokens makes far more sense in core (which is the one thing everyone's guaranteed to have installed. ;)) than in a contrib module. It makes for powerful interactions between modules, which is core's mission statement. :)

3. Minor but, it standardizes the way variables are used in e-mail templates and the like. With the t() changes in 5.x, you can pass variables as @something or !something, and it's confusing.

drewish’s picture

subscribing to this. i think the trick would be to implement the user mail messages using tokens.

moshe weitzman’s picture

subscribe ... IMO, there is nothing overengineered about token.module. it has been a super convenience for me when offerring email customization in og and in replies.module. further, autonodetitle uses it to great success. custom breadcrumb and custom pager use it. i think its usefulness is proven.

dww’s picture

note: the really tiny beginnings of this already happened via http://drupal.org/node/144859. see user.module's user_mail_tokens() function that i added as a result of that patch. it's nothing near as well engineered or useful or flexible as token.module would be, but it shows an example of how centralized token processing is a good move and how it could be made better.

dww’s picture

here's another example of an issue that'd be solved by token API: http://drupal.org/node/121267

moshe weitzman’s picture

I have submitted replies.module (a simple subscription module) for core consideration. It depends on this patch. See http://drupal.org/node/146291

Pancho’s picture

Version: 6.x-dev » 7.x-dev

Moving to D7 queue.

RobLoach’s picture

Subscribing... This is something that will really benefit customized user strings.

catch’s picture

Also subscribing. The functionality would be great, and it'd also make modules like pathauto easier to install for new users by eliminating the dependency.

RobLoach’s picture

One thing we'd have to consider when moving tokens/placeholders into core is all the current placeholders being used, like admin/user/settings, where it implements placeholders like !username, !site, etc.

dww’s picture

@RobLoach: see my comment #12 above... I'm not sure what you're trying to say -- those are all obvious placeholders that would be replaced (sorry, bad pun) by whatever happens in here. Those placeholders are part of the motivation for making this change in the first place...

eaton’s picture

After creating the issue and pushing hard for the adoption of token module, I'd like to say that I now have very serious reservations about a system like this going into core.

While it certainly serves a very useful purpose -- extracting bits of text for handy replacement tasks, and allowing other modules to expose their own bits to the system in a loosely coupled fashion -- I believe that Token module is slowly but surely becoming a monster. As more and more modules rely on it for more and more kinds of data, its original purpose -- simple string replacement -- is turning into 'generalized metadata extraction'. In other words, rather than capturing useful metadata about the objects in Drupal and the context of a given page/user/etc, module developers are encouraged to simply expose loads of tokens, with lots of wild variations in case people want to use them in various ways. This leads to overloading of the token namespace, and inverts the performance gains that the local cache of token data was originally intended to leverage.

I'm concerned that if a system like this were to go into core, it would only accelerate the process.

I would love to see the intended benefits of Token in core. Is there any way we can brainstorm about ways to prevent the tokenocolypse?

RobLoach’s picture

Concepts like context that were introduced by Panels have similar ideology as Tokens. The ability to choose a certain context from anywhere on the page, and then select content from that is similar to what we're trying to accomplish here.

@dww: Ahh, seems you beat me to it. I guess this is what happens when you comment on a patch without reading through the other comments in the issue queue.

Owen Barton’s picture

Subscribing...been thinking the same

eaton’s picture

I've been discussing this with a few more people, and i think a couple of key issues remain unaddressed by the current Token module.

  • Some sort of metadata needs to be captured for each token -- is it 'XSS clean' or is it raw? Is it complex HTML data, or is it pure text? This would allow modules like pathauto and custom breadcrumbs to only show relevant tokens, rather than spamming users with everything in the defined token-space
  • Language support is essential. Token should be able to generate values that reflect the current user's preferred language, or an explicitly passed in language
  • Performance testing is essential. Right now, there's a strong suspicion that the caching support in token.module is a net loss -- it consumes memory and makes for awkward logic more than it speeds things up. We need to figure out how to benchmark token related operations, and prune unnecessary cruft.
paul.lovvik’s picture

FileSize
42.06 KB

I created a new patch for which the token-specific hooks have been moved into their respective modules. Also there is a unit test that exercises the API.

Reading through these comments I see there is more work that needs to be done. I think turning this problem around might make more sense.

Currently each module identifies the tokens that it exposes through hook_token_values and the number of replacement tokens returned from the set of modules is limited by the token type and id. It may be more sensible to extract the required tokens from the string and request those tokens specifically.

Of course that would be easily done via a database table with a token name/value mapping for static strings. For values that have to be calculated there would need to be associated PHP code. The database could be reloaded or refreshed through a hook so it would be easy for modules to add tokens to the system.

I'll continue working on it, but wanted to share the work I have done so far.

eaton’s picture

Paul, thanks for the awesome work on this. For those following the thread, Greggles and I have been conversing with Paul as he worked on the patch to sync things with what we've learned about Token over the past year or so.

Currently each module identifies the tokens that it exposes through hook_token_values and the number of replacement tokens returned from the set of modules is limited by the token type and id. It may be more sensible to extract the required tokens from the string and request those tokens specifically.

Of course that would be easily done via a database table with a token name/value mapping for static strings. For values that have to be calculated there would need to be associated PHP code. The database could be reloaded or refreshed through a hook so it would be easy for modules to add tokens to the system.

My only concern is the potential overhead with a system like that. Right now, the blindly-generate-them-all approach is at least very simple. Performance problems come from the caching mechanism chewing RAM when lots of objects are tokenized in one page-load, and when an inefficient module does something expensive.

I agree, though, that the current system has its own problems. It's hard to build an effective alternative, I think, without going all out towards regex based arbitrary pattern matching, and giving modules a chance to replace the patterns in realtime (something fago has suggested in previous conversations). As Greggles has noted, I think that any optimizations will probably need some solid benchmarks in place before we can assume that there will be a payoff: there are probably 4-5 different ways to approach it and all have their potential hot spots.

Will be looking closer at the patch and testing, though. Definitely exciting!

Dries’s picture

I recommend that we go with the simplest approach and that we optimize for speed, not for flexibility. Let's try to keep it as lightweight as possible, but not "lightweighter". ;-)

greggles’s picture

My instinct (based on Eaton's and Fago's input) is that caching the results will be less effective than doing lazy generation of tokens i.e. only building the tokens that are needed for a particular token_replace. But, it's worth benchmarking...

One other consideration that I don't think we've discussed is the whole "raw" tokens problem. People get _really_ confused by this. When I first worked on that for some reason I decided that it couldn't be done by modules themselves. Now I think it's worth our time to investigate again and hopefully make a new "$raw = FALSE" argument to token_replace(_multiple) and therefore also to token_get_values which would tell the hook_token_values whether they should generate tokens as raw or not.

eaton’s picture

Personally, I lean towards always generating raw tokens and leaving it up to the consuming module to strip/filter/etc. as needed, but that could definitely be problematic.

quicksketch’s picture

I agree with greggles regarding filtered and "raw" versions of tokens. $raw should be an argument in token_replace() that defaults to FALSE. Shoving this responsibility on the end-user was a horrible mistake in my opinion. Though as Eaton correctly points out, before we started offering normal and raw tokens separately, almost every module that used token was opening up XSS vulnerabilities. Of course, now modules claim no responsibility and the end-users are doing just as much damage to themselves by misconfiguration. I suggest filtering everything by default, and make the developer specifically request the raw version if necessary. Not to mention, removing "-raw" tokens effectively cuts the number of generated tokens in half.

eaton’s picture

Greggles and I spent some time hashing through this in #drupal and came up with what we think are two potential approaches.

First, some terminology: provider modules implement hook_token_values to expose tokens. consumer modules use token_replace() or some variation to replace tokens in a text string with their values.

Right now, provider modules are encouraged to expose either 'fully scrubbed' XSS safe tokens OR scrubbed and 'raw' versions with a -raw suffix. This is problematic, and also leads to token bloat. Many consumer modules, like PathAuto, ask users to use the -raw tokens and then do their own filtering anyways. The conceptual complexity increases the likelihood of screwing up, both for end users and developers of consumer and provider modules. Two possible alternatives include:

The 'safe' flag
Add an additional flag parameter to the token_replace() function: $safe. Default it to TRUE. This flag gets passed on to provider modules, and it's up to them to XSS filter their tokens if it's on, and return raw tokens if it's off. Consumer modules that need more detailed filtering (stripping all tags, transliterating ala pathauto, etc) should ask for unsafe tokens internally, and do their post-filtering. Alternately, they can use the token_get_values() function themselves and act on the returned array of values.

The benefit of this approach is that it's not very different from how things work now: things are safe, raw, or custom-filtered by the consumer module. The downside is that it adds another param to token_get_values(), and forces provider modules to do more internal switching on the flag. It's basically a codification of what we're encouraging people to do by convention now.

The filter callback
Add a $filter_callback param (with filter_xss as a default) to the token_replace function. Provider modules ALWAYS expose raw tokens, and do no internal switching, but token_replace() will run all token values through the $filter_callback before it does the substitution. by passing in NULL as a filter_callback, a module can get totally raw token replacement. By passing in another function, like check_markup or my_custom_transliteration_function, it could do any other necessary finessing.

The upside (in my view at least) is that it would simplify the code for provider modules and reduce the number of cases in which consumer modules need to implement their own custom token-munging code. It may be a bit off-the-wall, though, and it puts the burden on every consumer module to know what cleansing function things should be run through before use.

Indecision '08
Greggles and I are split on it -- I can see the merits of both, and nothing would prevent someone who needs the second option from writing a 'get raw values and do stuff with them' helper function if they want to. Anyone have thoughts on it?

catch’s picture

The $safe flag is similar to ALREADY_SANITIZED which is being used in #242873: make drupal_set_title() use check_plain() by default.. It sems like a decent way to go for me.

quicksketch’s picture

I'm pro-proposition#1 (the $safe parameter). For the reason Eaton mentioned, that it doesn't require users to know what sanitation function to call. The #2 solution gets dirtier if you needed to pass in arguments to those callback functions (maybe you wouldn't, but I'm suspicious). It also seems more prone to break as "token-provider" modules change their internal methods of cleansing tokens (which other modules or themes might now be using).

eaton’s picture

The #2 solution gets dirtier if you needed to pass in arguments to those callback functions (maybe you wouldn't, but I'm suspicious). It also seems more prone to break as "token-provider" modules change their internal methods of cleansing tokens (which other modules or themes might now be using).

Well, the ability to call token_get_values() manually and iterate over the collection manually would still be there, as with 'the safe flag'. I'm not sure how the approach would be more prone to break based on what token-providers do? That was actually the reason I considered 'the callback'. Token providers, under that approach, would always provide raw tokens, period. the token_replace() function would strip XSS by default, and other functions could be passed in by the token consumer. That would mean that changes to the provider *wouldn't* change what gets pulled out, in terms of sanitization.

Again, I'm open to either one, I just want to make sure that 'the callback' is considered. ;)

eaton’s picture

Crap.

quicksketch took me aside and pointed out the trump card: $node->body. We can't pass it through a generic scrubbing function, because knowing HOW to scrub its 'safe' version depends on what value is in $node->format. So the 'safe' logic would have to be handled by the node module.

I cede, greggles' $safe flag is probably the best approach. :-)

greggles’s picture

Now, we'll see if it actually works ;)

#32 said:

doesn't require users to know

It's kind of moot now, but neither of these solutions requires site-admin users to know which function to use. It's only a matter of token-providers or token-consumers having to know what is "safe." I don't trust either group much, but trust token-providers a little bit more since they should really know how their data needs to be filtered to be "safe."

recidive’s picture

FileSize
41.49 KB

Some thoughts:

Decide on a convention for token wrapping character(s), core uses !token notation in user registration emails and %token in actions emails. Token module defaults to [token] what are the drawbacks of using left character only notation? I see it conflicting if there are e.g. !token-foo and !token-foo-bar so the [token] notation seems more appropriate.

Make consumer modules able to pass additional tokens/values when calling token_replace(). Use case: User registration emails, !login_url token.

Change how modules define tokens so tokens are grouped by module, and also expose hook_token_alter(), so other modules can add tokens to desired group.

Implement it in some core places where this seem appropriate, e.g. user registration emails, "Display a message to the user" and "Send e-mail" actions.

Leave it up to the consumer modules to decide whether to use global $user object, this can prevent unwanted values being used.

Factor out date splitting (day, month, etc) into separate functions so they can be re-used. Or leave that up to contrib? I see this useful only for pathauto.

Is it really necessary to include tokens.inc on bootstrap? Can't we just let registry automagically include this when token_replace() is called?

Re-rolling the patch in #24 to remove some fuzz.

webchick’s picture

On this point:

Make consumer modules able to pass additional tokens/values when calling token_replace(). Use case: User registration emails, !login_url token.

I would prefer tokens to always use the same syntax everywhere for consistency, so the upgrade path would scan through user_mail_foo and exchange !login_url for [login_url].

dww’s picture

I'm confused by the "trump card" described in #34. It seems like with the $safe flag, we're just punting to the provider modules to know/decide what "safe" means. That seems equally flawed. If anything, having the consumer specify what function they want to use to make something "safe" is more in line with context-sensitive filtering on output.

Similarly, #35 says:

It's only a matter of token-providers or token-consumers having to know what is "safe." I don't trust either group much, but trust token-providers a little bit more since they should really know how their data needs to be filtered to be "safe."

That seems like deeply flawed logic to me. If I'm consuming your tokens and rendering them to HTML, that's very different from if I consume your tokens and include them in plain text email. The provider doesn't know what "safe" means, since it has no clue what a consumer is going to do with the values. Only the consumer knows how we're using them, and therefore, what kind of escaping/filtering/transformation/etc is required to make them "safe" in the context where they're being used.

Either I'm missing something obvious, or this isn't as clear a "trump" as it's been made out to be, or it's hasn't been clearly explained. ;)

Given my (potentially flawed) understanding, I vote that providers always provide raw tokens, and there's a callback for consumers to automatically filter all the values as they want (option #2, we're calling it).

recidive’s picture

@webchick, sorry I guess I was not much clear on this point. What I'm talking about there is that !login_url (or [login_url]) is a token that is specific for user registration emails, so I think it makes sense passing it's value directly when calling token_replace(). But when I think more on this, I see it can be used elsewhere, e.g. in actions emails, we could want to send a url for user to change it's password when a certain event occur.

Anyway, your point is what I'm talking about on my first point above.

greggles’s picture

Some more thoughts on safe vs. raw.

There are basically three major contexts where tokens are used:

1) in situations where the "raw" version is needed - pathauto being the one I care about most, but I imagine there are others
2) in HTML rendered to a browser or a rich text e-mail
3) in plain text situations (plain text e-mail, sms, maybe some other stuff)

#1 is the easiest case for the API and can be handled by all proposals.
#3 - can be defined as the "safe" version of tokens from #2 and then the consuming module sends it through "drupal_html_to_text" or strip_tags or whatever is most appropriate for them. drupal_mail already does that, anyway.

#2 is the tricky situation, IMO. I think the definition of "safe" here is that there is no XSS in the token. So, how do we get rid of XSS in Drupal? We might use filter_xss_admin, check_plain, or check_markup. The only code that knows which of these filtering functions is appropriate is the providing module. Further, check_markup takes multiple arguments - especially when it is being used not just for filtering but for some sort of pseudo markup (markdown, dme, etc.).

I don't think a token consumer can know which of those functions is appropriate and, further, which secondary arguments to send to the filtering function. The only code that knows it is the providing module.

Of course, we could just advise module developers to send everything through check_plain. That would either make token crippled and only valuable for really basic strings OR it would result in people using an inappropriate filter which causes unexpected results (markdown tags that aren't interpretted) or a sechole or both.

I don't know if we want to really nail down all the craziness of token in this one issue. There are a few major issues I see that we need to hammer out:

1) safe vs. raw handling
2) hook_token_alter
3) best practices on the context grouping by module name? context of use?
4) standardizing on prefix/suffix of ! vs. [] (or continuing the current flexible concept)
5) performance - current system vs. token registry and lazy token building
6) performance - token caching
7) Deciding on the names of the tokens - i.e. [title] vs. [node-title] - imo, node-title is a better choice since "title" could be interpretted to mean a lot of things
8) convenience widgets for developers to present tokens to user - ajax searchable, ajax replace of tokens in pattern areas, etc.
9) modifiers for tokens (this would help solve the date problem that recidive points to - [node-created format="YYYYmmDD"] )

In this issue I'd love to solve safe vs. raw and the performance issues.

Owen Barton’s picture

I have also been mulling this over, and come down on the side of the providers deciding on how to filter. This is in line with the rest of Drupal (we don't try and filter all content in theme() for example).

In terms of the performance, lazy token building seems way more scalable. By the time you have tokens for a complex multistep CCK form, some fields with several different formats it will get hugely expensive to build them all, and I suspect that once we implement tokens in core they will rapidly take over in building block title, page title, node links etc etc. These will have different contexts (e.g. nodes) passed in, and so will not necessarily be very cachable.

A couple of other thoughts/goals:
* Namespacing - I would prefer we have a very standard namespacing rule/convention - e.g. [module:token] - otherwise it will rapidly get hard to tell them apart
* UI - we may want some kind of dropdown or whizzy JS navigation of tokens (perhaps with value previews in some case?), with an 'insert token' button to insert it into the text field. This would be a big improvement over the mega long fieldgroup we have now.
* Token formats - we need to decide if we want to fix this, or possibly have the delimiters for '[', ':' and "]' configurable (perhaps just for the input format filter config). I tend on the side of not allowing this (for now, at least). Another UX issue raised by the square brackets is our dear friend <!-- break --> - we should consider making this the same format, perhaps.

paul.lovvik’s picture

On safe vs raw - it occurs to me that this is almost the same problem that the t() function handles. The problem is that the client has a string prepared for display except that the string has references to dynamic content embedded that have to be replaced. It seems unacceptable for the client to have to do work in addition to calling the token API to replace the tokens with the runtime values. But at the same time there is no way for the token API to know (without some help from the calling code) what filtering should apply to which tokens.

I think we should make a solution similar to the t() function. Also, I like the [token] notation which identifies both the start and end of the token. I think we should put these ideas together:

[!token] - raw token - no filtering
[@token] - check plain
[%token] - HTML escape with emphasis

This will make for a simple client interface with the power that we need for any situation.

@recidive - thank you for removing patch fuzz. :-)

greggles’s picture

@paul.lovvik - that puts the burden for determining which kind of token to use on the site admin. Of the three groups that we could push this onto (site admins, consuming modules, providing modules) I think the site admins are least capable of making this decision (I say this having pushed it onto them once already and having learned from that experience).

Also, t() currently does either nothing for ! or check_plain (with/without em tags) for @ and %. That solution won't translate to tokens where we sometimes will need something like check_markup or filter_xss_admin depending on the nature of the token.

Pancho’s picture

Priority: Normal » Critical

Now that:

  • More and more contrib modules depend on Token
  • Quite some core code should adopt Token as soon as it's in core, and we need some time to do those followups
  • Discussion is about 'how' and not about 'if'
  • Even Dries now seems to agree on implementing a _lightweight_ version of Token in core
  • I remember that he also proposed getting parts of pathauto into core, as it is so widely used. Pathauto relies heavily on Token.

I think that qualifies this task for being critical. Obviously that doesn't mean that we need to have this in D7, it just means that if we do want to have it in D7, then this is a more time-critical task than others.
This saying, I feel very strong for getting Token into core.

alexanderpas’s picture

+1 for token implementation into core! (we really need a better way of subscribing)

sun’s picture

subscribing *sigh*

chx’s picture

Just because this thrived in contrib it does not mean it's a proper solution at all. The problem is, you are brewing a whole cauldron of soup when you need on a piece of steamed carrot. You are calculating so many values that it became tedious even for the patch writer as testified by $tokens['node']['mod-????'] = t('All tokens for node creation dates can also be used with with the "mod-" prefix; doing so will use the modification date rather than the creation date.'); by this little gem -- I am quite sure the original writer simply got too bored to list all that crap again. In Eaton http://drupal.org/node/113614#comment-1063361 already raised warning flags about a tokenocolypse and I am raising huge red flags again.

chx’s picture

dww warned me to search for "lazy" in the issue and to my astonishment, everyone who mentioned said it'd be more speedier. Damn right! If I need a node title and the changed date that there is no way you can justify for querying taxonomy tables. Just use $token_values = module_invoke_all('token', node', array('title', 'changed-ymd'), $node).

chx’s picture

About safe, anyone here read http://acko.net/blog/safe-string-theory-for-the-web ? Safe for what context? Incredible people still think safe is binary.

Edit: dww says, in #38 he said the same. Sure :) mine is just a bit more succint.

chx’s picture

FileSize
2.7 KB

Here is a new beginning. We still filter on output and the token provider merely tells where the data comes from and if necessary provides the format.

chx’s picture

FileSize
2.72 KB

So the patch, in very few lines, allows you to run PHP from the token. So .date.d-m-y.token:changed will run date('d-m-y', token:changed) but before running token:changed gets replaced to $node->changed. Magic!

chx’s picture

FileSize
3.01 KB

Added a safety switch.

drewish’s picture

better but still super scary... since i'd probably want to let people pick their date format but probably not want them doing something like:
[|file_put_contents|random.php|<]
[|file_put_contents|random.php|?php echo 'hi there';?|8]
[|file_put_contents|random.php|>|8]

Pancho’s picture

@chx:
You are really awesome... rewriting Token from the scratch within minutes... :)
Looks like that's the much cleaner and better approach than trying to "core-ify" the contrib token module!
Just tell us as soon as you can use our help for testing etc...

@drewish:
How could that be possible, as long as no one adds a specific token that does this or a token that allows for abuse. We might want to add some security checks, if we want admins to create and manipulate tokens both easily and freely. Otherwise this looks pretty safe from my POV...

dww’s picture

@chx: No offense, but are you kidding? ;) You expect end-users (the people potentially putting tokens into things like group broadcast emails) and mere mortal site admins (just trying to configure their site's welcome email or simple pathauto things like they want all their stories to be aliased at example.com/YYYY/MM/DD/title) to get .date.d-m-y.token:changed right?

I'm all for lazy expansion, and I'm glad you're on the side that thinks "safe" is hogwash and we need to do context-sensitive filtering on output, but otherwise I think this is a step in the wrong direction. It's very clever for a developer. But as a user of this system, I don't give a crap what module or function is providing my token -- I just want to know what tokens I can reference and have as simple a syntax as possible to refer to them.

chx’s picture

FileSize
2.7 KB

Okay, what about [changed:Y/m/d] ? Date is now hardwired.

chx’s picture

FileSize
1.86 KB

The automatic token extractor from property for numeric values now presumes it's machine generated -- even if not, it's still safe not to escape a number whatever the output context is. This helped compressing the patch even more... and a few bugfixes added.

webchick’s picture

Hm. I don't really see how this approach is workable from an end-user standpoint.

This approach now means you can't use tokens without understanding where data comes from. So you teach your editors the basic ones like node:title but then they see another field on the node form that looks exactly like the title, called "Byline" ... how do you explain to an *editor* who has never the administration panel before, and has CERTAINLY never seen Drupal's database schema, that byline actually comes from CCK and thus must be referenced by cck:byline?

I'm also a bit stressed that since heading off in this direction, the productive brainstorming that was brewing among the token module maintainers has all but died off. Token has proven itself in contrib. Let's not write it off so quickly. Let the productive brainstorming continue.

dww’s picture

Yay, webchick is on the side of goodness. ;)

Furthermore: given chx's approach, how do we present the end user a list of available tokens? We don't, unless provider modules implement hook_token_info() to register what callbacks they provide. At that point, we're most of the way back to TokenAPI...

I repeat: let's work towards lazy expansion and filter-on-output instead of "safe" vs. "raw". Otherwise, let's keep TokenAPI.

drewish’s picture

well i think the hybrid approach of a hook to define the available tokens and provide the data is a good one but chx's lazy creation is a much saner way to generate them.

greggles’s picture

filter-on-output

@dww - can you reconcile this with the earlier comments about how filter-on-output won't work because the consuming module doesn't know what input format to use? I can see how this would work, but it needs to be explained.

chx’s picture

Status: Needs work » Closed (fixed)

The issue is continued at http://drupal.org/node/343665 .

webchick’s picture

Status: Closed (fixed) » Needs work
chx’s picture

I managed to hurt some people's feelings in this issue and for this I am sorry. And now I am hurt for doing this as well. Sigh. Please accept my apologies for any harsh words in the issue and let's try to focus on the design issues instead.

alexanderpas’s picture

tokens needs to be de-normalized...

each module needs to register it's tokens at the appropriate place (the token register), in a standard format.

example toughts of some possible tokens:

token meaning registered as registered by
[node:title] The title of the current node [node:(#):title] Node module (has registered for [node:#] opional argument handling, and will send [date:*] to the date functions to solve the date)
[node:32:body] The body of node 32 [node:(#):body]
[node:date:created] The creation date of this node. [node:(#):date:(*):created]
[node:24:date:YYYY'-'MM'-'DD'T'HH':'MM':'SS:lastmodified] The last modification date of node 24 in ISO 8601 format. [node:(#):date:(*):lastmodified]
[node:byline] The CCK field named field_byline on the current node. [node:(#):*] CCK module
[node:29:image:logo:3] The third image in the CCK field named field_logo on node 29. [node:(#):image:*:(#)] ImageField Module (which will require CCK to solve [node:(#):*])
[date] the current date. [date:(*)] core date functions... (has registered for [date:*] optional argument handling)
[date:DDD:tomorrow] the julian number of the next day [date:(*):tomorrow]
drewish’s picture

alexanderpas, i like most of what you're proposing but cck fields should be field_fieldname because i don't want say field_created that's a boolean to overwrite the node's created timestamp.

sun’s picture

@alexanderpas: That's completely offtopic. Tokens are simple placeholder tags without any parameters. What you are referring to are (configurable) macro tags that expand to raw HTML, which need to be treated very differently - see #80170: Inline API for a concept and current effort.

jrabeemer’s picture

sub

Dries’s picture

It is not clear why we want to do stuff like [node-created format="YYYYmmDD"] or [node-created:YYYYmmDD]. We're starting to reinvent HTML. Can't we have _simple_ tokens instead of a built-in programming language? Please no dynamic Fu in tokens.

sun’s picture

So, to recap, and maybe get a first generation of tokens in Drupal core for 7.x:

Problems with current string placeholders in Drupal:

  • They are not standardized, most often we use !here-something.
  • They do not use proper placeholder delimiters, i.e. a user does not know where a token in !here-something-here starts and ends.
  • There is no default way of displaying available tokens to the user; most often we are squeezing them into a form element description, where they are hard to see and lookup.
  • They are limited to some very specific placeholders, while the user could also use other strings from the same scope (object) without having extra burden on the system.

A proposal:

  • Tokens in Drupal are simple placeholders for simple, but dynamic string replacements. Tokens are used to configure Drupal, not used for site contents. Tokens are simple placeholders, thus no parameters or arguments, rather an equivalent to translation placeholders in t() for Drupal administrators to configure their site.
  • A token is delimited by [ and ].
  • Tokens are grouped by scope/object; scopes are f.e. user, menu, node, aso.
  • Tokens are prefixed by the scope name, i.e. [user-name], [node-title], aso. Variable names may or may not be an exception (TBD).
  • The calling code (consumer) decides which scopes should be available to the user.
  • Modules providing tokens (providers) expose available tokens in the given scope, without having knowledge of the actual object that may be passed in.
  • All available tokens are displayed to the user in a unique style; an approach like the new #input_format FAPI property makes probably sense.
  • Token replacement is handled via token handler functions, i.e. the actual user input triggers the processing and replacement of the (actual) tokens. If the user input contains [user-name] only, then only this token is processed. If there is no matching token name in the given scope, the token is not replaced.
  • Replacement strings are provided raw/unsafe/unsanitized. The caller (consumer) has to sanitize the rest of the user input anyway.
  • Lastly, token.inc is pluggable like many other sub-systems, so Token (or another) module can replace and enhance Drupal core's default token handling during the next release cycle.
chx’s picture

Title: Drupal: Add centralized token/placeholder substitution to core. » .usDrupal: Add centralized token/placeholder substitution to core.

Let me rant a bit about that "replacement strings are provided raw/unsafe/unsanitized". Some things are only available in HTML: node bodies and teasers, as stored in the database must be ran through check_markup, as the database fields can contain pretty much anything. I have suggested in http://drupal.org/node/343665 to instead ""replacement strings can be safely displayed in an HTML context". Of course, the two are not that far from each other, we can run html_entity_decode on the node bodies to deliver "replacement strings are ready to be HTML escaped". If we want to send text mails we do not want to see HTML entities anyways and if we want to, say, provide a message from an action then the debate boils down to drupal_set_message(check_plain(drupal_replace_tokens($text))) or drupal_set_message(drupal_replace_tokens(check_plain($text))). Still, as I have written the patch it seems to be a lot easier to deliver "replacement strings are ready to be HTML escaped".

However, the big problem lies behind "tokens are prefixed by the scope name, i.e. [user-name], [node-title]". We discussed this last night from Eaton, Dmitri and others arrived to the point where every function needs to store every object it wants to expose via token with a sensible name. For example we wwant to be able to say "user1 created a comment commenttitle as an answer to user2' parentcommenttitle in the thread started by user3 at nodecreatedtitme". "user1 befriended user2" and so on.

chx’s picture

Title: .usDrupal: Add centralized token/placeholder substitution to core. » Drupal: Add centralized token/placeholder substitution to core.
alexanderpas’s picture

@ #70 by sun:
I think that's a nice first implementation.

@ #71 by chx:
I already tought about that... but i think we should not implement it in the first stage.
[comment:user] created a comment [comment:title] as an answer to [comment:parent:user]' [comment:parent:title] in the thread [node:title] started by [node:user] at [node:date:created] (or something like that)

paul.lovvik’s picture

FileSize
31.02 KB

I have a new patch which I believe fulfills much of what was described in comment #70.

This implementation has the following:

  1. Tokens in Drupal are simple placeholders for simple, but dynamic string replacements. Tokens are used to configure Drupal, not used for site contents. Tokens are simple placeholders, thus no parameters or arguments, rather an equivalent to translation placeholders in t() for Drupal administrators to configure their site.
  2. A token is delimited by [ and ].
  3. Tokens are grouped by scope/object; scopes are f.e. user, menu, node, aso.
  4. Tokens are prefixed by the scope name, i.e. [user-name], [node-title], aso. Variable names may or may not be an exception (TBD).
  5. The calling code (consumer) decides which scopes should be available to the user.
  6. Modules providing tokens (providers) expose available tokens in the given scope, without having knowledge of the actual object that may be passed in.
  7. Token replacement is handled via token handler functions, i.e. the actual user input triggers the processing and replacement of the (actual) tokens. If the user input contains [user-name] only, then only this token is processed. If there is no matching token name in the given scope, the token is not replaced.
  8. Replacement strings are provided raw/unsafe/unsanitized. The caller (consumer) has to sanitize the rest of the user input anyway.

... and lacks these:

  1. All available tokens are displayed to the user in a unique style; an approach like the new #input_format FAPI property makes probably sense.
  2. Lastly, token.inc is pluggable like many other sub-systems, so Token (or another) module can replace and enhance Drupal core's default token handling during the next release cycle.
  3. Unit tests

The important code is in token.inc, which has an API very similar to the original token module API. Also I have implemented most of the tokens that were available in the original token module.

I do have one regret with this code - I changed the signature of the token_replace function such that it does not include the type. I was hopeful that the type could be discerned from the object passed in, but after implementing a reasonably large set of tokens for various types I'm not 100% confident that it can be. Comments on this are more than welcome. The scheme I implemented is clearly working, but it could probably be better.

greggles’s picture

This all seems like great work, Paul. Thanks for your efforts.

One point I think we need to address:

Replacement strings are provided raw/unsafe/unsanitized. The caller (consumer) has to sanitize the rest of the user input anyway.

This seems like a problem to me. To illustrate why, consider the comment body token:

 function comment_get_body($object = NULL) {
  if (_comment_is_comment($object)) {
    return $object->comment;
  }
  return '';
} 

On many sites this will cause weird content in the token because the comment can contain data that must be sent through the associated input format (e.g. an inline view from http://drupal.org/project/insert_view ). So, I'd like to reiterate my belief that tokens should be provided in a format that is "safely displayed in an HTML context" (as chx put it). If it's not the default it should at least be optional on the part of the consuming module.

Separately, I'm concerned about

thus no parameters or arguments,

One of the most common complaints about the current implementation is #185446: Provide tokens for each vocabulary. With the current token API it would be possible to write a custom module which dynamically declares tokens like [term-1] and [term-2] that provide the term from a specific vocabulary. It could be that I'm just not creative enough, but I think that the registry/callback system that you've implemented wouldn't permit this kind of dynamic system, is that right? It would be possible to write a small custom module that provides this, of course and that is a reasonable solution as well.

I notice that the taxonomy tokens are left out. Is that purposeful or just lack of time?

paul.lovvik’s picture

@greggles

I'll address your concerns in reverse order...

Ah - I totally forgot to mention the lack of taxonomy tokens. I ran out of time and then spaced on it when I did the writeup. Sorry for not calling that out though.

You are right that this implementation does not provide a mechanism for dynamically named tokens. My intent was to keep this simple and grow it out as needed. I am operating under the assumption that we want a relatively simple implementation for core that has behavior that can be overridden in a module.

As for the display safety, I see this as a problem as well. Implementing no safety at all is easy. :-) But if this is really a strategy going forward we need to be able to show that we can reasonably handle the display safety on the client end of the call without making it impossible to use the result. Probably the choice to go with raw on the comment body was wrong. The original code had both the checked body and the raw body and I definitely went with a raw-only approach. I'm curious if there is ever a practical need for the raw body of a comment or a node? If not, it would be trivial to change these specific fields to be checked or we could implement both checked and raw values as the token module does.

The bigger question is whether the raw idea is a good general strategy with a few corner cases, or if we need to simply take this in a new direction.

sun’s picture

Let me first point out that my proposal in #70 was rather targeted to having a very simplified version of Token module in core - with the primary purpose of having a consistent processing and UI for tokens. Even if that would turn out to just replace:

function system_message_action(&$object, $context = array()) {
...
    $variables = array_merge($variables, array(
      '%uid' => $node->uid,
      '%node_url' => url('node/'. $node->nid, array('absolute' => TRUE)),
      '%node_type' => check_plain(node_get_types('name', $node)),
      '%title' => filter_xss($node->title),
      '%teaser' => filter_xss($node->teaser),
      '%body' => filter_xss($node->body),
      )
    );
...

with (pseudo-code):

      '[user-uid]' => $node->uid,
      '[node-url]' => url('node/'. $node->nid, array('absolute' => TRUE)),
      '[node-type]' => check_plain(node_get_types('name', $node)),
      '[node-title]' => filter_xss($node->title),
      '[node-teaser]' => filter_xss($node->teaser),
      '[node-body]' => filter_xss($node->body),

which, in the special case of actions, does the same (wrong) filtering using filter_xss() instead of check_markup(), then I think we would still be one step closer as long as the sub-system is pluggable and Token module is able to replace it.

Anyway, maybe we could address the concern about scopes and prefixes with a function signature like the following?

/**
 * Replace placeholders in a string.
 *
 * Return the value of $original, with all instances of placeholder
 * tokens replaced by their proper values.
 *
 * @param $original
 *   A string to perform token substitutions on.
 * @param $objects
 *   (optional) An array of objects to use for building substitution values,
 *   keyed by token prefix, e.g.
 *   @code
 *     array(
 *       'comment' => $comment,
 *       'parent-comment' => $comment_parent,
 *       'post' => $node,
 *     )
 *   @endcode
 * @return
 *   The modified version of $original with all substitutions made.
 */
function token_replace($original, $objects = array()) {
}

/**
 * Helper function to build an array of available tokens.
 *
 * A helper function that retrieves all currently exposed tokens,
 * and merges them recursively. This is only necessary when building
 * the token listing.
 *
 * @param $scopes
 *   (optional) An associative array having the token prefix as key and the
 *   class of substitution tokens to use as value, e.g.
 *   @code
 *     array(
 *       'comment' => 'comment',
 *       'parent-comment' => 'comment',
 *       'post' => 'node',
 *     )
 *   @endcode
 *   If no scopes are specified, only 'global' (site-wide) substitution tokens
 *   are returned.
 * @return
 *   The array of usable tokens and their descriptions, organized by token type.
 */
function token_get_list($scopes = array()) {
}

This would boil down to: let's have the consumer control available scopes, objects, and token prefixes to use.

Dries’s picture

Issue tags: +Favorite-of-Dries

+1 for sun's comment above mine

fago’s picture

A consumer provided array of $scopes makes definitely sense. It's important to not mix token prefixes and the actually object behind, for which we want to get tokens.

Also +1 for lazy token generation, as done in the last patch of #74. This way we ensure good performance and avoid caching problems, which occasionally occured with the token module's approach.

What might also be a thing to consider, is lazy-loading of related objects. Consider the author of a node. Currently we have tokens like [node-author-name], which would be directly provided by the node module. However, when a module exposes further tokens, one would like to use them with the node author too. So we would need a way to expose a related object, which then is lazy-loaded when used. Same thing for referenced nodes or users, the comment's author, the comment's node and so on..

Currently in 6.x I solve this problem on the "consumer" side with rules, but that's not really convenient as solving it on the provider side would expose the functionality to all consumers.
Opinions?

But anyhow, this is not so simple. Consider the author of something, let's say a node, I'd probably prefer to get tokens for it automatically. Thinking of referenced nodes or users, it would probably suffice to let the consumer modules decide whether to add it or not, as they are just available depending on the node's type. Furthermore we'd have to ensure we don't run into loops... (the node's author's content profile...)

quicksketch’s picture

Fago mentions a definite problem (referenced content or users within tokens), but I don't think we want to manage this sort of complexity in the first implementation. Especially considering we don't want any kind of logic or wildcards in our tokens (see #65-9), this problem may be out of scope for this issue. This issue should work on solving the legitimate problems of token.module (like the -raw tokens), and then we can work on adding capabilities to the token system later.

PixelClever’s picture

@Dries I agree. I think this could be simpler and more flexible. Instead of limiting it to a particular set of Drupal objects we could make it accept an optional callback that refers to a function and another argument that passes a value to that callback function. The callback function could be declared anywhere.

For example you could write it as follows: token_replace($original, $value = NULL, $callback = NULL) {}

and call it like so: token_replace('%example-token', $node->nid, 'node_token_processor');

In this case the function node_token_processor() would be passed the token and the nid and would return the new value. If the $value or the $callback variable are missing it would run according to default settings.
The $value variable could be an object or an array or whatever, it would be up to the callback function to determine how it is managed.

The advantage to this approach is that though the tokens would need to be uniform across the site and may need a registry, different modules could determine how those values are returned by using their own callbacks.

fago’s picture

As I'm really in favour of this one, I think the *right* way to do it is:

1. Get a mapping to RDF in core: #378144: RDFa: Add semantics from the ground up

2. Once this is in let all modules provide their metadata as RDF

3. Build token support upon that metadata.

More about this idea: http://groups.drupal.org/node/19786

ccoletta’s picture

subscribing

arman’s picture

Category: task » support
Priority: Critical » Normal

Greetings,

I was just looking over this in the code sprint at Drupal con and have a question:

Disclaimer
-- I've just scanned the discussion above.
-- I'm not very familiar with D7 design goals

Issue:
1. Say the site is in production and you refresh the token list -- consider the following test case:
a. call token_refresh() // for example when when enabling or disabling a module
b. execute
$table = "token";
db_delete($table)->execute();
c. The site continues to serve pages where token evaluation is required.
d. I suspect all the tokens will evaluate to "" empty strings.
e. What if we are caching the resulting items in cache.
f. token_refresh completes.
2. I suspect this may be inline with D7 design goals.
3. If not "2" should there be some warning given to the admin when they disable or enable modules

Thx.

Les Lim’s picture

Category: support » task
Priority: Normal » Critical

@arman: please don't alter the issue settings unless you mean to change the status of the issue as a whole. Also, subscribing. :)

fractile81’s picture

subscribing

mitchell’s picture

+1

dmitrig01’s picture

Title: Drupal: Add centralized token/placeholder substitution to core. » Add centralized token/placeholder substitution to core.

Removing unnecesarry Drupal: in title (and subscribing)

apaderno’s picture

+1 for this feature. It would allow the Drupal core code to use tokens defined by third-party modules.

caktux’s picture

This is not a patch to implement this feature but I didn't want to make a new issue for it and it seems like it would be related or could affect this issue. I ran into getting non w3c html from geshifilter putting a token inside filtered text, so it's resulting container div ended up in a p tag. This patch prevents tokens from getting "auto-p"ed

caktux’s picture

FileSize
724 bytes
recidive’s picture

@caktux: your patch has nothing to do with this issue, please search/create a new one and attach your patch. Also, I believe your issue can be fixed by re-ordering filters.

apaderno’s picture

Title: Add centralized token/placeholder substitution to core. » Add centralized token/placeholder substitution to core

Thinking now about implementing this feature, I now see it like something not desirable.
The first though I have is that a similar system will never be as fast as custom code a module would execute to handle the specific need it has; if a module just needs to replace three tokens that are well known in the moment the code it is written, all it needs is something like strtr($str, array('%title' => $node->title, '%user' => $user->name)). Querying all the enabled modules to get their list of tokens takes already the time that the custom module would need to replace its own tokens. There would be a gain of time just in the case the tokens to be replaced are not limited to 10, but they would be much more; this doesn't seem the case of the tokens used in Drupal core code, and I don't think there would be a module that needs to replace something like 100 tokens in a text, which could be an email text, an error message shown to the user, etc...
The second thought I have is that the function that needs the tokens replacement cannot control which tokens it really wants to be used. Maybe it's a case that happen few times, but the function could need to avoid the site name appears in the text being changed.

matt2000’s picture

subscribe

apaderno’s picture

It would make sense, though, to have code for a limited set of tokens in code, even if the centralized code could not be as fast as the original not centralized code. Maybe to cache the result could help, but then the cache should be emptied at the right moment.

eaton’s picture

This particular issue has wavered back and forth a number of times; Greggles and I have been discussing ideas for improvements on token itself for a while now, and last week I took the first stab at 'Token 2.0' -- it's rough code subject to change, but IMO at least it has the potential to address a number of the problems that have been so troubling here.

http://drupal.org/cvs?commit=232784 is the commit itself, and there are a couple of fundamental changes.

  1. Text is scanned for tokens that match the pattern [type:token] using a regex. The resulting collection of named tokens and their 'types' is then passed on to modules. [node:title], [user:name], [site:slogan] and so on are all valid and reasonable tokens.
  2. Tokens are no longer pre-generated by modules in a large batch -- only the tokens that are actually used in the text need be built. This makes it more reasonable to support 'expensive' tokens, because their code only need be run when they are actually used. Fago has supported this move for some time, and after working with the example code I'm inclined to say he's been right all this time. ;-)
  3. Tokens can be chained. In the new Token 2 prototype code, [node:author:email] is recognized as a 'node' type token with the name 'author:email'. The node.tokens.inc token handling code, though, recognizes any node tokens starting with 'author', loads the $node->uid user account, and reprocesses the author: tokens off as 'user' tokens. The same mechanisms can be used in situations that previously stymied Token. User Relationships, for example, could take [sender:name] and [recipient:name], and simply reprocess them as 'user' type tokens, while preserving the difference in meaning between the two different users being referenced in the text.
  4. Certain kinds of dynamic tokens, like dates, currently consume stupid amounts of token-space currently. Every time a unix timestamp is used, we generate dozens of tokens, then list them. This late-generation of tokens makes things like [node:created:mmDDYY] possible, where mmDDYY is any valid php date format. The date token generation could also be reused just like user tokens, so that module developers exposing tokens for a date value (Hello, KarenS) would not have to reimplement a dozen tokens over and over again.
  5. Currently, token delegation and conditional generation of the tokens requires slightly squirrelier code than I'm happy with. (It's not BAD, but it could definitely use some convenience helper functions to simplify the process.)
  6. 'Announcing supported tokens' and displaying token help isn't yet implemented. hook_token_info() will be used for that purpose though, and will allow modules to document both their tokens and any relationships (like [node:author:...]).

Clearly, this makes Token a bit more complicated. It's still very slim internally (string replacement isn't exactly rocket science), but complicated scenarios like token delegation require a bit more thought. We'll also need to think through how to display tokens visually in a way that makes sense. (Hmm, d7ux.) I believe that the delegation approach, and live-generation of tokens rather than pre-building, does give us a bunch of really big wins and solves some difficult problems that have plagued token for years.

Thoughts? Feedback? I think Token moving in this direction is a done deal. the Developer Experience for it can still be smoothed out quite a bit though, and the folks who have been active here are thinking a lot about it.

catch’s picture

I really like lazy generation of tokens like this and the token naming seems pretty clear to me. It also points towards being able to do partial caching of things and using tokens to fill in the customized bits at a later date - if not using token itself then sharing/building on the same code.

sun’s picture

Awesome! Without having looked at the code, that explanation makes me feel warm & fuzzy. ;)

Though there is one issue: [node:created:mmDDYY] - either we need to use strings for such custom parameters, or we need a safe delimiter. Putting a date with some time info in there - [node:created:dd.mm.YYYY HH:ss] - should clarify the issue.

Overall, however, I'm worried about allowing custom, user-supplied parameters in tokens. Some folks will kill me now for bringing up this topic again, but: We should clearly differentiate between "replacement tokens" and "inline macros".

The former is what Token provides, i.e. static, pre-defined replacement strings provided by modules; to use as-is within other strings.
The latter is what Inline module allows and will soon provide in a much more generic fashion, i.e. very dynamic macro tags taking various arguments/parameters to be processed into HTML by an input filter; configured/inserted using a 'helper wizard'.

Long story short: Tokens should stay static, because custom/dynamic parameters open up a whole other can of worms.

eaton’s picture

sun, I'm sympathetic, but I'm really not sure there's any way that we can prevent people from misusing a general token-parsing system in that way if they really want to.

The concern that you raise is precisely why I resisted lazy-generation of tokens for so long: the moment we move to building the replacements in realtime, we open the door to more flexible parsing of the underlying tokens. And the moment that is there, we get fun stuff like "What if I put if/then conditionals in a token?" That makes my soul hurt, and my kittens look frightened. But I've finally caved, and I don't think we should hold onto 'pre-building' just to ensure people don't abuse lazy generation.

Just as our 'old' policy with token was "Don't do things that are time consuming, because it will slow down ALL tokens EVERWHERE," our new policy should be, "Just because you CAN do this without slowing down the entire site doesn't mean you SHOULD."

sun’s picture

Sorry, in case my follow-up was ambiguous. Lazy generation of tokens is awesome. And yes, I agree that we should document the Dos and Don'ts. So we are all in line.

"static" was solely referring to the given example of a date format token argument. Perhaps I was even talking about two issues:

a) If custom token arguments are allowed/supported, then use a string to overcome the colon (:) delimiter issue, i.e. [node:created:"dd.mm.YYYY HH:ss"] (HH:ss contains the delimiter). Otherwise, you'd have to escape the delimiter somehow, i.e. [node:created:dd.mm.YYYY HH\:ss].

b) Only allow module-provided, static tokens, and don't support arbitrary, custom arguments (such as a date format). That would mean we would still clutter the list of possible tokens with all variants, but as long as they are only listed and not processed or generated, I don't see any issues in doing that.

eaton’s picture

Another point noted by litwol in IRC -- this also eliminates the problem of infinite looping when building complex tokens. That occurred in a number of situations previously because we pre-generated all possible tokens. Imagine for a moment a [node:child] token and a [node:parent] token. With pre-generation of tokens that could easily turn into an endless loop if child-parent-child-parent-child-etc. We ran into this in a number of places with CCK Field tokens, to the point that Token 1.x includes a kill switch if the token generation code calls itself more than a few times.

With 'live' generation of the tokens, that hypothetical [node:parent:child:title] doesn't cause any issues. it would be equivalent to [node:title], but it wouldn't break anything.

eaton’s picture

a) If custom token arguments are allowed/supported, then use a string to overcome the colon (:) delimiter issue, i.e. [node:created:"dd.mm.YYYY HH:ss"] (HH:ss contains the delimiter). Otherwise, you'd have to escape the delimiter somehow, i.e. [node:created:dd.mm.YYYY HH\:ss].

Aha, I see what you mean. In the proposed implementation, Token doesn't split up a token on delimiter boundaries. Instead, it chomps off the FIRST piece of the delimited string and leaves the rest for the hook_tokens() impelmentations to sort out. explode(':', $token, 2) is what it uses, IIRC. As such, [node:created:HH:ss] is perfectly valid: the node module's hook_tokens() would look for 'created:' at the beginning of any tokens, and use whatever 'remains'.

eaton’s picture

b) Only allow module-provided, static tokens, and don't support arbitrary, custom arguments (such as a date format). That would mean we would still clutter the list of possible tokens with all variants, but as long as they are only listed and not processed or generated, I don't see any issues in doing that.

I considered that for a while, but I'm still worreid about it. By blocking dynamic tokens, we actually slow the system down by making it rely more on large quantities of metadata during the actual generation/replacement phase. Is this really the way that we want to go? Some modules like Views actually include similar systems internally already, and merlinofchaos is hoping that views could simply leverage token for its internal replacement mechanisms. If we forced views to 'pre-register' a whitelist of possible tokens, it would make that much must more difficult.

We'll still need an explicit list of tokens for online help, but there's no reason that they couldn't just say, "date:long, date:short, date:normal, as well as any valid PHP date format..."

eaton’s picture

Actually, after talking to catch, it WOULD be necessary to capture some sort of meaningful flag to tell the token parser that 'what comes after this is a date format'. So, for example...

[date:long]
[date:short]
[date:custom:HH:ss]

would all be easy using this system.

[date:HH:ss] would be thornier, because it would require assuming that any date-related token that isn't explicitly matched is a PHP formatting string. the custom:formatting-string approach would make the matching simpler.

eaton’s picture

A new round of checkins for token.module in HEAD just went in. There's a helper function that dramatically simplifies the process of 'handing off' tokens; it allows things like [node:created:short], [node:changed:short], and [date:short] to all be handled consistently by the same code. The date related stuff probably still needs some closer attention (as revealed by the above discussion), and the code shouldn't be too tricky, it just needs to be discussed.

An example snippet of code from node.tokens.inc:

    foreach($tokens as $token => $raw) {
      if ($token == 'nid') $replacements[$raw] = $node->nid;
      elseif ($token == 'uid') $replacements[$raw] = $node->uid;
      elseif ($token == 'name') $replacements[$raw] = check_plain($node->name);
      elseif ($token == 'title') $replacements[$raw] = check_plain($node->title);
    }
    
    if ($author_tokens = token_match_prefix($tokens, 'author')) {
      $author = user_load($node->uid);
      $replacements += module_invoke_all('tokens', 'user', $author_tokens, array('user' => $author));
    }

That code builds up token replacements for a number of node values, but also 'delegates' any tokens dealing with the node's author to the normal 'user' token handling code. It's a relatively smooth way of allowing tokens to be 'chained' together without risking infinite loops the way the current token code does.

I'll be rolling a smallish patch for D7 shortly - my plan is to put the token.module code itself in common.inc, and move the individual foo.tokens.inc files to their respective module folders.

robertDouglass’s picture

/me joining the I-Love-Eaton's-Tokens fan club (aka subscribe)

fago’s picture

I'm also in favour of this approach, chained tokens are cool!

>Certain kinds of dynamic tokens, like dates, currently consume stupid amounts of token-space currently. Every time a unix timestamp is used, we generate dozens of tokens, then list them. This late-generation of tokens makes things like [node:created:mmDDYY] possible, where mmDDYY is any valid php date format. The date token generation could also be reused just like user tokens, so that module developers exposing tokens for a date value (Hello, KarenS) would not have to reimplement a dozen tokens over and over again.

Indeed. Though isn't that issue there for any type of values? We want numbers to be rounded in a special way or formatted as 0.000,00 €, texts maybe shortened or stripped of html tags. So why not give each object value a "type" and allow associating various token formats which each type?

>'Announcing supported tokens' and displaying token help isn't yet implemented. hook_token_info() will be used for that purpose though, and will allow modules to document both their tokens and any relationships (like [node:author:...]).

Sounds interesting! Have you thought about doing a kind of "hook_object_schema", which provides metadata about the drupal "objects", their properties and relations? I think the rdf patches could also build upon that - it would just need to add its rdf-properties to the "entity values" (similar as I already suggested in #82). In fact the proposed hook_rdf_mapping in #493030: RDF #1: core RDF module is already very similar:

+function hook_rdf_mapping() {
+  return array(
+    'blog' => array(
+      'rdftype' => 'sioc:Post',
+      'title'   => array('dc:title'),
+      'created' => array(
+        'properties' => array('dc:date', 'dc:created'),
+        'datatype' => 'xsd:dateTime',
+        'callback' => 'date_iso8601',
+      ),
+      'name'    => array('foaf:name'),
+      'uid'     => array('sioc:has_creator', 'foaf:maker'),
+    )
+  );
+} 
eaton’s picture

Indeed. Though isn't that issue there for any type of values? We want numbers to be rounded in a special way or formatted as 0.000,00 €, texts maybe shortened or stripped of html tags. So why not give each object value a "type" and allow associating various token formats which each type?

In theory, that's what happens whenever we do token chaining -- dates, node ids, and so on are passed on as a new token type. I'm not sure how feasible it is to start associating the metadata with the objects themselves, rather than making it part of the token generation code. I'm concerned that adding explicit support for this kind of stuff on top of every token would make things a lot less usable, and right now it doesn't support the use of chaining chaining something onto the "end" of an otherwise meaningful token.

Since anyone can implement tokens, though, it's certainly possible someone could create a 'number formatting' token domain... It would be pretty inefficient using this design, though, as each and every number would go through the whole hook system. Perhaps that wouldn't be a problem, though?

Sounds interesting! Have you thought about doing a kind of "hook_object_schema", which provides metadata about the drupal "objects", their properties and relations?

hook_token_info() might just start heading that direction. Right now the idea is for it to define/explain tokens, and indicate that a given token (like 'author') is actually a pointer to an entire domain of tokens ('user' tokens). That's a bit relationshippy but it's definitely short of mapping out the full object hierarchy. I definitely don't want devs to have to figure out RDF type mappings just to expose a token...

eaton’s picture

To answer the more specific question, though, yes -- any time we have numbers etc, we run into the formatting problem. Dates, though, are particularly thorny because they're often used in pathauto aliases...

fago’s picture

>I definitely don't want devs to have to figure out RDF type mappings just to expose a token...

Agreed. I'm just talking about putting both similar ways of providing metadata to objects together to an unified way - a kind of "hook_object_schema", which then both token and rdf could use as base for their special needs, e.g. extending the entries with infos about the rdfmapping or something token specific. So a dev wouldn't have to provide a rdf mapping, but he could.

This reminds me of jpetso's fieldtool, which intends to solves the same problem: http://drupal.org/project/fieldtool

>Since anyone can implement tokens, though, it's certainly possible someone could create a 'number formatting' token domain... It would be pretty inefficient using this design, though, as each and every number would go through the whole hook system.

Hm why should it go through the whole hook system? I think a way to let modules provide some callback functions for different ways of formatting the numbers would suffice.

>I'm not sure how feasible it is to start associating the metadata with the objects themselves, rather than making it part of the token generation code.
Good question, but don't you need the metadata for providing the token help nevertheless?

eaton’s picture

Hm why should it go through the whole hook system? I think a way to let modules provide some callback functions for different ways of formatting the numbers would suffice.

Right now, even with the chaining mechanism, the token system is relatively "dumb" -- no metadata is used during the actual generation of the tokens themselves, and there's no contextual model maintained. Token doesn't actually understand the "chain" of tokens at all, it just lets modules that implement hook_token() chop off bits and recurse until they hit the "end" of the token chain. So I'm not sure there's any clean way to maintain type information for the resulting replacement values.

One option that's been suggested is to allow modules themselves to supply a callback that would be applied to all the actual token replacement values before str_replace() is actually called. In pathauto's case, this could do spaces-to-dashes conversion and trunkate too-long values before they're replaced. I'm not sure if this is flexible enough for what you're describing though. Hmmm...

Good question, but don't you need the metadata for providing the token help nevertheless?

In some ways yes, but that hierarchy of metadata doesn't have to be maintained and traversed during the actual parsing/replacement process... Also, data types beyond the basic token domain (node, user, etc) aren't managed...

eaton’s picture

The latest round of checkins to Token (http://drupal.org/project/cvs/106016) have fleshed things out better. Almost all the core tokens that come with Token 1.x in Drupal 6 are now supported in HEAD, with some nice additions. [comment:node:author:last-login:short], [term:vocabulary:node-count], [node:poll-votes], and other tricks are now supported.

There's now a hook_token_info() that allows modules to announce information about the tokens they provide, noting the token type, the token itself, and friendly-name/description pairs for each one. It's also possible to document chains in this format by setting the 'references' key in hook_token_info() to the token type in question. For example:

  $data['comment']['node']['name'] = t('Node');
  $data['comment']['node']['description'] = t('The node the comment was posted to.');
  $data['comment']['node']['references'] = 'node';

And, for those who haven't been keeping track of the details, actually using the token system consists of:

token_replace($my_string, array('node' => $node));

In the case of complicated scenarios with lots of nodes or users or what-not, the second parameter could contain other stuff. As mentioned earlier, array('bookmarked' => $node, 'bookmarker' => $account) could be passed in as well, and the 'bookmark' module could use the chaining system to fill out: "Hi, [bookmarked:author:name]! [bookmarker:name] just bookmarked your node, [bookmarked:title]."

Emphasis on "All of this is working." Greggles and I will most likely be turning this into Token 2.x for Drupal 6, as well as Token for D7, if it doesn't go into core. I'd really like to see it hit core, though, as I believe it overcomes a lot of the old problems with a relatively small amount of code and a LOT of flexibility for things like automated email and pathauto in core.

One of the biggest challenges right now -- the part that I haven't had a chance to tackle at all -- is how to display the metadata about what tokens are currently available. Using the data returned by token_info(), we could build a ginormous themed table the way the Token 1.x does. We could make a collapsible JS-driven thing that lets people navigate through a hierarchy of chained tokens. We could fire off the tree of token names and chaining info as JSON, and do auto-complete. I don't know what the right approach is, and I think it's a UX challenge a bit beyond my skills. Perhaps while we iron out the API side, "Inputting and browsing tokens" it could be turned into a UX microproject?

moshe weitzman’s picture

Bojhan’s picture

eaton : Just create an issue, for the UI and we will see how far we can come from within the community.

fago’s picture

>One option that's been suggested is to allow modules themselves to supply a callback that would be applied to all the actual token replacement values before str_replace() is actually called. In pathauto's case, this could do spaces-to-dashes conversion and trunkate too-long values before they're replaced. I'm not sure if this is flexible enough for what you're describing though.

I don't see why it shouldn't be possible in that case too, but nevermind.

Another point:

>As mentioned earlier, array('bookmarked' => $node, 'bookmarker' => $account) could be passed in as well, and the 'bookmark' module could use the chaining system to fill out: "Hi, [bookmarked:author:name]! [bookmarker:name] just bookmarked your node, [bookmarked:title]."

So this would require the *using* module to implement hook_token_info() to provide token with the info how to deal with boomarker/bookmarked, right? So why put that into the hook, when it belongs to just one token usage? Even worse, what if "bookmarked" is sometimes a node and sometimes a comment?
Imo the above concept of "$scopes" would fit better here, just pass it to token once it's generating help or does the actual replacement:

* @param $scopes
*   (optional) An associative array having the token prefix as key and the
*   class of substitution tokens to use as value, e.g.
*   @code
*     array(
*       'comment' => 'comment',
*       'parent-comment' => 'comment',
*       'post' => 'node',
*     )
*   @endcode
*   If no scopes are specified, only 'global' (site-wide) substitution tokens
*   are returned.
eaton’s picture

So this would require the *using* module to implement hook_token_info() to provide token with the info how to deal with boomarker/bookmarked, right?

Nope. Token makes absolutely no use of hook_token_info() metadata during actual token processing. I'll try to take another stab at explaining:

A module that wants to use tokens in some of its user-entered text (Let's say, a customized email that gets sent out when someone posts a node) gathers up the objects that are relevant to its token substitution. In this case, the node that was posted. It puts that in a keyed array that corresponds to the 'token' type. For example:

function mymodule_send_mail($node) {
  $data = array('node' => $node);
  $text = variable_get('mymodule_email_text', 'Hi, [node:author:name]! Thanks for submitting your story, [node:title]!');
  $text = token_replace($text, $data);
}

Token regexes the $text string for things that match the pattern [type:token], matching the type and the token as separate 'chunks'. Note that there are no longer 'global' tokens in this model that require no 'type' -- things like site name are entered using [ site:name ] and [site:slogan], while the currently logged in user is entered using [user:name] style tokens.

Once the list of 'found' tokens is generated, each 'type' of tokens is passed along to hook_tokens() in turn, along with the data. Tokens passed to modules at this point are in a more streamlined form: 'author:name' instead of [node:author:name] and 'title' instead of [node:title].

At this point, the modules can check if they 'know' any of the tokens that are ready to be replaced and supply substitution values, either based on global information (like the site name or the currently logged in user), or information in the $data variable (like the node that was passed in initially).

In addition, modules can scan the list of tokens being handled for 'complex' tokens -- ones like 'author:name'.

    if ($author_tokens = token_match_prefix($tokens, 'author')) {
      $author = user_load($node->uid);
      $replacements += module_invoke_all('tokens', 'user', $author_tokens, array('user' => $author));
    }

That snippet of code doesn't rely on any metadata; it's just loading the proper account based that prefix, and handing off any 'author:' tokens to be processed as if they'd been prefixed with 'user:'.

All of the token replacements are rolled up in the $replacements variable, which gets returned to token module when all the hooks have fired, and it uses str_replace() to manipulate the original $text variable.

None of the steps above use or require any of the metadata -- any discussion of hook_token_info() is only about providing help text or inline documentation for users that need help selecting tokens to enter into a form. That's one of the reasons that I'm hesitant to move down the road of "lots of heavy metadata and data typing in hook_token_info()" -- building the chaining relationships from the metadata would require a very different processing model, and one that I've got more performance concerns about.

Maybe I'm missing something?

Frando’s picture

No, I don't think your missing something. That model sounds very clean, simple yet powerful and performant.

sun’s picture

I think I have to agree with fago and others. We don't necessarily need to invent a hook for a standardized object description schema though.

But there might be issues for token providers, resp. contrib modules, enhancing tokens for certain entities if the consumer uses custom entity names without specifying the internal entity type.

Fictional use-case: A module enhances all nodes (maybe even of a certain node type only) with additional data and wants to provide this data as token.

To demonstrate, we have to intermix previous examples:

function mymodule_send_mail(&$message, $node) {
  global $user;

  $context = array(
  	'bookmarked' => $node,
  	'bookmarker' => $user,
 	);
  $text = variable_get('mymodule_email_text', 'Hi, [bookmarked:author:name]! [bookmarker:name] just bookmarked your node, [bookmarked:title].');
  $message['body'] = token_replace($text, $context);
}

function node_tokens() {
  // How does node.module know that 'bookmarked' is a node?
}

function bookmark_tokens() {
  // How can bookmark.module provide tokens both for 'node', 'bookmarked', and any other, arbitrary context name that may be a $node?
}

Sorry if that is already taken care of in some other way. I didn't look at the code yet.

eaton’s picture

Sun, those are excellent questions. Let me see if I can clarify:

// How does node.module know that 'bookmarked' is a node? // How can bookmark.module provide tokens both for 'node', 'bookmarked', and any other, arbitrary context name that may be a $node?

None of the modules that provide 'node' tokens have to do anything special. Node module doesn't have to know about 'bookmarked' at all -- all it knows is that when tokens with the type 'node' come in, it has replacement values for some of them, and it expects there to be an object in the 'node' key of the $data array.

There are some situations, though, where this 'dumb' approach doesn't work. For example, a user_relationships module that wants to send out an email whenever a friend request is posted. It needs to have tokens for two separate user accounts -- the sender and the recipient -- for it to make sense. In that situation, the user_relationships module could simply tell its users to use the [friender:...] and [friendee:...] types, rather than 'user'.

When the time comes to do the token replacement, it can put both accounts into the $data array. On the first pass through, no other modules will bother trying to do replacements -- because they don't know anything about the 'friender' and 'friendee' token types. But user_relationship module would use the following code:

if ($type == 'friendee')
  $replacements += module_invoke_all('tokens', 'user', $tokens, array('user' => $data['friendee']));
}

if ($type == 'friender')
  $replacements += module_invoke_all('tokens', 'user', $tokens, array('user' => $data['friender']));
}

That code simply passes along all 'friendee' token types, but tells other modules that they are actually 'user' types, and populates the $data['user'] key with what was originally placed in $data['friendee']. It's a bucket-brigade mechanism that keeps each token-implementing module ignorant of the surrounding context, and frees us from having to encode lots of complex translation rules for different token types. If a module implementing hook_tokens 'knows' that a 'friendee' is really a user, it's also in the best position to hand off those tokens properly so that other modules don't have to worry about it.

To summarize: If a module uses custom entity-names to separate multiple instances of an entity type (like a sender user and recipient user), it is responsible for delegating those tokens and 'handing off' the processing of them under their proper entity names. Modules that provide custom 'user' tokens need not know about custom entity names used elsewhere, because they will be given a chance to replace them as if they were normal 'user' tokens.

And, thankfully, the lazy generation of the tokens themselves frees us from a lot of the nastiness of bulk-generating ALL of this stuff even if it's uneeded.

sun’s picture

Wow, thanks for clarifying, that makes perfectly sense! And seems to cover all aspects and issues that have been raised in this issue thus far.

We're close to completion then, right?

Actually, I'd propose to move token.module as token.module into core. That way, we keep the atomic functionality and also pave the way for getting a suitable UI for this into core, which has to live somewhere and most likely needs to provide a new FAPI #type along with attached JS/CSS. Cluttering all of that into includes/token.inc, includes/common.inc, or system.module, but also misc/token.js, modules/system/token.css, and includes/form.inc wouldn't make much sense to me.

eaton’s picture

Actually, I'd propose to move token.module as token.module into core. That way, we keep the atomic functionality and also pave the way for getting a suitable UI for this into core

I'm so-so on that, in part because token module itself is tiny (about 100 lines, total) and there are a lot of places in core where it would make sense to use it -- sending out user welcome emails, etc -- that would turn it into a 'required module'.

The bulk of the code -- the modulename.tokens.inc files -- would need to be moved to the respective module directories, too. Realistically, all the patch would need to do is move those files and stick the functions from token.module into common.inc. That, and remove the token_registry_files_alter() line, as it would no longer be necessary.

I'm open to any discussion on that aspect though; I just have an aversion to 'required modules'. ;-)

One other thing I'm ironing out right now is that hook_token_info() may need to document 'tokens' and 'token types' alike. Otherwise it will be pretty difficult to provide user-friendly listings. Right now we only have the key-names like 'node' and 'user' and 'site' to go on. Fine for code but so-so for helptext. Tweaking the stuff in token HEAD today.

apaderno’s picture

Moving the token.module code into common.inc would make possible to have the code available in all the cases where not all the modules are loaded, but common.inc is.

eaton’s picture

FileSize
35.23 KB

Attached is a patch that contains the same code as Token HEAD, with token functions moved to common.inc and [modulename].tokens.inc files moved to their respective modules.

Conceptually, I'm calling this code 'done' - it's what we will be using for Token 2.0 in Drupal 6, barring something dramatic. The module-specific token implementations are rough, and have some gaps, but it's certainly orders of magnitude past what we have in core at the moment (aka, nothing).

The purpose of the patch is to see what people think about the approach, and perhaps run some light testing. Tokens have been implemented for comments, nodes, taxonomy terms, users, site settings like name and slogan, dates, and some node addons like statistics and poll results. Chaining, in the form of tokens like [comment:node:author:mail], works. What is left to be done?

  1. Tokens for more core entities, and more node data. Specifically, menu-related information for nodes, taxonomy data for nodes, and field data for any fieldable entity. The mechanisms in place can handle them, I just haven't written any code for those specific cases. So you can't, say, use [node:menu-title] or [node:top-term:tid] yet.
  2. Unit tests. The token parsing and replacement itself is pretty lightweight and tests should be easy; the tokens themselves may need a different approach. Any thoughts?
  3. Use token in core. Existing core modules that do token replacement need to use it. User notification emails and actions that send emails or display text to the user can both be upgraded pretty easily.
  4. Online help. How should we display help and instructions in fields that deal with tokens? Working with Bojhan, I'm opening up another issue for this question. The metadata provided by the token_info() function should provide us with all the raw materials, but presentation is a tricky question.
  5. Upgrade path for existing tokens in core email text. Right now core uses tokens like %user-name. Token uses [user:name]. We need to swap those out.

How can you help test this patch? If you are feeling daring, try porting some core features (like email sending or actions) to use the token_replace() function. Otherwise, Devel module can do simple token substitution from the PHP Evaluation block. dsm(token_replace('Hi, [user:name]! You last logged in on [user:last-login:short].'); gives you a bit of a starting point for experimentation. dsm(token_info()); gives you a full list of the token-types and tokens that are supported by the current code.

The most valuable feedback at this point is whether the approach that's being taken here is considered generally acceptable for core. Dries? Webchick? Thoughts on this? Two to three years ago, Token 1.x was considered not up to snuff. Does this approach fit the bill?

eaton’s picture

Status: Needs work » Needs review

Based on webchick's suggestion, I'm marking this "Needs review." While portions of the code still need to be completed, we are at a key turning point in the design, and that aspect of the patch now needs review before further energy is invested in the 'make changes to core' portion.

eaton’s picture

(Also worth noting: this would make pathauto in core much simpler, since the current Pathauto module relies entirely on Token.)

Crell’s picture

Subscribe!

fago’s picture

>To summarize: If a module uses custom entity-names to separate multiple instances of an entity type (like a sender user and recipient user), it is responsible for delegating those tokens and 'handing off' the processing of them under their proper entity names. Modules that provide custom 'user' tokens need not know about custom entity names used elsewhere, because they will be given a chance to replace them as if they were normal 'user' tokens.

Oh, sry I understood that (a powerful concept!), but I mixed things up above - sry. I meant the *calling* module has to implement hook_token() (not hook_token_info()) just to tell the system what 'bookmarked' and 'bookmarker' actually is (node+user). However I think it shouldn't be necessary for a caller just consuming tokens to do so, but apart from that this way

  • this "names" are unnecessarily in a global name space, so modules would have to prefix them or there might be clashes
  • supporting different cases, where the bookmarked entity (node, users, ..) changes (think of flags) isn't possible in a clean way. How to tell hook_token_info() to display the right help? How to let the own hook_token() implementation know about the entity type? Yes, one could wrap the entity and pass the info along with it, but that is what I wouldn't call clean.

Thus adding a call time parameter $scopes (as described above) to token_replace() and token_help() would solve this by doing a simple lookup in that array.

recidive’s picture

I did just a visual inspection of patch and have some suggestions on code style:

+    foreach($tokens as $name => $original) {
+      if ($name == 'uid') $replacements[$original] = $account->uid;
+      elseif ($name == 'name') $replacements[$original] = check_plain($account->name);
+      elseif ($name == 'mail') $replacements[$original] = check_plain($account->mail);
+      elseif ($name == 'url') $replacements[$original] = url("user/$account->uid");
+      elseif ($name == 'edit-url') $replacements[$original] = url("user/$account->uid/edit");
+    }

That code and the like is not Drupal code style compilant. This seems a good place to use switch statemets.

+function user_token_info() {
+  // Metadata for token types.
+  $data['types']['user']['name'] = t('Users');
+  $data['types']['user']['description'] = t('Tokens related to individual user accounts.');
+
+  // Basic data for users
+  $data['tokens']['user']['uid']['name'] = t('User ID');
+  $data['tokens']['user']['uid']['description'] = t('The unique ID of the user account.');
...
+}

Should not we split that info into two hooks: hook_tokens_types() and hook_tokens_tokens() to cut down one level off from the array?

Also using array() notation seems more appropiated. This way the above code would be written as:

function user_token_types() {
  $types = array();

  $types['user'] => array(
    'name' => t('Users'),
    'description' => t('Tokens related to individual user accounts.'),
  );

  return $types;
}

function user_token_tokens() {
  $tokens = array();

  $tokens['user'] => array();
  $tokens['user']['uid'] => array(
    'name' => t('User ID'),
    'description' => t('The unique ID of the user account.'),
  );

  return $tokens;
}

That way looks more readable to me. What do you think?

Looking forward to test the patch, though I don't have the time now.

Great work so far!

recidive’s picture

I think it is also worth considering splitting hooks by type, so:

function user_token_types() {
  $types = array();

  $types['user'] => array(
    'name' => t('Users'),
    'description' => t('Tokens related to individual user accounts.'),
  );

  return $types;
}

function user_token_user_tokens() {
  $tokens = array();

  $tokens['uid'] => array(
    'name' => t('User ID'),
    'description' => t('The unique ID of the user account.'),
  );

  return $tokens;
}

function user_token_user_replace($tokens, $data, $safe_for_html = TRUE) {
  // Do the fun stuff.
}
drewish’s picture

@recidive in #128 i think you'll actually find that replacing the if elseif bit with a switch statement would eat up more lines and be harder to read than just adding the curly braces.

eaton’s picture

Thanks for the awesome feedback!

# supporting different cases, where the bookmarked entity (node, users, ..) changes (think of flags) isn't possible in a clean way. How to tell hook_token_info() to display the right help? How to let the own hook_token() implementation know about the entity type? Yes, one could wrap the entity and pass the info along with it, but that is what I wouldn't call clean.

My use of 'bookmarked' and 'bookmarkee' was probably a bad one -- there's nothing that would prevent a module from simply using 'node' and 'user' unmodified in that case. The trick is in cases where there are multiple entities of a given type, like 'sender' and 'recipient' or 'friender' and 'friendee'. In this scenario, it's not just a matter of saying, 'recipient is the same as user' -- they are not simply aliases for 'user' but ways of capturing two different users that need to be tokenized, keeping them separate.

In theory, we could set up Token to internally remap those 'aliases' to their canonical entity types, shuffling around the $data array to make everything line up, then hand things off to modules. But then there would be no way for another module to actually add a new kind of 'recipient' or 'sender' token; for example, '[recipient:is-already-a-friend-of-sender]' or something like that that the original module didn't account for. That token would require actual knowledge of both the sender and the recipient, which would be lost if Token automatically mapped it to 'user' behind the scenes.

I'm willing to go with it, I just want to think very hard before adding any complexity to the token processing code itself. Right now, that's about 130 lines of code including PHPDoc, and I'm hoping that it can be kept in that range...

Thus adding a call time parameter $scopes (as described above) to token_replace() and token_help() would solve this by doing a simple lookup in that array.

I can definitely see it being useful for the help/info stuff -- saying, "I only want to list 'Site' and 'User' tokens on the help screen" is a reasonable use case. I'm not sure how explicitly passing in which token types should be replaced in token_replace() helps, though. That was done in token 1.0 only because of the lack of namespacing for tokens...?

Should not we split that info into two hooks: hook_tokens_types() and hook_tokens_tokens() to cut down one level off from the array?

Feels like a tossup to me. I'd want to keep hook_token_info() unified, as most modules building help/info displays would need information on both types and tokens, but if it makes sense to split the underlying information hooks, so be it. ;-)

I think it is also worth considering splitting hooks by type, so:

I briefly looked this, but was concerned that it would prevent novel one-off token replacements from being implemented cleanly. A module can, for example, use hook_tokens() to inspect all incoming tokens regardless of their type. Are there strong feelings pro or con on this?

Also using array() notation seems more appropiated. This way the above code would be written as...

Yeah, the inline ifs and ifelses, as well as the strings of $foo['bar']['baz'] = 'blah'; assignments were parts I wasn't particularly thrilled with. The alternatives ended up being almost twice the number of lines and trickier for cutting-and-pasting when I was going through and cranking out dozens of tokens the other afternoon, so I took the shortcut. I have no problem with it being brushed up to match the code standards, I just wanted to focus on 'keeping it easy to search/replace on while the data structure was still in flux.'

fago’s picture

@scope:
>I'm not sure how explicitly passing in which token types should be replaced in token_replace() helps, though. That was done in token 1.0 only because of the lack of namespacing for tokens...?

To lookup the actual type for passed data - as alternative way to support passing e.g. multiple users to token by just using the "user type" without introducing new specialized "types" like "recipient" and "sender".

>In this scenario, it's not just a matter of saying, 'recipient is the same as user' -- they are not simply aliases for 'user' but ways of capturing two different users that need to be tokenized, keeping them separate.

I see!

However if one has to introduce new "token types" for each token replacement with multiple data items of the same type, I fear this could lead to name clashes as all modules share the same name space? Furthermore adding new "token types" would be the only possibility to customize the used prefixes, right? Consider a module that just wants to assign the prefix 'sender' to a node and another module to a user?

eaton’s picture

However if one has to introduce new "token types" for each token replacement with multiple data items of the same type, I fear this could lead to name clashes as all modules share the same name space? Furthermore adding new "token types" would be the only possibility to customize the used prefixes, right? Consider a module that just wants to assign the prefix 'sender' to a node and another module to a user?

Yeah, I can see that becoming potentially problematic if there is a huge call for replacement operations with lots of similarly named entities. I'm not terribly worried about it at the moment, in part because token has done reasonably well so far without any support for multiple-entities-of-the-same-type at all. But if there's a simple and relatively clean way of working around this, it's worth considering...

sun’s picture

 * Custom tokens wrapping an existing system object should be exposed and
 * implemented using the module name as prefix, i.e. [mymodule-differentuser].

I don't see why we cannot simply define a development standard like this.

eaton’s picture

sun, I think that's reasonable. I don't really like super long token names, but it's reasonable considering the relatively small number of cases where this seems to be needed.

matt2000’s picture

However if one has to introduce new "token types" for each token replacement with multiple data items of the same type, I fear this could lead to name clashes as all modules share the same name space? Furthermore adding new "token types" would be the only possibility to customize the used prefixes, right? Consider a module that just wants to assign the prefix 'sender' to a node and another module to a user?

I would have assumed the correct way would be e.g., [mymodule:sender:mail].

I.e., the first argument is always the module that defines the token. I think this could be standardized in core by switching [site:x] for [system:x] ? Or are there other exceptions I'm not thinking of?

In fact, couldn't namespace collision be prevented by enforcing this, e.g., check module_exists() on the first argument before anything else?

I haven't looked at the code. Just trying to give an outside perspective. Drupal trains one fairly well to think about namespace issues, and that naturally translates to tokens, so either way, I don't think it's an issue, even if it's left as it is.

apaderno’s picture

To use a prefix for the module that implements the token seems reasonable; it would help in the lazy generation because it would not required to ask to each modules which tokens they implement.
Maybe there are some cons on using the module prefix, but I cannot think of any of them.

eaton’s picture

I would have assumed the correct way would be e.g., [mymodule:sender:mail].

Actually [modulename-sender:mail] would be cleaner; that would treat 'modulename-sender' as a token type and 'mail' as the token itself. There's no need to make 'modulename' one type of token and 'sender' another if we never want people creating new arbitrary types like 'sender' at all.

I'd *personally* prefer simple prefixes like 'sender' and 'recipient', which seem to intuitively suggest user accounts and so on, but I can see how there could be potential for collision there.

I'm not sure about prefixing things with the modulename automatically -- date related tokens, for example, aren't provided by date module at present. They're provided by system module because they're a basic fundamental type. 'site' and 'system' could probably be thought of as synonymous, but [site:name] and [site:slogan] and so on felt more meaningful. Any thoughts?

eaton’s picture

Also, #514990: Add a UI for browsing tokens is the new issue for any brainstorming/discussion about the UI for presenting possible tokens to end users...

apaderno’s picture

'site' seems preferable to me too; it's less "technical", and it's shorter.

matt2000’s picture

I'm not sure about prefixing things with the modulename automatically -- date related tokens, for example, aren't provided by date module at present. They're provided by system module because they're a basic fundamental type. 'site' and 'system' could probably be thought of as synonymous, but [site:name] and [site:slogan] and so on felt more meaningful. Any thoughts?

I think it's pretty valuable to maintain the precedent of module name defining the namespaces. How else can you as a module developer ensure interoperability with other arbitrary modules of which you have no knowledge?

Doesn't the first argument of the token (i.e., what you've been calling the 'token type') really determine what module handles the token most of the time anyway? Even with date type tokens, I don't think you're suggesting [date:node:created], but just [node:created] and then node module is responsible for determine what 'created' means and who handles it.

[site:x] or [system:x] is really a side issue. I agree 'site' is more intuitive, and wouldn't have any objection to hardcoding a special exception for that. Or maybe we allow a hook_token_alter to define 'aliases' for tokens, and get the best of both worlds...

This would

  • Prevent accidental namespace collision
  • Allow for intentional overrides of tokens
  • Allow for both 'developer friendly' and 'user friendly' tokens (e.g., system_token_alter could redirect
  • Possible performance improvement by calling module_invoke instead of module_invoke_all('tokens') ?

Example:

<?php
function token_generate(array $raw_tokens, array $data = array(), $safe_for_html = TRUE) {
  $results = array();
  foreach ($raw_tokens as $type => $tokens) {
    $defined_tokens = module_invoke($type, 'token_info');
    if (!module_exists($type) OR !in_array($defined_tokens['type'][$type])) {
      $type = module_invoke_all('token_alter', $type);
    } else {
      $results = array_merge_recursive($results, module_invoke($type, 'tokens', $tokens, $data));
    }
  }
  return $results;
}
?>

(An alternate approach instead of a hook_token_alter might be to add 'alias' to token_info and then $defined_tokens = module_invoke_all('token_info'); ... $type = $defined_tokens['aliases'][$type]; )

So, we have a convention, we allow exceptions to the convention, but place the burden of namespace concern on the developer who wants to break convention. I can be sure that [mymodule:mytoken] always works the way I expect. And if I really want to create [othertoken:stuff] in mymodule.module, then I can, but I'm responsible for dealing with if module_exists('othertoken') .

Specifically, if system module wants to integrate [date:today] it can, but it has to defer to date.module if date module implements the token, which is exactly how core should relate to contrib. system.module can implement hook_token_alter to map [site:x] to [system:x], but I can write site.module if I want to override the way [site:x] works, etc.

All that said, I don't want to appear as a dissenter. I'd be thrilled to see the patch as given in core.

eaton’s picture

Doesn't the first argument of the token (i.e., what you've been calling the 'token type') really determine what module handles the token most of the time anyway? Even with date type tokens, I don't think you're suggesting [date:node:created], but just [node:created] and then node module is responsible for determine what 'created' means and who handles it.

No, on the contrary -- I'm trying as hard as possible to avoid that! Three examples are the [node:comment-count], [node:poll-votes], and [node:total-visits] tokens supplied by the comment, poll, and statistics modules. The 'token type' is about the kind of thing that is being dealt with rather than who is dealing with it.

[site:x] or [system:x] is really a side issue. I agree 'site' is more intuitive, and wouldn't have any objection to hardcoding a special exception for that. Or maybe we allow a hook_token_alter to define 'aliases' for tokens, and get the best of both worlds...

Actually, site-wide information is an excellent example of the difference between entity-centric tokens versus module-centric tokens. If the site has a default language provided by the locale module, should that be [locale:language] or [site:language]? I'd say the latter. Similarly, [user:friend-count] or [user:openid-url] make sense as tokens while defining custom token types for openid and 'friend module' would be (IMO at least) confusing. We get a slightly distorted idea when we just look at core modules, because there is a fairly strong emphasis on one module providing one sort of 'entity', but once contrib comes into play, most of what happens is extending existing entities like users, nodes, comments, and so on.

Specifically, if system module wants to integrate [date:today] it can, but it has to defer to date.module if date module implements the token, which is exactly how core should relate to contrib. system.module can implement hook_token_alter to map [site:x] to [system:x], but I can write site.module if I want to override the way [site:x] works, etc.

At least up to this point, that's not how tokens have worked. The kinds of tokens being provided ([site:name] and so on) are not overridden -- modules just add additional tokens as they add more data to the site. Date module, for example, would not 'take over' the handling of date tokens. (Well, it COULD by processing things agressively and giving itself a low weight, but that's another matter). It could, though, provide additional more precise tokens, or account for timezone offsets, and so on with its own tokens.

All that said, I don't want to appear as a dissenter. I'd be thrilled to see the patch as given in core.

Oh, absoutely keep these conversations up. I have some strong ideas about how it works/can work, but input from the community and other developers is essential. Greggles and I have been watching a lot of token usage over the past several years and I think we have a pretty good handle on how this stuff can work together, but this kind of discussion and wrangling is essential for a system that will serve core and contrib well...

greggles’s picture

Thanks for your work on this, eaton. As you know from our discussions in IRC I really like where this is now and am trying to make time to upgrade Pathauto to work with this version of token.

No, on the contrary -- I'm trying as hard as possible to avoid that!

Yes! Modules that add data onto nodes should be able to respond to a token in the [node:foo] space. It is much more confusing to users, for example, to have [imagefield:field_image_product_photo] prefix when in their mind it's really related to the node at the base like [node:field_image_product_photo].

matt2000’s picture

OK, thanks for the clear response. I know I'm coming late to the game. I only worked through the code enough to write my previous example. I think I overestimated the nature of chaining.

To use your example, if you had asked me what token to use for a user's open-id url in a node, I would have guessed [user:openid:url], thinking that the chain used automatic stripping. I.e., the token would go to user module, which does not implement an 'open-id' token, so then 'user:' would be stripped and and [openid:url] would be passed on, which would get picked up by openid module and dealt with appropriately.

It might make more sense with something like location module, where a node and users might both have locations. So node module gets to decide how to deal with [node:location:city], user gets to deal with [user:location:city], and node module can pass off responsibility to user module for [node:author:location:city], but if node or user don't have anything to say about it, it becomes [location:city] automatically, and location module is responsible for determining whether to default to node, node:author, or global user, depending on which location submodules are enabled, or whatever criteria it judges most appropriate for user expectations.

This way, it seems we get the advantages of the module-centric tokens, without losing anything from the entity-centric thinking. But I thought we were already closer to this than we really are.

Either way, I still think avoiding namespace collision is a key concern. Am I over-estimating the problem? Or is there some other way to prevent collision?

yched’s picture

It is much more confusing to users, for example, to have [imagefield:field_image_product_photo] prefix when in their mind it's really related to the node at the base like [node:field_image_product_photo].

Esp. since fields can be attached to non-nodes. The object holding the token *has* to be specified...

matt2000’s picture

re #145,

Wouldn't [node:field:image_product_photo] cover both concerns, if we had the chaining-to-defaults? And it would be more user friendly, because you can do [field:image_product_photo] without having wonder what the heck a node is. Field module or whoever get responsibility for the field token type know it's attached to a node, because token_replace passed it a node object.

eaton’s picture

After some careful thinking, I believe I see what fago is getting about about the 'foo type is the same as bar type' token type equivalency issues. While none of that is necessary for token PROCESSING, it's important to be able to generate a list of tokens for users that reflects what will be useful.

One possibility is to say, "Please show me these particular types of tokens." Another is to say, "I will provide these entities -- show me the tokens that are appropriate for them." I think the latter is probably the best match... And it's also where the 'token type X is an alias for token type T' would come in handy. A module could say, "I need a list of available tokens -- and I'll provide a user, a node, and a taxonomy term..."

Hmmm. I'll have to ponder this. I still think we'll do fine without any explicit aliasing on the token-generation side, but when building the help text, we may need that. Thanks!

apaderno’s picture

Just to have an example of a module that uses token aliases, see Rules; the module offers a token for the current logged in user, and a token for the user to act on.
Maybe it's a particular need, and it seems that Rules handles it without any help from token.module. For sure, its code would be lighter if token.module would handle aliases; it's all to see if it would be worth to implement entity aliases.

eaton’s picture

Just to have an example of a module that uses token aliases, see Rules; the module offers a token for the current logged in user, and a token for the user to act on.
Maybe it's a particular need, and it seems that Rules handles it without any help from token.module. For sure, its code would be lighter if token.module would handle aliases; it's all to see if it would be worth to implement entity aliases.

Well, the current system, again, does handle what amounts to entity aliases. A module can look for tokens that match one string and pass them on as if they were another. I'm still very hesitant to attempt any automatic alias delegation or chaining in the token code itself, though -- what we're talking about right now is still purely for display of help text.

Although what you mention -- the distinction between a [current-user:...] and a [user:...] token might be useful. Thoughts? That kind of stuff is very very easy, so the important part is agreeing on whether or not it should be done...

eaton’s picture

Wouldn't [node:field:image_product_photo] cover both concerns, if we had the chaining-to-defaults? And it would be more user friendly, because you can do [field:image_product_photo] without having wonder what the heck a node is. Field module or whoever get responsibility for the field token type know it's attached to a node, because token_replace passed it a node object.

It would probably be more straightforward to implement something like [node:field-image-product-photo] or [node:field-image-product-photo:url] etc. The idea of 'chaining' isn't necessarily to take a 'node', and say, 'I want to do something with fields,' and then 'This particular field...' Rather, it's a way to say, 'This particular token (say, field-image-product-photo) is actually a pointer to all the tokens associated with an imagefield.' For example:

[node:author:foo] In this case 'author' is being used as an effective alias for the generic token type, 'user'.
[node:field-name:baz] In this case 'field-name' is being used as an effective alias for the generic field type that field-name uses.
[user:created:short] In this case 'created' is being used as an effective alias for the generic token type, 'date'.

This raises an excellent point of course: what the guidelines should be on how these things work when there is a lot of leeway for developers. Heh.

One thing I'm considering. Should 'chaining points' like [node:author] and [user:created] and [comment:node] have some sort of automatic default? i.e., should each chaining point come with a sensible default?

[node:author] == [node:author:name]
[user:created] == [user:created:small]
[comment:node] == [comment:node:title]

Hmm? Hmm.

quicksketch’s picture

There are many places in token replacement that don't allow access to the "un-cleansed" versions even if specifically requested. For example:

elseif ($name == 'name') $replacements[$original] = check_plain($comment->name)

This makes it so that I can't insert the value as $item['#value'] or into the database. This same pattern happens all over the replacement code. Other than that, *awesome*!

quicksketch’s picture

Status: Needs review » Needs work
eaton’s picture

There are many places in token replacement that don't allow access to the "un-cleansed" versions even if specifically requested. For example:

Good catch. I had that on an internal to-do for a while but it was missed. That, along with a move to switch/case statements for a lot of the tokens, is a necessary improvement.

I'm teetering on the edge of rolling a new version with some of the tweaks we've been talking about (encoding the 'alias' information in hook_token_info for better help display, some of these code style cleanups, etc.) Is there any chance of getting a check-in from dries or webchick on the general approach that's being pursued now?

eaton’s picture

FileSize
43.24 KB

Attached is an updated version of the patch. Doesn't include any of the code style fixes that have been discussed, but fixes a couple of bugs (including an error in token_match_prefix that resulted in false positives and needless module_invoke_all() calls.) Also implemented a 'file' token type in system.tokens.inc, and added upload.tokens.inc, which maps the first file attachment on a node to [node:upload...]

So, [node:upload:url] will give you the web addressable URL of the first attachment to a node. Any files from the files table can also be used with [file:...] tokens.

Talking extensively with Crell about ways to make reduce the number of hook invocations; I think we're at an acceptable level now, as it only happens a lot when there are deeply deeply chained tokens, like the silly 'And now this is possible' examples I'm posting. Further updates as events warrant. All feedback appreciated!

eaton’s picture

FileSize
53.33 KB
42.91 KB

...And one more rolled with fixes to some additional tokens.

For your testing convenience, I've also created a smallish module called 'tokin' that gives you a raw textfield and a dropdown from which you can choose nodes and comments. From there, you can test various user, date, node, etc tokens.

Attached is a screenshot, and a zipped copy of the module itself.

eaton’s picture

FileSize
2.74 KB

Whoops. Here's the module.

eaton’s picture

FileSize
42.34 KB

Whoops. Accidentally let a change to .htaccess for my localhost slip into the patch. Fixed!

Anonymous’s picture

eaton’s picture

FileSize
42.44 KB

bangpound, thanks!

The attached tweak of the patch incorporates several issues spotted by tha_sun in a review in #drupal_dev. Specifically, the $safe_for_html boolean is ambiguously named: it's now called $sanitize, and indicates that a hook_tokens() implementation should sanitize replacement values before returning them. We've also removed that flag from the token_replace() function itself -- only modules that call token_generate() themselves, and perform their own sanitization of the individual token values, should ever request 'raw' unsanitized tokens. Pathauto is an example of that, but we don't want people blindly calling token_replace() and setting it to FALSE.

eaton’s picture

FileSize
42.44 KB

And finally, because you can never have enough pointless minor re-rolls of a monster patch, a corrected version of the regex we're using to match tokens. Again, thanks to tha_sun on spotting ways to make it cleaner and more accurate.

apaderno’s picture

+    foreach($tokens as $name => $original) {
+      if ($name == 'cid') $replacements[$original] = $comment->cid;
+      elseif ($name == 'nid') $replacements[$original] = $comment->nid;
+      elseif ($name == 'uid') $replacements[$original] = $comment->uid;
+      elseif ($name == 'pid') $replacements[$original] = $comment->pid;
+      elseif ($name == 'hostname') $replacements[$original] = check_plain($comment->hostname);
+      elseif ($name == 'name') $replacements[$original] = check_plain($comment->name);
+      elseif ($name == 'mail') $replacements[$original] = check_plain($comment->mail);
+      elseif ($name == 'homepage') $replacements[$original] = check_url($comment->homepage);
+      elseif ($name == 'title') $replacements[$original] = check_plain($comment->subject);

Rather than having that big IF-ELSEIF-ELSE statement, would not it be a SWITCH statement better? The IF-ELSEIF branches are then on a single line, when they should be on three lines.

eaton’s picture

Rather than having that big IF-ELSEIF-ELSE statement, would not it be a SWITCH statement better? The IF-ELSEIF branches are then on a single line, when they should be on three lines.

Yes. Converting all the ifelse single-line stuff to case statements is one of the major to-dos at present. We're hoping for feedback from webchick or dries on the general direction that's been taken before diving into all the syntax cleanup stuff.

drewish’s picture

please see my comment on #130 about the case statement. just put the freaking curly braces around the ifelses.

eaton’s picture

Status: Needs work » Needs review
FileSize
47.88 KB

drewish, after a bit of careful examining I think the case statements might actually be cleaner to read. I did a bunch of side-by-side comparisons, and it feels a lot more consistent and less cluttered, in part because there's less variation between the one-liners and the two-line token implementations (format a variable, then set it, or load an object first, etc.)

I've gone through and switched everything to switch/case, and ensured that all the tokens in need of sanitization respect the $sanitize flag.

In addition, chained tokens like [node:author:name] have reasonable fallbacks. [node:created] is equivalent to [node:created:medium], for example, and does NOT trigger a module_invoke_all(). That's probably a good guideline - if your module adds a 'chaining point' for a given token, it should implement a reasonable default token as well.

Setting this back to 'needs review' - there are some important TODOs but the code as it stands is much much cleaner than it was a few days ago.

SeanBannister’s picture

Sub

eaton’s picture

FileSize
6.07 KB

It doesn't include the help pointing to the proper tokens yet, but this quickly-rolled patch against system.module converts the 'display message to user' and 'send email' actions to use tokens rather than custom-building replacement values.

It eliminates a hundred lines of code and dramatically expands the number of replacements available in those actions -- sending out an email to the person who posted a comment, and the person who posted the original node, letting them know that they might want to chat with each other? Easy to do now, using nothing but tokens and trigger.module.

It's an example of why centralizing our placeholder replacement code can result in much simpler and easier to maintain code in other areas.

dww’s picture

Status: Needs review » Needs work

@eaton, greggles: way to go with the re-write -- this all looks amazing! Just briefly catching up (been slammed with flu and various personal crises, pretty out of the loop these days).

I'm still worried by one lingering issue, which is exactly the same thing I raised in #38 above. $sanitize for what context?

For example, here's some relevant lines of the patches from #164 and #166:

#164:

function node_tokens($type, $tokens, $data, $sanitize = TRUE) {
  $replacements = array();
  ...
        case 'title':
          $replacements[$original] = $sanitize ? check_plain($node->title) : $node->title;
          break;
  ...
}

#166:

function system_message_action(&$object, $context = array()) {
  ...
-  if (isset($node) && is_object($node)) {
-    $variables = array_merge($variables, array(
-      '%uid' => $node->uid,
-      '%node_url' => url('node/' . $node->nid, array('absolute' => TRUE)),
-      '%node_type' => check_plain(node_type_get_name($node)),
-      '%title' => filter_xss($node->title),
-      '%teaser' => filter_xss($node->teaser),
-      '%body' => filter_xss($node->body),
-      )
-    );
-  }
-  $context['message'] = strtr($context['message'], $variables);
+  $context['message'] = token_replace($context['message'], $context);
  ...
}

#164 assumes that "sanitized" == check_plain(). If I'm composing a plain text email, I don't want HTML-escaped "plain" text, I want filter_xss(). (Actually, in plain text email, I want raw text, but whatever).

The point, again, is that only the token consumer can possibly know what "safe" means for the context that the token is being used in.

Instead of passing a $sanitize flag to token_replace(), I'd say that the caller either needs to pass a context string (might get clumsy, but something like 'html', 'plain text', 'url', etc), or it should pass a callback to use to sanitize the raw values. Or, something smarter you people figure out. ;) But, this one-sanitize-fits-all approach can't possibly work.

Rock onwards!
-Derek

eaton’s picture

Status: Needs work » Needs review

#164 assumes that "sanitized" == check_plain(). If I'm composing a plain text email, I don't want HTML-escaped "plain" text, I want filter_xss(). (Actually, in plain text email, I want raw text, but whatever).

Then we should go to filter_xss for the standard sanitization.

The point, again, is that only the token consumer can possibly know what "safe" means for the context that the token is being used in.

Instead of passing a $sanitize flag to token_replace(), I'd say that the caller either needs to pass a context string (might get clumsy, but something like 'html', 'plain text', 'url', etc), or it should pass a callback to use to sanitize the raw values. Or, something smarter you people figure out. ;) But, this one-sanitize-fits-all approach can't possibly work.

Our problem is twofold, but the solution (fortunately!) is really simple. First, yes -- we should be using filter_xss rather than check_plain() for the reasons you mention above. We should assume that someone who uses an HTML-filled token and wants plaintext will run it through their own scrubbing functions, as in the case of HTML-to-plaintext conversion for email. I'll fix that and re-roll.

Second, I need to clarify the intent of the $sanitize flag. In newer versions of the patch, there is no way to pass $sanitize into the basic token_replace() function. If you use token_replace(), you will get sanitized tokens back. Period, end of story. The flag itself is only passed into token_generate(), the internal function that wraps module_invoke_all('tokens'...). It's intended specifically for the case you describe: a module wants to get back all the token replacements and do its own sanitization/scrubbing/etc on the array of replacements. That's what Pathauto does currently.

One of the biggest problems that we faced in Token 1.x was the fact that token-providing modules were required to sanitize input, BUT provided [token-name-RAW] variations of each of their tokens so that pathauto could use them without choking. The 'raw' tokens were unsafe in every other location, but necessary when modules did their own custom scrubbing on a per-token basis. The $sanitize flag just hides that distinction from end users entirely, so that there is no confusion between 'A place where raw token values should be used, and a place where scrubbed ones should be used.'

eaton’s picture

Also, in earlier discussions I had argued for a passed-in callback function to be used for sanitizing. i.e., token_replace($text, $data, $callback_function) -- it would loop over the tokens and alter them after token generation. In #34, though, quicksketch noted that We'd still need the $sanitize flag because some tokens -- like [node:body] -- are dependent on the input filters used on the node. check_markup() has to be done when the token-generating code is actually holding onto the proper node, comment, etc. That can't be reconstructed in a post-toke-generation scrubbing function, so we would still need some sort of $safe flag to indicate whether raw values or sanitized/processed values should be returned.

Since both a callback and $safe flag would be necessary anyways, we dropped the definable callback function. If anyone really needs to do their own scrubbing, they can also do the following:

  $token_list = token_scan($text);
  $replacements = token_generate($token_list, $data, TRUE); // or FALSE if they want raw data
  
  $tokens = array();
  $values = array();
  foreach ($replacements as $token => $value) {
    $tokens[] = $token;
    $values[] = my_scrubbing_function($value);
  }
  
  return str_replace($tokens, $values, $text);
dww’s picture

Status: Needs review » Needs work

I'm not sure I buy it. ;) You're saying: "either you always get filter_xss()'ed token values, or you have to scrub your own through various tricky means". Even if I accept that, this still needs work to get all the sanitized tokens using filter_xss().

The example of [node:body] you point out illustrates why #169 won't work -- just iterating over the results of token_generate(), I don't actually have the data I need to call check_markup() properly. Plus, check_markup() is still providing "safe" data only for an HTML context.

Seems like you're proposing that all tokens go raw -> HTML -> something else if you need it. Sure, you *can* grab raw tokens directly, but in many cases, that's not good enough to get what you need, and you have to go through the intermediate state of HTML before you can sanitize it for your desired context.

Maybe I should give up, and let the above be Good Enough(tm) for most cases. But, it still seems like the best approach (if a bit clumsy) is to pass a $string_context flag into token_replace(), and pass that on down the chain, and then token providers can inspect the context and properly convert their values into something that's safe/appropriate for the requested context. Aside from being ugly, bloating each token producer's code, and not being able to think up all the right contexts ahead of time, what's wrong with that? ;)

dww’s picture

Meta: not all "scrubbing" involves safety. The deeper problem is encoding strings properly for the context they're being used. It's more about escaping the data properly for the desired context than a simple question of "safe" vs. "unsafe". If I'm using a string containing an ampersand as the text inside an <h3>, it needs to be escaped differently than the same string as the subject in an email, which is different from how it needs to be escaped as an element in a URL, which is different from how it needs to be escaped when used in a SQL query. Aside from the question of malice, even a "safe" string can be broken if you always encode it one way and try to use it in different contexts.

dww’s picture

Last point in favor of a $string_context flag, then I'll slither back into bed for the nap I so desperately need: only the token consumer knows the context a token is going to be used in, but the token producers know the best way to escape/filter/encode their tokens for any given context.

eaton’s picture

Status: Needs work » Needs review
FileSize
48.95 KB

As discussed above, here's a re-rolled version of the token patch itself with filter_xss used where appropriate, rather than check_plain(). In addition I've added proper PHPDoc @file blocks and "Implement hook_foo()" bits to the module.tokens.inc files, and various whitespace/coding standards issues.

Thanks, dww!

eaton’s picture

Last point in favor of a $string_context flag, then I'll slither back into bed for the nap I so desperately need: only the token consumer knows the context a token is going to be used in, but the token producers know the best way to escape/filter/encode their tokens for any given context.

I'm still a bit on the fence with this one, but I'm not sure what other contexts might arise that that could be reasonably accounted for in this way.

  • 'totally unsanitized' is handled by modules calling token_scan() and token_generate() themselves.
  • 'safe for display in HTML' is handled by using filter_xss as is the case in the current patch.
  • 'plain text' is one that we could hypothetically provide, but it could also be handled by generating HTML-sanitized replacements and scrubbing the final $text with drupal_html_to_text(). Am I missing something?
  • 'url safe' is another potential case, but pathauto currently handles that by going raw and processing everything itself. What constitutes a 'safe' string for the URL is something that pathauto has its own elaborate settings for; in the past, attempts by other developers to make URL-safe tokens have collided with Pathauto's own internally consistent mechanisms.

Are there others I'm missing? My fear is that we could easily turn this into the equivalent of the theming system... I definitely do see your point about 'contextually appropriate output' not just being a matter of xss filtering. That's one of the reasons I'm hoping that we can keep token's focus on a couple of key use cases: generating small snippets of text for URL aliasing, "Mail merge" style canned email messages, and so on.

The single biggest hurdle we've run into with the kind of approach you're talking about is still the 'formatted output' issue. i.e., $node body with $node format. When it comes to a better solution than the current one, I'm stumped. If there are better ideas, I'm all up for them, but the creation of global tokenizing contexts smells dangerous -- will we offer extensible contexts, like we do for node build modes? Hmmmm. This is precisely the point that this issue circled around for a year or so, and I'm very skittish about trying to make it any 'smarter' than it has to be...

Plus, check_markup() is still providing "safe" data only for an HTML context.

Are there specific use cases we can think of that would run into problems with this approach? If so, are those the use cases that we want to attempt to solve for at this point? Automated mail, messages to display to the user, auto_nodetitles, pathauto, Rules... I know that Rules is a definte edge case (and I'm not comfortable with its use of token as a substitute for object introspection, to be honest). But would it need anything beyond what the current patch provides?

eaton’s picture

FileSize
54 KB

A conversation with sun and a careful review of the existing tokens used in the user module revealed that we neglected localization. (Whoops!) This version of the patch includes an optional $language param for token_replace(). It's passed along to hook_tokens() implementations, and used when formatting dates and generating URLs. It's optional, and should only be used when the generated is intended for a language other than the one that's used by the current user. (For example, User A does something that triggers a mail to User B -- and User B has chosen the 'french' locale. User B's language should be passed into token_replace() so that their dates and other strings are formatted natively.)

Also snuck in support for [node:body] and [node:summary], which are useful for action-driven mails on node posts, as well as [user:confirm-cancel-url] and [user:password-reset-url].

apaderno’s picture

Rather than using a long list of parameters (which can change a lot in this phase of the development), it could be better to use an array of options, as url() accepts. In this way, the function formal parameters would not change, the modules would not need to change each of the calls to the function, and the code of the function could assign a default value to all its parameters by a code line like:

  $options += $default_options;
eaton’s picture

Kiam, which case are you talking about?

apaderno’s picture

My "rather than using a long list of parameters" should be read as "rather than using list of parameters of variable length" as the list of parameters is not absolutely long.
What I was referring is the parameters list of token_replace($text, array $data = array(), $language = NULL), token_generate(array $raw_tokens, array $data = array(), $language = NULL, $sanitize = TRUE), and token_match_prefix(array $tokens, $prefix, $delimiter = ':') as reported in the last attached patch, which could be rewritten as token_generate(array $raw_tokens, array $options = array()). In this way, if there would be the need of another parameter, the function declaration would still be token_generate(array $raw_tokens, array $options = array()).

It's just an idea; as we are still in the prototyping phase, the function parameters could change, but its interface would be constant from the beginning.

eaton’s picture

It's just an idea; as we are still in the prototyping phase, the function parameters could change, but its interface would be constant from the beginning.

Actually, we're getting past the 'prototyping' stage. We're mostly just polishing things and waiting for feedback from Dries or webchick. The current patch, with some minor corrections, will be used as Token Module for D7 if the patch isn't committed to core.

Some use of $options may be useful, but some of the parameters -- especially $data -- are technically optional but frequently needed. If there are uses for additional options, I think it might be worth making the switch but I think we're on the edge right now.

Thoughts from any others?

sun’s picture

Status: Needs review » Needs work
+ * Find all tokens in a given string and replace them with appropriate values.

Replace all tokens in a given string with appropriate values.

+ *   The text containing replacable tokens.

A string potentially containing replacable tokens.

+ *   An array of keyed objects. For simple replacement scenarios 'node', 'user',

Comma after scenarios? Not sure, but it reads strange for me as non-native speaker.

+ *   the value. Some tokens types, like 'site', do not require any explicit

"tokens" minus "s".

+ * @returns
+ *   An associative array of discovered tokens, grouped by type.
+ */
+function token_scan($text) {

"@returns" minus "s".

+  preg_match_all('/\[([^\]:\ ]*):([^\]\ ]*)\]/', $text, $matches);
+
+  $tokens = array();
+
+  $raw = $matches[0];
+  $type = $matches[1];
+  $token = $matches[2];
+
+  for ($i = 0; $i < count($raw); $i++) {
+    $tokens[$type[$i]][$token[$i]] = $raw[$i];

That's quite complicated code and I even thought that I would have found some issues with it, but re-considered after thinking about preg_match_all() internals and stuff. Some comments wouldn't hurt in here. Also, the assignment of $raw seems to be for cosmetical purposes only.

Inline API additionally ensures (in some cases) that we're not matching a newline -- that might be another useful addition to both (negated) PCRE patterns, since I believe, we expect tokens to be on the same line.

+ *   the value. Some tokens types, like 'site', do not require any explicit
...
+function token_generate(array $raw_tokens, array $data = array(), $language = NULL, $sanitize = TRUE) {

"tokens" minus "s" once more.

+ * @param $sanitize
+ *   A boolean flag indicating whether the replacement values should be sanitized
+ *   for display on an HTML page. Defaults to TRUE.

s/on an HTML page/for HTML/.

+ * @returns

Live above. ;) Oh, some more times - I'll stop repeating this from here.

+/**
+ * Given a list of tokens, return those that begin with a specific prefix.
...
+function token_match_prefix(array $tokens, $prefix, $delimiter = ':') {

This function could use an additional description about its purpose - i.e., when and where it's supposed to be used. (I know, so please don't explain in the issue ;).

+ * @param $reset
+ *   Rebuild the internal cache of Token metadata. Defaults to FALSE.
+ * @returns
+ *   An associative array of token information, grouped by token type.
+ */
+function token_info($reset = FALSE) {
+  $data = &drupal_static(__FUNCTION__, array(), $reset);
+  if (empty($data)) {

uh oh - $reset arguments are completely gone in D7. Also, we want to use drupal_static() without second argument here, and just test for !isset($data)... while $data could also use better variable name; perhaps $info or $tokens. :)

+files[] = comment.tokens.inc

I raised this in IRC already, and please don't do anything about this until there is a core committer opinion about this particular issue: I'd prefer to have those files named $module.token.inc (singular). Reasoning? All corresponding functions are singular, and "token", in singular, builds a direct relationship to Token API.

+      switch ($name) {
+
+        // Simple key values on the comment.
+        case 'cid':
+          $replacements[$original] = $comment->cid;
+          break;
+        case 'nid':

1) No blank line at the beginning of switches.

2) A blank line between every 'case' in switch control structures. (Only omit in case of a fall-through)

+function comment_tokens($type, $tokens, $data, $language = NULL, $sanitize = TRUE) {
...
+        case 'mail':
+          $replacements[$original] = $sanitize ? check_plain($comment->email) : $comment->email;
+          break;

According to my latest DB and my latest comment.module, that's $comment->mail now.

+function comment_tokens($type, $tokens, $data, $language = NULL, $sanitize = TRUE) {
...
+        case 'url':
+          $replacements[$original] = url('comment/' . $comment->cid, array('absolute' => TRUE, 'fragment' => 'comment-' . $comment->cid));
+          break;
+        case 'edit-url':
+          $replacements[$original] = url('comment/edit/' . $comment->cid, array('absolute' => TRUE));
+          break;

Not sure whether we shouldn't set up kind of a "token prefix standard" here... i.e. 'url' + 'url-edit' + ... Most probably, that's bikeshedding, so you can just ignore. ;)

+function comment_tokens($type, $tokens, $data, $language = NULL, $sanitize = TRUE) {
...
+        case 'created':
+          if (isset($language)) {
+            $replacements[$original] = date_format($comment->timestamp, '', NULL, $language->language);
+          }
+          else {
+            $replacements[$original] = date_format($comment->timestamp);
+          }
+          break;

Hmm... can't we pass $language inline in either case? I.e. (isset($language) ? $language->language : NULL)

+  // Basic ids, foreign keys, and metadata for nodes
+  $data['tokens']['node']['nid']['name'] = t('Node ID');
+  $data['tokens']['node']['nid']['description'] = t('The unique ID of the node.');
+
+  $data['tokens']['node']['vid']['name'] = t('Revision ID');
+  $data['tokens']['node']['vid']['description'] = t("The unique ID of the node's latest revision.");
+
+  $data['tokens']['node']['tnid']['name'] = t('Translation set ID');
+  $data['tokens']['node']['tnid']['description'] = t('The unique ID of the original-language version of this node, if one exists.');

1) I think the leading comment is superfluous (in all instances).

2) I don't understand why we use this lengthy/duplicating way to define tokens? I.e.:

+  $data['tokens']['node']['nid'] = array(
+    'name' => t('Node ID'),
+    'description' => t('The unique ID of the node.'),
+  );
+  $data['tokens']['node']['vid'] = array(
+    'name' => t('Revision ID'),
+    'description' => t("The unique ID of the node's latest revision."),
+  );
+  $data['tokens']['node']['tnid'] = array(
+    'name' => t('Translation set ID'),
+    'description' => t('The unique ID of the original-language version of this node, if one exists.'),
+  );

And since that still looks like a lot of duplicated code, maybe even:

+  $tokens['nid'] = array(
+    'name' => t('Node ID'),
+    'description' => t('The unique ID of the node.'),
+  );
+  $tokens['vid'] = array(
+    'name' => t('Revision ID'),
+    'description' => t("The unique ID of the node's latest revision."),
+  );
+  $tokens['tnid'] = array(
+    'name' => t('Translation set ID'),
+    'description' => t('The unique ID of the original-language version of this node, if one exists.'),
+  );
+  $data['tokens']['node'] += $tokens;

Why all of that nitpickery? Because I think we are _really_ close to being RTBC.

yched’s picture

About node:body, node:summary:

Note that if the input format is not-cacheable (not many actual examples IIRC, but filter.module has this concept), $node->body[0]['safe'] and $node->body[0]['safe_summary'] are only available after field_attach_view() has run on the $node - see text_field_load() / text_field_sanitize().
(the need for hook_field_sanitize() might actually be debatable after #493314: Add multiple hook for formatters gets in. Not done yet though)

I guess a followup task will be 'expose tokens for all fields on all fieldable entities', right ? If so,
- node:body will be redefined (no big deal I guess, or we can remove the hardcoded current version),
- we'll get something like node:body:summary as a duplicate for node:summary (which itself can possibly clash with a field named 'summary')
So if we want tokens for full body and summary right now, I'd suggest we use node:body:summary from the start ?

yched’s picture

Well, non-cacheable input formats being pretty rare, I guess this can be left out for when the currently hardcoded node:body and node:summary tokens are replaced by the actual 'field API prodided' tokens...

eaton’s picture

Status: Needs work » Needs review
FileSize
51.51 KB

Attached is a re-roll that includes the changes Sun outlined - some great catches, thanks! In addition I've added a term:parent relationship so that taxonomy terms can reference their parents. token_patch_prefix() has become token_find_with_prefix() - I'm not sure if it's much of an improvement, but it can be renamed if needed.

yched, thanks for the heads up on the format caching. I'll take another look at those, but I tend to agree that it might be better delayed for patch #2 - general handling of FieldAPI field tokens. I'll be working on profile module integration for D6 as well; if the Profile/FieldAPI conversion doesn't happen in D7 we can roll that inc file into a later patch.

I'll be working on a few unit tests next. At this point, I think we're mostly in need of an architectural review by Dries or webchick. Ping! Ping!

matt2000’s picture

Sorry if this question is answered by a more detailed look at the code but:

How would I got about writing an access control module for tokens. E.g., Assuming I have something that allows users to use node tokens in various places, so for example, they can do [node:author], but I want something that would allow me to prevent them from using [node:author:mail] or certain fields or whatever.

Would I just implement hook_tokens in a lightly weighted module to strip out disallowed tokens? (I always feel like something is wrong when I have to mess with module weights to accomplish a task....) Is there a better way I'm not thinking of?

eaton’s picture

How would I got about writing an access control module for tokens. E.g., Assuming I have something that allows users to use node tokens in various places, so for example, they can do [node:author], but I want something that would allow me to prevent them from using [node:author:mail] or certain fields or whatever.

Would I just implement hook_tokens in a lightly weighted module to strip out disallowed tokens? (I always feel like something is wrong when I have to mess with module weights to accomplish a task....) Is there a better way I'm not thinking of?

An excellent question. At present, there is no provision for access control of tokens themselves. In theory, a token based module could check things like the currently logged in user's access permissions or the access permissions of the user who created a node being used for text replacement.

Any thoughts on how this would be be handled? I'm inclined to think that it isn't really in the scope of the patch, but there might be some easy ways to work around it.

eaton’s picture

FileSize
53.16 KB

Attached is the latest revision of the patch. It includes a basic unit test for token replacement functionality that creates a user, a node, and generates a variety of replacements based on the entity values. It's relatively simple, but we can graft other tests onto it if we'd like. For what it's worth, even the simple test that's included sniffed out one ugly issue -- $language wasn't being passed along properly in one of the module invoke all calls, so date formatting was still ignoring locale overrides.

I'm not sure about the best approach to actually testing the localization issues -- is there a good way for a unit test to generate a new locale with specific time/date formatting settings, then compare against it?

Whitespace and newlines have been cleaned up, the unit test is added... this is definitely ready for review, IMO.

eaton’s picture

FileSize
53.12 KB

Just because we're obsessive, a version with four typos and a missing newline fixed.

Crell’s picture

Eaton, this is seriously cool stuff. :-) I love the recursive sub-token processing.

I do, however, want to comment on the subject of sanitization and the increasingly-long function signature.

You can add auto_nodetitle and auto_username to the list of modules that would need to bypass the default sanitization mechanism. In those cases, tokens may get used in a PHP context; that is, the values are substituted into a string of PHP code to get eval()ed. That's not quite as edge-casey as it seems.

Think of the simple case where you want a username to be set to the first 1 character of the "first name" profile field followed by the first 5 characters of the "last name" profile field. Now imagine you have someone named "Miles O'Brian" sign up, and you want to auto-generate a username. The natural approach would be:

$username = substr('[user:profile:firstname]', 0, 1) . substr('[user:profile:lastname]', 0, 5);

And both of the auto_* modules provide a textarea to do precisely that. However, when your user has a last name of O'Brian the apostrophe in the name gets read as a string delimiter, and you get a parse error. Or if you're really really sneaky, you can use it as a code injection security hole. That's doubleplusungood.

For auto_username, I ran into that precise issue. The solution I ended up with was copying and pasting several hundred lines from pathauto to configure escaping and string stripping so that I could trim out apostrophes (and other characters, since it was easier to copy and paste the whole thing than just a few pieces) and then bypassing the Token API entirely the same way pathauto does. When you're copying and pasting that much code multiple times, it's a sure indication that you probably should be looking for a more generalized solution.

To me, that would mean being able to provide an arbitrary token processing callback; as the calling code I know what sort of processing/filtering/sanitizing/whatever I need to do. The default callback would be filter_xss, but you could provide something else if you wanted. (As a random side note, such callbacks just became substantially more powerful in PHP 5.3 through the use of anonymous functions and closures, which if we do this right should just magically work for users on PHP 5.3. Woo!) Or you could provide a way to configure processing, so that pathauto, auto_nodetitle, and auto_username could all use the same UI code and configuration for character stripping. Either way, there's 3 use cases right there that need more flexibility than "filter_xss or nothing", and arguably the same additional flexibility.

Of course, that line of thinking let me to yet another argument on the token_replace() function, which as of when I was working on auto_username() was already quite long. That's when I started harping on eaton to make Token use a processing class rather than a function. :-) The API becomes a lot cleaner when instead of passing a half-dozen arguments in order into a method, you can instead create a processing object, configure it in an arbitrary order (which per-configuration sub-configuration via parameters on a method), and then pass it a string to process; arguably you could even then reuse the token processing object if you wanted to.

// Factory function
$processor = token();
$processor->setContext('user', $user);
$processor->setContext('node', $node);
// Default is currently active language
$processor->setLanguage('en');
// Default is filter_xss, pass NULL to get "Raw" tokens
$processor->setFilter('my_custom_callback'); 
// But use a different filter for just the node-based tokens, who knows why?
$processor->setFilter('my_special_callback', 'node'); 

$processed = $processor->process($mystring);
$processed_2 = $processor->process($other_string);

And although I don't do it here, the methods could be made chainable with virtually no extra work. Adding more configuration methods to those above later would then be dead simple, and even possible via subclassing for those edge cases in contrib that need something more than what core provides but don't want to bypass the workflow entirely.

OK, I'm rambling a bit but I hope that finally manages to explain to eaton what I've been attempting to say for the past year-plus. :-)

apaderno’s picture

I think that the use of tokens inside code that must be eval-ed is not so a edge case; I personally use such thing in a module I maintain, and I can count at least 3 more modules that do the same thing.

chx’s picture

tokens in eval? Good idea. The security team is so bored. Please bring it on.

dmitrig01’s picture

However non edge-case-y it may be, if it's PHP code it should be in a file as a rule of thumb. If someone breaks that, it shouldn't be our fault and we shouldn't have to deal with them.

$username = substr('[user:profile:firstname]', 0, 1) . substr('[user:profile:lastname]', 0, 5);

If you want to do this, file an issue for auto_username: Accept a function callback for username replacement.

Crell’s picture

Dmitri, I am the author/maintainer of auto_username. :-) The point is that there needs to be some way to allow user-configurability of node titles and usernames and other such token-obvious things that doesn't involve "write a new function somewhere and specify its name in this textfield". These use cases already exist in the wild. They won't go away, so we should try to at least make sure that they won't flame out in security holes.

apaderno’s picture

As far as I know, Rules replaces the tokens present in a text that is then passed to eval(); it does that in two different steps.
In that case, what is the security problem? And what is the difference between using token.module for such thing, and using custom code to replace custom tokens in a text that is passed to eval()?
The modules that replace tokens in a text that is then passed to eval() allow that to users with a restrictive permission; they don't allow that to the anonymous user (if the anonymous user is allowed to set rules, then somebody has given to the anonymous user more permissions than needed, and that is another story).

eaton’s picture

Just to be clear: there is absolutely, positively no way that we're going to make provisions for running token_replace() on PHP code that is subsequently evaluated. If you have questions on why, mail me via the contact form here, or pm me as eaton on freenode.net's IRC server.

Token is intended to allow users who should not otherwise be entering PHP code to generate reusable dynamic text. If they have enough privileges to enter PHP code, they should write the PHP code that generates the text.

If a module needs access to individual tokens in its code, token_scan() and token_generate() can be used -- as noted in http://drupal.org/node/113614#comment-1800900 -- to extract the raw array of tokens and their replacements. Pathauto does the equivalent of this in Drupal 6, as it needs to post-process tokens by truncating them to a specific maxlength.

apaderno’s picture

Effectively, there isn't a need for token.module to provide tokens for a "PHP context"; if I would really need such a thing, I could get the raw tokens, and then replace them in the PHP code.
token.module can provide the tokens, but it cannot handle all the token use cases.

It is also true that providing tokens, and PHP code at the same time is something too much; tokens replacement should be allowed for users without permission to use PHP code, and users with the permission to use PHP code would not need to use tokens (from PHP is also possible to get the value of a CCK field).
Thanks for pointing out that. I will correct the code I developed.

eaton’s picture

KiamLaLuno, there is a bit of fuzziness there -- an administrator might well want to use a token somewhere, like auto nodetitles, even though they theoretically have permission to use PHP. But the two should never be used *simultaneously* on the same text, as they are at odds with each other. But, yes. What you said!

I'm of the opinion that adding in some kind of 'php context' for tokens would take more code than simply getting/handling the tokens manually in the modules that need them.

eaton’s picture

FileSize
53.95 KB

After talking things over, here's one last stab at it. One major change and one minor. First off, a couple of locations were using filter_xss() on things like node titles, when core normally uses check_plain() on those items. The principle of least surprise should apply with tokens -- things should be filtered to the format users are accustomed to seeing them. So, filter_xss for randomly entered in user text like taxonomy term descriptions, check_plain for things like node titles and term names. We follow the practices set elsewhere in core.

Second, the $language and $sanitize parameters have been replaced by a single $options parameter that's just passed around through the entire option. It gives a bit more breathing room and, as some have suggested, makes the function signature a bit less daunting. Only a tiny fraction of tokens needed the language support. PHPDoc has been updated to reflect it, and one extra goodie snuck in with the $options. It's now possible to pass in $options('callback' => $function_name), and that function will get a chance to alter the array of built tokens after they're generated. This isn't commonly needed, but for some modules like PathAuto, it gives them a chance to truncate all tokens to a specific length before the final path is generated. It's something that has come up a couple of times, and this is a relatively easy way to provide it without making the architecture crunky.

That final function also allows one specific extra -- the ability of a module to generate a bunch of 'standard' tokens, then insert several special use-case-specific tokens into the mix as well, even though they should never appear in other token-enabled modules. One example is the user mail forms that admins use to set up 'Click here to reset your password' emails. Those tokens should only appear in one place, and this lets us take them out of the global bin of tokens while still allowing the admin mails to use the power of the token system.

I feel safer now that the [account-cancellation-confirm-url] is stripped out of user.tokens.inc.

As always, the biggest need here is an architectural review and feedback. I would say that it's essentially RTBC at this point, but we haven't gotten committer feedback on the direction we've taken in the last couple of weeks.

webchick’s picture

Just a note that this is one of my top about 3 patches that I am watching, but this week is pretty rough for deep architectural reviews because I'm on the road (and Dries is as well, actually). My hotel is unfortunately located right next to a thriving jackhammer farm (apparently) that likes to get nice and busy around 10pm, so my head is splitting and can't possibly do this tonight. :(

I will say that I've been monitoring the feedback and comments from other reviewers since taking the latest direction (well, maybe not the latest latest ;)) and am really encouraged. Really great questions have been raised, and really great answers have been returned, and the APIs tweaked accordingly. So I'm pretty confident that when I do finally get some uninterrupted, un-jackhammered time to look at this the results will be positive. :)

eaton’s picture

FileSize
54 KB

Whoops. One of the files's changes didn't get into that version of the patch. System.tokens.inc had a missing variable declaration, and the tests were still using filter_xss() for one of the comparisons. It didn't show up as a failure because filter_xss and check_plain produced identical results using the test data, but a but it was.

fago’s picture

>As far as I know, Rules replaces the tokens present in a text that is then passed to eval(); it does that in two different steps.

If you activate PHP and token evaluation then you can really use both, yes. But you don't need tokens to write dangerous PHP code..

>I know that Rules is a definte edge case (and I'm not comfortable with its use of token as a substitute for object introspection, to be honest). But would it need anything beyond what the current patch provides?

I know, I don't want to "misuse" tokens for that either. I'm currently working on the rules2 API and I think about introducing something like "DataSelectors" - which use metadata of objects to select suiting properties. Do you know Yahoo Pipes? There you can select e.g. the item.date to fill it in somewhere, the same way it could work for rules.
Thus I could rely on a kind of "hook_data_info" providing generic metadata as I proposed above too. Actually the token chaining mechanism comes pretty close to what I had in mind, though token deals only about texts. I'd like to also deal with other data types, thus you could use the mechanism to fetch any property.

So I really don't want to take over this issue with my use-case and hold this great stuff on. However it would be sad if we'd end up with two similar hooks providing similar metadata (or even 3 taken the rdf case into account). Yes as noted above the rdf patch mentioned above suggests a similar hook (see #107) which could instead build upon the same metadata and use the system to get the actual values.

So how could this work?
* hook_tokens(): Add the property 'type' talking about the actual type of the token in distinction to the referenced entity. Example: $comment->nid could be type 'number' or 'integer', but references 'node'.
* Thus tokens would have a "type" but the returned value would have to be textual thus fitting for primitive types only.

Consequences:
* Something like 'date' should be a 'type' as conceptually it's not a reference - so the metadata is right. But there is no cause to don't use the chaining mechanism for those types too! Thus the same concept could also be used for to provide tokens for other types as to format numbers, e.g. format it as "0.000,00 €".
* So values of the same type should always be returned the same way. E.g. dates always as DATETIME or TIMESTAMP, but I think that would be good to ensure anyway, so that further date tokens can rely on that.

Thus hook_tokens() would provide sufficient metadata and token_generate() could be used to get the values. It would be similar to my proposed hook_data_info, but only really care about token values. Perhaps it would make even sense to rename hook_tokens() to a more generic name like hook_data_info() (or whatever) - thus starting to build a hook to collect metadata for objects/entities. Use-case: Add a mapping to RDF-properties. Need another use case? Read http://drupal.org/project/fieldtool.

catch’s picture

Entity metadata is a good thing to have, but we already have that in hook_fieldable_info() and a patch to extend that to hook_entity_info() here #460320: Standarized, pluggable entity loading (nodes, users, taxonomy, files, comments) - so no need to do it here.

fago’s picture

FileSize
53.63 KB

interesting patch, thanks for that pointer. However hook_entity_info() deals with info about drupal db-entities, but this metadata should be also available for data not stemming from the db, possible being remote data or even some temporary data structure. Thus keeping this in a separate hook makes imo sense.

Attached is an updated patch which implements the basic change: Distinct between 'type' and 'references' for dates. (no big change). Also I fixed the simpletest getInfo to not use t() any more.

If we proceed with that approach we could start adding metadata like 'type' => 'number'. Though supporting further 'number' tokens wouldn't be feasible with current concept as it would require adding the usual token_find_with_prefix() for each id thus requiring quite some duplicated code. If we want to support that we could improve expanding the tokens for types by doing it automatically. We can do so as we can get the actual data easily ourself - in contrast to types being referenced using 'references'.

* I noted that token_find_with_prefix() has a nice parameter $delimiter, but it doesn't make much sense when the hook_tokens() implementation don't pass it. We could add it to $options, change token_find_with_prefix() to work with $options and pass it through hook_tokens.

* It'd be nice to support modules to get related entities using that data. We could achieve that by just getting the id for [node:author]. But of course this would interfere with a "default token" system.

yched’s picture

re fago #203 - "However hook_entity_info() deals with info about drupal db-entities"
I don't think so Field API was built with the idea that non local-db 'entities' could be fieldable too, so hook_fieldable_info() is not restricted to local entities, and I don't think #460320: Standarized, pluggable entity loading (nodes, users, taxonomy, files, comments) changes that - it just adds an *optional* loader feature for local entities..

Status: Needs review » Needs work

The last submitted patch failed testing.

eaton’s picture

Status: Needs work » Needs review

Attached is an updated patch which implements the basic change: Distinct between 'type' and 'references' for dates. (no big change). Also I fixed the simpletest getInfo to not use t() any more.

I like the 'type' change quite a bit, as it maps it more clearly to the 'token types' that are also managed. That's what the idea behind that array key is, really -- to capture the idea that the token being listed in the info hook is actually a more complex 'entity' rather than a simple textual token. In the current implementation, hook_tokens() implementations do the heavy lifting of chaining those complex types to provide sub-tokens, but perhaps in the future it could be implemented using a purely metadata driven approach like the one you describe.

For right now, performance is one of my biggest concerns. Already we're doing a lot of work on replacement -- the win of only generating the tokens we care about offsets a lot of that right now, but I'm hesitant to also initialize a large metadata system when the simpler "dumb" approach satisfies the majority of the use cases we've discussed.

I think that if the 'type' key on the metadata is workable, it could be possible for modules to developer cleaner UIs. Letting users choose from a dropdown list of tokens rather than typing them by hand, for example, then using token_generate() on a collection of code-built tokens rather than asking for entered text. But I'm getting ahead of myself...

If we proceed with that approach we could start adding metadata like 'type' => 'number'. Though supporting further 'number' tokens wouldn't be feasible with current concept as it would require adding the usual token_find_with_prefix() for each id thus requiring quite some duplicated code. If we want to support that we could improve expanding the tokens for types by doing it automatically. We can do so as we can get the actual data easily ourself - in contrast to types being referenced using 'references'.

I'm not sure about the number/date stuff, at least at present: the moment we start doing typing for all tokens (not just the ones we need it for wrt chaining purposes), we're treading into the dangerous territoy of token being an object metadata storage system. And that, honestly, is a horrifying thought. It wasn't at all designed or optimized for that and it will suck at it, no matter how crafty we think we are. If we need that information, we need to put our shoulders into the work of making drupal entities real PHP objects with typed properties and toString() methods on everything -- then use tokens as a light user-friendly wrapper for inserting the 'known' data into textfields. I'd definitely get behind that and would love to help, but Token (IMO) should tie into a system like that, not be that system.

* I noted that token_find_with_prefix() has a nice parameter $delimiter, but it doesn't make much sense when the hook_tokens() implementation don't pass it. We could add it to $options, change token_find_with_prefix() to work with $options and pass it through hook_tokens.

Yeah, it's not quite like the 'prefix' and 'suffix' stuff from Token 1.x in contrib. The 'delimiter' is just there so that if some token providers want to do something like [node:field-bar] they can split based on the dash rather than a colon, while still taking advantage of the _find_with_prefix() helper function. I'm okay with removing it, but I'm not sure about making it a configurable option at the caller level...

* It'd be nice to support modules to get related entities using that data. We could achieve that by just getting the id for [node:author]. But of course this would interfere with a "default token" system.

Yeah, that's essentially what hook_tokens() implementations do right now when they find tokens that start with 'author:' and load the node. Moving it to an 'automatic' operation would require more centralization of that code and make the token system itself a lot heavier, while veering into the metadata system caveats above. I'm very hesitant to build 'cool bits' into this that would drag us into "Tokens as the defacto metadata discovery system." We need to solve that problem the right way --token is currently about as close to the edge as I'm comfortable with. Any farther and I'd say, 'Let's scrap it and convert Drupal to objects.'

The patch that you posted earlier didn't apply properly so I've re-rolled it with the changes you described. I'll also be merging in the actions and user_mail() conversions, so that some parts of core actually use it. Patch to come later this evening.

eaton’s picture

FileSize
62.38 KB

Attached is the patch discussed above. In addition to the changes that have been discussed, it converts the system email, display-message-to-user, and redirect-to-url actions to use tokens_replace() rather than a hundred lines of custom token-building code.

The user notification mails (password reset, welcome, etc.) are not yet converted to tokens, but they have additional wrinkles due to translation and security issues that I'm looking through.

fago’s picture

>I like the 'type' change quite a bit, as it maps it more clearly to the 'token types' that are also managed. That's what the idea behind that array key is, really -- to capture the idea that the token being listed in the info hook is actually a more complex 'entity' rather than a simple textual token.

But that's not the case for a 'date' - a date is simple. Thus I suggest to use 'type' for simple values and 'references' to pointer for complex values. The current chaining mechanism does fine with that as it doesn't really rely on this metadata at all, so you can use it for both 'type' and 'references'. I see that this distinction is unnecessary for token, but I feel it's cleaner that way.

>The 'delimiter' is just there so that if some token providers want to do something like [node:field-bar] they can split based on the dash rather than a colon, while still taking advantage of the _find_with_prefix() helper function.

I see, that makes sense!

>I'm not sure about the number/date stuff, at least at present: the moment we start doing typing for all tokens (not just the ones we need it for wrt chaining purposes), we're treading into the dangerous territoy of token being an object metadata storage system.

Agreed, the current way of chaining suffices for dates and is simple. If one really wants one could use the same mechanism for numbers too.
But not related to that - if we add 'type' => 'number' for numbers and assume 'text' as default, would allow rules2 to make use of that information, so that when you need a number, you can use only number tokens. Analogously it could work for dates.

>If we need that information, we need to put our shoulders into the work of making drupal entities real PHP objects with typed properties and toString() methods on everything -- then use tokens as a light user-friendly wrapper for inserting the 'known' data into textfields.

You can't type properties with primitive data types, also you can't just provide a class for modular built objects. Furthermore the type information isn't enough, no label, no description and no way for modules to add further meta-attributes e.g. a mapping to rdf. So objects alone would not serve the need for an entity-property-metadata system.

@yched: I had a look at the code and it seems to be quite db-centric e.g. containing 'base table' attributes, but yes that mean that we couldn't expand that and add metadata about the properties to it. But if we do so I fear we start overlapping with hook_token_info(), which already describes entity properties and even references.

Anyway I don't want to hold on this great patch, so perhaps we should go for the metadata thing in a separate issue.

eaton’s picture

But that's not the case for a 'date' - a date is simple. Thus I suggest to use 'type' for simple values and 'references' to pointer for complex values. The current chaining mechanism does fine with that as it doesn't really rely on this metadata at all, so you can use it for both 'type' and 'references'. I see that this distinction is unnecessary for token, but I feel it's cleaner that way.
...
But not related to that - if we add 'type' => 'number' for numbers and assume 'text' as default, would allow rules2 to make use of that information, so that when you need a number, you can use only number tokens. Analogously it could work for dates.

Hmmm. I see. The metadata system was intended almost exclusively for generating online help, and the 'references' link wasn't intended to mark complex objects so much as mark where chaining could occur, so that the UI could present users with reasonable help about what 'chained tokens' followed something like [node:created:...]. Changing it to 'type' turned it into a reference to other *token* types as opposed to other *data* types.

I can see the value of some way of capturing another data type parameter above that for certain things, but I would want us to be sure that it captures enough useful information for it to be used in other use cases. (Allow Pathauto to large formatted text blocks, for example, since it doesn't make sense in a path). I have to confess that I'm still not comfortable with Rules' reliance on Token given the performance and accuracy implications that come with it: it really is being used as an ad-hoc object metadata system in that sense. But that's a side issue.

If we were to add data typing information, making it useful would require identifying quite a few internal data types, like number, currency, formatted text, short text, email, url, path, date, and so on. I'd be cool with addressing that in a follow-up patch connected to the UI/presentation of tokens, where we can all wrestle over the details of what has to be captured in the metadata. Does that sound good?

You can't type properties with primitive data types, also you can't just provide a class for modular built objects. Furthermore the type information isn't enough, no label, no description and no way for modules to add further meta-attributes e.g. a mapping to rdf. So objects alone would not serve the need for an entity-property-metadata system.

This is why I wanted Drupal to be written in C# four years ago. ;-) Seriously, though, I'm not sure why classes can't cover modular objects; node, user, field, currency, formattedString (a string with a format value) and so on are all potential matches for that. You can't type properties that are simple data types, but that's why languages like C# make everything object... The downside of using PHP is that we need to build up a massive, parallel metadata system to 'fake' that fundamental language capability. I'm not sure that a PHP-based system as expandable and alterable as Drupal benefits enough from pervasive entity metadata to merit the overhead. I've worked on systems that required keeping huge metadata stores and parallel code in sync, and it was hell. But that, I think, is probably a side discussion.

sun’s picture

Status: Needs review » Needs work

Upfront: I would highly suggest to defer the discussion about meta types and corresponding token handling to another issue. I mostly agree with eaton that we have to solve this at a lower level. Making two or three sub-systems (fields, RDF, tokens, ...) introduce some meta information, independently from the rest of the system, doesn't really make sense to me. Of course, the discussion and idea makes absolutely sense, but for now and regarding this patch, it is 100% scope creep IMHO. (Please have a look at the # of follow-ups we're at.)

+ *   - language:
+ *       A language object to be used when generating locale-sensitive tokens.

Applies to all PHPDoc lists in this patch: Please continue the description directly after the colon. Sometimes, we additionally use single-quotes for the key to clarify its string nature, but that doesn't happen consistently.

+  // token found in the source text.
+  // For example, $results['node']['title'] = '[node:title]';

Can we move For example: up and have the example snippet solely on the line?

+ *   - unsafe:
...
+function token_generate(array $raw_tokens, array $data = array(), array $options = array()) {

I think that it should also be 'sanitize' here?

+ * 
+ *   $data = array(
+ *     'author:name' => '[node:author:name]',
+ *     'title'       => '[node:title]',
+ *     'created'     => '[node:author:name]',
+ *   );
+ *   token_find_with_prefix($data, 'author');
+ *

All code in PHPDoc should be wrapped in:

 * @code
 *   $item = item_load();
 *   $output .= check_plain(t($item['title']));
 * @endcode

See PHPDoc of t() for examples. Indentation of that code is correct.

+  $sanitize = empty($options['sanitize']);

Maybe I need another cup of coffee, but I think that $sanitize gets the opposite of the intended value here.

+  $comment['cid'] = array(
+    'name' => t("Comment ID"),
+    'description' => t("The unique ID of the comment."),
+  );
+
+  $comment['pid'] = array(
+    'name' => t("Parent ID"),
+    'description' => t("The unique ID of the comment's parent, if comment threading is active."),
+  );

Applies to many other occassions in this patch: Only use double-quotes where technically required.

Additionally, removing the blank lines between tokens would make the definition code a bit smaller.

+function poll_token_info() {
+  $results = array();
+

Technically, we can skip the array initialization in all hook_token_info() implementations.

Speaking of hook implementations, it would make sense to always have hook_token_info() come before hook_tokens().

Lastly, those first conversions of existing token-alike functionality look plain awesome! Way to go!

apaderno’s picture

+  $sanitize = empty($options['sanitize']);

Maybe I need another cup of coffee, but I think that $sanitize gets the opposite of the intended value here.

When $sanitize is TRUE, empty($sanitize) returns FALSE; I guess that is the opposite of what it should be (or that, or I need a cup of coffee too :-)).

eaton’s picture

You're totally correct. That's pretty massively wrong. I'm going to fix the $unsafe/$sanitize bit and add a simpletest to ensure that a simple safe/unsafe token gets handled properly. Thanks!

eaton’s picture

Hear that, webchick? I'm making a unit test for that.

And they said I couldn't be trained...

JohnAlbin’s picture

/me puts down some newspaper while waiting the for the unit test. ;-)

eaton’s picture

FileSize
62.48 KB

Sun, here's a patch that includes the $sanitize fix, the PHPDoc changes, and the new SimpleTest addition that tests [node:title] in both sanitized and unsanitized mode to ensure we aren't losing the option flag somewhere along the line.

The only part I'm going to poke back at, at least before going through and refreshing the patch, is the quotation marks in t() strings issue. From the coding standards doc:

Drupal does not have a hard standard for the use of single quotes vs. double quotes. Where possible, keep consistency within each module, and respect the personal style of other developers.

The consistency issue is the one that concerns me: there's a roughly 1 to 3 ratio of places where single-quotes in the text require the double-quotes, and places where the double-quotes are unecessary. Wouldn't it be better to use double-quotes as long as we know the text could be edited or changed, so that it's consistent across the *.tokens.inc files?

I'm open to a coin-toss or a ruling either way from a committer, I just know that I was using the only-when-needed approach when i first started the patch, and it got annoying very quickly changing between single and doubles whenever I happened to use a possessive when reworking a token description...

fago’s picture

Status: Needs work » Needs review

If we were to add data typing information, making it useful would require identifying quite a few internal data types, like number, currency, formatted text, short text, email, url, path, date, and so on. I'd be cool with addressing that in a follow-up patch connected to the UI/presentation of tokens, where we can all wrestle over the details of what has to be captured in the metadata. Does that sound good?

Yep, that makes sense!

sun’s picture

Status: Needs review » Reviewed & tested by the community

This new implementation is, roughly, what has been outlined in #78.

I could go on with some more nitpicks, but I think it is time to get a "thumbs-up" from core maintainers now.

NancyDru’s picture

subscribing

eaton’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
63.07 KB

After discussion with greggles, drewish, and a number of other developers we've tweaked the 'user' tokens slightly. [user:...] only works with an actual user object passed in by a module. [current-user:...] generates tokens for the currently logged in user account. This shift remove the ambiguity about the meaning of [user:...] tokens and makes explaining them a fair bit easier.

Tests have been updated to reflect the change, everything continues to pass with flying colors.

markus_petrux’s picture

subscribe

eaton’s picture

FileSize
63.59 KB

Discussion with webchick in irc revealed a subtle snafu -- as fago noted earlier, the "aliased" token types like [current-user:...] don't really provide sufficient information to display online docs properly. I've added a couple of extra keys to some of the hook_token_info() implementations -- now, token types that are simply aliased versions of other types can have a 'type' => $whatever key. That indicates that tokens from another type should be shown under their heading, and another module will handle the 'aliasing'. In addition, token types can have a 'needs-data' => 'foo' key in their hook_token_info() block in order to indicate that they require an entity of some type in the $data array to function properly.

the 'user' token type requires 'user', for example. the 'current-user' type does not, however, as it uses the current global $user object.

This is just polishing in place to make sure that UX work to present tokens to end users doesn't require another set of patches over here to add extra metadata.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
moshe weitzman’s picture

This has been in re-test for 9 days. Wake up, bot!

Dries’s picture

I missed this patch last week because it was incorrectly marked as 'code needs work', it seems. Drats.

I took a first pass over this. I deliberately choose not to read up on all the comments in this issue because that would give away insight and answers to questions that should be addressed in PHPdoc. I will obviously read all the comments later. Consider this to be an initial maiden review. While the system is relatively simple and clean, there are a couple of things that confuse me. I've pointed these out below, along with some suggestions for improvement.

- It is unclear why this patch is 'critical'. I suspect we created a false dependency between this patch and other patches.

- Shouldn't we document the token format (e.g. [$type:$token]) somewhere for developers? We seem to assume that people know what tokens are, and what format they take.

- In general, we could do with some high-level documentation. Maybe we should put the API in includes/tokens.inc and add a brief introduction. What is a token? What does it look like? What is the token system? When to use it? How does it work from a high-level view? Any performance or security gotchas? That kind of stuff. I certainly prefer token.inc over token.module for the time being.

- Glad to see we wrote tests. That's great.

- token_replace: typo in documentation: replaements should be replacements. Same typo exists in token_generate().

- token_replace: because $options is optional, it is not clear whether my token will be sanitized by default. Can we update the documentation to be more explicit about the default behavior when it comes to security? The current documentation leaves me wondering.

- token_replace: it is not clear, from the PHPdoc, how useful the callback is. I initially expected the callback to be with the token producer, if you will. The two examples in the PHPdoc gave me a hint at how it could be used, but weren't concrete enough to make me feel confident about the callback's purpose. The callback documentation is missing some depth.

- Shouldn't we document the prefix stuff for developers? When I looked at token_find_with_prefix() I though 'node' would be the prefix, but it looks like 'author' is? Is 'prefix' the right term? Maybe I had my expectations wrong because of the word 'prefix'. *confused* The API documentation didn't help me.

- token_find_with_prefix() does not explain when and why I'd want to use that function.

- token_info() mentions an an optional pointer indicating that the token chains to another set of tokens but I don't see that in the example. What would that look like? Second, why and when would I use that?

- token_info(): only one token specifies a type. Why is that and what impact does that have? What happens when I specify type 'date'? Not clear by reading the PHPdoc. *confused*

- I suspect a lot of knowledge in this issue is not yet embedded in the PHPdoc. That might get confirmed after reading up on the comments.

- For some modules it makes sense to create a xxx.tokens.inc but for most modules it doesn't, IMO. I won't make a big deal out of that, but I'm not a fan of many small files.

- Sometimes we set $url_options but we don't actually use it. Looks like we might have some death code. It also seems to be a bit of a pattern in every token function. Not sure if we can do anything about it. None of these patterns have documentation.

- Indentation in modules/upload/upload.tokens.inc seems wrong.

- Would this have a performance impact on the front page and other regular pages? I haven't tested it, and reading the code it wasn't immediately clear yet. I'll have to take a second look at it so I can drill a bit deeper.

Overall, I like this patch and webchick can (help) drive it home while I'm on vacation. While the system is relatively simple and clean, there are a couple of things that confuse me and I'd like to get those things flushed out in PHPdoc.

NancyDru’s picture

Something also came up in the forums that maybe should be addressed. I know the "baggage" philosophy and you may totally disregard this if you choose. There are some tokens that are currently in use in 5.x and 6.x that are rather popular but seem to be missing in 7.x; amongst these are [author-name] and [author-name-raw]. Should we at least make some attempt to keep the more popular tokens like these? Also, even in 6.x there is a small flaw, which seems to also be there in this patch: [author-name] does not pick up "Anonymous" (or it's specification) for user/0.

eaton’s picture

First, a huge thanks to Dries for the review and the detailed attention. I'm skipping over responses to a couple of the syntax/whitespace related bits and jumping to the detailed feedback. I imagine most of this will be fore the sake of webchick and others participating in the thread:

- It is unclear why this patch is 'critical'. I suspect we created a false dependency between this patch and other patches.

IIRC, someone originally set it to critical because it's a dependency for Pathauto in core; something needs to be doing what Token is doing, whether we call it Token or not, for Pathauto to do its work. I agree that this isn't a dependency for any patches other than that. The good news is that the mix of .inc files and so on mean that it has almost zero collision with other patches.

- Shouldn't we document the token format (e.g. [$type:$token]) somewhere for developers? We seem to assume that people know what tokens are, and what format they take.
- In general, we could do with some high-level documentation. Maybe we should put the API in includes/tokens.inc and add a brief introduction. What is a token? What does it look like? What is the token system? When to use it? How does it work from a high-level view? Any performance or security gotchas? That kind of stuff. I certainly prefer token.inc over token.module for the time being.

Agreed. To some extent the high-level docs have been held off until the final nature of the system has been hammered out; now that we know the basic approach isn't objectionable, I think we can get some brains on the high-level docs that have otherwise been captured in the comment thread of this issue. Also agreed on the .inc vs .module -- As a module it would be an 'invisible' dependency that most users would never see, and only turn on when someone told them to. Right now the token functions themselves are stuck in common.inc. I can move the token related code to token.inc relatively easily -- the only question would be what bootstrap phase it would be loaded in.

- token_replace: because $options is optional, it is not clear whether my token will be sanitized by default. Can we update the documentation to be more explicit about the default behavior when it comes to security? The current documentation leaves me wondering.

Yes, tokens are definitely sanitized by default. I think we got carried away with warnings about what happens when one DOESN'T sanitize, rather than explaining the default.

- token_replace: it is not clear, from the PHPdoc, how useful the callback is. I initially expected the callback to be with the token producer, if you will. The two example in the PHPdoc gave me a hint at how it could be used, but weren't concrete enough to make me feel confident about the callback's purpose. The callback documentation is missing some depth.

The callback is an edge case and there was a bit of discussion about its inclusion. The best example of its use is Pathauto -- after tokens are generated, it wants to do additional munging on them before they're used as replacements in the text. Specifically, it needs to transliterate URL-unsafe characters, optionally lowercase everything, chop each token to a specific max length, and so on. One option is for Pathauto to ignore the token_replace() function and use the underlying scan and generate functions itself, then process the resulting array and do its own str_replace() call. Using the callback function -- once its properly documented -- is a way that could (we thought) make it more straightforward for developers to do that kind of 'additional tweaking' to tokens before they're used.

In the case of one of our very few actual 'dangerous tokens' (the User module's 'instant password reset url', 'instant account cancellation url' and 'current account password' tokens) this function also allows us to sneak those three tokens into the list of things that will be used for replacement without generating full-fledged tokens for them, ensuring they will never be accidentally used in other emails, messages, etc.

- Shouldn't we document the prefix stuff for developers? When I looked at token_find_with_prefix() I though 'node' would be the prefix, but it looks like 'author' is? Is 'prefix' the right term? Maybe I had my expectations wrong because of the word 'prefix'. *confused* The API documentation didn't help me.

- token_find_with_prefix() does not explain when and why I'd want to use that function.

token_find_with_prefix() needs a better name. We've done some bikeshedding on it but nothing (clearly) better has yet emerged. It takes an array of tokens in the format foo:bar:baz, as well as a specified prefix string. It finds any tokens that start with that string, strips it off, and returns the resulting 'remaining' portion of the token. It's the heart of the 'chaining' mechanism that allows tokens like [node:created:short] to be handled by the 'date' related token code.

It desperately needs an explanation, yes.

- token_info() mentions an an optional pointer indicating that the token chains to another set of tokens but I don't see that in the example. What would that look like? Second, why and when would I use that?

- token_info(): only one token specifies a type. Why is that and what impact does that have? What happens when I specify type 'date'? Not clear by reading the PHPdoc. *confused*

- I suspect a lot of knowledge in this issue is not yet embedded in the PHPdoc. That might get confirmed after reading up on the comments.

First: hook_token_info() doesn't DO anything. It can be eliminated entirely and everything works, it's never called during the token generation or replacement process, so there is no real 'consequence' if you don't specify certain metadata. That said, we need some hook to capture meta-information about tokens so we can generate online help/lists of available tokens for users, etc. Thus, hook_token_info().

A lot of this stuff in particular has been in flux -- because hook_token_info() is PURELY about generating metadata that we can use for online help/hints, a lot of it has been shaped by discussions about how we should present the online help and (more critically) what data we need to 'encode' to be ABLE to capture the nature of tokens in generated online help.

Things like 'type' and so on are rather imprecise and I think we need to have a good, solid bikeshed about them now that some of the code related stuff has been nailed down.

- For some modules it makes sense to create a xxx.tokens.inc but for most modules it doesn't, IMO. I won't make a big deal out of that, but I'm not a fan of many small files.

Indeed. For the moment, it has made the patch much easier to manage, but there's nothing preventing us from moving those custom tokens into the module files themselves.

- Sometimes we set $url_options but we don't actually use it. Looks like we might have some death code. It also seems to be a bit of a pattern in every token function. Not sure if we can do anything about it. None of these patterns have documentation.

I'll double-check that, thanks. I looked for ways we could eliminate that pattern, but I'm not sure if there are any good examples. Many of drupal's own functions flip back and forth between taking a language *object* and a language *code*, for example, and that difference resulted in a bit of tangly special-casing anywhere localization was needed.

- Would this have a performance impact on the front page and other regular pages? I haven't tested it, and reading the code it wasn't immediately clear yet. I'll have to take a second look at it so I can drill a bit deeper.

Unless a module uses the token_replace() function, there is zero impact. In core (and in the vast majority of contrib, currently) tokens are only really used during specific actions like building the text of an email to send out, or generating a path alias. So, other than the overhead of parsing an additional 100 lines of comments and code, there's no impact.

When token_replace() IS used, its cost depends heavily on how 'expensive' the tokens the user has chosen are to generate. That, though, is the major improvement over the old token implementation in contrib. An expensive token is only generated it the user explicitly specifies it. The cost of running token_replace() on a string that has NO tokens in it is the cost of a regex. Period, end of story.

Dries’s picture

Thanks for the clarifications. I think we're really close, and potentially close enough.

I'm happy with us committing something that is not 100% there yet -- especially if we need to unblock an important issue -- and as long we follow-up on the open issues. It feels like this patch is 95% there, so it wouldn't be a shame to get this committed as is.

However, if we don't _really_ unblock anything pressing or anyone pressed, I'd prefer to see us take one or two more passes at this so we can get the documentation right as well as some of the remaining 'in flux' things.

I'd still like to give this a deeper review and read up on all the comments and feedback, but given that I'll be on vacation for 5 days in less than 24 hours, I'm more than happy with webchick driving this home during my absence.

eaton’s picture

Something also came up in the forums that maybe should be addressed. I know the "baggage" philosophy and you may totally disregard this if you choose. There are some tokens that are currently in use in 5.x and 6.x that are rather popular but seem to be missing in 7.x; amongst these are [author-name] and [author-name-raw]. Should we at least make some attempt to keep the more popular tokens like these? Also, even in 6.x there is a small flaw, which seems to also be there in this patch: [author-name] does not pick up "Anonymous" (or it's specification) for user/0.

[node:author:name] is the equivalent to [author-name]. The changing format of the tokens (using colons, for example) is a consequence of the changing replacement mechnism -- optional generation of tokens rather than batch-generation, etc. Greggles and I have ben discussing ways to iron out a migration path for existing modules that have used tokens in the past -- specifically, a canned str_replace() that turns Token 1.0 tokens into token-in-core tokens. A moduel that previously used tokens could run that replacement on its token-containing-fields for a quick 90% solution.

The issue of recognizing 'anonymous' as the author name is a good one. I'll do a quick check to see about an easy way of making all 'user' type tokens return the current anon-user-name if they're specified.

NancyDru’s picture

Thanks for the clarification, Eaton. A migration path would make a lot of people happy, as would a document outlining for developers how to change their modules (I have several modules that supply tokens).

As for anonymous, I was going to write a patch for the contrib - it should be only one or two lines. The same code should transfer over pretty much.

GuillaumeDuveau’s picture

sub

eaton’s picture

FileSize
65.28 KB

Attached is a fresh version of the patch against current versions of core.

It fixes a number of the simple issues Dries mentioned -- token is now its own include file, included during _drupal_bootstrap_full(). Typos in the PHPDoc were fixed, PHPDoc for bits like the 'callback' and 'sanitize' options were clarified, and the handling of anonymous users when using tokens like [node:author] and [comment:author:name] now works properly.

I've taken a first rough stab at high-level documentation in the head of token.inc:

/**
 * @file
 * Functions related to Drupal's token replacement system.
 *
 * These functions allow developers to perform dynamic placeholder substitution
 * on text, replacing tokens like [node:title] with the title of an actual node
 * at runtime.
 *
 * This token replacement system is useful when administrators must enter
 * reusable text, like automated email messages or user-facing notifications,
 * and that text must also contain small snippets of dynamic information that
 * change each time the text is used.
 */

It's not perfect and it's not complete, but it gives an idea of what I'm trying to explain. The token chaining and better clarification of the hook_token_info() format, as well as the proper use of token_find_with_prefix(), are on the list.

Anyone who's willing to offer a hand with some of that stuff is more than welcome! Thanks.

nicholasThompson’s picture

I just applied this patch and often see errors such as:

Notice: Undefined variable: type in user_token_info() (line 56 of /var/www/lighttpd/vmtest5/drupal/modules/user/user.tokens.inc).

I'm guessing this is due to the variable is $types but is referred to as $type/

Status: Needs review » Needs work

The last submitted patch failed testing.

webchick’s picture

Docs please ! :) Eaton, Token in core would make a very good birthday present for you, don'tchathink? :)

fago’s picture

I just had a closer look at some token integration code and noticed some issues with provided tokens:
* $language in node_tokens/user_tokens and maybe others is not initialized
* $options['language'] is not used for generating node urls
* node_get_types op name is node_type_get_name() now.

fago’s picture

Also I'm wondering what about supporting node-type specific tokens? Sry if this has been discussed before, I couldn't find in here.

Often one knows the node type in question, so it would be neat to display only the tokens e.g. of fields that are really in there. I think such a possibility would help improving usability a lot.

fago’s picture

I thought further about this whole metadata about entities thing and I realized that I really need this for rules2 I'm working on currently. So I decided it's best to start implementing it *asap* and do it as a general usable API so other stuff can build upon it too. So I've gone ahead and converted this token patch into a "entity property metadata system", see #551406: Centralize entity properties meta data.

This patch makes use of the ideas of this issue and converts the tokens provided to the property metadata. Then in turn the token patch can easily rely on it, but would still have to handle formatting dates and applying default tokens.

Anyway, I know this patch is close from being committed and I don't think this work should hold it on. I'd be fine if it goes in as is as it's great work and d7 really has to ship with tokens in core. Anyway I'm working on a follow-up patch that could convert the heart of it to the entity-property metadata system and make use of it for the token replacements, so the metadata can be used by other stuff too and modules don't have to provide duplicated meta-data.

greggles’s picture

so it would be neat to display only the tokens e.g. of fields that are really in there

I think we should postpone that discussion. The UI can be solved after the functionality lands.

eaton’s picture

Status: Needs work » Needs review
FileSize
76.98 KB

Re-rolled version of the patch. This one contains the proper split for token.inc, and more detailed PHPDoc at the head of token.inc including sample code for token replacement.

It also tweaks the 'token_generate()' function to handle its own function calls rather than relying on module_invoke_all() -- this have to do this because module_invoke_all() relies on array_merge_recursive(), and explodes if two modules provide replacements for the same token. This also means that hook_tokens() implementations can just call token_generate() when chaining, rather than module_invoke_all().

webchick, and others -- I'd love to get feedback on the documentation for using tokens as it exists in the @file section of token.inc. Specific docs for hook_tokens and hook_token_info() is needed, but I would like to see if we can get this "using tokens" documentation, and the underlying functionality, committed and then push the hook docs in. It would allow us to quickly start work on a couple of pieces like the UI and PathAuto for core.

Status: Needs review » Needs work

The last submitted patch failed testing.

eaton’s picture

Status: Needs work » Needs review
FileSize
66.86 KB

Apologies, that last patch caught snippets of a module in the /sites/all/modules directory and wouldn't apply on a clean HEAD. Re-rolled.

eaton’s picture

FileSize
68.75 KB

And here's one more roll of the patch with a stubbed out set of hooks in system.api.php -- it doesn't yet cover the trickier edge cases like token chaining or defining new top-level token types, but that's being ironed out, along with the UI-intertwined-stuff in the info hook. This at least puts a stake in the ground and gives us a starting point.

Dries’s picture

Status: Needs review » Fixed

Committed to CVS HEAD. Woop, woop!

eaton’s picture

Holy *SMOKES*. Thanks, Dries, and webchick for the feedback on the documentation in IRC. I'll be focusing on expansion of the hook documentation, approaches for presentation of available tokens in the UI, and conversion of the system notification emails to use tokens.

dww’s picture

Status: Fixed » Needs work
Issue tags: +Needs documentation

WOO HOO! This is great news. Yay!

In addition to better PHPdoc, this needs an entry in the module upgrade guide.

Also, there are probably a number of spots in core that could be converted to this. user_mail_tokens() immediately springs to mind (since I wrote it), but I'm sure there are others. Is the plan to open separate issues for each of these and tag them all or something?

eaton’s picture

Also, there are probably a number of spots in core that could be converted to this. user_mail_tokens() immediately springs to mind (since I wrote it), but I'm sure there are others. Is the plan to open separate issues for each of these and tag them all or something?

Yep, that one in particular I wanted to put off until after the major patch landed. The email sending and message-displaying actions in core have already been converted to use the tokens, though, so that part is already converted.

eaton’s picture

fago’s picture

YEAH, d7 is going to be awesome! =)

@#238:
As it's in now, are there any opinions about refactoring hook_tokens() to avoid duplicated metadata? See #551406: Centralize entity properties meta data.

Dries’s picture

Status: Needs work » Fixed

A changelog.txt entry might be in order too (if not already).

John Morahan’s picture

Status: Fixed » Needs review
FileSize
795 bytes

err, how tf did that dsm() get past the testbot?

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

and why would anyone call dsm() on a string?

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

John Morahan’s picture

FileSize
2.01 KB

I guess the fatal error must have been masking other failures, as there is no way in hell that this patch could possibly have caused new ones.

webchick’s picture

Status: Needs work » Needs review

Letting the bot have a crack at it. I'm disturbed that testing bot didn't puke all over the place though. hm...

Dries’s picture

Status: Needs review » Fixed

Committed. Thanks for the follow-up.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

NancyDru’s picture

Priority: Critical » Normal
Status: Closed (fixed) » Needs work

Sorry about re-opening this, but I don't see anything in Converting 6.x modules to 7.x. I am blonde and senior, so if it's just hiding from me, please feel free to close this issue again.

webchick’s picture

Status: Needs work » Needs review
FileSize
2.88 KB

heyrocker pointed out that token hooks were missing from api.drupal.org, and so we went back to look at this issue. It turns out that these were written, but for some reason the changes in system.api.php were never committed. I just committed that hunk.

I also found a test hunk in system.test that was similarly never committed. I'm attaching here to see what testbot thinks.

Status: Needs review » Needs work
Issue tags: -Favorite-of-Dries, -Needs documentation

The last submitted patch, token.test.patch, failed testing.

Status: Needs work » Needs review

Re-test of token.test.patch from comment #259 was requested by aspilicious.

Status: Needs review » Needs work
Issue tags: +Favorite-of-Dries, +Needs documentation

The last submitted patch, token.test.patch, failed testing.

mcarbone’s picture

Status: Needs work » Fixed

That token replacement test was committed back in August: http://drupalcode.org/viewvc/drupal/drupal/modules/system/system.test?r1... -- so I think this issue can be closed.

Also note that I'm working on test coverage for every token (#701818: Several tokens are broken, especially file tokens; test coverage of every core token), but I'll leave this generic test of token replacement in system.test in place anyway, since it's probably wise to explicitly test token_replace and token_generate.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

drupal_was_my_past’s picture