Problem/Motivation

We need field tokens for Drupal 7. This capability used to be provided by CCK, but CCK was moved into core without this functionality, and Drupal 7 shipped without it in order to meet its release date.

Token integration for D6 was fairly simple. For the most part, fields only defined small bits of information. In Drupal 6, a node reference field token might be:

[field_node_reference-nid]

This would only provide the first value of the field; multivalue fields were not supported.

In Drupal 7, however, tokens can be chained in the following way:

[node:field_node_reference:author:mail]

This allows you to get to the email address of the author of the node that is referenced by the nodereference field. Whew!

However, there are complications:

  1. Fields can contain multiple values, and this is especially relevant for taxonomy fields. So you need to reference the delta:

    [node:field_some_node_reference:0:author:mail]
  2. In addition to field values, fields also contain metadata about the field itself (such as help text or description) that users might want to reference as a token. So you want to separate those, like:

    [node:field_some_node_reference:values:0:author:mail]
    [node:field_some_node_reference:field:machine_name]
  3. Users are going to also want "simple" versions of fields, similar to what they had in Drupal 6, which output in a standard way without dealing with deltas.

    For example:

    [node:field_some_node_reference] == the output of the field run through field_view_field()

  4. Dynamic tokens need good enough context in help. For multivalue fields, this is challenging because the Token help can only list things it knows about ahead of time, and everything else must be manually specified by the user.

    For example, looking at the UI in admin/help/token, you'll see the following for "Nodes > Date created > Custom format:"

    [node:created:custom:?]

    We can't go any further than this, because the elements after :custom: might be anything (some custom combination of date parts).

    We run into this with multivalue fields as well; the best we could do is:

    [node:field_name:values:?]

    But the actual token value might be:

    [node:field_name:values:0:node:author:created]

    We need to preserve a "drill down" ability for chained tokens even when the delta is unknown.

Proposed resolution

One of the first blocking issues is what to output from the token at the "top level" of the field. We suggest the following approach:

> node
  > field_attendees (depends on field name) <-- Field output from field_view_field()
     > values <-- everything under here is raw values from the database.
       > 0
         > uid = 345
         > name = User 345's name
       > 1
         > uid = 453
         > name = User 453's name
     > field <-- everything under here is metadata about the field itself.
        > field-name
        > help-text

So in Drupal 6, a Pathauto definition for this field might've been:

user/[field_attendees-uid]

In Drupal 7, it will be this instead:

user/[node:field_attendees:values:0:uid]

While this requires more typing, it is also *way* more precise, and allows you to get to *exactly* the data you want, without undue asumptions.

To handle the dynamic help bits, Dave has proposed using a base token type of "Array" (see #1195874: Need to figure out how to create nested tokens from the array token type) which could then be "extended" by field tokens. This would allow us to give a token type definition to a 'first' and/or 'last' keyword value, from which we could also provide chained token help for a field's value.

Remaining tasks

The proposed architecture in the issue summary needs to be vetted by folks with field API experience. Constructive comments welcome! :)

Solving #1195874: Need to figure out how to create nested tokens from the array token type is the biggest blocker, as we can then re-use this same mechanism for fields. This currently works for simple key => value arrays, but needs to handle a situation with chained objects. See the issue for more details.

Once that's finished, then we can adjust the current field token code to conform to the spec in this issue, utilizing the array pattern.

Also note that #1058912: Prevent recursive tokens is something that's getting encountered as well while building up token lists.

User interface changes

TODO: Fill in later.

API changes

TODO: Fill in later.

Files: 
CommentFileSizeAuthor
#4 1222592-field-tokens-round2.patch5.88 KBDave Reid

Comments

I know this issue should be in the token module issue queue, but this is such an important issue that touches core system and the resultant code should eventually be moved to core that it needs to have a core-level discussion involving core and field API maintainers.

[node:field_some_node_reference:field:machine_name]
Why not just [field:field_name:something], field metadata doesn't really feel like part of the node to me.

Also is this going to be the field info or the field instance info? Or both?

values <-- everything under here is raw values from the database.

This means raw values from the node object? I'd assume it'd be taken from the node object after hook_field_load() has run.

[node:field_some_node_reference:values:0:author:mail]

This implies loading the referenced node, then loading the author of the referenced node. I guess the chaining was designed to allow that kind of thing so it's probably OK..

For individual field values, assuming you'd want to use http://api.drupal.org/api/drupal/modules--field--field.module/function/f... ?

We can ignore the stuff under [node:field_some_node_reference:field:*] for now - that's only the 'field-instance' token type that would provide information about the field itself and not its data - plus its not really important to the RFC.

The [node:field_some_node_reference:values:0] would be a chained 'node' token, so it would have all the sub-tokens of the node token type. Yeah for the [node:field_some_node_reference:values:0] token itself it would make sense to use field_view_value() + drupal_render() - I think maybe the initial token work used this approach too.

StatusFileSize
new5.88 KB

Here's my initial patch for adding support for just the base field tokens (e.g. [entity:field_name]) and nothing yet chained beyond that. This can be a relatively easy win as it will easily fill the use-case for single-value fields. Even though we're rendering the tokens as full field output (which are markup-heavy) we can still use them in modules like Pathauto and Real name since we run strip_tags() on the token output.

Summary of patch:

  1. Uses field_view_field() with a custom display settings array('label' => FALSE) rather than using a 'token' view mode because using a view mode would require users to manually configure every field's view settings to hide field labels (which most people will want to do). We can add support for a rendered format using [entity:field_name:rendered] and possibly [entity:field_name:rendered:[view_mode]], but that's for the more future-looking version.
  2. Adds a pre-render to the renderable array from field_view_field() that adds a space as a suffix for each field item except for the last one as the field may appear 'smooshed' when rendered. This also makes multi-value fields show up as term1-term2 when used in Pathauto rather than term1term2.
  3. Doesn't bother converting each field name into a 'token field name' (converting underscores to dashes) like used in the original approach as its a waste of time to convert between the two. Not sure why I even did it with profile fields.
  4. We need to figure out what we can put for the 'name' and 'description' properties in field_token_info_alter(). The problem is that fields have different names and optional descriptions depending on the entity type and specific field instance/bundle. There's no one canonical field name/description.
  5. Note we use field_token_info_alter() because now we have to be considerate of the dang entity API tokens that 'beat' us to field tokens so that both modules don't try to return a value for the same token - causing an array_merge explosion.

Discussed a bit with Dave Reid in irc, we could skip some of the field theming and just get the individual items themed, I was suggesting looping + field_view_value() but that's a bad idea. Unsetting #theme at the top level of field_view_field() as suggested by Dave ought to work though.

Main thing is to avoid rendering labels and markup then stripping it again wherever possible.

Related question: Should token contain fully-themed HTML chunks?

With incorporating full field-formatters we go the way to have full blown HTML that is intended for site-output in tokens, e.g. consider formatters adding some javascript magic.
That somehow drives text containing token replacements into a simple theming-language. Do we want it to go in that direction?
I'm not sure about it either, thoughts?

To be clear, we are only talking about using rendering/formatter output at the base token level for now (e.g. [entity:field_name]) - because those tokens have to be output in a consistent way and we really only have field rendering in core. By unsetting $elements['#theme'] as well we avoid all the div-heavy field markup. I have tested that this works absolutely great with Pathauto and Real name, some of the big use-cases for tokens. Phase 2 of implementing field tokens (which is part of the original RFC) would be specifically handling multiple values, drill-downs, and chained tokens beyond [entity:field_name] that gets the raw data if the user needs it.

subscribing.

I'd recommend reaching out to eaton and yched personally about this.

I have reached out to eaton, yched, and karens via e-mail begging for their help and reviews.

no time to read up tonight, but subscribe

I've got an hour or so train ride ahead of me, and I'll be reading the proposal and the current feedback. Thanks, Dave!

I so rarely subscribe with comments, but I don' thave enough braincells to absorb this now and want to make sure I can find it later. :)

Read through the OP and the followups, here are my comments so far. I'm afraid they possibly carry more questions than answers :-(.

About using field_view_field() :
Removing '#theme' = 'field' from the render array solves the "label + extra markup + dependency on theme layer" issue.

Yet this still goes through a formatter. The patch in #4 uses the default formatter for the field type, with its default settings, but this is in no way guaranteed to be the one that makes most sense for tokens - and "good for tokens" cannot be the primary criteria for field type authors when deciding which is the default formatter for their node type (and its default settings).
More generally, I tend to advocate for keeping formatters outside tokens. Formatters produce HTML output for a web page, that's the contract they signed. Also, some formatters do go through the theme layer internally, so independency on the theme layer is not guaranteed. Plus, as fago pointed, some add CSS, JS in #attached...

Can't we rather require each field type to explicitly implement a token hook (hem, magic callback...) that generates the output of their base token ? No hook = no base token (empty string).
[node:field_foo:values] = output of the hook for each value, space separated
[node:field_foo:values:0] = output of the hook for the specified value
Maybe that hook is not even strictly token-related. We face a similar issue when we want to generate textual values for an XML or JSON output of a node.

Aside from that top-level token, I guess each field type also wants to provide more specific tokens for their data type. Filefield D6 provides a series of specific tokens, Date does too, noderef has nid, title, url of the node page... (hm, are all of those in fact handled by chaining ? Well, at least text fields have 'raw' and 'filtered')
--> subtokens like [node:field_foo:values:0:SOME_VARIANT] ?
But then I'm not sure how do prevent clashes with chained tokens : [node:field_noderef:values:0:ANY_TOKEN_VALID_FOR_A_NODE] ?

Other, less central remarks :

[node:field_some_node_reference:values:0:author:mail]
[node:field_some_node_reference:field:machine_name]

is the 'values' part really needed in the 1st one ? The part after 'field_name' is either 'field' or a numeric value, so that's not ambiguous ?

We need to figure out what we can put for the 'name' and 'description' properties in field_token_info_alter(). The problem is that fields have different names and optional descriptions depending on the entity type and specific field instance/bundle. There's no one canonical field name/description.

Yup, instance properties vary per bundle, by definition. But any token is evaluated against a given node (or some other entity type), right ? If you have a specific node, then you have a specific instance, and can get specific values for the label, help text, etc... Or am I missing something ?

the best we could do is [node:field_name:values:?], but the actual token value might be [node:field_name:values:0:node:author:created]. We need to preserve a "drill down" ability for chained tokens even when the delta is unknown.

Not sure it helps, but even if the delta is unknown, the stuff that can be chained after that will be the same for all deltas ?

Yet this still goes through a formatter. The patch in #4 uses the default formatter for the field type, with its default settings, but this is in no way guaranteed to be the one that makes most sense for tokens - and "good for tokens" cannot be the primary criteria for field type authors when deciding which is the default formatter for their node type (and its default settings).

Thinking what if we add a 'default_token_formatter' to hook_field_info_alter()? That way we could default the formatter for term reference fields as plain text rather than the link.

I think the idea is that we almost have a reliable way to output those base tokens and I don't feel that it's good DX to require the field-providing modules to require more hooks just for tokens. We'd basically be making a 'light' formatter rather than using the existing system.

Aside from that top-level token, I guess each field type also wants to provide more specific tokens for their data type. Filefield D6 provides a series of specific tokens, Date does too, noderef has nid, title, url of the node page... (hm, are all of those in fact handled by chaining ? Well, at least text fields have 'raw' and 'filtered')

Yep, the plan is to handle this with token chaining. For example, node reference would use [node:field_node_ref:values:0:nid] or [node:field_node_ref:values:0:title] since at the point of [node:field_node_ref:values:0] would be 'node' tokens.

is the 'values' part really needed in the 1st one ? The part after 'field_name' is either 'field' or a numeric value, so that's not ambiguous ?

Yes since we delta could be any value, we need to define a 'dynamic' token which cannot provide token help 'beyond' it. If the root-level field token was the dynamic token, we wouldn't be able to provide any type of token help for the field.

Yup, instance properties vary per bundle, by definition. But any token is evaluated against a given node (or some other entity type), right ? If you have a specific node, then you have a specific instance, and can get specific values for the label, help text, etc... Or am I missing something ?

Token *help* is generated ahead of time regardless of context, so there could be any number of labels and descriptions for a field if it's attached to multiple content types.

Not sure it helps, but even if the delta is unknown, the stuff that can be chained after that will be the same for all deltas ?

Yep, that's the plan.

I spent some time reading over the proposal and the first-patch pass, and I think this is going to be our best path forward; moving forward it might be useful to allow field-providing-modules to specify what formatter should be used to build the token by default, but this approach works and requires no special changes to field modules for the time being.

I've spent some quality time in IRC picking Dave's brain about it as well, and it's got my unreserved +1.

I'm probably going to proceed with the plan to add 'default_token_formatter' info for core fields in token_field_info_alter() and use formatters for the base-level tokens as we can improve them down the road but since the token names will not change, they'll be safe to implement.

Adding a 'default_token_formatter' info property sounds like the best way forward to me as well.

The 'default_token_formatter' will work for the raw/plain-text case, in which the result must always meet plain-text criteria; i.e., don't contain any HTML.

However, I think that we have to have a way to specify the formatter to use for a token when producing markup. The default field formatter will work in some scenarios, but I imagine there will be many situations in which the default formatter output is not suitable for the context in which the token is used.

Lastly, while specifying the formatter is probably needed, tokens should not care and carry information about the markup language being produced. The markup language (text/HTML/JSON/XML/etc) is a context information that needs to be passed on and negotiated between the calling code and hook_tokens() implementations (and subsequently, field formatters). It basically means to replace the 'sanitize' option/flag with a 'format' option.

Subscribing

Subscribing

Subscribing

Basic level field tokens (e.g. [node:field_tags], [node:field_image], etc.) have been committed to Token and included in 7.x-1.0-beta4: http://drupalcode.org/project/token.git/commit/7f722d2

Next up working on tests for the basic-level field tokens, and then the array issues for nested/detailed field tokens.

subscribe

subscribe

sub

I'm late to the party but trying to understand what is going on here, especially with respect to finding a way to do complex field tokens like date fields. I think specifying a default token formatter is a great idea and I'll add that to date -- that can be a plain text version of the date field, with no markup. So I guess this next step will deal with how to get other information to the token implementation, like timezone and format and *which* instance of a multiple value date to use (delta isn't so helpful in that case, but I imagine we can't get any more complex than that). I'd be happy if we could at least get something working for the first value of the field.

Ad #3. Please don't neglect the need of getting values of custom fields in referenced entities.

I don't know if it is important for RFC, but lack of deep enough drilling for field values in Drupal 7 is a real and painful obstacle for making better use of Rules etc. For example there is no way to get those values in the fetched entities.

A token like [entity-ticekt-fetched:0:field_custom:value] is badly and urgently needed.

It is understood and not being ignored, but it has to be done right to move into core. If you don't have something to constructively comment on this RFC, please don't comment.

I have an idea, maybe not very good, here is my architecture for field token:
[node:uid]|[user:mail]
[node:field_nodereference:0:nid]|[node:title]

Instead of using a long chain, we could split it into several sub chains, and allow
The result of first sub chain could be passed to the second sub chain,
The result of second sub chain could be passed to the third sub chain,
...
For example,
[node:uid]|[user:mail]
This is a coumpound chain, it contains two sub chain [node:uid] and [user:mail].
We get the uid from the first chain and pass this uid to the second chain.

[node:field_nodereference:0:nid]|[node:title]
This is also a coumpound chain, it contains two sub chain[node:field_nodereference:0:nid] and [node:title]
We get the reference nid from the first chain, and pass this nid to the second chain.

Maybe my method is a little stupid, But there are a lot custom field types, I think this method could simply this complex issue.
The structure of Field in drupal 7 and 8 is very complex. If we could split a long chain into small ones, and allow the result of previous chain to be passed to following chain, this complex issue could be break down.

Maybe this will be a little better:
{[node:uid]|[user:mail]}
{[node:field_nodereference:0:nid]|[node:title]}

{} means this is a compound chain(token),
| represent the separator between sub chain.
The result of previous chain is the ID of following entity.

This will be allowed:
{[node:field_my_custom_field_which_is_not_noderefrence:0:value]|[node:title]}

I have build a module compound_token in drupal 7 using my ideas. This is an example in Drupal 7:
[node:uid|user:mail]
[node:body:default:0:value]
...
These tokens could not be found through token tree.

If we provide this feature in core, then it could save a lot of code for custom field module.

I also find a bug in drupal7 of token.inc when i build module compound_token, not very sure:
hook_tokens_alter() was called before hook_tokens, http://drupal.org/node/1541642 ,

Could any core maintener review my module in drupal 7, and review my patch to issue [1541642] ?

@Dave Reid: Any updates on this? I think that most of the proposal makes sense, and we should really try to get something in core for D8.

So we should forget about a solution for D7 already ???

So we should forget about a solution for D7 already ???

Until it is committed to D8 then yes. Only then will it be backported.
You can read about the backport policy here.

No, no. Its not that. Its just that Sun's post @35 sounds like "I have lost all hope for D7, but at least lets do something about D8". I dont know if he really meant that but thats how it sounded to me. And I have just a little hope that this issue will be fixed for D7.

I was skimming through the token related issues, and I did not see any of them discussing how the new data type api and Entity Field/Property API affects tokens. It seems to me that this APIs should make a lot of the field token issues much more straight forward to deal with.

Any related news about this issue?

#2164635: Automatically expose typed data to token API seems to aim to tackle this problem with a more generic approach.