Problem

  • Configuration settings may need to be changed depending on language settings.

Goal

  • Settings marked as translateable should be able to be arbitrarily saved or retrieved for any of a site's enabled languages.

Status

This issue is now [meta] and produced have 3 options

Discuss away in the relevant issues.

Details

  • We will need to handle both translated text and internationalized data (different depending on the specified language).
  • We will want to specify which settings are internationalized. The default will be that settings are not internationalized.

Proposed resolution

  1. Configuration files will have a language extension added to them. For instance, English configuration of the performance settings would be stored in system.performance.en.xml whereas Swedish performance settings would be stored in system.performance.se.xml.
  2. The language chosen at install time will have its configuration installed, and any subsequent language configs will only contain diffs from the site default language.
  3. A new field for language will be added to the active store.
  4. For each file and language combination, configuration is loaded and saved into the active store. For the site default language, the complete configuration is loaded and saved. For other enabled languages, the diffs are merged into the default configuration and saved.
  5. Pieces of configuration that can be internationalized will be marked as "translateable" either in the XML attributes or in some other similar mechanism if XML disappears.
  6. Items that are translateable will always be saved into the configuration settings of the language the user is currently browsing in. Items that are not translateable will be always be saved into the site default language configuration.
  7. All public APIs for config CRUD operations will have a language parameter added which will default to the site's default language.

Notes

  • Originally it was thought that we could save all languages in an individual file, with the languages set as attributes like this:
          <site_information>
            <site_name lang="en">I am awesome!</site_name>
            <site_name lang="se">Jag är grymt!</site_name>
          </site_information>

    However this means that when languages are installed, all their configuration will have to be merged into existing files and saved, which is much more complicated than the current solution which just has them copied at install-time. The downside of the current solution is that you will at times have to push multiple files to encompass all changes to a piece of configuration. I think this is a good compromise to keep the architecture simpler.

  • Another downside of this system is that it has a second place where we store translated text aside from the .pot files.
  • This system is similar to (albeit simplified from) the way that Java Resource bundles are internationalized (see http://java.sun.com/developer/technicalArticles/Intl/ResourceBundles/).
  • At one point yched brought up the option of replacing the strings in the configuration files with placeholders, storing the internationalized versions elsewhere (such as in the .pot files). The ramifications of such a change have not been fully thought out.
  • Past discussion can be found at http://groups.drupal.org/node/185609
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

yched’s picture

So my proposal about having string placeholders in config files - something like :

<field_instance>
  <field_name>field_foo</field_name>
  <entity_type>node</entity_type>
  <bundle>article</bundle>
  <label>%placeholder_1</label>
  <description>%placeholder_2</description>
  ...
</field_instance>

I was initially considering that the translated values would be appended in a separate <strings> section at the end of the XML, but somewhere else (pot files) could work too, I guess.
The idea is that the actual strings would be replaced before getting to the active store (which stores a separate copy by language).

This ensures that the "object" (a field instance, a view, an image style...) has exactly the same structure and values for non-translatable properties between languages, and that only the translatable values differ.
Really nasty things could happen if this is not the case. A field instance has to be essentially the same regardless of the language. A view should not have different contextual filters depending on the language.

I have worries on that aspect if the config system does not ensure consistency at its core. Separate config files by language always leave room for desynchronization. Especially : "6. Items that are translateable will always be saved into the configuration settings of the language the user is currently browsing in"

Gábor Hojtsy’s picture

@yched: I agree consistency is key, and translations "of the same thing" should be in the same structure. The "thing" should be defined pretty well though. Eg. menus can have separate items per langugae (menu to menu item relations would be separate for example), so if the thing is the whole menu, the consistency is not realistic to assume. If the thing is a menu item, the consistency is good to assume IMHO. (However, it sounds unlikely we would store one "thing" per config file, would we?) So how do we ensure that consistency? I think its a bigger question than removing strings from the source.

(Side note: I think the biggest open question remains how will the UI ensure this consistency. If we use the original editing UI to edit views translations, it would be pretty hard to ensure no new filters can be added, and only certain pieces of filters can be edited, right? If we separate it entirely, we need an enumeration of all translatable fields and need to reproduce some structure for the translation UI (in effect design a specific translation UI for each object type). Anyway, that is a far reaching question, but I think that is going to be the hardest part (proper storage and retrieval of translations notwithstanding), and a great reason I'd have loved to get to this point *much* sooner. It looks unlikely we could solve that problem in Drupal 8, so we should at least concentrate on solving the backend proper.)

Gábor Hojtsy’s picture

Priority: Normal » Major
Issue tags: +language-config

One more note: the primary use case for this in D7 to consider as a first implementation piece is date formats. We might want to have an example implementation at first with this patch, so the separate issue at #1431292: Migrate date format configuration to CMI might or might not need to live on. Carrying on the config tag though and marking major.

andypost’s picture

First of all each config file should have language assigned, when we know language of strings in file then we can use any t() or tt( in "context" = "config.
")
For that purpose would be better to have some special function ct() or extend t() to accept not only English strings by default

Gábor Hojtsy’s picture

@andypost: the plan is not to mix the config translation system with t(). A context of "config" would not really be enough. The same string or number might not be translated the same way in a menu or taxonomy term or as a views exposed block title. Also, the translations of configuration need to support staging the way configuration is staged otherwise. I agree the original version of the configuration should specify its language just like we do on nodes, etc. We cannot just assume all configuration is created in the site default language initially.

agentrickard’s picture

Can we redefine this to say "Configuration settings may need to be changed depending on context.", where the default context supported by core is "language setting."

There are other modules and paradigms -- Context, Spaces and Domain Access spring to mind -- that need to alter configuration and are hindered at time because language is treated as a special case that trumps other cases.

Granted, I have no idea what such a context system would look like, and we have the additional problem of multiple contexts, e.g. vary settings by domain _and_ language.

bojanz’s picture

+1 to #6, I'd love to see that in action (though it's a tough problem to crack and needs a lot of discussion, so apologies if you consider this issue hijacking).
Since we're already mentioning Views, it needs some sort of #6 not just for translation, if it's to use CMI. Same for Rules.

yched’s picture

@gabor :

menus can have separate items per langugae (menu to menu item relations would be separate for example), so if the thing is the whole menu, the consistency is not realistic to assume. So how do we ensure that consistency? I think its a bigger question than removing strings from the source.
[...]
the primary use case for this in D7 to consider as a first implementation piece is date formats

I'm wondering whether those are actually about "internationalization". They are about "stuff that are attached to a language", just like we have "stuff that are attached to a node type and view mode...". 'Language', or 'node type, view mode' are part of the "primary key" to the config object.

"Primary keys in the db" => "parts of the filename in CMI" :
One individual config item for a field instance is tied to a given entity type and bundle name, and those will most probably be part of the config item's name (entity.[entity_type].bundles.[bundle].instances.[field_name].xml or whatever).
Similarly, the config item for a menu structure in a given language is tied to that specific language, which will be part of the filename.
The xml for the menu in fr can have a completely different structure than the one for de, but that's because they are two separate objects, that are not more closely related than, say, two instances of the same field in different entity types. They are related in the sense that you want to act on them when the menu gets renamed or deleted, just like you need to act on instances when a field gets changed.

We have two cases to consider, which IMO are fairly different and probably shouldn't be addressed by one single mechanism.

- stuff that are inherently tied to a language : i.e configuration items whose current db schema has a 'langcode' column. By definition, they can be very different from one language to the other. Language is just a part of the primary key / filename, but in the end they are two different and mostly unrelated config objects. No i18n considerations in there, and CMI is already fully equipped for that, AFAICT.

- the rest (most common case): stuff that is "the same" regardless of the language, but has some i18n "parts" in it: a view, a field instance, an image style. That's what we need to address, preferably in a way that ensures that the object stays globally the same whatever the language, except for a couple precisely identified snippets. That's where the placeholder approach can come handy IMO.

Gábor Hojtsy’s picture

@yched: yes, that did not yet address the granularity question though. If we have an XML file for a whole menu tree lets say, I cannot have menu items that are tied to language in this logic, right? If I have an XML file per menu item, then I can express that language "primary key" in the filename as you suggested. So my point was that the granularity also defines what detail can be tied to language and if tying to language is only possible on the whole XML file level, then we'll need to break into as many XML files as deep we want to support tying to language (such as individual menu items).

yched’s picture

[crosspost :-) ]
On a side note :

if the thing is the whole menu, the consistency is not realistic to assume. If the thing is a menu item, the consistency is good to assume IMHO. (However, it sounds unlikely we would store one "thing" per config file, would we?)

Yeah, the files layout and exact level of granularity that is fit for a given config "thingy" is something that needs to be figured out by each subsystem on its own, and for which we sorely miss heuristics for now. I guess those will emerge after a couple subsystems have been migrated. How to map a config structure to SQL tables is pretty much intuitive for anyone, but we still need to learn about CMI files.

Gábor Hojtsy’s picture

Ha, right. So if we can ensure that the level of granularity will be so that language assignment can happen without structural changes per language THEN, in that case we can assume we only need value replacement for translation retaining the original structure. In that case, whether we use your "outsourcing" solution to store values elsewhere or repeat the structure with separate values is a matter of preference, we get the same results. The higher layer for CMI or at worse case each module implementation would then need to take care of the merging of "thingy" configs based on language (and/or other conditions), to produce a menu structure for example proper to the page language based on the objects stored in various files. I do think that is a common problem for modules using the CMI so not sure that level should all be pushed on module authors, and therefore my notes that that is just as important for language support on the config layer.

We can keep this issue definitely with the assumption that config objects are stored and assembled into bigger structures with language as part of their primary key where relevant, but that actually needs to be taken care of elsewhere :)

sun’s picture

Feedback in arbitrary order:

  1. A langcode within config object names doesn't make sense, because the config system needs to be able to look up config objects, and the default language, as well as available languages, is configuration already. Chicken and egg.
  2. D8 de-special-cases English in various areas. Enforcing English as default langauge contradicts those efforts.
  3. Language-specific sub-directories were proposed at some point, which would contain the identical filenames, and those files would only contain the overridden config object keys/properties (to be merged into the main file); i.e.:
    ./config/entity.node.bundle.article.xml
    ./config/de/entity.node.bundle.article.xml
    

    However, that

    • contradicts the default language issue again
    • is incompatible with multiple contexts
    • is entirely incompatible and cannot be represented in the schema of the active store
  4. If value translations will happen to live within a configuration object, then they must not rely on a syntax or semantics that is only available in XML (such as element attributes).
  5. Putting placeholder tokens into configuration objects makes no sense. It implies that all values in configuration objects would be abstracted data objects, which cannot be handled as-is. Before we do that, we'd put no translatable strings into configuration objects at all in the first place. (Also, placeholder tokens would tie into t(), but t() cannot be used for user input, since translations vanish as soon as the original/source string changes.)
  6. Overall, I'm on the fence with all of the current proposals, and would like us to take a big step back and rather reconsider an approach along the lines of i18n module - i.e., a dedicated data store for context-specific translated strings pertaining to unique/dedicated objects (of any kind); e.g.:
    context                      key         language      value
    entity.node.bundle.article   label       de            Artikel
    entity.node.bundle.article   label       xx-lolspeak   ATICL!
    system.site                  error.404   de            Nicht auffindbar.
    
    • No impact at all on monolingual sites (only English, or only German)
    • Only invoke translation when actually needed -- e.g., prior to rendering
    • Not limited to translating configuration objects; context can/could be something else
Gábor Hojtsy’s picture

A langcode within config object names doesn't make sense, because the config system needs to be able to look up config objects, and the default language, as well as available languages, is configuration already. Chicken and egg.

Well, then your (3) uses langcode in the path which is just a slight variant of that :)

D8 de-special-cases English in various areas. Enforcing English as default langauge contradicts those efforts.

Looks like everybody is an agreement on that :)

Putting placeholder tokens into configuration objects makes no sense. It implies that all values in configuration objects would be abstracted data objects, which cannot be handled as-is. Before we do that, we'd put no translatable strings into configuration objects at all in the first place. (Also, placeholder tokens would tie into t(), but t() cannot be used for user input, since translations vanish as soon as the original/source string changes.)

What would the placeholder tokens suggested by @yched have to do with t()? I don't see that connection!

Overall, I'm on the fence with all of the current proposals, and would like us to take a big step back and rather reconsider an approach along the lines of i18n module - i.e., a dedicated data store for context-specific translated strings pertaining to unique/dedicated objects (of any kind)

Ok, well, we did not talk much about this, but any external data source (placeholders, separate XML to be merged, separate key-value store) would meta-information on what is translatable in config structures. Most of the i18n complication is that by pulling these out of config entirely, i18n is responsible for knowing how to store these, what kind of widgets to show for them when being edited, etc. I don't see how any approach to config translation can escape those problems, which are the real meaty problems. Your key-value store is just a variation on @yched's externalized strings, you have IDs for the strings, you just don't store the default language version in this store, while @yched would (which BTW makes it easier to swap language on a config object - in your concept, we'd need to copy the text from the config object to this store and the language values from here to the config store in that case, another issue i18n suffers from).

Also, how did we store these values? In CMI? These are part of configuration and would need to satisfy the same goals of stageability, deployability and versionability.

yched’s picture

A langcode within config object names doesn't make sense, because the config system needs to be able to look up config objects, and the default language, as well as available languages, is configuration already. Chicken and egg

I don't see that as a chicken and egg problem.

Accessing the list of enabled languages would be done by reading either :
- locale.enabled_languages.xml (if "every language in one file" scenario)
- locale.enabled_languages.*.xml by prefix (if "every language in its own file at locale.enabled_languages.[langcode].xml" scenario)
But you don't need to know any information about languages to access that info, and that's what matters.

Then you can access some.path.with.a.[langcode].in.it.xml for a specific [langcode] you know you're interested in, or query for the list of files with a some.path.with.a.* prefix to get the list of available files.

I don't think we intend to state that "we cannot put [node_type_name] in xml file names because the list of node types resides in config", do we ? Then how is that different ?

gdd’s picture

Priority: Major » Critical

Now that the patch has landed, this is a critical followup task

sun’s picture

Issue tags: +Configuration system
Dave Reid’s picture

I'd like to reiterate what #6 said - we can't support just translation. There are other contexts necessary for contrib to be able to modify or use alternate config.

gdd’s picture

@davereid If we manage to keep the existing system of being able to override variables very early (essentially the same thing we use $conf for now) wouldn't that keep the same functionality we have? I am adamantly against increasing the scope of this issue. If someone wants to make another one that is 'implement contextual configuration system' that is fine, but it won't be critical and internationalization alone is.

Gábor Hojtsy’s picture

Issue tags: +sprint

Copying our Denver discussion feedback from #1542578: Multilingual and translatable configuration to avoid issue duplication. We should discuss and merge this into the issue summary :)

Problem

  • Drupal configuration cannot be translated or localized.
  • All contributed solutions and efforts happen after the fact and thus impose idempotent problems; i.e., things cannot be safely loaded and saved.

Goal

  • Make configuration translatable.
  • Make configuration idempotent; safe to be written regardless of current language.

Terminology

Language-specific configuration
Configuration that is limited to a specific language condition and only exists in that context/condition.
Multilingual configuration
Configuration values (not necessarily human-readable strings) can be different per language.
Translatable configuration
Human-readable string values in configuration can be translated into different languages.

Given this terminology, read the Goal again.

Architecture

Translatable configuration architecture

  • Architectural design considerations:
    • [withdrawn] Translators should not change configuration. Configuration should not change just because it was translated.
    • Modules should be able to ship with pre-translated configuration.
    • Monolingual (single-language) non-English sites do not necessarily need English configuration values. Only default configuration shipped with modules may be in English.
    • Retain easy Developer Experience (DX) and performance for monolingual English-only sites.
    • Keep the configuration system clean and lean. Language support must not turn it into a complex beast.
    • A single request may require configuration values in any number of languages/translations (e.g., for sending e-mails).
    • The configuration file format might lead to escaped Unicode string garbage for translated configuration values (e.g., JSON).
    • Not all values in a configuration object are translatable.
    • Each translation of a configuration value has to go through the same validation as the original/source value.
    • Localized and multilingual (language-specific) values will be required in addition to translations of human-readable strings (e.g., date formats, or language-specific recipient e-mail addresses for a contact form). While business logic for localized/multilingual values differ (configured by site administrators, not translators), the storage handling/requirements are the same.
    • Loading » saving » loading a configuration object must be idempotent, regardless of current language. Configuration values must not be altered or overloaded after loading.
  • Especially the last two points mean that language-specific values are best stored and maintained within a configuration object (like field translations).

Design / Implementation

Translatable configuration design implementation

  • Language is part of the configuration key names.
  • The configuration system internally handles and negotiates the language keys. Not specified manually.
  • Each configuration value always has a fallback value for any language. Used if there is no translation/language-specific value.
  • Configuration values can be untranslatable. But that does not matter for the configuration system/storage. The configuration system will not care for that business logic.
  • The language key is always the first key for each key/value in a configuration object. This allows to generate the translated/language-specific values more easily (overloading all fallback values with all language-specific values.)

    (Disregard the second language key in the photo; result of brainstorming.)

  • An additional/separate system to identify translatable/multilingual configuration values is required.
    • An (XML-alike) schema approach to identify translatable values in a configuration object will not work, because configuration objects may be extended with sub-values by other modules.
    • Configuration for which configuration values are translatable/multilingual will not be supported. The information about translatable/multilingual configuration values will be hard-coded. (for D8)
    • Identification of translatable/multilingual configuration values does not matter for the implementation, will be figured out later.
  • An additional/separate system for the User Interface for translatable/multilingual configuration values is required.
  • Conclusion: Decouple the property API from the entity system, to make it not only work for fields and entity properties, but also for configuration object properties. This gives us:
    • Type: The data type of a configuration value is clearly specified.
    • Widget: A dedicated widget for each configuration value. (unblocks a separate translation interface)
    • Validation: Each configuration value can be validated independently. (unblocks a separate translation interface)
    • Meta information: Required values, single/multiple values (cardinality), identification of translatable values.
Gábor Hojtsy’s picture

So the main differences between (#19) and the original proposal here is that (a) we'd include the language specific versions of config on the same level for each language, like fields and (b) we'd skip supporting configurability of what part of config is translatable, instead we'd bake it in. Any opinions? Let's not waste our time with going silent on the hard questions now and then freaking out later when it is not implemented like we wanted :)

jcisio’s picture

Can we simply add a handler for each config default value? For example for user.account_settings we have:

<config>
  <anonymous translatable="1">Anonymous</anonymous>
</config>

where translatable="1" is a shortcut for handler="t". When the function t is called, the context "user.account_settings" will be used. With this context, config is easily translated in the current l.d.o like any string.

That's for the default value. Also, for custom value, we'll need a new field in the active store for language.

gdd’s picture

Some initial reactions:

- I really really hate having the language be part of the key. It is hackish and is going to make the files harder to read. Why not use attributes for this instead?

- How are fallback values for configuration specified? Does everything have to have an 'und' key specified?

An (XML-alike) schema approach to identify translatable values in a configuration object will not work, because configuration objects may be extended with sub-values by other modules.

Can you elaborate on this? I'm not really sure what you're trying to say here.

- The UI/form issues seem separate to me and should be broken out into a separate task issue. How this stuff is handled on the front end isn't really the config system's concern as long as we can store and retrieve the information properly.

Overall honestly I'm just having a really hard time picturing how this is going to look and/or work. I find the internationalization stuff in general to be a struggle to understand, and since I don't know anything at all about the property api, the whole picture has just turned into a big mush. Some sample code or a longer description of how the pieces work together would really help me a lot, but my initial reaction is that we're adding an enormous amount of complexity to something that might not really need it.

A description of why other proposed solutions wouldn't work would help me out a lot as well.

David Strauss’s picture

This combination is a non-starter:

  • Language is part of the configuration key names.
  • The configuration system internally handles and negotiates the language keys. Not specified manually.

If the configuration management system internally understands which value to set/get based on i18n, we would store it in a semantic way, not munged into the key.

The primary issue we're facing is a where we want to be on the following spectrum.

No built-in i18n

Extremely clear path for getting and setting any configuration value from the GUI. Some possibility to wrap the Config API where necessary with an i18n-aware interface. GUI editing would be possible but largely the burden of modules using the Config API.

Basic i18n

Single i18n context for each user. Concept of default and localized value. Ability to mark values as translatable. But, clearly, if we opt for compromise on capability (versus the "full i18n" below), we would do so to allow GUI editing.

But, GUI editing even with this simple model raises some questions:

  • Do we allow editing only of the user's own language value, saving that back for only, say language "es"?
  • What happens if the user needs to set the default value or other languages for a translatable item like the name of a View?
  • Do we provide a fancy widget for translatable configuration that defaults to setting the value for the current language but allows opening a pop-up/overlay with all the languages for the site?

As suggested by Gábor, decoupling the Property API from the entity system and using it for the Config API might be a start. But, we'll have to keep in mind that Config API widgets will appear in more places and in more advanced interfaces (like Views) than the entity/property configuration interface, which is mostly under control of core. I also have to admit that I'm unfamiliar with the Property API work done so far.

Full i18n/Context

I say "full" as in the gold standard in the Android/Java/iOS world. This involves a tree of priorities, like en_GB > en > default. Some things, like displaying a date, may depend on language for the month names but country for the ordering/punctuation. We could do this, but it would be at the expense of any reasonable GUI editing interface.

David Strauss’s picture

The ultimate convergence would be a general concept of forms with data-bound controls (including i18n-aware controls) that is usable by entities, fields, configuration, and general forms.

This is not the current Form API; it would have to be a successor. The Form API lacks two critical features:

  • Ability to (cleanly) support arbitrary widgets, including i18n variants. Gábor alludes to this with the #multilingual key, but we already have a DX disaster in the node form/Field API/Entity API/Form API divergence.
  • Proper data-bound controls. The current scheme depends on each adding its elements with naming and placement that allows it to find the value when the form comes back. This is bad because it breaks when other modules alter the form too much. Each form element ought to be data-bound in the sense that it knows what code is responsible for providing its current value and, when saved, persisting its possibly changed value.

This sort of move is not unprecedented: one reason we built the Field API is because we tired of people hammering their use cases into nodes just to get nice widget and persistence features. I feel like we're trying to do the same here by hammering the configuration use case the entity/property system in order to get functionality that ought be part of Form API.

I will now head off and check out what Symfony is doing in this territory to see if this is good cause for dropping a Drupalism in favor of something implemented there.

David Strauss’s picture

Symfony 2 supports a rich form system that is, essentially, what Form API should be in 2012. It supports custom widgets is a way that makes them usable by any form. Widgets can run as a "service" in a data-bound capacity.

Symfony forms support everything we need already for configuration other than translation, and that's easily addable for basic text fields by having Drupal core implement a translatable text custom widget type ("custom field type" in Symfony terminology).

I do not want to add more complexity on the Drupal side. I feel that it requires strong justification to newly implement anything on the Drupal side that is possible within Symfony.

It would be great to have a sprint just focused on converting Config API, Field API, Entity API, and other core forms to the Symfony form system.

David Strauss’s picture

Also, in the interest of not propagating further Drupalisms for CMI, I'm talking with Symfony core folks about how CMI can leverage Symfony's configuration model. On disk, there's little difference. The biggest gap is in GUI capability; Symfony's configuration files are exclusively developer-edited. Symfony configuration also lacks first-class i18n, but it should be possible to layer on that, too.

Gábor Hojtsy’s picture

@heyrocker:

- I really really hate having the language be part of the key. It is hackish and is going to make the files harder to read. Why not use attributes for this instead?

If you use an attribute to tell what language the value is in for the key, you'll also have another place, where you specify the same key with other language attribtues, so it becomes what I'd call a compound key. As in your above example, the "key" is the same for two languages, so to decide which one to load, you need the language anyway:

<site_information>
  <site_name lang="en">I am awesome!</site_name>
   <site_name lang="se">Jag är grymt!</site_name>
</site_information>

So its a compound key in theory. Whether stored in the tag name or attribute or whatever. To keep the discussion format agnostic, it seemed like best to refer to this as part of the key. The example photo has this as an array dimension, your example has it as an attribute dimension.

- How are fallback values for configuration specified? Does everything have to have an 'und' key specified?

That is not true for entity fields either. How would you even conceptually define the "language neutral site name". Its text with words, that are in some language. We need to know at least one language version for the configuration that we have, which was the original submission language. If I create a French site, I'll never enter my site name in Danish or Chinese or English. I'll only enter it in French. Then if I want to translate the site to Turkish and Italian, I'd translate from French to Turkish and Italian. For things I did not translate to Turkish and Italian (I did not diligently visit all config pages that had my French stuff), we'd need to tall back to the French stuff, since that is what the user initially input right. We don't have any other data to fall back on. What else would we fall back to?

- The UI/form issues seem separate to me and should be broken out into a separate task issue. How this stuff is handled on the front end isn't really the config system's concern as long as we can store and retrieve the information properly.

I can wholeheartedly agree. If we scope our problem to trying to solve the componentization of configuration with the property API that will solve our language problems, then it seems like its such a huge problem space and more importantly seems impossible to solve in the D8 timeframe given the speed of changes we can make and the lack of time available (the lack of consensus notwithstanding). So that is why my email summary to you tried to take that topic short originally. @sun's summary has it much more elaborate (which I copy-pasted from the other issue, so although it shows up under my name above, it is @sun's extended writing). I agree we should focus on the config storage and retrieval and language logic API and not on the UI for now. (I honestly **fully doubt** we could even solve the config translation UI problem in core for D8, but having the API there and consistently used is a prime goal).

@David Strauss: you touched a lot on the GUI parts, which @heyrocker and I agree should not be part of the scope here. Do you agree it would make sense to find a solution for config storage/retrieval and language logic without taking the UI into consideration for now? I think the backend needs to support the same set of tools either way. How is it going to be editable could be a great debate, and doing it here would muddy the waters even more, which seems like pointless given we cannot even agree on the basics of storage/retrieval and language logic.

Plus almost nobody seems to care about this question, they just want to see it solved in a great way. Heh. A typical conversation (although this one about entity stuff): https://twitter.com/#!/marcvangend/status/198330019485978624

gdd’s picture

If you use an attribute to tell what language the value is in for the key, you'll also have another place, where you specify the same key with other language attribtues, so it becomes what I'd call a compound key. As in your above example, the "key" is the same for two languages, so to decide which one to load, you need the language anyway:

Note that my most recent thoughts on this centered around keeping language-specific information in separate files, not in attributes. While this is still a composite key of a type, its much more understandable and manageable.

Each configuration value always has a fallback value for any language. Used if there is no translation/language-specific value.

We don't have any other data to fall back on. What else would we fall back to?

These two statements are in conflict.

Plus almost nobody seems to care about this question, they just want to see it solved in a great way.

I'd like to point out that its not that I don't care, its that fundamentally this is a very complex system that I have very little understanding of, and every time I think I have a handle on it, I feel like something else comes up that invalidates my initial understanding and I am back to square one again. Adding the property system on top of it just makes that feeling even more confounding. What I would really love is for sun to put together some code that I can look at that shows 1) How this would look from a DX perspective 2) How the files would change either in naming convention or format. This would go a long way towards getting me and others involved I think, because every time I open this issue up I just get really confused and my head starts to hurt.

Gábor Hojtsy’s picture

Title: Implement internationalization of configuration » Implement internationalization of configuration [no patch]
Status: Active » Needs review
FileSize
119.71 KB

- Each configuration value always has a fallback value for any language. Used if there is no translation/language-specific value.

- We don't have any other data to fall back on. What else would we fall back to?

These two statements are in conflict.

Uhm, why? The first one says we always have some fallback value. The second is a detail from where I explained how do we source that fallback value :) Basically the config is always entered at least in one language and then it is translated from there. The original value is the only one that we can reliably fall back on, since that is the one we are sure to have, right? Here is it again in graphical form:

DrupalConfigTranslation.png
(Figure shared at https://docs.google.com/drawings/d/1xem6BixByUZjuBg7kXjvJLFq4R4uAOmMTk3c...)

1. The config enters the system somehow. It either comes from an "export" (AKA shipped with a module/distro) or it comes from a UI (either directly input or via an import UI).

2. Either way, except some very odd cases, each config has some language specific values in it. A site name, a view title, a contact form label, a field description text, etc. So the config comes with its language known.

3. The configs coming from d.o modules and distros will all come in English. If you build a Finnish site though that you do not intend to translate to English ever, you'd not build English views and rules, just as you'd not write English nodes either. So in that case the exported config or the UI based config would come in Finnish as its source. There is no English to fall back on, there is Finnish to fall back on.

4. Then there can be multilingual websites that have configs in different languages. I add a view just for the French portion of the site, which would be in French and then a view for all sections that I add in Finnish and then translate to French. So again, we know there is an original version of the config that came in a language. This is really the only thing to know.

So when we need to display either of the above config in German (for which it was not translated for), what config would we use? Of course for each case, the original version of the config. Which in cases, would be English, Finnish or French. That is the only version of the config reliably available for each. Of course if we do have German of anything, then we'd use that. But otherwise fall back on the original version of the config entered.

I feel like we are going in circles here because I believe we discussed this multiple times really. :)

Finally, there is the case of new config properties coming in, that were previously not available. Eg. views introduces a new display config property or rules introduces a new thing. That would not be available in ANY of the versions of config, either original or translated, or whatever. In this case, obviously, the module needs to have its own default fallbacks (such as Views has now if you go kamikaze and delete stuff from your exported views, it will fill in with defaults when loading). You've also advocated before that in this case Views would need to write an update function to fill in the new things in the config exports. That could also happen, but the module can just as well has its defaults.

Did I not cover something here? Looks clearer now or even murkier?

(Added no-patch and moved to needs review to encourage discussion :)

Gábor Hojtsy’s picture

Ok, so looks like my summary did not help ignite the conversation but rather it halted it. Meh. Is our only hope to try and schedule a phone call with people interested to move this forward?

plach’s picture

No time to dive into the discussion right now, but if there is a call I'll be available.

gdd’s picture

Gabor: Well, it puts things back where I always thought they were. It was sun's statement that had confused me in the first place.

I still don't have a good picture of how this will look from a code standpoint. How will this change the config system's DX? How will it change the files? How will we represent this data in YAML since we won't have attributes (assuming we switch which is pretty likely at the moment.) These are the kind of things I'd like to see to get a better picture of things.

Is there currently an issue around breaking the property system into its own subsystem?

Gábor Hojtsy’s picture

Gabor: Well, it puts things back where I always thought they were. It was sun's statement that had confused me in the first place.

Yeah, I believe I did highlight the minimal number of new things we discussed specifically so you get the explanation in perspective.

I still don't have a good picture of how this will look from a code standpoint. How will this change the config system's DX?

Once again as per our Denver discussion explained above, the values would be in the main config object, much like an entity with its fields. You'd have easy methods to get original/translated values of properties I 'd assume much like we have getters and setters for properties/fields already in Drupal 8.

How will it change the files? How will we represent this data in YAML since we won't have attributes (assuming we switch which is pretty likely at the moment.)

As per the photos from our discussion, we represented the language values as part of the key. You said they should be attributes! David was not happy of them being part of the key or a compound key either as far as I understood. I think its now your turn to suggest an alternative.

Entity propert API issue here: #1346214: [meta] Unified Entity Field API.

gdd’s picture

I talked with sun about this in IRC a bit this weekend and he was able to clarify some things for me. The main thing he pointed out is that since we are hard coding what properties are and aren't translateable, this doesn't have to be added into the files at all. I had missed this point somehow, and cleared up a lot of ambiguity in my head.

In terms of the language, my first choice had always been to have the language-specific data in separate files. This keeps the data isolated and more easily deployable on a per-language basis, as well as removing a lot of complex merging issues when installing new default config or writing data into the files. I think we could still do that, and I would love to try as it would fit more naturally in with how the config system works right now.

I note that the issue linked above is for the Entity property API, but is there an issue for separating the property api from entities entirely?

Gábor Hojtsy’s picture

I talked with sun about this in IRC a bit this weekend and he was able to clarify some things for me. The main thing he pointed out is that since we are hard coding what properties are and aren't translateable, this doesn't have to be added into the files at all. I had missed this point somehow, and cleared up a lot of ambiguity in my head.

I tried to clear this up for you as part of the two main differences between the original proposal and the newly discussed one above in #20: http://drupal.org/node/1448330#comment-5914980

we'd skip supporting configurability of what part of config is translatable, instead we'd bake it in.

As for language data vs. files you wrote this:

In terms of the language, my first choice had always been to have the language-specific data in separate files.

Well, the original config file you maintain also has language data, no? I think you are advocating saving the original version of the config (such as a view created on a UI) in a config file and then saving OTHER language versions in other files. But the original version has language-specific data just as well as any other language version. The same kind of privilegization of initial language does not happen for fields on entities (or any foreign language for UI translation). It does happen for properties on entities at this stage. Just for comparison of how developers will need to deal with language at other places in Drupal.

I think its more important to get something done than debating this forever, so if we want to go the opposite direction for config that we go for fields (both in storage separation and privilegization of initial values), then let's pick that one and go.

Gábor Hojtsy’s picture

Ok, so I wanted to get down to work finally and demonstrate what we'd need to do to even support assigning language to configuration. That is not at all translating configuration, it is merely saying 'this view is German' or 'this contact form config is French' or 'my site info config is Finnish'. I looked through all the D8 development tip and did not find anything that is in a config file but would contain any language specific value. So there is not really a possibility to demonstrate this on existing code.

1. So you do not want to have language data separate from language independent data AND you don't want to have multilingual data in the same storage (file) as the original language data. So the natural workflow for assigning language would be assembling the data and then telling the save routine the language we are saving with.

Pre-modifications:

function image_style_save($style) {
  $config = config('image.style.' . $style['name']);
  $config->set('name', $style['name']);
  if (isset($style['effects'])) {
    $config->set('effects', $style['effects']);
  }
  else {
    $config->set('effects', array());
  }
  $config->save();
}

Post modifications:

function image_style_save($style, $langcode = NULL) {
  $config = config('image.style.' . $style['name']);
  $config->set('name', $style['name']);
  if (isset($style['effects'])) {
    $config->set('effects', $style['effects']);
  }
  else {
    $config->set('effects', array());
  }
  
  // Assume here that here is some language specific data saved such as a
  // display label for the imagestyle (which is currently not the case).  

  // FIgure out if we already know the language of this config. If we do, do
  // not change it. If we don't, save it with the information provided.
  $current_langcode = $config->getLanguage()->langcode;
  if (empty($current_langcode)) {
    // If langcode is not provided, we save with the current UI language.
    if (empty($langcode)) {
      $langcode = drupal_container()->get(LANGUAGE_TYPE_INTERFACE)->langcode;
    }
  }
  else {
    // NULL out langcode so the existing langcode is kept on the config.
    $langcode = NULL;
  }
  
  $config->save($langcode);
}

1. The save API function needs a langcode so people can tell the language to assign.

2. The language should be used for new config objects. For existing config objects, we either need to enforce the API to always require that we are told the language or we should protect existing language information (so if you edit the same config on different UI languages for example, it would not chnage the language of the config). If we make the config language on the UI explicit and don't assume it being identical to UI language or something else, then we don't need this "keep existing language" mambo.

3. We save the whole config object with this langcode assigned.

The result would be something like (note new langcode attribute not present before):

<?xml version="1.0"?>
<config langcode="fr">
  <name>thumbnail</name>
  <effects>
    ...
  </effects>
</config>

It seems pointless to try and add the langcode (or lang if you want to be similar to HTML) attribute on the concrete values that are language specific because (a) we know the whole file will only ever contain information for one language only, (b) we'd hardcode the translatable pieces into the data storage, while we can just forgo that and merge whatever we find in other language specific versions with this.

As I've said countless times, the absolute #1 priority is to be able to tag config with language if that portion of config would ever contain language specific pieces. Since you want to store single language values in a file ever (and you rejected including the langcode in the filename), tagging at the upmost wrapper of the file sounds like it would make most sense. It belongs to the whole file, so whether it is in the filename or in the upmost wrapper it has the same logical effect.

So this is storage and original creation. What happens if you want to change the language of config. Well, if you provide a different langcode on save, sounds like that would change it.

Since you want to keep the original version and the translations separate, sounds like we'd need to introduce language on the set/get methods as well optionally. So you could save translations in other languages but would otherwise overwrite the original language. Something like:

  $config = config('image.style.' . $style['name']);
  // This would set the French label.
  $config->set('label', 'Visible label in English');
  $config->save('fr');

  $config = config('image.style.' . $style['name']);
  // Set English label even though the main config is French.
  $config->set('label', 'Visible label in English', 'en');
  $config->save('fr');

  $config = config('image.style.' . $style['name']);
  // This would get the French label.
  $config->get('label');
  // This would get the English label.
  $config->get('label', 'en');

Note sure how you'd get a whole config object in a language, since that makes saving impossible idempotently.

  // Load the English config even though the original was French(?)
  $config = config('image.style.' . $style['name'], 'en');
  // This would get the English label.
  $config->get('label');
  // This would save English stuff but then would it overwrite language independent?
  // Or would it even set the whole object to English instead of French? Meh.
  $config->save();

Overall this looks like it is shaping up API wise exactly like the getter/setter API for properties and fields on entities, since those get you the original submission value if no language is provided and give you the language specific one if you provide langcode. Except you don't set language with save() there. We could have a setLanguage() or setLangcode() on $config as well if that feels better.

Enough of a coder perspective for discussion? :) :D

Gábor Hojtsy’s picture

So in other words, you'd have the original config file without langcode in its name (because you don't want to) but a langcode in its root tag. Then you'd have all translations with langcode in their filename so you can identify them for loading (and langcode in its root tag or not, indifferent). If the cleanliness of the original filename is worth the inconsistency that is.

Then loading, saving and all operations would assume a language unless told otherwise (saving would assume current UI language unless told otherwise I guess and keep existing language if already present by default). Then if you set the main config language, it would change the original file language. If you set a specific language for a key, if it matches the original file language, it would change that, otherwise it would change the translation. The getter would also know the original config language of course, so it would know where to look if a langcode based value is requested.

I think the trick question after all is what happens if you saved a French view, then translated it to German and now you want to set the view to be German. Would it take all the French values and now stamp it as German (but then you have a German translation which is not OK) or would it swap values from the original file and the translation files (because you wanted to store them on different levels, not in the same file), in which case what happens to the changes you made inbetween loading the config and saving it? Which language does it go to? In sequence:

1. You compose a French config object.
2. Save it as French (now you have one main config file).
3. Translate some parts to German (now you have a main config file and a de override file).
4. Load the original config object (system will know it is French).
5. Make some changes AND save as German (or just save as German).
5. ?!%?

We might not want to support changing language on the main file once it is set, but that poses the question of shipped config vs. single language sites at least. If I install VBO that comes with a set of views but I only have a French site, I don't want the English originals to be staying around and being the master versions, I want my French stuff to be THE config, so I'd want to have that as the master version. We can possibly come up with some solution for that in the initial copying of shipped config. Then the question still applies to any later changes desired unless we want to say flat out that we don't support that, in which case setLanguage() or save() with a language different from the original for an existing config would need to throw and exception and bail out of doing anything.

andypost’s picture

Oh, no! Gabor your example points another edge case of config localization.

There's old issue #606598: Human readable image-style names
In context of image styles we should translate image label but not the machine-readable name (or uuid) of a piece of content.

So who and how would define a translatability of config options?

Gábor Hojtsy’s picture

@andypost: please read the above discussion :) In short: code would define that since that is what i18n module does now and people did not complain much (we have enough complexity anyway). This is not really an edge case it is a pretty core requirement to have language independent and dependent values mixed up and all our discussion above considered that.

Gábor Hojtsy’s picture

Had a discussion about jcisio's proposal in #21 and the possibilities with the more recent discussions; plus the plans for UIs or lack thereof on IRC that might also enlighten some others:

[2:35pm] jcisio: GaborHojtsy: Why it is not possible to have t() translate your translatable objects in a config? http://drupal.org/node/1448330#comment-6002684 1 conf per lang is too complex, it's something like a multisite installation.
[2:35pm] Druplicon: http://drupal.org/node/1448330 => Implement internationalization of configuration [no patch] => Drupal core, configuration system, critical, needs review, 36 comments, 13 IRC mentions
[2:36pm] GaborHojtsy: jcisio: no, its not 1 conf per lang
[2:37pm] GaborHojtsy: jcisio: the translations would only store values that are different from the base language, such as a views title, but not the whole views object
[2:38pm] GaborHojtsy: jcisio: you need to be able to tell the base language of the config, the original language with which it was saved
[2:39pm] GaborHojtsy: jcisio: which is where $config->getLanguage() comes to use
[2:39pm] GaborHojtsy: jcisio: and if you noticed in the code samples, individual values would be saved in other languages, like a human readable label
[2:45pm] GaborHojtsy: jcisio: what do you think of that?
[2:55pm] jcisio: GaborHojtsy: sorry I was on phone. That will be tricky to save different translated "parts".
[2:55pm] jcisio: Is there any use-case that the translatable "parts" are not simple strings or arrays ?
[2:57pm] GaborHojtsy: jcisio: not as far as Drupal 7's i18n module supports now
[2:59pm] jcisio: GaborHojtsy: so that's easy: just mark properties as "translatable". Those properties are string or imploded string (we convert array into imploded string to make it easy for translators).
[2:59pm] GaborHojtsy: jcisio: yeah, well, so far there is no UI layer in CMI
[2:59pm] jcisio: That's what I proposed in http://drupal.org/node/1448330#comment-5933806
[2:59pm] Druplicon: http://drupal.org/node/1448330 => Implement internationalization of configuration [no patch] => Drupal core, configuration system, critical, needs review, 37 comments, 14 IRC mentions
[2:59pm] GaborHojtsy: jcisio: so marking properties as translatable would not inform anything
[3:00pm] GaborHojtsy: jcisio: you still need to write all the code manually to save / load the translated values
[3:00pm] jcisio: Yes, we'll need to add soething in CMI
[3:00pm] GaborHojtsy: jcisio: well, a UI layer is out of question as per heyrocker
[3:00pm] GaborHojtsy: jcisio: it would basically need to redo all forms in Drupal for config, right?
[3:01pm] GaborHojtsy: jcisio: based on mapping for element <=> storage
[3:01pm] jcisio: I think we just need to add a language data in config object
[3:01pm] GaborHojtsy: jcisio: the t() handler that you proposed would not solve any UI problems, since translating a view would be at a very obscure place within UI translation
[3:02pm] GaborHojtsy: jcisio: also context is not the unique key for the source, so if the source string changes
[3:02pm] GaborHojtsy: jcisio: you loose the translation altogether
[3:02pm] GaborHojtsy: jcisio: and need to start over
[3:02pm] GaborHojtsy: jcisio: so if the context is "user.account_settings" then it would be translatable, but the UI would look even more basic
[3:03pm] GaborHojtsy: jcisio: not just going to a different place but translate obscure IDs as if they were source strings
[3:03pm] jcisio: GaborHojtsy: ok, it begins to sound something to me 
[3:03pm] GaborHojtsy: jcisio: also, configs can be in any source language
[3:03pm] GaborHojtsy: jcisio: so some configs you translate from French to Spanish, others from German to Spanish
[3:03pm] GaborHojtsy: or Spanish to French 
[3:03pm] GaborHojtsy: jcisio: and t() is really not a multi-directional translation thing 
[3:04pm] GaborHojtsy: jcisio: for certain config keys, it would need to deny you to translate it to certain languages, and they'd be different per key based on the source language the config was saved with
[3:04pm] GaborHojtsy: jcisio: so if user.account_settings was saved in French, you should be able to translate to German and Spanish but no French
[3:04pm] GaborHojtsy: jcisio: then different rules for other keys
[3:05pm] jcisio: Ok, I think (when $form_state['config'] gets in), in settings form, we save the translated string, not the string itself in config()
[3:06pm] jcisio: <label>My account</label> => it displays "Mon compte" in FR, then if you modify, it modifies the translated string
[3:06pm] GaborHojtsy: jcisio: I'll post this discussion on the issue so we don't need to repeat it with others 
[3:07pm] jcisio: jcisio: ok np
[3:07pm] GaborHojtsy: jcisio: yeah, how will the config translation UI look we don't really know yet
[3:07pm] GaborHojtsy: jcisio: but it does not seem like it would make the backend API any different either way
[3:07pm] jcisio: (I wanted to make sure that I understand something before posting to that issue 
[3:08pm] GaborHojtsy: jcisio: yeah, so I think we are concentrating on the backend now, and given how late we are with that it does not seem very likely the UI will even be solved in D8 core
[3:09pm] jcisio: In my opion: config is not translatable now. We'll have it translatable, but with very limited functionality: we mark some strings as translatable.
[3:09pm] jcisio: With that idea in mind, we can think about an UI plan
[3:13pm] jcisio: It depends on how http://drupal.org/node/1324618 gets into, too.
[3:13pm] Druplicon: http://drupal.org/node/1324618 => Implement system_settings_form() in CMI => Drupal core, configuration system, normal, active, 31 comments, 17 IRC mentions
[3:18pm] GaborHojtsy: jcisio: yeah, well, the settings form would not be applicable to forms of any complexity
[3:18pm] GaborHojtsy: jcisio: think even contact module or block module
[3:18pm] GaborHojtsy: jcisio: feel free to think of a UI plan, but we don't have any agreement yet as to how to make anything in config translatable or even tagged with language, so I'm not rushing to UI plans yet
[3:19pm] GaborHojtsy: jcisio: we can have the nicest UI plan and then no backend to use it with because we skipped that discussion and time flew by
[3:19pm] GaborHojtsy: jcisio: we not only need to discuss these but also implement them in all core modules 
Jose Reyero’s picture

One of the issues with the current config is that we have no metadata at all about what every element of configuration is. This would be needed to know which can be translatable properties and what not. That could be fixed either by adding some hook_config_info() with metadata or by adding that metadata into the configuration itself.

<config langcode="fr">
  <name type="machine-name">thumbnail</name>
 <label type="string">Thumbnail</label>
  <effects>
    ...
  </effects>
</config>

Then provided we have some default English configuration (like the one that comes with Drupal core) and a translation system in place (locale) we could attempt to translate many of these properties on the fly or maybe generate a new configuration file for the enabled languages.

Database storage should be improved too to provide either a simple language field or full support for configuration realms (Like 'language=es,domain=example.com')

Anyway maybe we better think before about supporting configuration in different languages (or generic realms one of them can be language) and then we can think about how default configuration can be translated (from English as locale system only supports translations from English).

Related issues:
#1498270: Meta data for configuration
#1320012: Internationalization of configuration

Gábor Hojtsy’s picture

@Jose: right we will need some method to do initial translation of configuration based on community work (localize.drupal.org). That is somewhat different from the human translation and language assignment capability we need though. I tried to graph this in https://drupal.org/node/1448330#comment-5964766 above.

On generic realms or language, this was discussed many many times in previous threads on CMI and the decision so far was to support language as in CMI and have alter hooks for the rest in Drupal 8, then we can generalize from there in Drupal 9 if we want to. We cannot even agree on how to do language, so it did not seem feasible to do an even bigger scope (and try to build a UI for that).

Finally, #1320012: Internationalization of configuration looks like a duplicate of this one that @heyrocker stopped using and opened this one instead?

jcisio’s picture

Here is how i18n is implemented in GNOME configuration (example):

[Desktop Entry]
Type=Application
Exec=/home/jcisio/bin/gvfs-mount.sh
Hidden=false
NoDisplay=false
X-GNOME-Autostart-enabled=true
Name[vi_VN]=gvfs-mount
Name=gvfs-mount
Comment[vi_VN]=
Comment=

- It looks like that once a string is modified in conf, then later if the string is modified and it has a translated version, the string won't be translated in your conf.

- We might not want something like "this view is in French" (#29, #36). Nodes are not configuration thus they behave differently. Configuration is not the same. We have a view, and that view is language-independant, only some texts in it are localized. Like all other configuration in a website.
If we really want a view appear in the French version, some tools (e.g. Context.module) allow us to do that.

Gábor Hojtsy’s picture

@jcisio: we proposed above including the language in the key like in your example AND including all language versions in the same file, but BOTH were deemed inappropriate by @heyrocker et. al. Looks like you are reinforcing support for that idea.

As for "this view is in French", as long as we are not storing / not supporting storing multilingual data in the same file, marking the individual elements with language vs. marking the whole thing with a language has the same effect. Practically. You could only apply the same langcode to language dependent items either way, so logically you can only do the same thing at the end, not?

andypost’s picture

For now we have no universal solution and proposal.

Working on #1588422: Convert contact categories to configuration system I see that we can't use a language for whole config object because not all values are translatable (machine name is a key of config and part of config object) so #43 looks like a way to go.

OTOH we have slightly different approach in #1552396: Convert vocabularies into configuration where config is just a storage backend for entity and in this case we should translate entity properties.

In both cases we need a way to mark properties as translatable. And both points that language should be passed as DI or just parameter to config() itself.

Gábor Hojtsy’s picture

Working on #1588422: Convert contact form settings to configuration system I see that we can't use a language for whole config object because not all values are translatable (machine name is a key of config and part of config object) so #43 looks like a way to go.

[...]

In both cases we need a way to mark properties as translatable. And both points that language should be passed as DI or just parameter to config() itself.

Right, so since we need to know the list of translatable properties *either way*, putting the language association on the whole thing or the individual pieces results in the same thing, no? We'll know both cases to only apply the language to the translatable pieces. Just for brainstorming:

<some_value lang="de">
  <time>3600</time>
  <label translatable="yes">Every hour</label>
</some_value>
<some_value>
  <time>3600</time>
  <label translatable="yes" lang="de">Every hour</label>
</some_value>

Same thing, no? The later makes people believe they could also do a lang="fr" in the same file, which people above (@heyrocker et al) did not want to support though, so it kind of looks counter-productive. You could also say, if we have a lang property on a piece, that assumes it is translatable, so:

<some_value>
  <time>3600</time>
  <label lang="de">Every hour</label>
</some_value>

(This would still make my fingers itch though to put a fr version right alongside there, which is not about to be supported as per the above).

OTOH we have slightly different approach in #1552396: Convert vocabularies into configuration where config is just a storage backend for entity and in this case we should translate entity properties.

OMG :/ This is exactly the type of problem that we wanted to avoid for multilingual support to be unified. Config manipulated with the entity API has the possibility to be accessed both with the config API (for deployment, load, etc) and the entity API then. Sounds pretty scary.

Gábor Hojtsy’s picture

Also, for the "where to put the language info" discussion, we don't need to store the translatability flags inside the data itself, since that is part of our knowledge of the "config schema" not the data. Whether a label is translatable or not is not part of the data it is part of the "schema" as per above. So you can also consider the "translatable='true'" information would not even be in the file as it would be known already.

andypost’s picture

As for me both translatable="true" and lang="de" are Attributes that said the same thing - needs localize. If we going to learn CMI to translate we need to pass current language to $config->get() or config() itself somehow.
Then config "reader" method should care about language staff.

In case of config($name, $class, $langcode) we are free to implement langcode globally on config object but in case of $config->get($key, $langcode) the only possible way to use language for attributes

Gábor Hojtsy’s picture

In case of config($name, $class, $langcode) we are free to implement langcode globally on config object but in case of $config->get($key, $langcode) the only possible way to use language for attributes

Why? @heyrocker's proposal is to store the translations in separate files and have merged configuration structures per language for accessing their values fast. You'd use the same data structure either way to back the API up for both calls.

andypost’s picture

In terms of the language, my first choice had always been to have the language-specific data in separate files. This keeps the data isolated and more easily deployable on a per-language basis, as well as removing a lot of complex merging issues when installing new default config or writing data into the files. I think we could still do that, and I would love to try as it would fit more naturally in with how the config system works right now.

@heyrocker's proposal is config($name, $class, $langcode)

You'd use the same data structure either way to back the API up for both calls.

Structures are different - once you init $config=config(..., $langcode) you never get other language versions of properties except passed language or config object would lost it identity to exact config file/storage

Gábor Hojtsy’s picture

@heyrocker's proposal is config($name, $class, $langcode) [...] Structures are different - once you init $config=config(..., $langcode) you never get other language versions of properties except passed language or config object would lost it identity to exact config file/storage

I don't believe that API comes straight from that storage. You could do either API on either storage AFAIS, you'd just use different internal logic for them, no?

clemens.tolboom’s picture

What I do miss in this discussion is the concept of a resource_id (mentioned in issue summary and #23) but this would mean we could have an untranslated string (und).

For me that would mean installing a site in French without translation file downloaded we should see resource_id's all over the site that is _NO_ English POT source strings anymore.

I further miss (as I'm no i18n guru) are examples esp. the edge cases which could learn me probably a lot. Sun's example #12 #6 did help a little.

(I'm not sure I contribute now of adding more noise to the plate)

Gábor Hojtsy’s picture

For me that would mean installing a site in French without translation file downloaded we should see resource_id's all over the site that is _NO_ English POT source strings anymore.

Externalizing the strings like that was proposed by @yched in #1 and looks like nobody but you picked up on liking that so far.

sun’s picture

Meh. It really did not help to merge #19 into this issue.

  1. The entire talk about "possibly leveraging element key attributes" within config files is obsolete. We don't have attributes in the common denominator of configuration encoding in the first place. Serialized PHP, JSON, and YAML have no attributes either.
  2. Putting string identifier keys into configuration instead of the actual string values was ruled out in #19 already. The idea is incompatible with the fundamental goal of the configuration/staging effort, requires a separate subsystem next to the configurations system (but very much like it) in order to load any configuration, and thus is a performance burden for monolingual sites, and lastly disallows to seamlessly switch from monolingual to multilingual.
  3. The Drupal\Core\Config component will most likely not have baked-in multilingual support. All it does is to save and load configuration data. It's the wrong business layer for applying language support to begin with. All this layer is able to do is to write data values into sub-keys in config objects, which might refer to language codes, but are dumb strings like any other to the system itself. Whether it's going to be sub-keys within config objects or separate config objects per languages doesn't matter.
  4. Multilingual support requires a DI, fully working language subsystem, and generally some higher bootstrap level than the bare Config component itself does (or will in the future). Language support and negotiation, and possibly required loading of multiple config objects (per language) may happen at the layer/level of config() in core/includes/config.inc.
  5. This might possibly mean a LanguageEnabledConfig extends DrupalConfig, which is instantiated instead of DrupalConfig in config() to provide the language negotiation and additional class methods for enforcing a certain language to use.
  6. Before thinking about the technical implementation, we need to agree on a storage design first, make sure that it makes sense, and does not have unintended/unexpected consequences. See #19. The code/implementation can do whatever we want.
  7. Whatever we're going to do: Loading » saving » loading a configuration object must be idempotent, regardless of current language. Configuration values must not be altered or overloaded after loading. (Again, see #19)

Thanks.

Gábor Hojtsy’s picture

Multilingual support requires a DI, fully working language subsystem, and generally some higher bootstrap level than the bare Config component itself does (or will in the future). Language support and negotiation, and possibly required loading of multiple config objects (per language) may happen at the layer/level of config() in core/includes/config.inc.
This might possibly mean a LanguageEnabledConfig extends DrupalConfig, which is instantiated instead of DrupalConfig in config() to provide the language negotiation and additional class methods for enforcing a certain language to use.

The question is who would not use the language supporting layer? I can see some lower level settings like the image effects or error level config that is already implemented would not need this level of support, which only means if they initialize a different config object then most others (where translatable information is stored), it would be a code baked decision. Eg. there would be no way to extend image effects with a human label without modifying the original code. That might be totally acceptable, just recording the observation.

Do you think that the separation to two levels would help with using the config system for "lower level bootstrap stuff" vs. "higher level bootstrap stuff" in different cases?

clemens.tolboom’s picture

In #1 @yched wanted to use placeholders. I want to get rid of looong English message source IDs.

For me .po files are CIs too. I understand entity.node.bundle.article.label is more dynamic than a pot extract but conceptual they are the same: Resource IDs

The discussion so far is sometimes troubled by the storage type (xml, one file or multiple files). But that did not lead to a abstract scheme let alone some examples in the issue summary ;)

Following along the Gnome scheme an example path came into existence due to a hook dictating it's translatable giving module developer the ability to instantly translate to his language of choice. (He would be stupid not to provide English)

'entity.node.bundle.article.label' => array(
  '#translatable' => TRUE,
  '#translations' => array('nl_NL' => 'Article')
);

Someone managed to translate into en_US resulting in our abstract storage

entity.node.bundle.article.label[en_US] = Article
entity.node.bundle.article.label[nl_NL] = Artikel

Using any of the Symfony dumpers this can be any file format we like.

(did this help? I kinda promised to read this issue @ #drupal-i18n so shut up for now until next office hours :p)

Gábor Hojtsy’s picture

@clemens.tolboom: I don't understand your code examples and cannot put it in perspective of the above discussion. I'm totally puzzled for providing translations in PHP formats like that for example :/

andypost’s picture

Language is part of the configuration key names.

Currently views has own implementation for storing configuration translatable it uses a kind of

key_name:{default:value; translatable:FALSE;
// we could store translations in keys by their langcode.
en:value_en;
ru:value_ru}

subkey default will be used as fallback if no language key provided.

clemens.tolboom’s picture

@sun tnx for #54 :)

I'm not sure what https://en.wiktionary.org/wiki/idempotent means regarding

Whatever we're going to do: Loading » saving » loading a configuration object must be idempotent, regardless of current language. Configuration values must not be altered or overloaded after loading. (Again, see #19)

Is the code example from #36 impossible then?

Gábor Hojtsy’s picture

Given that we have no hope to resolve this question as part of a drupal.org issue, we are holding a conference call to discuss the problem in the coming 10 days(!) If anybody wants to be on that call, let me know ASAP. Otherwise, unless you want this postponed to Drupal 9 we ask you to accept the decisions and let us move forward with implementation.

andypost’s picture

I'm going to join this call, this is a very painful bit of D8 for me.

Gábor Hojtsy’s picture

Title: Implement internationalization of configuration [no patch] » [META] Discuss internationalization of configuration

Sun opened an implementation issue at #1613350: Multlingual/translatable configuration [OPTION A] and I agree it does not seem like this should become an implementation issue on its own. So aligning title slightly to reflect that (meta-ifying, and changing wording to say discuss).

Gábor Hojtsy’s picture

Gábor Hojtsy’s picture

We have an option C from @chx which implements language fallback outside of CMI, so in total:

Discuss away in the relevant issues.

Gábor Hojtsy’s picture

Issue summary: View changes

fix typo

clemens.tolboom’s picture

Inserted a status update referring to the work issues.

webflo’s picture

I think we need a richer implementation (context/realm/whatever) for more sophisticated module like Space or OG, Domain ("Multi-headed Drupal") but not for a 'simple' multilingual site.

If everything is tightly coupled we need to re-implement every widget, formatter or handler for a multilingual site. This is a painful process. We have implemented multilingual capabilities for many core modules in i18n. This is a lot of duplicate code to make these modules language aware. Since core is getting bigger or at least more complex, i18n grows a lot to support all the different modules.

Multilingual configuration is not just about variables. If we store anything thats not an entity in CMI, its about field labels, bundle labels, widgets and additional field properties too. I think its not possible to handle multilingual configuration completely transparent for modules. The module developer has to decide. I need this piece data in 'UI language' or 'Content language' ... The use cases for multilingual configuration are different.

I prefer a simple API for reading the multilingual configuration.


// Retrieve feedback category configuration.
$config = config('contact.category.feedback'); // default is interface language?

// Retrieve feedback category configuration for English
$config = config('contact.category.feedback', language_load('en'));

// Retrieve feedback category configuration for content language
$config = config('contact.category.feedback', LANGUAGE_TYPE_CONTENT);

// Retrieve feedback category configuration keyed by langcode
$config = config('contact.category.feedback', TRUE);

Gábor Hojtsy’s picture

FileSize
182.49 KB

Here is a visual summary of options A, B, C (and AB, which is my name for plach's diversion of A towards B). First an explanation of the call chain differences that I posted on option A:

I think this comes down to the following: code that handles configuration needs to be aware of language eventually. We either need to make it aware right when retrieving it or later when "rendering" it. To make reusable functions aware of different language scopes that they can be invoked with, you can do the following things:

  • Retrieve the right language version directly
    • Add language argument through the whole invocation chain, always pass it on (option A originally does this with some fallback).
    • Pass language as implicit argument, such as with dependency injection, the context in options B and AB.
  • Instead of getting the actual data, put in an indirection
    • Retrieve all the data at the start and provide helper functions to pick the right value later. This is what entity field rendering does currently, as-is also what option C does.
    • Retrieve an accessor (a class for example, or a representation of the config key) and use the accessor later to retrieve the right language version. This is not in any of the current options.

There is no escaping really to specify what language do you want. You either do it directly when retrieving and somehow pass the information there or you do it later and somehow pass on a meta-level instead of just the value.

For storage, A, AB and C store in the same config file, B stores in separate "override" config files.

Does this help clean up the picture or not helpful? Any details not clear in the overview?

(Update: retracted outdated image from post to make page load faster; new image posted below).

Gábor Hojtsy’s picture

FileSize
188.15 KB

Discussed the recent changes on A with Jose, it evolved a bit more towards AB to have more global contexts, but it is also now context-based so you get your config *from* the context. Have fun reviewing! :)

Schnitzel’s picture

Gabor asked me for some Feedback on ABC:

Overall it should be as easy as possible for DX. The translation of Drupal today works so well, because t() is very easy and stupid to use. So without thinking about the problems specific versions could have, I would prefer Option A, where the Developer only needs to Get and Save and under the hood everything is done for him.

But because we found out in Option A this will be pretty hard and could lead into overwritten values with ->get()->save()->get() chains. It looks to me like we will not be able provide this functionality with Option A except some hardcore checking of loaded values, etc.

So I would more prefer to Option C. Which I don't like so much, because we need some ML knowledge of developers, but because Entity Field needs the same knowledge and if we have a t() function which the developer has to use all the time, I think we make it as easy as possible for the developers.

Option B, tries to fix other problems at the same time, which are a bit hard for me to estimate if they make sense to be fixxed. Ooverall it seems for me a reasonable question how we fix other overwrites: We are using Domain Module in our projects quite often, so it would be interesting how Domain Module can overwrite configurations for specific domains. Also Multivariate Module (which does A/B testing) will be interested in overwriting some of the configurations.
So the question would be: How to handle this in CMI, if this is not yet fixxed, but should be: Is the proposal of Option B a good idea? If yes, why not using the same functionality for ML and other overwrites.
If configuration overwrites are handled in other ways: Let's keep Drupal Core as stringent as possible and use same workflow/functionalities over and over.

Gábor Hojtsy’s picture

As per our discussion in Barcelona with heyrocker (CMI), webchick (core), merlinofchaos (views), reyero (i18n), webflo (i18nviews), Gábor Hojtsy (D8MI), xjm (views) and others, we are going with a base implementation of option B for now. In short we figured out we have a need for almost anything to be translatable/multilingual, we need some context information passed around and we need to make the system extensible due to the definite lack of time to solve all problems until code freeze, and all of those criteria lead us to work with option B. Let's focus our efforts on getting it done best!

Should we mark #1616594: META: Implement multilingual CMI critical to focus implementation work and close this issue down as duplicate?

webchick’s picture

That sounds like a reasonable approach to me, though we should probably leave this open (as a normal) for a few days to give people who weren't at the sprint time to comment.

sun’s picture

Status: Needs review » Fixed

Yup, let's do so. Marking as fixed, so it will automatically disappear after two weeks.

Leaving priority at critical, since this actually is critical. If anyone strongly objects to the agreed-on direction, then we need to know early.

Gábor Hojtsy’s picture

Issue tags: -sprint

Moving off sprint.

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Inserted a status update referring to work issues.