Updated: Comment #26

Problem/Motivation

In the current architecture we have an inconsistency between the storage of the entity language and the language of configurable field original values: in fact the latter are supposed (see below) to be always stored under the 'und' language code, no matter which language the entity has. The reason for it is not having to mess with field language codes when changing entity language. However this introduces another inconsistency that is potentially problematic: base fields would be stored under the original language, as they all share the same record in the data table, and this may introduce additional complexity when joining field tables or needing to filter by field language.

Additionally configurable field storage has not been ported to this architecture yet, hence we also have an inconsistency between entity data structures and the underlying storage.

Proposed resolution

During the Multilingual fields hard problems discussion we agreed that to solve this issue we should aim for the following design principles:

  • Base fields and configurable fields should be handled the same way.
  • Language codes in the storage should match those in runtime entities / field items.

The conclusion was that we should only use und if the original entity is und. If the original entity is not und then original field values (both base and configurable) should inherit from the entity language code: a de entity would get de values even for untranslatable fields. The entity data structure should be able to treat untranslatable fields as shared no matter which language code they have.

The solution above should let us cleanly address the following use-cases:

  • Change base/original entity language (special case: und => xx language)
  • Switching translatability for fields without requiring a data migration
  • Querying on configurable fields
  • Disabling translatability of field/entity type

The discussion was attended by: attrib, berdir, das_peter, fago, Gábor Hojtsy, plach, swentel, tstoeckler, webflo, yched.

Remaining tasks

  • Agree on a solution
  • Write a patch
  • Fix failing tests
  • Add test coverage
  • Reviews

User interface changes

None foreseen

API changes

Language codes for configurable untranslatable fields will change from und to the language of the entity they are attached to.

Original report by @yched

On a default D8 install (english is the only language, no translation module is enabled, no fields are translatable)

For nodes created programmatically (entity_create('node')):
- node language is 'und',
- values for base fields (title...) are stored in {node_field_data} with langcode 'und'
- values for configurable fields are considered to have langcode 'und' (langcode stored in field storage and used in node forms $form['body']['und'] / $form_state['values']['body']['und'])

For nodes created through node/add UI
- node language is 'en'
because node_add() assigns language_default()->id, which is 'en'
- base fields (title...) are stored in {node_field_data} with langcode 'en'
- values for configurable fields are considered to have langcode 'und' in SQL and in forms
because field.multilingual.inc logic considers the fields are not translatable --> langcode is 'und'

I don't know what we're aiming at in terms of "what should be the field language with respect to the entity language", but this looks severely wrong.
#2075095: Widget / Formatter methods signatures needlessly complex is stumbling on that for forms.

Files: 
CommentFileSizeAuthor
#108 et-field_langcode_wtf-2076445-108.patch74.24 KBplach
PASSED: [[SimpleTest]]: [MySQL] 59,032 pass(es).
[ View ]

Comments

Mmh, lots of stuff going wrong here :(

No matter whether a site is monolingual or multilingual, fields holding the original values should always get 'und'. The fact that they are translatable or not should be irreleveant here, just regular field translation should get an actual language. This is how the whole stuff is supposed to work.

I think node getting the default language is by design, since we prefer having language information stored, in case other languages are added subsequently. But Gabor should confirm this as he's the mind behind the original plan.

Component:language.module» field system
Issue tags:+translatable fields, +Field API, +language-content, +Entity Field API

Totally wrong component

See also #2075095-7: Widget / Formatter methods signatures needlessly complex - which possibly boils down to: we need Field/FieldItem objects to know their own language, whether it is the language of their parent entity or some other, more complex fallback logic.

related: #2061331: Add getEntity() helper in FieldInterface / FieldItemInterface

we need Field/FieldItem objects to know their own language, whether it is the language of their parent entity or some other, more complex fallback logic.

From the top of my head, I'd say we don't want language fallback in such contexts, the parent entity (translation) language should be just fine. Fallback should be applied explicitly just where it's needed. However some concrete use case would help me :)

So what is a language should be stored in comment-field (commenting state) if I'd like to allow commenting only for some translations of the entity? By default I mark the field none-translatable but need ability to make it translatable

@andypost: I don't think that's a question you need to wonder, a field type doesn't get to decide in which language its values get stored, this is decided by the field translation logic (is the field translatable, in which language is the entity being edited...)
For your use case, I'd say the "commenting state" field should be made translatable and there you go.

@plach:

No matter whether a site is monolingual or multilingual, fields holding the original values should always get 'und'

Even if the entity itself has a language ? Then the current behavior of "base fields" being stored with langcode 'en' is wrong ?
At the very least, the fact that base fields and configurable fields get stored with different langcodes looks very wrong.

I think node getting the default language is by design, since we prefer having language information stored, in case other languages are added subsequently

Maybe, but then it seems:
- this should also be the case for nodes created programmatically (including devel generate) if no explicit langcode is specified. No reason to have a different behavior for UI-created and programmatically created nodes
- this should also be the case for other entities ? Right now this "get the default language by default" is strictly a matter of the node_add() page callback.

"in case other languages are added subsequently" : ok, doesn't this apply to field translations as well ?
An entity is created in a monolingual site, no translation is enabled, its fields are stored with 'und'.
Then you add German language and translate the entity. The original english values stay under 'und', the german values are under 'de'.
- Why the difference between the two languages ?
- And why the entity languages are 'en' / 'de' while the field languages are 'und' / 'de' ?

So, yup, several things look weird here :-)

I think the answer to the OP is #1966436: Default *content* entity languages are not set for entities created with the API. There is configuration for the default entity language for each content entity. The API created entities should obey this just as much. If the language module is not enabled (no configuration yet), then language_default() should be assumed for all entity creation on the UI and API.

@Gabor:
OK, subscribing to that issue, thanks!

Any opinion about storing base / configurable field values under 'und' or the entity language ? (and base fields currently being stored in a different language than configurable fields ?)

In D7 at least, the entity was storing the main langcode and then fields were und unless translatable. When translatable, they got langcodes that were not und. I *think* the D8 plan was that all fields get the same langcode that the entity has by default, so when you switch translatability on a field, you don't need to update all the instances from und to the base langcode. I'm not sure if that is still the plan or not. I think it is not yet how it is implemented, but that is how far my understanding reaches.

There is a fancy docs page with several fancy figures explaining at https://drupal.org/node/1500308

I *think* the D8 plan was that all fields get the same langcode that the entity has by default, so when you switch translatability on a field, you don't need to update all the instances from und to the base langcode

If that's the plan, I'm +1 on it :-).
- Not having to mass-update language in stored field values on some config change somewhere (entity translation enabled, language added, some individual field switched to 'translatable') is definitely a good thing.
- If entities have an actual language, why would field values be 'und' ?

But:
- Contradicts @plach's #1, so it looks like there is no clear agreement yet :-)
- D7 upgrade needs some thinking then. D7 field storage stores values at langcode 'und', we would need to update that to "some (which ?) language code".

@yched: I think this is one of many aspects where the field API is inbetween the ideal and a temporary and the prior state. The reason for the 'und' on fields I guess was if you change the language of the entity, you don't need to update all the untranslated fields too. If we use a real language code, then you only need to change one entity when one entity changes not all entities when a config changes :)

In the update, all entity fields would get to copy-inherit their parent entity langcode.

If we use a real language code, then you only need to change one entity when one entity changes not all entities when a config changes :)

+1

The UI currently lets you change the language of an entity on the fly ? Hm, not sure what would be the expected outcome on field values. Field values on the previous language switch over to the new language ?

Yeah you can change the entity language on the UI when editing, yeah I believe so. The default language of the entity changes, so I guess if there were no translated values, all fields would need to change language, if there are translated languages, maybe not :D That may be a grey area.

The default language of the entity changes, so I guess if there were no translated values, all fields would need to change language, if there are translated languages, maybe not

That doesn't seem right to me. Why should the meaning of "change the entity's default language" vary based on whether translations already exist? I think the meaning should either be "change the entity's default language only, but all existing field values on the entity are unaffected (i.e., retain the language they're already marked as having)", or else "change the entity's default language and update the langcode of all field values from the prior langcode to the new one (overriding prior translated values)", or else "present the user with a choice of those 2".

Sorry for being vacant but I've been busy with reviewing a very big patch these last days :)

I will provide a complete answer to all the raised questions ASAP.

Priority:Normal» Major

Many of the concerns raised here already got an answer, however here is a brief recap:

  • Nodes being created from the UI with 'en' language on monolingual sites: by design.
  • Node language inconsistency between API and UI creation: to be fixed in #1966436: Default *content* entity languages are not set for entities created with the API.
  • Node language change logic: changing language means that the values originally entered should be interpreted as being in a different language from the one initially set. This may happen if the wrong language has been selected when creating the entity. If for any reason the primary entity language needs to change, the originally entered values need to be (manually) migrated to a new translation or just overwritten. The UI does not allow to choose a language for which a translation is already existing.
  • Field language inconsistency between entity data structures and storage: as soon as we get rid of the entity BC layer, we can update the storage logic to match the data structure one. This is probably the right place to make that happen as well as the related upgrade stuff.
  • All original values being stored under the same language code, no matter if they are translatable or not: this is to avoid having to mess with field language codes when switching field translatability.

What is left is the inconsistency between entity language and original field language: the reason for it is not having to mess with field language codes when changing entity language, however this is the trickiest part of the new architecture. In D7 changing entity language is not particularly hard at storage level: old values are deleted new ones are inserted, language codes just need to be updated before storing fields. However doing that is not so straightforward: apparently updating the data structure is easy, but this is an example of the kind of unpleasant consequences it may have: #1141912: Changing node language by moving down language list deletes files in file field. That issue is caused by the file system being too efficient in deleting files, but the root cause is that messing with field languages is tricky. Additionally, given how Field form processing happens, submitted field values needs to be updated before thay are passed to submit handlers because otherwise they would have the wrong language.

Having original field values under the 'und' language would suddenly solve all these issues, but would introduce an inconsistency that is potentially problematic. AAMOF, as noted in the OP, base fields are stored under the original language and this may introduce additional complexity when joining field tables or needing to filter by field language.

Another issue with this approach is it conflicts with being able to support language of parts. Basically we are using field language as a reference to the entity translation field values are attached to, rather than denoting their actual language. This prevents us from supporting lots of use cases (see the first bullet). When initially proposing this approach to me @fago suggested to reintroduce support for those through a language property attached to fields (more or less #2078625: Field/FieldItem value objects should carry their language), although that would require two distinct language properties at storage level, one referring to the parent entity translation and another carrying the actual field language, which would behave just as an additional column.

After mulling on this for a while, I'd be tempted to propose to keep the (apparent) inconsistency between data structure and storage: we could leave original values under the 'und' key in the entity object, so we don't have to mess with it while processing the entity, but store everything under the actual entity language (e.g. 'en'), even untranslatable field values. This might allow us to get the best of both worlds, I think, as querying should be easier, while untranslatable fields would be still shared among entity objects and appear as language neutral. Madness?

Not sure what to think about the language of parts stuff, honestly. @Gabor, any idea?

Feedback more than welcome :)

I'm not sure making the data storage use separate language coding from the runtime entity is a good idea honestly. That sounds like a possible recipe for disaster :/

I noticed this as well while working on in-place editing, but in a completely different way.

  • In hook_preprocess_field(), $variables['element']['#language'] == 'und'.
  • In hook_preprocess_field(), $variables['entity']->get('langcode')->value == 'en'.

-> WTF.

Many thanks @plach for the detailed reply.

Field language inconsistency between entity data structures and storage: as soon as we get rid of the entity BC layer, we can update the storage logic to match the data structure one

Yes, we definitely should. Just pointing that we need #2083811: Remove langcode nesting for fields in entity $form structures for this, to let us change / adjust language logic without breaking a ton of tests :-).

the inconsistency between entity language and original field language: the reason for it is not having to mess with field language codes when changing entity language

Yes, I'm not sure I see why that would be a problem. Mass changes are hard, but changing the stored language of the values of *1* entity, while we are updating this entity seems a no brainer ?
I also don't get why the "original" language would get special treatment here, as compared to translations. All of this is to be able to change the original language of the entity in case you made a mistake, but changing language is is not supported for translations ? Why ?

However doing that is not so straightforward: apparently updating the data structure is easy, but this is an example of the kind of unpleasant consequences it may have: #1141912: Changing node language by moving down language list deletes files in file field. That issue is caused by the file system being too efficient in deleting files, but the root cause is that messing with field languages is tricky

That sounds very much like something we've noticed in #2015697: Convert field type to typed data plugin for file and image modules, caused in current HEAD by LegacyConfigField wrongly assuming field langcode = entity langcode, which should be fixed by either #2015697: Convert field type to typed data plugin for file and image modules or #2061331: Add getEntity() helper in FieldInterface / FieldItemInterface, whichever gets in first. It doesn't seem to me that it's about which specific langcode we assign to which value ?

At any rate, I really wish we can a have a conversation about this in Prague. I join Gabor & Wim above, special casing the values of the original langcode by storing them under an 'und' flag that then gets interpreted to some actual langcode at runtime really seems like a major WTF to me too :-/.

changing language is is not supported for translations ? Why ?

Well, if you have an Italian only node, changing it to German is easy. If you have an Italian node with fields and also translated to German, than changing the Italian to German would do.... what? You would need a whole set of options presented like the user cancellation process as to what should happen. Should the prior German stuff just be deleted, or should it become Italian or should this stop the action from happening? Right now AFAIK you can change an entity to any language that it does not have translations in yet, so such questions don't need to be asked / resolved.

Right now AFAIK you can change an entity to any language that it does not have translations in yet

Ok, makes sense, I'd be fine if we keep it that way.
But from the above I understand that we allow language change for the "original language" only, and but for other translations ? I'm not sure I see why supporting this makes sense in one case and not in the others.

The reasons for that may be historic.

The inconsistency in storage is fine now except case of changing translatability of the field:
1) english node with translatable field_tags (und)
2) translation to german with a different set of tags

once we change translatability of the field to untranslatable:
- we should remove all other field values? so only 'und' will be shown in both translation
- we should update all "none-und" values with 'und' values? so each translation still hold down the same data

latter we enable translation back:
- no changes, so absence of translated values will fallback to 'und' values?
- we should copy 'und' values to other translations? so each translation with on data

Suppose each of implementation have own pros&cons glad to get other's visions!

Currently we have content translation module's logic to change field language totally broken #1946462-54: Convert content_translation_translatable_form() to the new form interface @plach please check my patch!
Also deprecated field_get_items() is trying to access field values by the wrong way and change notices needs a sprint to be updated to current state.

Title:Entity / field language WTF ??Make sure language codes for original field values always match entity language regardless of field translatability

Added an issue summary

Issue summary:View changes

Added issue summary

Issue summary:View changes

Updated issue summary.

Assigned:Unassigned» plach
Issue tags:+D8MI, +sprint

Working on this right now

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Updated issue summary.

Issue summary:View changes

Updated issue summary.

Status:Active» Needs review
StatusFileSize
new1.36 KB
FAILED: [[SimpleTest]]: [MySQL] 59,176 pass(es), 10 fail(s), and 0 exception(s).
[ View ]

Let's just try this tiny change to see what impact it has.

Status:Needs review» Needs work

The last submitted patch, et-field_langcode_wtf-2076445-28.patch, failed testing.

Status:Needs work» Needs review
StatusFileSize
new6.8 KB
FAILED: [[SimpleTest]]: [MySQL] 58,488 pass(es), 1 fail(s), and 0 exception(s).
[ View ]
new5.44 KB

Fix broken tests

Status:Needs review» Needs work
Issue tags:-translatable fields, -D8MI, -Field API, -sprint, -language-content, -Entity Field API

The last submitted patch, drupal8.field-system.2076445-30.patch, failed testing.

Status:Needs work» Needs review

Status:Needs review» Needs work
Issue tags:+translatable fields, +D8MI, +Field API, +sprint, +language-content, +Entity Field API

The last submitted patch, drupal8.field-system.2076445-30.patch, failed testing.

Issue summary:View changes

Updated issue summary.

Issue tags:+Prague Hard Problems

tagging

Category:bug» task

What is this patch dependent on to get started more seriously? :)

Nothing, I just need some free time :)

Status:Needs work» Needs review
StatusFileSize
new15.09 KB
PASSED: [[SimpleTest]]: [MySQL] 59,053 pass(es).
[ View ]
new10.04 KB

This should be green, hopefully.

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -411,7 +412,14 @@ protected function getTranslatedField($property_name, $langcode) {
    +        if ($default) {
    +          // If we are initializing the default language cache the variable is
    +          // not populated, thus have no valid value to set.
    +          $field->setLangcode(isset($this->language) ? $this->language->id : NULL);
    +        }
    +        else {
    +          $field->setLangcode($langcode);
    +        }

    Minor, understandability : what this conditional does is call setLangcode() for some langcode. Put that langcode in a $field_langcode variable set conditionally, then call setLanguage($field_langcode) unconditionally ?

  2. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -411,7 +412,14 @@ protected function getTranslatedField($property_name, $langcode) {
    +          // If we are initializing the default language cache the variable is
    +          // not populated, thus have no valid value to set.

    Minor : the sentence is difficult to parse. Add a comma after "cache" ?
    [edit: also, grammar looks weird - "thus *has* no valid value" ?
    + not sure what "the variable" is]

  3. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -565,6 +573,7 @@ public function onChange($property_name) {
           // Avoid using unset as this unnecessarily triggers magic methods later
           // on.
           $this->language = NULL;
    +      $this->initializeDefaultLanguage();

    The comment looks a bit weird now ?

  4. +++ b/core/modules/system/lib/Drupal/system/Tests/Entity/EntityTranslationTest.php
    @@ -217,20 +217,21 @@ protected function _testMultilingualProperties($entity_type) {
         // As fields, translatable properties should ignore the given langcode and
         // use neutral language if the entity is not translatable.
         $field = $entity->getTranslation($langcode)->get('name');
         $this->assertEqual($name, $field->value, format_string('%entity_type: The entity name defaults to neutral language.', array('%entity_type' => $entity_type)));
    -    $this->assertEqual(Language::LANGCODE_DEFAULT, $field->getLangcode(), format_string('%entity_type: The field object has the expect langcode.', array('%entity_type' => $entity_type)));
    +    $this->assertEqual($default_langcode, $field->getLangcode(), format_string('%entity_type: The field object has the expect langcode.', array('%entity_type' => $entity_type)));

    Is the comment still relevant ?

  5. +++ b/core/modules/system/lib/Drupal/system/Tests/Entity/EntityTranslationTest.php
    @@ -284,7 +286,7 @@ protected function _testMultilingualProperties($entity_type) {
           // Fields for the default entity langcode are seen as language neutral.
    -      $field_langcode = ($langcode == $entity->language()->id) ? Language::LANGCODE_NOT_SPECIFIED : $langcode;
    +      $field_langcode = ($langcode == $entity->language()->id) ? $default_langcode : $langcode;

    Is the comment still relevant ?

StatusFileSize
new21.93 KB
PASSED: [[SimpleTest]]: [MySQL] 59,036 pass(es).
[ View ]

Thanks, fixed a merge issue and #39 except #3, it looks still ok to me: what problem do you see?

I messed up the interdiff so not posting it, sorry.

+++ b/core/lib/Drupal/Core/Language/Language.php
+++ b/core/lib/Drupal/Core/Language/Language.php
@@ -63,7 +63,7 @@ class Language {
@@ -63,7 +63,7 @@ class Language {
    * @todo: Change value to differ from Language::LANGCODE_NOT_SPECIFIED once
    * field API leverages the property API.
    */
-  const LANGCODE_DEFAULT = 'und';
+  const LANGCODE_DEFAULT = 'xx-default';

I guess the todo can be removed then?

StatusFileSize
new22.03 KB
PASSED: [[SimpleTest]]: [MySQL] 59,708 pass(es).
[ View ]
new535 bytes

Here's a reroll fixing #41. I am wondering whether it would make sense to commit this patch and the start again from a clean situation. It would probably make the following reviews easier.

StatusFileSize
new28.26 KB
FAILED: [[SimpleTest]]: [MySQL] 58,316 pass(es), 579 fail(s), and 60 exception(s).
[ View ]
new9.75 KB

This should be more or less it. Let's see how many failures we have.

Status:Needs review» Needs work

The last submitted patch, et-field_langcode_wtf-2076445-43.patch, failed testing.

Status:Needs work» Needs review
StatusFileSize
new45.58 KB
FAILED: [[SimpleTest]]: [MySQL] 58,389 pass(es), 595 fail(s), and 61 exception(s).
[ View ]
new17.33 KB

Removed the field language migration code. Tests still to be fixed.

Status:Needs review» Needs work

The last submitted patch, et-field_langcode_wtf-2076445-45.patch, failed testing.

Status:Needs work» Needs review
StatusFileSize
new46.39 KB
FAILED: [[SimpleTest]]: [MySQL] 59,661 pass(es), 1 fail(s), and 0 exception(s).
[ View ]
new1.75 KB

Some test fixes.

Status:Needs review» Needs work
Issue tags:-translatable fields, -D8MI, -Field API, -sprint, -language-content, -Entity Field API, -Prague Hard Problems

The last submitted patch, et-field_langcode_wtf-2076445-47.patch, failed testing.

Issue summary:View changes

Updated issue summary.

Issue summary:View changes
Issue tags:+translatable fields, +D8MI, +Field API, +sprint, +language-content, +Entity Field API, +Prague Hard Problems

Restored tags

Issue summary:View changes
Related issues:
StatusFileSize
new1.07 KB
new46.57 KB
PASSED: [[SimpleTest]]: [MySQL] 59,850 pass(es).
[ View ]

This adds a small BC layer to support D7-style untranslatable fields when loading field values. AAMOF I don't think we can write a proper update function to migrate field language codes as we have no way to know the enity language without using the Entity API. OTOH writing a migration doing that should be trivial and will allow us to remove the BC layer.

Working on test coverage now.

Status:Needs work» Needs review

Issue summary:View changes
StatusFileSize
new7.15 KB
new52.29 KB
PASSED: [[SimpleTest]]: [MySQL] 59,394 pass(es).
[ View ]

And this provides additional test coverage. Now we are ready for review.

+++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
@@ -553,11 +561,32 @@ protected function getDefaultLanguage() {
+  protected function updateFieldLangcodes($langcode) {
+    if (!empty($this->fields)) {
+      foreach ($this->fields as $name => $items) {
+        if (!empty($items[Language::LANGCODE_DEFAULT])) {
+          $items[Language::LANGCODE_DEFAULT]->setLangcode($langcode);
+        }
+      }
+    }
+  }

This could use some test-coverage too.

StatusFileSize
new4.22 KB
new54.87 KB
PASSED: [[SimpleTest]]: [MySQL] 59,422 pass(es).
[ View ]

Fixed #54. Reviews welcome.

I just skimmed through the patch but the changes generally look amazing! The diffstats speaks for itself: "10 files changed, 296 insertions(+), 487 deletions(-)".

This is not just a tremendous DX improvement (no more fiddling with cryptic default language codes), it is also a massive UX improvement (no batch when changing translatability, no confirm form step since the action is reversible). So, so lovely!

The diffstats speaks for itself: "10 files changed, 296 insertions(+), 487 deletions(-)".

And we have lots of additional test coverage in those insertions :)

StatusFileSize
new919 bytes
new55.56 KB
FAILED: [[SimpleTest]]: [MySQL] 55,159 pass(es), 790 fail(s), and 192 exception(s).
[ View ]

Rerolled and removed an obsolete comment.

Reviewing.

Status:Needs review» Needs work

The last submitted patch, 58: et-field_langcode_wtf-2076445-58.patch, failed testing.

Status:Needs work» Needs review
StatusFileSize
new1.15 KB
new56.71 KB
FAILED: [[SimpleTest]]: [MySQL] 59,529 pass(es), 4 fail(s), and 15 exception(s).
[ View ]

This should fix tests.

+++ b/core/modules/field/field.multilingual.inc
@@ -92,7 +92,8 @@ function field_available_languages($entity_type, FieldInterface $field) {
     // If the field has language support enabled we retrieve an (alterable) list
-    // of enabled languages, otherwise we return just Language::LANGCODE_NOT_SPECIFIED.
+    // of enabled languages, otherwise we return just
+    // Language::LANGCODE_DEFAULT.

Silly nitpick on last interdiff: we could remove the "just" and avoid the linebreak :-p

As I said, I'm reviewing this, but not sure I'll have the time to finish and shape my feedback before friday, sorry :-/

Status:Needs review» Needs work

The last submitted patch, 61: et-field_langcode_wtf-2076445-61.patch, failed testing.

Status:Needs work» Needs review
StatusFileSize
new3.99 KB
new59.66 KB
FAILED: [[SimpleTest]]: [MySQL] 59,521 pass(es), 1 fail(s), and 0 exception(s).
[ View ]

This should fix the last failures and address @yched's pre-review :)

+++ b/core/modules/field/lib/Drupal/field/Tests/TranslationTest.php
@@ -142,7 +142,7 @@ function testFieldAvailableLanguages() {
+    $this->assertTrue(count($available_langcodes) == 1 && $available_langcodes[0] === Language::LANGCODE_DEFAULT, 'For untranslatable fields only Language::LANGCODE_NOT_SPECIFIED is available.');

Minor: the assertion message needs to be fixed too.

Status:Needs review» Needs work

The last submitted patch, 64: et-field_langcode_wtf-2076445-64.patch, failed testing.

Status:Needs review» Needs work

The last submitted patch, 64: et-field_langcode_wtf-2076445-64.patch, failed testing.

Status:Needs work» Needs review
StatusFileSize
new847 bytes
new60.56 KB
PASSED: [[SimpleTest]]: [MySQL] 60,426 pass(es).
[ View ]

Let's try again

Yay, great to see language storing unified. And triple yay on the code that gets removed in content_translation !

Looking at the translation logic in ContentEntityBase, I have the feeling it still looks relatively convoluted and has room for further clean up / streamlining, but I have a hard time nailing specific places. I'll try below. Field / entity language has always been a highly puzzling area, I'd really wish we could come to something much more understandable and maintainable in D8 (well, we should be much better already)

  1. I'm not sure I see the value in keeping the internal runtime storage keyed by LANGCODE_DEFAULT (in $this->translations, $this->fields[$field_name], $this->values[$field_name]), rather than the actual langcode. It complicates the code and logic in a lot of places in ContentEntityBase ?
  2. The code would be much easier to follow if $this->language was named defaultLanguage. That's what it is, and is more inline with how the rest of the code (methods, var names, comments) uses it (+ it matches the name of initializeDefaultLanguage()).
    Would make code like this (taken from onChange()) much clearer:
          $this->language = NULL;
          $this->initializeDefaultLanguage(); // Not clear that it resets $this->language
          $this->updateFieldLangcodes($this->language->id); // But $this->language was set to NULL two lines above ??

    (Side note, I don't get why we store a language object in here rather than a langcode --> we'd have $this->defaultLangcode / $this->activeLangcode, would be more consistent and intuitive ?)
  3. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -411,7 +412,14 @@ protected function getTranslatedField($property_name, $langcode) {
    +        if (!$default) {
    +          $field->setLangcode($langcode);
    +        }
    +        // If we are initializing the default language cache, the variable is
    +        // not populated, thus we have no valid value to set.
    +        elseif (isset($this->language)) {
    +          $field->setLangcode($this->language->id);
    +        }

    Minor: the comment is not immediately clear in relation to the code. "$this->language might not be set if we are initializing the default language cache, in which case there is no valid langcode to assign"

    More importantly, this case puzzles me a bit.
    - Can it really happen that $this->language is not set ? initializeDefaultLanguage() sets it and is called in __construct()... Do we always wrap $this->language before using it in the rest of the code ? (seems not, in other places the patch *removes* such a check).
    - if !isset($this->language), it means the $field keeps its default langcode value, NOT_SPECIFIED. It would IMO be clearer to explicitly set that value here.
    - The logic then would be easier to follow that way:

    if ($langcode == Language::LANGCODE_DEFAULT) {
      $field_langcode = isset($this->language) ? $this->language->id : LANGCODE_NOT_SPECIFIED;
    }
    else {
    $field_langcode = $langcode;
    }
    $field->setLangcode($field_langcode);

    - Taking another step back, this code is the reversed logic of what's done repeatedly in getTranslation(), hasTranslation(), removeTranslation(), initTranslation() ... (switch in DEFAULT if needed).
    There is an "internal" langcode (DEFAULT or an actual langcode) and an "external" langcode (always an actual langcode), and some places that need to convert from one "shape" to the other. If we keep that special dance around DEFAULT (see my first remark), then it would be clearer to move that conversion logic to protected helpers ?
  4. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -425,9 +433,7 @@ public function set($property_name, $value, $notify = TRUE) {
         if ($property_name == 'langcode') {
    -      // Avoid using unset as this unnecessarily triggers magic methods later
    -      // on.
    -      $this->language = NULL;
    +      $this->onChange($property_name);
         }

    Not clear why 'langcode' needs special treatment here to trigger onChange() ? This feels like an opportunistic fix and reinforces the feeling of convoluted black magic :-/ Would deserve at least a comment ?

  5. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -537,17 +543,17 @@ public function language() {
    -  protected function getDefaultLanguage() {
    +  protected function initializeDefaultLanguage() {

    initializeDefaultLanguage() has remnants of being the former getDefaultLanguage(), that could be cleaned up IMO:
    - The return value is never used, should be removed. The language is available in $this->language, that's the job of the method.
    - Also, the method is only called in __construct() (where $this->language is not set yet) and onChange() (where we want to overwrite it). So the if (!isset($this->language)) could be removed (would avoid the need for onChange() to set it to NULL before calling the method)
    - Then the method would make more sense named setDefaultLanguage() ?

  6. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -559,11 +565,32 @@ protected function getDefaultLanguage() {
    +      // This needs to be initialized manually as it is skipped when
    +      // instantiating the language field object to avoid infinite recursion.
    +      if (!empty($this->fields['langcode'])) {
    +        $this->fields['langcode'][Language::LANGCODE_DEFAULT]->setLangcode($this->language->id);
    +      }

    Minor: the comment could explicitly mentions that this is about special treatment for the 'langcode' field, this is easy to miss.
    "it is skipped when instantiating the language field object" : A bit vague, not clear exactly what this refers to ? which method ? (there should probably be a mirror comment referring to initializeDefaultLanguage() at the place where this special "skipping" happens ?)

    More generally, this code snippet has the smell of something wicked in the overall code flow :-/. A bit like the elseif (isset($this->language)) { part in getTranslatedField() (see above). Are the two related ?

  7. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -559,11 +565,32 @@ protected function getDefaultLanguage() {
    +  protected function updateFieldLangcodes($langcode) {
    +    if (!empty($this->fields)) {
    +      foreach ($this->fields as $name => $items) {
    +        if (!empty($items[Language::LANGCODE_DEFAULT])) {
    +          $items[Language::LANGCODE_DEFAULT]->setLangcode($langcode);
    +        }
    +      }
    +    }
    +  }

    $this->fields should be at least always an empty array, shouldn't it ? No need for if (!empty()) ?

    updateFieldLangcodes() might be overkill as a separate method. Could be inlined into onChange() ? Or even moved to initializeDefaultLanguage(), that could totally be part of that function's job ? What the function does seems very similar to the stuff already done in initializeDefaultLanguage() for 'language' field ?

    Plus I'm actually a bit surprised at what the method does. If the "default language" of the entity changes from langode_1 to langcode_2, we keep the current field values for langcode_1, and just do setLangcode(langcode_2) on them ? And what if the entity already had an existing translation for langcode_2 ? Looks like the current code in updateFieldLangcodes() would leave two translations for langcode_2, one under DEFAULT, one under langcode_2 ?

    At any rate, this code could use a comment IMO.

  8. +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -915,10 +939,7 @@ public function __clone() {
    -      unset($this->translations);
    -      // This will trigger the magic setter as the translations array is
    -      // undefined now.
    -      $this->translations = $translations;
    +      $this->translations = &$translations;

    This code in current HEAD is already way above my head, but is the comment above this hunk still valid ? (couldn't it explain why the added "by reference" thingie is needed ?)

  9. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -325,7 +325,7 @@ protected function attachPropertyData(array &$entities, $revision_id = FALSE) {
    -            $revision_ids[] = $values[$this->revisionKey];
    +            $revision_ids[] = $values[$this->revisionKey][Language::LANGCODE_DEFAULT];

    Oh ? How did it work so far ??

  10. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -849,41 +849,47 @@ protected function doLoadFieldItems($entities, $age) {
    -    $all_langcodes = array_keys(language_list());
    +    $all_langcodes = array_keys(language_list(Language::STATE_ALL));

    Minor: since the patch removes the other $langcode variable, then I'd suggest we just name this one $langcode.

  11. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -849,41 +849,47 @@ protected function doLoadFieldItems($entities, $age) {
    +      $translatable = $field->isFieldTranslatable();

    Used only once, no need for a variable ? Or if this is to optimize across the foreach ($results) loop, there are other stuff that would need to be optimized that way (getColumns(), getFieldCardinality()...). This feels like micro optim that only affects "field cache" misses, so I'd say don't bother ?

  12. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -849,41 +849,47 @@ protected function doLoadFieldItems($entities, $age) {
    +        // @todo Remove BC support for 'und' untranslatable fields as soon as
    +        //   can write a migration for them.

    Right, in D8, we never ever store 'und' as langcode in field storage. But we did in D7, so there is some upgrade impact.
    - The BC code + @todo means we leave that to a followup ? can we create it and link from the @todo ?
    - Were 'untranslatable fields' is only case where we stored 'und' in D7 ?

  13. +++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
    @@ -849,41 +849,47 @@ protected function doLoadFieldItems($entities, $age) {
    +        if ($translatable || $row->langcode == Language::LANGCODE_NOT_SPECIFIED || $row->langcode == $entities[$row->entity_id]->getUntranslated()->language()->id) {

    "$entity->getUntranslated()->language()->id" (and similar in other places)
    Ouch :-/ Looks like we really need a DX-friendlier way to get the "defaultLangcode" for an entity ?

  14. +++ b/core/lib/Drupal/Core/Language/Language.php
    @@ -59,11 +59,8 @@ class Language {
    -  const LANGCODE_DEFAULT = 'und';
    +  const LANGCODE_DEFAULT = 'xx-default';

    Yay for differenciating with NOT_SPECIFIED at last ;-)
    - Where does the chosen string value come from ? some standard somewhere ? Is it worth documenting ?
    - I haven't checked all the current uses of LANGCODE_DEFAULT, but I'm wondering if for example aggregator_update_8001() is still correct, then. 'xx-default' looks like it's intended to be a runtime-only placeholder, not something that we'd want to have in actual db records ?

  15. [edited out, I was drunk or something]
  16. +++ b/core/modules/hal/lib/Drupal/hal/Tests/DenormalizeTest.php
    @@ -206,7 +206,10 @@ public function testPatchDenormailzation() {
    +      // The 'langcode' field has always a value.

    s/has always/always has/ ;-)

Also, while we're doing stuff in getTranslatedField(), could we rename its $property_name argument to $field_name ? There was a moment earlier in the cycle where we talked about "entity properties", but we're now settled on "fields".

StatusFileSize
new17.11 KB
new66.05 KB
PASSED: [[SimpleTest]]: [MySQL] 57,995 pass(es).
[ View ]

@yched:

Great review as usual :)

I am uploading an interim patch to see if I broke something. I still owe you some fixes and some answers, but it's too late now ;)

I had a first short look, will review more once yched's review in #69 got addressed.

ad, #69, 1)
I think we should start arranging data according to our translation model, i.e. entity - language - fields. I agree the keying by default language stuff seems deprecated since the object now deals with all translations, instead we could start doing shortcuts as $this->current_fields =& $this->fields[$this->activeLangcode];. But that all does not have to happen in this issue as it's internal restructuring, so let's do as it's easier?

  1. ad #69, 6), 7)
    +++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
    @@ -559,11 +565,32 @@ protected function getDefaultLanguage() {
    +      if (!empty($this->fields['langcode'])) {
    +        $this->fields['langcode'][Language::LANGCODE_DEFAULT]->setLangcode($this->language->id);

    This overall complex logic - but do we still need that given fields have a respective entity translation as parent now and language fallback is on entity level? I.e. revisit #2078625: Field/FieldItem value objects should carry their language
    $field->getLangcode() could just fetch the language code of the parent entity translation, what should have positive impact on the memory usage and simplify the logic?
    Speaking of that, the FieldItemList interface currently does not clarify $field->getEntity() always gets the right translation - but I think it does and imo it should have to.

  2. +++ b/core/modules/system/lib/Drupal/system/Tests/Entity/EntityTranslationTest.php
    @@ -596,4 +632,152 @@ function testFieldDefinitions() {
    +  protected function assertFieldStorageLangcode(ContentEntityInterface $entity, $message = '') {
    ...
    +    $entity_type = $entity->entityType();

    This test is tightly coupled to the field storage mechanism in use, so maybe better move it to a dedicated DB-field storage test?

re @fago - "do we still need $field->setLangcode()"

I'll let plach confirm, but I think the current behavior is:
$field->getLangocode() is
- the entity current language for translatable fields
- the entity *default* language for non-translatable fields.

If so, then yes, maybe that logic could be hardcoded in field::getLangcode(), and no need for a setLangcode() ?
(would need a way to read the entity's default langcode that is more direct/efficient than the current $entity->getUntranslated()->language()->id :-))

StatusFileSize
new20.96 KB
new73.84 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch et-field_langcode_wtf-2076445-74.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

And here we go, answers following.

@yched:

I'm not sure I see the value in keeping the internal runtime storage keyed by LANGCODE_DEFAULT

I'd tend to agree, but as @fago is pointing out we might want to perform a deeper refactoring in a dedicated issue.

Can it really happen that $this->language is not set ?

Yes, when we set the ::defaultLangcode cache, we instantiate the 'langcode' field object without being able to set its langcode, since it's still NULL.

More generally, this code snippet has the smell of something wicked in the overall code flow :-/ [...] Are the two related ?

Yes, once we have instantiate the 'langcode' field object, we can retrieve its value and initialize its own language. I couldn't come up with anything cleaner, suggestions welcome.

updateFieldLangcodes() might be overkill as a separate method. Could be inlined into onChange() ? Or even moved to initializeDefaultLanguage(), that could totally be part of that function's job ? What the function does seems very similar to the stuff already done in initializeDefaultLanguage() for 'language' field ?

If we revert field langcodes as standalone properties, we can get rid of it altogether. Meanwhile I don't see the problem of having a standalone method encapsulating a well-defined and reusable logic. I don't think it necessarily belongs to the other two cited above.

Plus I'm actually a bit surprised at what the method does. If the "default language" of the entity changes from langode_1 to langcode_2, we keep the current field values for langcode_1, and just do setLangcode(langcode_2) on them ? And what if the entity already had an existing translation for langcode_2 ?

Changing the entity language means that its original values should be interpreted as having the new language. The latter matching an existing translation language is not supposed to happen: we explicitly forbid it in the UI. I added a couple of lines + test coverage to ensure this cannot happen even programmatically.

Oh ? How did it work so far ??

No idea :(
However we should probably remove the langcode level also from the values passed to the entity constructor, it's completely usless now.

Right, in D8, we never ever store 'und' as langcode in field storage. But we did in D7, so there is some upgrade impact. [...] Were 'untranslatable fields' is only case where we stored 'und' in D7 ?

Both in D7 and in D8 we store 'und' fields if the entity has 'und' language itself. In D7 we stored 'und' also for untranslatable fields.

the current behavior is:
$field->getLangocode() is
- the entity current language for translatable fields
- the entity *default* language for non-translatable fields.

Yup :)

Ouch :-/ Looks like we really need a DX-friendlier way to get the "defaultLangcode" for an entity ?
[...]
would need a way to read the entity's default langcode that is more direct/efficient than the current $entity->getUntranslated()->language()->id :-)

Well, I thought a lot about the DX of this particular use case in the original ET API issue: the rationale behind not providing a dedicated method to retrieve the default language is that people should be able to write code not taking into account multilingual entities whenever possible. If we have only a ::language() method developers do not need to choose: if someone bothered to pass along a translation object they will get the translation language, otherwise they will get the default language. If the code explicitly needs the default language then its logic is likely multilingual-aware and so the developer are supposed to be familiar with the ET API, or at least they should ;)
For instance:

<?php
function entity_do_stuff(EntityInterface $entity)
 
// The developer has no choice here.
 
$langcode = $entity->language()->id;
 
$output = '<div lang="' . $langcode . '"> ... </div>';
 
// ...
}
?>

vs

<?php
function entity_do_stuff(EntityInterface $entity)
 
// "Mmh, should I use ::language() or ::defaultLanguage()? Default sounds as a
  // good choice when in doubt..."
  // Both will work in a monolingual context, so people would not notice the
  // difference, but only ::language() would be correct in a multilingual
  // environment.
 
$langcode = $entity->defaultLanguage()->id;
 
$output = '<div lang="' . $langcode . '"> ... </div>';
 
// ...
}
?>

And this updates to @todo. #69 should be addressed now :)

StatusFileSize
new1 KB

Wrong interdiff...

Status:Needs review» Reviewed & tested by the community

Great that we have an agreement on simplifying $entity->fields internal langcode keying and $field->getLangcode(), and agreed about doing that in a separate issue - looking forward to it, looks like it will make the code dead simple... (dead simple Entity/field translation code, can you believe that ? What's not to like about D8... ;-)

Regarding "$entity->getUntranslated()->language()->id" vs "a DX-friendlier way to get the defaultLangcode for an entity"
I understand the reasoning for "hiding the API" here - yet switching the front-facing language of an entity object has a non negligible overhead, doesn't it ? Having to go through getUntranslated() as the only way to access a basic metainfo about the entity does seem a shame. However, not introduced here, so, separate discussion - opened #2142603: DX / efficiency for accessing the "default langcode" of an entity

With further cleanups on the radar, the current patch works for me. RTBC as far as I'm concerned, I'll let @fago +1 or not ;-)

The last submitted patch, 74: et-field_langcode_wtf-2076445-74.patch, failed testing.

StatusFileSize
new26.35 KB
PASSED: [[SimpleTest]]: [MySQL] 59,051 pass(es).
[ View ]

Thanks!

Just a reroll

#82 confuses me a bit, which one is the latest patch? The not hidden-one seems to be the wrong one?

Then, I guess we need to create further follow-ups for removing the field item languages and/or re-organzing the internal entity structure.

StatusFileSize
new73.87 KB
PASSED: [[SimpleTest]]: [MySQL] 59,017 pass(es).
[ View ]

Oops, too many local branches...

I created #2142885: Refactor ContentEntityBase internal field storage, let's discuss there what we want and figure out whether that's enough or we need another issue.

+++ b/core/lib/Drupal/Core/Entity/FieldableDatabaseStorageController.php
@@ -867,23 +866,29 @@ protected function doLoadFieldItems($entities, $age) {
+        // @todo Remove BC support for 'und' untranslatable fields as soon as
+        //   can write a migration. See https://drupal.org/node/2137917.

Not sure about this part - we do not support head head updates yet, so why bother writing a migration. Of course there will be a migration from 7.x. but not from a previous 8.x state? So shouldn't we just remove bc support here?

Oh, I forgot to mention - the patch looks good else. It's great to see all that batch processing being removed.

Status:Reviewed & tested by the community» Needs work

Let's remove that, yup!

Status:Needs work» Needs review
StatusFileSize
new1.13 KB
new73.67 KB
FAILED: [[SimpleTest]]: [MySQL] 59,096 pass(es), 1 fail(s), and 0 exception(s).
[ View ]

Makes sense

Status:Needs review» Needs work

The last submitted patch, 89: et-field_langcode_wtf-2076445-88.patch, failed testing.

Status:Needs work» Needs review

Status:Needs review» Needs work

The last submitted patch, 89: et-field_langcode_wtf-2076445-88.patch, failed testing.

Status:Needs work» Needs review

89: et-field_langcode_wtf-2076445-88.patch queued for re-testing.

No failures here.

Status:Needs review» Needs work

The last submitted patch, 89: et-field_langcode_wtf-2076445-88.patch, failed testing.

Status:Needs work» Needs review

Status:Needs review» Needs work

The last submitted patch, 89: et-field_langcode_wtf-2076445-88.patch, failed testing.

Status:Needs work» Needs review

Status:Needs review» Needs work

The last submitted patch, 89: et-field_langcode_wtf-2076445-88.patch, failed testing.

Status:Needs work» Needs review

StatusFileSize
new1.52 KB
new74.26 KB
PASSED: [[SimpleTest]]: [MySQL] 59,095 pass(es).
[ View ]

@yched suggested another improvement.

Status:Needs review» Reviewed & tested by the community

Works for me. Thanks !

The last submitted patch, 89: et-field_langcode_wtf-2076445-88.patch, failed testing.

Status:Reviewed & tested by the community» Needs work

The last submitted patch, 100: et-field_langcode_wtf-2076445-98.patch, failed testing.

Status:Needs work» Reviewed & tested by the community

That test passes locally and is failing in other unrelated issues too.

Issue tags:+blocker

Marking as a blocker as well because this changes major assumptions in the entity system and several of our other work touches the same code and therefore it would save considerable time to commit this sooner than later.

Latest changes + patch looks good, +1 to move on with this.

StatusFileSize
new74.24 KB
PASSED: [[SimpleTest]]: [MySQL] 59,032 pass(es).
[ View ]

Straight reroll after the field definitions patch.

Title:Make sure language codes for original field values always match entity language regardless of field translatabilityChange notice: Make sure language codes for original field values always match entity language regardless of field translatability
Status:Reviewed & tested by the community» Active
Issue tags:+Needs change record

Well, I'm just going to be totally honest here and say this patch for the most part goes zooming over my head. :P~ I did read the issue summary about 3 times and now kind of get the general idea, though. It's surprising to me that we use language codes like 'de' even on untranslatable fields, but I do understand the architectural simplicity that making fields and entities match gives us, and it sounds like all the various D8MI people who have experience translating sites and would know if this was going to cause a huge headache in advance were involved in this discussion.

I caught a couple of nits with Sentence capitalization in comments but nothing that would hold this patch up. So...

Committed and pushed to 8.x. Thanks!

This will need a change notice.

Status:Active» Needs review

Added a user facing change notice at https://drupal.org/node/2147921 and a developer facing change notice at https://drupal.org/node/2147925. I think the later would need to be reviewed more, but review on the first helps too. AFAIS the consumer API did not change with this patch, language handling was already well encapsulated inside the entity system. This change affects more people who want to extend and build their own entity types. Not sure what to explain for them if anything :)

Title:Change notice: Make sure language codes for original field values always match entity language regardless of field translatabilityMake sure language codes for original field values always match entity language regardless of field translatability
Status:Needs review» Fixed

They look great, thanks!
(and sorry for forgetting to write them myself :( )

Assigned:plach» Unassigned

Issue tags:-sprint

Moving off of sprint then! Thanks all!

Status:Fixed» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Issue tags:-Needs change record