Code that works with field values sometimes need to know their language.

Right now, Field & FieldItem objects do not know their own language, only the containing entity knows it based on where the field is placed in its internal storage ("I know this $field is 'en' because it is stored in $this->fields[en]")
So that info has to be passed along as a separate $langcode param next to the $item / $items objects for every piece of code that might need it.

- This complicates APIs and functions / methods signatures (see #2075095: Widget / Formatter methods signatures needlessly complex)
- This is also very brittle. The multilingual field logic (and notably the language fallback part) is far from simple, and the correct language is not easy to figure out if the connection between the $langcode and the values object is lost somehow (e.g you get an $item but no $langcode)
- Notably, the code in the Field / FieldItem classes themselves does not know which language are the values it's working with :-p
- There is code in current HEAD that assumes that the language of the field is the language of the parent entity, but that assumption is wrong when a "language fallback" logic is applied.

This is sick: field values have *one* langcode in storage, and they are turned into objects at runtime, so those objects should just carry their own language, not expect the external code to somehow know it for them.
Language fallback API & logic is still in flux for D8 (#2019055: Switch from field-level language fallback to entity-level language fallback), all the more reason to separate it from the rest of the code that works with field values. Language fallback assembles a set of fields according to whatever logic it sees fit. The rest of the code works with those values, and if needed reads their language directly from the value objects without needing an external fallback map to be passed along or rebuilt if it's been lost.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

yched’s picture

Issue summary: View changes

some precisions

yched’s picture

Title: Field/FieldItem value objects need should carry their language » Field/FieldItem value objects should carry their language
Status: Active » Needs review
FileSize
2.29 KB

Patch attached.

- This adds setter+getter methods for the language in Field objects.
Field objects are created through the generic "typed data factory", which enforces the signature of Field::__construct(), and we have nowhere in there to inject the language.
Besides, TypedDataManager::getPropertyInstance() has this notion of "prototypes", where only a small small number of objects are actually instantiated, and new objects are cloned from those prototypes.
So the language cannot be set in the constructor anyway, this requires a setter that can be calle don existing objects.

- The easy info, at the time EntityNG calls TypedDataManager::getPropertyInstance() and the language setter, is the langcode, not the Language object. It's also the info 99.9% of the consuming code needs - a langcode rather than a language object.
So right now Field objects store a $langcode, and the methods are setLangcode() / getLangcode().

- Those method names are too consistent with $entity->language() (for which most of the time you end up doing $entity->language()->id...).
But storing a reference to a Language object would require some more care to it performant, and as mentioned above, I'm totally not convinced it would be better DX. So I kept it this way for now, feedback welcome :-)

Still @todo:
- Add the methods to FieldInterface and properly document them.
- Add some tests

yched’s picture

Issue summary: View changes

more rationale...

yched’s picture

Issue summary: View changes

Rephrase

andypost’s picture

Suppose better to add a getLanguage() to interface as well for the 0.1% and better DX

yched’s picture

Well, not sure about that, really :-)
#2004244-64: Move entity revision, content translation, validation and field methods to ContentEntityInterface and following are discussing about adding $entity->langcode(), and about $entity->language() being mostly useless and having perf issues.
The use cases for getting a Language object from Field/FieldItem are marginal at best, and cause similar performance concerns.

yched’s picture

Priority: Major » Critical

And this is actually critical, text field and file/image fields need to have a langcode to do their job.

yched’s picture

Added the methods to the interface, and tests for the returned values in EntityTranslationTest.

plach’s picture

I had a quick look to the patch and I am not convinced by test assertions: I'd expect field language to behave in a different way. The fact we are green here probably indicates there's something wrong and not test covered atm.

As stated in #2076445: Make sure language codes for original field values always match entity language regardless of field translatability, field language should behave as follows:

  • If a field is translatable, its original values should have the same langcode of the parent entity object. Even if they are internally stored with langcode Language::LANGCODE_DEFAULT, their user facing language should be the same of the entity:
    <?php
    $entity->language->value = 'en';
    $langcode = $entity->field_foo->getLangcode(); // $langcode == 'en' even if the storage is 'und'
    $entity->language->value = 'de';
    $langcode = $entity->field_foo->getLangcode(); // $langcode == 'de' even if the storage is 'und'
    ?>
    
  • If a field is translatable, its translated values should have the same langcode of the parent translation object:
    <?php
    $translation = $entity->getTranslation('it');
    $langcode = $translation->field_foo->getLangcode(); // $langcode == 'it' as well as the storage
    translation2 = $entity->getTranslation('fr');
    $langcode = $translation2->field_foo->getLangcode(); // $langcode == 'fr' as well as the storage
    ?>
    
  • if a field is not translatable, it always has the Language::LANGCODE_NOT_SPECIFIED langcode:
    <?php
    $entity->language->value = 'en';
    $langcode = $entity->field_bar->getLangcode(); // $langcode == 'und' 
    $langcode = $entity->getTranslation('it')->field_bar->getLangcode(); // $langcode == 'und'
    ?>
    

Please note that Language::LANGCODE_NOT_SPECIFIED is an actual language code, whereas Language::LANGCODE_DEFAULT is just used to indicate the entity original language whatever it is. They have the same value only because this eases the implementation.

yched’s picture

@plach:

Which specific tests do you think are off ?

This patch doesn't change any existing language logic: $field->getLangcode() just returns the langcode by which Entity API considers the values to be at runtime (i.e in which "langcode key" the values are placed in the internal $entity->fields array). If there is something wrong, then it's already in HEAD, not introduced by the patch ?

Now, I indeed found some IMO very surprising/unintuitive behavior in there :-/ :
- the runtime langcode might be different from the langcode saved for the values in storage (values in the entity's "default language" are stored with this language, but seen as 'und' at runtime)
- the runtime langcode might be different from the one determined by the "old" field.multilingual.inc logic.
- storage different between base fields & configurable fields.

But this is related to the discussion happening in #2076445: Make sure language codes for original field values always match entity language regardless of field translatability. I think we need to dedicate some time on this in the Prague sprints.

Meanwhile, the code in current HEAD assuming the langcode of an $item is the one of the parent entity is plain wrong, and has no way to be made right - you *cannot* know the langcode of an item you received (or your own langcode if you are the FieldItem class). We need to fix this and introduce the correct practice: use $item->getLangcode() if you need the langcode of the $item.
Then, if something is wrong with how langcodes are assigned, it needs to be fixed/changed upstream, but this doesn't affect the consuming code.

plach’s picture

If there is something wrong, then it's already in HEAD, not introduced by the patch ?

Yup, that's what I meant with "The fact we are green here probably indicates there's something wrong and not test covered atm."

Then, if something is wrong with how langcodes are assigned, it needs to be fixed/changed upstream, but this doesn't affect the consuming code.

Ok, but we can't provide here tests that "legalize" a wrong behavior, better leaving that uncovered for now.

yched’s picture

Well, those tests have the merit of making clear what HEAD does atm, which IMO is a good thing since we seem to be very unclear on that. Then, if we change some multilingual logic later, it's at least clear what we change from and what we change to ?

But I can go both ways and remove the tests if you prefer.

Berdir’s picture

Makes sense to me as well to add tests for the current behavior, possibly with a "@todo: This tests the current behavior, fix in ..." ?

Could the storage stuff be related to the fact that the storage code still uses the BC decorator? (@plach, we need your help in #1983554: Remove BC-mode from EntityNG to implement that correctly, might need the language fallback issue, not sure) The very point of that seems to simulate the old behavior...

plach’s picture

Status: Needs review » Reviewed & tested by the community
catch’s picture

Title: Field/FieldItem value objects should carry their language » Change notice: Field/FieldItem value objects should carry their language
Status: Reviewed & tested by the community » Active

I bumped the other issue to major.

Not sure about the tests here either - I guess it's OK as long as the people who refactor this are aware the tests aren't happy, this is where we could use expectedFail().

Committed/pushed to 8.x, thanks!

Could use a change notice for the new methods etc.

yched’s picture

Title: Change notice: Field/FieldItem value objects should carry their language » Field/FieldItem value objects should carry their language
Status: Active » Fixed

Change notice added at https://drupal.org/node/2082415

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

andypost’s picture

Status: Closed (fixed) » Needs work

The change notice https://drupal.org/node/2040323 should be updated too

plach’s picture

Status: Needs work » Fixed

Done

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

typo