The file entity schema is missing the langcode property (present in nodes, users, comments, etc). To add support to language later, we need the file entities to support the langcode property just like other entities do. We do not need/want to add a UI here since the node module has UI for this which we want to move up to entity module eventually to be shared among entities. So this is about providing the low level data storage for the langcode field.

Comments

hairqles’s picture

StatusFileSize
new1.19 KB

This patch adds a language field to the schema, and adds the field to the db in the update hook.

hairqles’s picture

Status: Active » Needs review
hairqles’s picture

StatusFileSize
new1.64 KB

The file object langcode attribute has initialize in the file_save function, not in the hook_schema.

fubhy’s picture

Status: Needs review » Needs work
+++ b/core/includes/file.incundefined
@@ -571,6 +571,10 @@ function file_load($fid) {
   $file->filesize = filesize($file->uri);
+  ¶

There is some unnecessary whitespace in that line (below $file->filesize)

+++ b/core/modules/system/system.installundefined
@@ -1669,3 +1676,19 @@ function system_update_8002() {
+  $langcode_field = array(
+    'description' => 'The {language}.langcode of this file.',
+    'type' => 'varchar',
+    'length' => 12,
+    'not null' => TRUE,
+    'default' => '',

Maybe also set the 'initial' property to LANGUAGE_NONE here?

hairqles’s picture

Status: Needs work » Needs review
StatusFileSize
new1.72 KB

Thanks for the review!
I attached the patch with the changes.

fubhy’s picture

Status: Needs review » Needs work

The

+++ b/core/modules/system/system.installundefined
@@ -814,6 +814,14 @@ function system_schema() {
+      'langcode' => array(
+        'description' => 'The {language}.langcode of this file.',
+        'type' => 'varchar',
+        'length' => 12,
+        'not null' => TRUE,
+        'default' => '',
+        'initial' => LANGUAGE_NONE,

The 'initial' property is only required in the update hook, not in the schema.

hairqles’s picture

StatusFileSize
new1.68 KB

'initial' property removed from the hook_schema.

hairqles’s picture

Status: Needs work » Needs review
kalman.hosszu’s picture

Status: Needs review » Reviewed & tested by the community

I checked the code and it seems OK. The test run successfully so I change the status to RTBC.

dave reid’s picture

Issue tags: +Media Initiative

Tagging with Media initiative since this touches file entities.

dave reid’s picture

Status: Reviewed & tested by the community » Needs review

If a module wanted to add a langcode column on the {file_managed} table via contrib in D7, it would be good if this update function could run a simple db_field_exists() check before adding the field.

damien tournoud’s picture

The file entity schema is missing the langcode property (present in nodes, users, comments, etc). To add support to language later, we need the file entities to support the langcode property just like other entities do.

I cannot really make sense of this. {node}.language and {users}.language are two *completely* different animals. The language attached to the user is the *default language of the user*, not the language of the user entity.

Also, I don't really see how this is necessary. More details on the use case would be very welcome.

dave reid’s picture

This would probably make more context when used with File entity which has a full UI for creating, editing, deleting file entities and fields on those files. We've had several requests to be able to translate files, and we can't do that for two reasons:

  1. No langcode field in {file_managed}
  2. The {file_managed}.uri column is a unique index so you can't have two file records with different languages pointing to the same URI.
damien tournoud’s picture

1. No langcode field in {file_managed}
2. The {file_managed}.uri column is a unique index so you can't have two file records with different languages pointing to the same URI.

Well, my point exactly: those are really mutually exclusive: in one case you want to translate the whole entity (likely referencing two different files), in the second case you want to translate *fields* attached to the entity, while still referencing the same file.

If you translate the whole entity, the references to it needs to be changed too (because you have a different file id), and you end up with the same types of issues caused by legacy content translation.

dave reid’s picture

Yep that's true. I'm still not sure why we want to add a langcode field to files considering with core they're not fieldable. Also this is missing an update hook to add a default of 'und' (e.g. LANGUAGE_NONE) to any existing files, otherwise they will all have a langcode of an empty string.

gábor hojtsy’s picture

Status: Needs review » Needs work

@Dave/@Damien: one of the reasons to add langcode is exactly to support the use case of adding fields when file_managed is used to support an entity (which I've expected to be happening in D8?) as well as to be abe to just tell language information about the files. D8MI's primary goal is to add the possibility to assign language to as many types of data in Drupal as possible (the file referenced could very well be attributed with a language, unless its a landscape photo :).

@Dave: on the update function, the schema 'initial' key already covers what you are looking for IMHO.

TODO:

1. I agree contrib might modify the schema with an existing field and we should skip adding the field in that case.
2. Also @xjm pointed out very well in the taxonomy langcode field issue at #1444966: Add langcode property in taxonomy schema that we should make sure to add tests. Ie. add a file with the API with langcode LANGUAGE_NONE and then one with some other language.

Unless of course we don't agree even that we need this field, in which case the work is futile. I think the reasons to have it are clear.

effulgentsia’s picture

I just read the excellent write-ups on http://groups.drupal.org/node/197848 and http://groups.drupal.org/node/165194, but similar to #12, I'm still left wondering what a langcode in the entity table really means. Is it just a stopgap until we have translatable properties? I mean, if we intend to keep translatable fields, and if the intent is to implement translatable properties, and if we unify all our relation modeling with a generic entity relation field and this field can be language assigned, then after all these things, does a langcode on the entity record itself mean anything?

But, these are all big ifs. In the meantime, I think consistency is good, so if we have langcode on most of our entity types, then we should have it on files too, so I'm +1 for this issue, even though I'm looking forward to understanding better how langcode is intended to be used across all our entities in general.

The language attached to the user is the *default language of the user*, not the language of the user entity.

Let's discuss that in #1439680-26: Rename $user language property to langcode.

effulgentsia’s picture

Also, let's consider that {file_managed} already has columns for information about the URI: the filemime and filesize columns. Potentially, the meaning of {file_managed}.langcode needs to also be coupled to the URI (i.e., the contents of the file itself is what determines its langcode). This might be a different situation than what we have with other entity types, though, where in theory properties could be localized independently of the underlying entity. Or, do we want to divorce a 1:1 relationship between fid and URI, so that the same fid could refer to one URI in one language, and a different URI in another language?

tstoeckler’s picture

Even without fields on files I think such 'langcode' property makes sense. The following use-case isn't possible with just core, but still:
If you manage a whole bunch of files, say e.g. pdf's which are tutorials of some sort. They have an intrinsic language, which is the language the tutorial is written in. This would be managed in the 'langcode' property. You would need it if, for a certain user, you only want to display tutorials in the user's language. In the same way, other files (including video and audio) can have an intrinsic language.

Attaching fields to that, which in turn can be translated, is also a valid use-case, but a totally different matter, IMO.

If the different .pdf's have the same file ID or not is just the same translation sets vs. in-entity translation debate which is irrelevant for this issue and just depends on what comes out of entity_translation.module.

EDIT: Crosspost, but still relevant I think.

effulgentsia’s picture

Re #19, I might be off-base here, but at least to me, that use-case is wrapped up in how we approach translatable properties. If we decide to make all properties translatable, then that PDF query would be about finding all file entities with a URI property that exists in the desired language, not about querying the langcode of the entities themselves. However, I think that logic would extend to all entity types, and would mean langcode has no meaning whatsoever on entities, only on fields, properties, and relationships.

However, since we do not yet have translatable properties, adding langcode to entity tables makes sense (to me). Then, depending on how we implement translatable properties, maybe we'll need to remove langcode from entity tables, but we can do so at that time.

gábor hojtsy’s picture

@effulgentsia: yes, we don't yet have any consensus at to how to implement translatable properties (in a performant manner). Also, storing the "initial language" of the entity even if all properties/fields are translatable and have language information assigned is useful as the "fallback language" for when a property or field does not have translation and/or as a workflow support feature where we know the original submission language and can support the translation of the entity to other languages and/or a permissions helper value where we want to tie the original edit/delete permissions to the original language value of the entity while tie translation permissions to the actual translations of the entity. In short knowing the original language of the entity is useful for display fallback, workflow, permissions, etc. even if we otherwise can/will store all properties and fields for different languages on all on the same level.

effulgentsia’s picture

Status: Needs work » Needs review
StatusFileSize
new3.72 KB

This addresses the todos in #16.

gábor hojtsy’s picture

I think this looks pretty complete. Any other concerns?

gábor hojtsy’s picture

Issue tags: +Needs tests

Oh, uhm, still needs upgrade tests I think (although it is pretty darn simple).

effulgentsia’s picture

Status: Needs review » Closed (duplicate)
gábor hojtsy’s picture

Issue tags: -Needs tests, -sprint

Removing sprint tag.

gábor hojtsy’s picture

Issue summary: View changes
Issue tags: -Media Initiative +D8Media

Fixing to the right media tag.