Problem/Motivation
In Drupal a content entity is allowed to have a string field as ID. The string is not necessarily expected to only contain ASCII characters, it is allowed to contain UTF-8. Edit, see comment #7: Actually, the default setting is non-ascii, the developer needs to explicitly set the setting in order to limit it to ascii characters.
However, in case a content entity has string ID, the Field API enforces the entity_id
column to be of varchar_ascii
type, which is realized in mysql as a VARCHAR with ascii encoding.
Therefore, any field added on a such entity type, will be throwing a DatabaseException each time an entity with non-ascii characters in its ID is saved, if any field with dedicated table receives a value. If, on the other hand, the entity type does not have any fields with dedicated tables (or, if the specific entity that is saved does not have values to any of these fields), the entity will be saved without problems.
Similar problem can be found on the ..._target_id
column for entity reference fields.
According to the Database API:
"A special 'varchar_ascii' type is also available for limiting machine name field to US ASCII characters."
According to this statement, the usage of varchar_ascii for the field columns is wrong. I cannot think of a reason why an entity ID must follow this same rule with machine names.
Steps to reproduce
- Create a custom module, and define a custom content entity type by following any relevant tutorial, such as https://www.drupal.org/docs/drupal-apis/entity-api/creating-a-content-en...
- In your entity type's class that extends
ContentEntityBase
, when implementing the
public static function baseFieldDefinitions
which declares the base fields, override the definition of your id-field as
$fields["id"] = BaseFieldDefinition::create('string')->setSetting('is_ascii', FALSE) ...
- Install the module and check your entity type's table in the database: You will see that the column for the ID is a VARCHAR with encoding UTF-8 (utf8mb4)
- Add a field of any type to your entity type, for example a string one
- Check your field's table in the database: You will see that the
entity_id
column< is a VARCHAR(128) with ascii encoding. - Try to create an entity with a non-ascii character in its ID, without setting any value on the field you created above: The entity is saved without problems.
- Try to either create another entity with a non-ascii character in its ID, or just update the previous one, but this time set any value to the other field: You will get a DatabaseException
Problem: The entity type's table is allowed to have a row with non-ascii characters in the ID column, but the field's table is not allowed to have the same ID in its entity_id
column.
The problem seems to by caused in \Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::getDedicatedTableSchema
, in this code snippet:
$id_definition = $this->fieldStorageDefinitions[$entity_type->getKey('id')];
if ($id_definition->getType() == 'integer') {
$id_schema = [
'type' => 'int',
'unsigned' => TRUE,
'not null' => TRUE,
'description' => 'The entity id this data is attached to',
];
}
else {
$id_schema = [
'type' => 'varchar_ascii',
'length' => 128,
'not null' => TRUE,
'description' => 'The entity id this data is attached to',
];
}
Here the condition is only checking about the ID field being an integer or not. In the second case, it assumes by default that the length of the ID is 128 and that it will only contain ascii characters.
Proposed resolution
Use varchar instead of varchar_ascii.
Remaining tasks
Write a first patch- Write tests
User interface changes
None, as far as I can tell.
API changes
TBD
Data model changes
TBD
Release notes snippet
TBD
Comment | File | Size | Author |
---|---|---|---|
#20 | 3263695-nr-bot.txt | 192 bytes | needs-review-queue-bot |
#16 | field-table-schema-respect-entity-table-schema--3263695-16.patch | 5.93 KB | efpapado |
#16 | field-table-schema-respect-entity-table-schema--3263695-16--failing-test.patch | 856 bytes | efpapado |
Comments
Comment #2
efpapado CreditAttribution: efpapado at utdanning.no commentedUploading a first patch, without tests, to begin the discussion.
Comment #3
efpapado CreditAttribution: efpapado at utdanning.no commentedComment #4
efpapado CreditAttribution: efpapado at utdanning.no commentedComment #6
efpapado CreditAttribution: efpapado at utdanning.no commentedA new try, this time adding the missing
public function getSettings(): array
method in theTestBaseFieldDefinitionInterface
, to ensure a more accurate mocking.Comment #7
efpapado CreditAttribution: efpapado at utdanning.no commentedThe same problem seems to also happen with the
..._target_id
column on the tables of entity reference fields: By default they are created as varchar_ascii, so they cannot accept non-ascii characters in their targets.I'm extending the patch to cover this case too, and I'm also changing my approach. I believe that, in both cases, the column should explicitly be set as varchar and not varchar_ascii. According to the Database API documentation:
According to the Database API:
But an entity's string ID does not need to follow this same rule as machine names. And, btw, since the default value for the is_ascii configuration is
FALSE
(see\Drupal\Core\Field\Plugin\Field\FieldType\StringItem::defaultStorageSettings
), so in order to have ascii-string ID you need to explicitly set it as such, then I believe it makes sense that both theentity_id
, and the..._target_id
columns should NOT be ascii by default.Comment #8
efpapado CreditAttribution: efpapado at utdanning.no commentedNew patch without the extra empty line.
Comment #9
efpapado CreditAttribution: efpapado at utdanning.no commentedChanging the expectation of the
SqlContentEntityStorageSchemaTest::testDedicatedTableSchemaForEntityWithStringIdentifier()
to match the code changes.Comment #10
efpapado CreditAttribution: efpapado at utdanning.no commentedComment #11
efpapado CreditAttribution: efpapado at utdanning.no commentedA first attempt to writing a test.
Uploading a patch with the failing test, and the overall patch with the passing test.
Comment #12
efpapado CreditAttribution: efpapado at utdanning.no commentedNew patch, that will take into account whether the target entity of an entity reference is fieldable or not.
Comment #16
efpapado CreditAttribution: efpapado at utdanning.no commentedA more safe approach: On entity reference target_id column decide between varchar_ascii and varchar based on whether the target entity type is a config one or not.
Comment #20
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.