Problem/Motivation

In Drupal a content entity is allowed to have a string field as ID. The string is not necessarily expected to only contain ASCII characters, it is allowed to contain UTF-8. Edit, see comment #7: Actually, the default setting is non-ascii, the developer needs to explicitly set the setting in order to limit it to ascii characters.

However, in case a content entity has string ID, the Field API enforces the entity_id column to be of varchar_ascii type, which is realized in mysql as a VARCHAR with ascii encoding.

Therefore, any field added on a such entity type, will be throwing a DatabaseException each time an entity with non-ascii characters in its ID is saved, if any field with dedicated table receives a value. If, on the other hand, the entity type does not have any fields with dedicated tables (or, if the specific entity that is saved does not have values to any of these fields), the entity will be saved without problems.

Similar problem can be found on the ..._target_id column for entity reference fields.

According to the Database API:

"A special 'varchar_ascii' type is also available for limiting machine name field to US ASCII characters."

According to this statement, the usage of varchar_ascii for the field columns is wrong. I cannot think of a reason why an entity ID must follow this same rule with machine names.

Steps to reproduce

  • Create a custom module, and define a custom content entity type by following any relevant tutorial, such as https://www.drupal.org/docs/drupal-apis/entity-api/creating-a-content-en...
  • In your entity type's class that extends ContentEntityBase, when implementing the
    public static function baseFieldDefinitions which declares the base fields, override the definition of your id-field as
    $fields["id"] = BaseFieldDefinition::create('string')->setSetting('is_ascii', FALSE) ...
  • Install the module and check your entity type's table in the database: You will see that the column for the ID is a VARCHAR with encoding UTF-8 (utf8mb4)
  • Add a field of any type to your entity type, for example a string one
  • Check your field's table in the database: You will see that the entity_id column< is a VARCHAR(128) with ascii encoding.
  • Try to create an entity with a non-ascii character in its ID, without setting any value on the field you created above: The entity is saved without problems.
  • Try to either create another entity with a non-ascii character in its ID, or just update the previous one, but this time set any value to the other field: You will get a DatabaseException

Problem: The entity type's table is allowed to have a row with non-ascii characters in the ID column, but the field's table is not allowed to have the same ID in its entity_id column.

The problem seems to by caused in \Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::getDedicatedTableSchema, in this code snippet:

    $id_definition = $this->fieldStorageDefinitions[$entity_type->getKey('id')];
    if ($id_definition->getType() == 'integer') {
      $id_schema = [
        'type' => 'int',
        'unsigned' => TRUE,
        'not null' => TRUE,
        'description' => 'The entity id this data is attached to',
      ];
    }
    else {
      $id_schema = [
        'type' => 'varchar_ascii',
        'length' => 128,
        'not null' => TRUE,
        'description' => 'The entity id this data is attached to',
      ];
    }

Here the condition is only checking about the ID field being an integer or not. In the second case, it assumes by default that the length of the ID is 128 and that it will only contain ascii characters.

Proposed resolution

Use varchar instead of varchar_ascii.

Remaining tasks

  • Write a first patch
  • Write tests

User interface changes

None, as far as I can tell.

API changes

TBD

Data model changes

TBD

Release notes snippet

TBD

CommentFileSizeAuthor
#20 3263695-nr-bot.txt192 bytesneeds-review-queue-bot
#16 field-table-schema-respect-entity-table-schema--3263695-16.patch5.93 KBefpapado
#16 field-table-schema-respect-entity-table-schema--3263695-16--failing-test.patch856 bytesefpapado
#12 field-table-schema-respect-entity-table-schema--3263695-12.patch5.35 KBefpapado
#12 field-table-schema-respect-entity-table-schema--3263695-12--failing-test.patch856 bytesefpapado
#11 field-table-schema-respect-entity-table-schema--3263695-11.patch5.22 KBefpapado
#11 field-table-schema-respect-entity-table-schema--3263695-11--failing-test.patch856 bytesefpapado
#9 field-table-schema-respect-entity-table-schema--3263695-9.patch4.38 KBefpapado
#8 field-table-schema-respect-entity-table-schema--3263695-8.patch3.19 KBefpapado
#7 interdiff_6-7.txt2.85 KBefpapado
#7 field-table-schema-respect-entity-table-schema--3263695-7.patch3.26 KBefpapado
#6 interdiff_2-6.txt600 bytesefpapado
#6 field-table-schema-respect-entity-table-schema--3263695-6.patch1.56 KBefpapado
#2 field-table-schema-respect-entity-table-schema--3263695-2.patch997 bytesefpapado
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

efpapado created an issue. See original summary.

efpapado’s picture

Uploading a first patch, without tests, to begin the discussion.

efpapado’s picture

Issue summary: View changes
efpapado’s picture

Issue summary: View changes

Status: Needs review » Needs work
efpapado’s picture

A new try, this time adding the missing public function getSettings(): array method in the TestBaseFieldDefinitionInterface, to ensure a more accurate mocking.

efpapado’s picture

The same problem seems to also happen with the ..._target_id column on the tables of entity reference fields: By default they are created as varchar_ascii, so they cannot accept non-ascii characters in their targets.

I'm extending the patch to cover this case too, and I'm also changing my approach. I believe that, in both cases, the column should explicitly be set as varchar and not varchar_ascii. According to the Database API documentation:

According to the Database API:

A special 'varchar_ascii' type is also available for limiting machine name field to US ASCII characters.

But an entity's string ID does not need to follow this same rule as machine names. And, btw, since the default value for the is_ascii configuration is FALSE (see \Drupal\Core\Field\Plugin\Field\FieldType\StringItem::defaultStorageSettings), so in order to have ascii-string ID you need to explicitly set it as such, then I believe it makes sense that both the entity_id, and the ..._target_id columns should NOT be ascii by default.

efpapado’s picture

efpapado’s picture

Changing the expectation of the SqlContentEntityStorageSchemaTest::testDedicatedTableSchemaForEntityWithStringIdentifier() to match the code changes.

efpapado’s picture

Issue summary: View changes
efpapado’s picture

A first attempt to writing a test.
Uploading a patch with the failing test, and the overall patch with the passing test.

Status: Needs review » Needs work
efpapado’s picture

A more safe approach: On entity reference target_id column decide between varchar_ascii and varchar based on whether the target entity type is a config one or not.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
FileSize
192 bytes

The Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.