Updated: Comment #41

Problem/Motivation

- Mapping field item properties to DB storage record is hardcoded in \Drupal\Core\Entity\Sql\SqlContentEntityStorage.
- It contains special cases for the serialize key of FieldItemInterface::schema, and a special case of this special case for MapItem field (where not a single property but all properties are serialized).
- Issues like #2563843: MapItem is broken for configurable fields demand to add more special cases
- In the context of the JSON storage initiatives (like #3276818: [META] Add support for JSON field queries in entity queries), a proper JSON map field adds more requirements to that.

Time to add an API so that fields can specify how their properties are mapped to their schema columns.

Proposed resolution

- Allow field type classes to define how their values should be (un)serialized.
- Remove the serialize key from MapItem's schema and use this mechanism instead.
- Deprecate the serialize key in a followup.

Remaining tasks

- Decide on how this API should work, code, review, commit.

User interface changes

None.

API changes

Addition: Field items can specify how they are mapped to storage and possibly serialized or JSON encoded.

CommentFileSizeAuthor
#35 2232427-nr-bot.txt3.99 KBneeds-review-queue-bot

Issue fork drupal-2232427

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

plach’s picture

Actually @Berdir pointed out that we should be doing the opposite, see #2229181: MapItem::propertyDefinitions() claims it has a static value property, which is wrong. The rationale is that serialization is a need specifc to the database storage, so ContentEntityDatabaseStorage should be responsible for automatically (un)serializing field values marked as serializable in the definition.

tstoeckler’s picture

Yes, that makes sense.

I'd like to wait for a patch over there, however, to see whether we can close this or repurpose it.

Berdir’s picture

Note that LinkItem also has two properties that need to be serialized, it is currently broken for base fields and #2235457: Use link field for shortcut entity adds the same hack as MapItem has, so whatever gets in first should be updated...

And yeah, what @plach said :)

Berdir’s picture

Related: I pushed a fix for the ugly MapItem hacks to http://drupalcode.org/sandbox/dereine/2031809.git/commitdiff/refs/heads/... that supports different and no main properties for fields.

Berdir’s picture

Note that there are two different scenarios:

MapItem: The whole field item needs to be stored in a single column
LinkItem: Has properties that need to be stored in a serialized column

And as mentioned before, this is storage dependent as well, because MongoDb doesn't care, it just throws all the values into a document and is done.

fago’s picture

Title: Allow field types to handle their own value (un)serialization » Field types cannot control what gets stored
Category: Task » Bug report
Priority: Normal » Major
Issue tags: +Entity Field API

This is quite a major problem of our storage API, thus re-titling and re-priorising.

plach’s picture

Issue tags: +entity storage
Xano’s picture

I would like to see functionality like this, because I am working on a field item that contains objects, which can be decomposed into scalars and arrays that need to be stored. From my point of view, decoupling field value storage from field properties and introducing methods to map to and from would be a good step towards providing the required functionality.

dawehner’s picture

So #2414835: SqlContentEntityStorage does not unserialize base field data was marked as a duplicate of this one, so should this one maybe be a critical?

jibran’s picture

Created #2502913: Link field default options values should be array for link field option values. Please review.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

xjm’s picture

@catch and @plach both marked #2414835: SqlContentEntityStorage does not unserialize base field data as a duplicate of this issue.

There was an indication on that issue that this could cause difficulty for contributed modules. It would be helpful to clarify more how it is a blocker and give an example scenario. Thanks!

Also adding the code snippet from the other issue's summary since it helps show why this is weird.

xjm’s picture

Title: Field types cannot control what gets stored » Field types cannot control what gets serialized/stored

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Berdir’s picture

We have more duplicates of this, which IMHO shows that this is major as a lot of people are reporting this.

I'd actually suggest to close this now as duplicate in favor of #2788637: Values in shared table for SQL content entity storage do not get unserialized. as that is much further along, just needs a bit more test coverage.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

EclipseGc’s picture

Would it be possible to introduce a serialization callback (and unserialization) so that core/contrib can determine which serialization they'd like to use instead of forcibly dealing with php's serialization. We could do this in the schema definition right under "serialize" => TRUE, or similar?

Just a thought, perhaps a different issue is better for this conversation?

Eclipse

EclipseGc’s picture

Or pick which serializer to use... just something that let's us have some control over this.

andypost’s picture

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Graber’s picture

@xjm: It's a contrib blocker indeed. Scenario: when creating a custom entity type that uses another contrib module's custom field type that doesn't have the unserialize fix as a base entity field. The custom entity operations cause errors that are hard to debug. In my case it was Table Field.

I see that core field type definitions like LinkItem have the unserialize fix if needed, so those can be used as base fields.

What's the point to include 'serialize' => TRUE in field schema definition if the type logic has to handle (un)serialization anyway?

@Berdir, +1 for the #2788637 as the main problem issue.

tim.plunkett’s picture

I have a case where at runtime, I want all get/set calls to return an object.
That object has no idea that it is backed by a field, it doesn't care. It has various getters and setters.

$entity->{$field_name}->my_cool_thing->setSomething('foo');
$entity->save();

This only works right now because I am *serializing the entire object*.

Ideally though, there would be some method similar to EntityStorageBase's mapToStorageRecord/mapFromStorageRecords that existed on field types, allowing me to split up the object into its parts across individual columns on save, and have it reconstructed on load.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Wim Leers’s picture

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.9 was released on December 7, 2022 and is the final full bugfix release for the Drupal 9.4.x series. Drupal 9.4.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.5.x-dev branch from now on, and new development or disruptive changes should be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

geek-merlin’s picture

Version: 9.5.x-dev » 10.1.x-dev

geek-merlin’s picture

Status: Active » Needs review

Took a first stab at this. Feedback appreciated.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
FileSize
3.99 KB

The Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

geek-merlin’s picture

FTR: In the first commit i explored delegating load and save to FieldStorageDefinition, which in turn delegates to static FieldItem class methods.
As of the latter (delegate to FieldItem), there are ups and dows with this, and as FieldItem already is tightly coupled with storage, e.g. via ::schema, the better DX tipped the balance.
In later commits, i delegated save to a non-static FieldItem method. Why? Because we can and i wanted to explore it. All in all, the first approach with static methods on load and save looks cleaner to me. Also it gives FieldStorageDefinition full control, to maybe do sth different like annotation based mapping definition or whatever. So will work the code in that direction again.

geek-merlin’s picture

This is an interesting failure: How can (supposed) PathAliases return an array for $entity->$fieldName for a valid field name?? Die, __get, die!

1) Drupal\Tests\workspaces\Functional\PathWorkspacesTest::testPathAliases
Error: Call to a member function first() on array

/var/www/html/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php:1060
/var/www/html/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php:1149
/var/www/html/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php:1024
/var/www/html/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php:944
/var/www/html/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php:721
/var/www/html/core/lib/Drupal/Core/Entity/EntityStorageBase.php:486
/var/www/html/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php:815
/var/www/html/core/lib/Drupal/Core/Entity/EntityBase.php:339
/var/www/html/core/modules/workspaces/src/WorkspacePublisher.php:136
/var/www/html/core/modules/workspaces/src/WorkspaceManager.php:290
/var/www/html/core/modules/workspaces/src/WorkspacePublisher.php:139
/var/www/html/core/modules/workspaces/src/Entity/Workspace.php:131
/var/www/html/core/modules/workspaces/tests/src/Functional/PathWorkspacesTest.php:107
/var/www/html/vendor/phpunit/phpunit/src/Framework/TestResult.php:728
geek-merlin’s picture

OK, this triggers other edgy cases:

1) Drupal\Tests\content_moderation\Kernel\WorkspacesContentModerationStateTest::testNonLangcodeEntityTypeModeration
  Drupal\Core\Entity\EntityStorageException: Field langcode  is unknown.
1) Drupal\Tests\workspaces\Functional\PathWorkspacesTest::testPathAliases
  Drupal\Core\Entity\EntityStorageException: content_translation_source is unknown.
geek-merlin’s picture

Status: Needs work » Needs review

Yup, here's the api. Now we need an example implementation and tests.

smustgrave’s picture

Status: Needs review » Needs work

In addition could you update the issue summary please. Or state why the tag is no longer needed/current IS is correct.

geek-merlin’s picture

Category: Bug report » Task
Issue summary: View changes
Issue tags: -Needs issue summary update

Yup, i agree on needs-IS-update. Did that. Also removed the citation from #2414835: SqlContentEntityStorage does not unserialize base field data added by @xjm in #12 which basically stated that unserialization does not happen on load. Judging from the code, i guess this was changed long ago. Which makes this a task rather than a bug. Tentatively keeping this major as it's a key blocker for the whole JSON in DB columns work like JSON Map Field.

geek-merlin’s picture

geek-merlin’s picture

Status: Needs work » Needs review
Issue tags: +Contributed project blocker

Banzai!

The MR
- adds an API like requested in the IS
- implements the API on behalf of MapField
- Extends MapField tests to cover this

Currently working on a JsonMapField that depends on this API.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

smustgrave’s picture

Status: Needs review » Needs work
Issue tags: +Needs change record, +Needs Review Queue Initiative

Was hoping someone more knowledgable would review this one but don't want to leave it hanging.

Test coverage does seem to be good.

Only glaring thing is if there could be a change record for the interface?

Will keep an eye out for that too.

bradjones1 made their first commit to this issue’s fork.

bradjones1’s picture

Title: Field types cannot control what gets serialized/stored » Allow field types to control how properties are mapped to and from storage
Status: Needs work » Needs review
Issue tags: -Needs change record

Took a first stab at a change record. Also updating the title to reflect the fact this change is broader than just serialization.

I believe this should again be ready for maintainer/committer review.

bradjones1’s picture

I think #2563843: MapItem is broken for configurable fields is heavily affected by this change, and might need an update/rethink after this lands.

Installed this patch along with #3343634: [PP-2] Add "json" as core data type to Schema and Database API and took a stab at a JSON-backed field - see https://git.drupalcode.org/-/snippets/124 for a working example.

This is mostly a reimplementation of the MapItem code but using json_encode() and json_decode() instead. It works, however I'm curious if there's undesirable overhead in always doing a json_decode() on stringified JSON input to denormalize the values into the Map::$values array.

I don't think this is the right issue to add something like JsonMapItem, however the linked snippet at least proves out we're heading in a good direction on this change enabling things like JSON data storage.

If anyone has thoughts on making setValue() less janky in either or both of these classes, I'm all ears.

Before I go adding an issue to implement something like JsonMapItem in core... is that something we want in core? Or should this be implemented by more opinionated modules/services/etc. that bring along more business logic with them?

smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

IS seems good and all threads are resolved.

Going to put in the committers bucket to see their thoughts.

quietone’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs reroll

I'm triaging RTBC issues. I read the IS and the comments and skimmed the MR.

The diff no longer applies, tagging for a reroll. And the MR should be against 11.x. Thanks.

I read the Change record and wonder if code examples are needed for how to implement this.

rpayanm made their first commit to this issue’s fork.

rpayanm’s picture

Status: Needs work » Needs review
Issue tags: -Needs reroll

Please review.

smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

Reroll seems good so restoring previous status.

geek-merlin’s picture

Bump...

tim.plunkett’s picture

"Bumping" is unhelpful, now the issue is further down the RTBC queue

larowlan’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs subsystem maintainer review

Left some comments on the MR - I'll ping @Berdir and @catch in slack for a subsystem maintainer review

Berdir’s picture

This is a very old issue, but comment #1 still summarizes my concerns very well.

I'm not convinced that moving the serialize() calls out of the storage makes the situation better. In my opinion, that doesn't make moving to json easier:

* We have no API to change a field type, so i don't like introducing a new JsonMapItem field type as a solution.
* with json storage, there is also the idea to have all (configurable) fields in a single json data structure. With that, this suddenly becomes counter-productive and results in having a double json encoding.
* It also seems counter-productive for completely different possible backends, like the test key value implementation in core, or something like mongodb.

In regards to the contrib blocker tags here, I don't really see that. Both TMGMT and Paragraphs use serialized data in a base field, and Metatags in a configurable field. They do the mapping and serialization outside of storage and it works. both TMGMT and metatags also migrated from a serialized string to json. It's not always pretty, but it's not blocking. MapItem is a special case, it's architecture is both too heavy and too rigid to be a sensible solution for any of these modules (partially because it's not fully working to be fair), I'm a bit unsure what we should do with that.

To be clear, my concerns are about the *serialize* part, not the part that's actually in the issue title, which is mapping. We could just leave out the serialize() call and then treat the properties like we already do and support, for example with linkItem attributes?

The second part that I don't like is that we essentially introduce the same API twice, once through the field item class and once through the definition class, and we use both each, but the definition classes just use the trait as a passthrough to the field item class again. Going through the definition classes means we risk breaking this feature for field types when they are used for definitions that are neither config nor base fields (which have limited support but are technically possible). Why do we even need this? The implementation of FieldStorageConfig::getFieldItemClass() is imho not internal/secret, I'm certain we have other cases that directly get the field item class from the field type, that's basic plugin definition stuff.

bradjones1 changed the visibility of the branch 2232427-field-storage-mapping to hidden.

bradjones1’s picture

Trying to move this along in the context of #3343634: [PP-2] Add "json" as core data type to Schema and Database API.

I rebased the 11.x MR and it is passing tests, which is a nice fresh baseline.

Having read #58 a few times and I think I understand most of the main points, though I'm not as familiar with some of the contrib examples provided.

At the risk of reading too far into others' arguments without the benefit of a synchronous exchange, I think that we might be prematurely blocking ourselves from a big leap forward in some of the assumptions around JSON storage vis-a-vis the field API. While it's true there are issues like #3277081: [Plan] Convert field storage to use JSON fields proposing a more flexible storage solution for fields, that's solidly in D12 territory at this point.

Re: these specific concerns:

We have no API to change a field type, so i don't like introducing a new JsonMapItem field type as a solution.

I don't think this need be an either-or type situation. In theory, we could provide a field-type-change API to convert backing storage from serialized PHP arrays to JSON data columns. Also, the use cases for a JsonMapItem type field are different enough from the current MapItem that there is room for both in the ecosystem. While the universe of use cases for the JSON map field probably entirely encapsulates the existing map field, many map fields likely don't need the other benefits of being in JSON storage. (Mainly, being able to query for document members in SQL.)

with json storage, there is also the idea to have all (configurable) fields in a single json data structure. With that, this suddenly becomes counter-productive and results in having a double json encoding.

I'll be honest, I don't entirely follow this point, although I'm sure that's just my lack of context. I think this gets to the point I made above about prematurely limiting ourselves re: a major change in field storage that is still only in an idea stage.

It also seems counter-productive for completely different possible backends, like the test key value implementation in core, or something like mongodb.

This is probably the strongest concern of the three, however I still think it's a bit theoretical and I'm struggling to find a case where this is a blocker. The test K/V entity storage is rather feature-incomplete and is a bit of a theoretical example. MongoDB is a bit more of an interesting case. Drupal certainly is written with an SQL backend in mind, and it will be a significant lift (but not impossible) to allow entities be stored in MongoDB. What that would look like, I'm not sure, however I don't think it's a reason to block anything here.

To the other concern:

To be clear, my concerns are about the *serialize* part, not the part that's actually in the issue title, which is mapping. We could just leave out the serialize() call and then treat the properties like we already do and support, for example with linkItem attributes?

I think I understand this in theory but am struggling to see it in practice. By my read the issue title "Allow field types to control how properties are mapped to and from storage" is inextricably linked to getting the storage out of the serialization business entirely, and allowing fields to determine how and what gets sent to the storage backend's columns. So if LinkItem were to be refactored to use this new interface, its ::mapColumnsOnSave() would look something like:

return [
  'uri' => $properties['uri'],
  'title' => $properties['title'],
  'options' => serialize($properties['options']),
];

That's a rather trivial example, however one can imagine more interesting cases where field properties are retrieved and set in a much less storage-bound manner and the field then does load and save logic to form them into a sane mix of columns, which don't have to track with the field's user-facing properties at all. If some of them are serialized, great, if not, great.

I'm taking a fresh look at the MR and making a few stylistic edits, however I'm skeptical that this can be any less than it is now. Looking at it more by way of commenting, I'd actually love to see us entirely deprecate and remove all default serialization and require fields to handle it themselves.

Berdir’s picture

I think I understand this in theory but am struggling to see it in practice. By my read the issue title "Allow field types to control how properties are mapped to and from storage" is inextricably linked to getting the storage out of the serialization business entirely, and allowing fields to determine how and what gets sent to the storage backend's columns. So if LinkItem were to be refactored to use this new interface, its ::mapColumnsOnSave() would look something like:

I disagree with this. I see the use case that properties and columns don't map 1:1, I don't see the need to remove serialization from the storage and put it into the field item classes. IMHO, serialization belongs in the storage.

So if LinkItem were to be refactored to use this new interface, its ::mapColumnsOnSave() would look something like:

And what would we gain from that? I don't see this as an improvement. Alternative storage implementations are rarely used, but they do exist and we explicitly support that as a feature.

My proposal is that we define that each column must either be a scalar value or an array (with only scalars and arrays as content, which is where it gets interesting*), and in case of an array, the storage is responsible for serialization which is then an implementation detail and must not be relied on, which means that such properties/columns can't be queried, as it might be a php serialize string or json.

* one reason why switching to json in the database is hard is that there are cases where people currently store objects in there, for example language for link attributes.

with json storage, there is also the idea to have all (configurable) fields in a single json data structure. With that, this suddenly becomes counter-productive and results in having a double json encoding.

I'll be honest, I don't entirely follow this point, although I'm sure that's just my lack of context. I think this gets to the point I made above about prematurely limiting ourselves re: a major change in field storage that is still only in an idea stage.

The idea is that we have a storage implementation that stores all configurable fields in a single "fields" column in the database as json (which now all database backends allow to have queries on, with indexes), no more node__field_... tables. And then such properties wouldn't need to be serialized again as the whole thing is already a json structure.

bradjones1’s picture

Thanks for the feedback and clarification.

in case of an array, the storage is responsible for serialization which is then an implementation detail and must not be relied on, which means that such properties/columns can't be queried, as it might be a php serialize string or json.

I think I see where the disconnect in approaches and philosophy is coming from, now. From where I sit, the whole point of adding support for JSON data structures is to empower queries that do depend on the data being stored as JSON per se - not just as an implementation detail. It would be unfortunate if we ship first-class support for JSON data types in the database and then say, if you use it with field API, we make no representations that you can actually depend on this being stored as JSON in the db.

To help address your concerns over the serialization piece, and also help with the JSON data storage piece, I've updated the MR to focus on mapping. The serialization bits in the SQL content entity storage have been abstracted into a pair of new methods which will also help to auto-magically encode and decode JSON.

In the course of updating the MR I think I discovered a bit of a hole in the test coverage in the original MR, though it probably needs more to ensure all this works in all three cases of 1) base tables, 2) shared tables, 3) dedicated tables.

bradjones1’s picture

Status: Needs work » Needs review
bradjones1’s picture

The new ::fromStoredValue() method where we are consolidating unserialization and json_decode() could unlock us using allowed_classes for PHP class unserialization, too, I think. See #3046696-26: Move from serialized columns to JSON encoded data wherever possible, or use allowed_classes.

Berdir’s picture

Status: Needs review » Needs work

Need to have a closer look at the changes to see if that's closer to what I had imagined.

The last paragraph of #58 hasn't been adressed yet, we still have two interfaces and StorageMapperDelegatorTrait to channel the calls in some cases through the field definition objects, I'm pretty certain we don't need that.

bradjones1’s picture

Yeah, I noticed that as well while refactoring out the serialization bits but reached my cognitive-load limit on refactoring. I think we can simplify this, or at least make it clearer why it ended up the way it is.

bradjones1’s picture

...we still have two interfaces and StorageMapperDelegatorTrait to channel the calls in some cases through the field definition objects, I'm pretty certain we don't need that.

Taking a fresh look at this after some sleep and a re-read of the code and I think the answer here is that these are similar, but distinct, interfaces. In this case, the one always delegates to the other, however that could potentially not be the case.

StorageMapperInterface is for field storage definitions to determine how properties are mapped to db columns. In theory, at least, this would allow for custom storage implementation that is completely transparent to the field item. This was introduced at #36 with some reasoning explained there.

In turn, FieldItemStorageMapperInterface signals to the storage that it is capable of handling this decision-making itself, and that the storage definition can delegate these method calls.

I agree we could be more explicit about explaining this in the docblocks. I kinda like this solution since it does help with potential future flexibility around storage, and might even address your concerns around how database drivers might handle these situations e.g. in the case of MongoDB as first-class database entity storage.

if this is a no-go for you, then we rip it out and do away with the delegation piece easy enough. This is just a judgment call. I kinda like how it is now and can see the vision of why it was done, but I'm also motivated to get it passing muster with the maintainer (you) and the core committers to unblock other work. So IOW, I don't care enough to go back and forth on it much, and see both sides of this decision.

geek-merlin’s picture

@bradjones1, @berdir: Thanks for moving this further forward! I see the concerns and decision-points, esp. that of #62/63. +1 for the thorough discussion of that.

bradjones1’s picture

@geek-merlin, appreciate your kind words. Do you have anything to add here, since you were the one to first introduce these interfaces? Did I summarize your motivations correctly?

geek-merlin’s picture

@bradjones:
Yes, what i want to say is that i feel that all the energy i spent to make the first iteration, i feel is in good hands with both of you.

I don't quite remember the details, but the two-interface thing was to have the first POC in the least invasive way that seemed possible. I did not feel too comfortable with it from the beginning, so feel free to replace with sth better.
I can follow @berdirs pints well, and having the storage say "hey, you have to decide between a scalar and a json value" feels like a even better solution. (Yes, it should be tied to JSON to allow JSON queries later.)
That said, i wouldn*t feel too comfortable with another version of the NoSQL wars a decade ago, where new features had to deny that our primary target is SQL.

Happy you are taking further the energy that i put in, even if for the foreseeable future i won't be able to spend a lot more. That said, feel free to PM me if a can add sth.

bradjones1’s picture

Thanks. That's helpful. I kinda like it how it sits now, though as mentioned, not wed to it and I'd rather get something in than go back and forth for much longer. I think the ball is in @Berdir's court and we can try to get this RTBC "soon."