Updated: Comment #41
Problem/Motivation
- Mapping field item properties to DB storage record is hardcoded in \Drupal\Core\Entity\Sql\SqlContentEntityStorage
.
- It contains special cases for the serialize
key of FieldItemInterface::schema, and a special case of this special case for MapItem field (where not a single property but all properties are serialized).
- Issues like #2563843: MapItem is broken for configurable fields demand to add more special cases
- In the context of the JSON storage initiatives (like #3276818: [META] Add support for JSON field queries in entity queries), a proper JSON map field adds more requirements to that.
Time to add an API so that fields can specify how their properties are mapped to their schema columns.
Proposed resolution
- Allow field type classes to define how their values should be (un)serialized.
- Remove the serialize
key from MapItem
's schema and use this mechanism instead.
- Deprecate the serialize
key in a followup.
Remaining tasks
- Decide on how this API should work, code, review, commit.
User interface changes
None.
API changes
Addition: Field items can specify how they are mapped to storage and possibly serialized or JSON encoded.
Comment | File | Size | Author |
---|
Issue fork drupal-2232427
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 2232427-11.x changes, plain diff MR !4814
- 2232427-field-storage-mapping changes, plain diff MR !3799
Comments
Comment #1
plachActually @Berdir pointed out that we should be doing the opposite, see #2229181: MapItem::propertyDefinitions() claims it has a static value property, which is wrong. The rationale is that serialization is a need specifc to the database storage, so ContentEntityDatabaseStorage should be responsible for automatically (un)serializing field values marked as serializable in the definition.
Comment #2
tstoecklerYes, that makes sense.
I'd like to wait for a patch over there, however, to see whether we can close this or repurpose it.
Comment #3
BerdirNote that LinkItem also has two properties that need to be serialized, it is currently broken for base fields and #2235457: Use link field for shortcut entity adds the same hack as MapItem has, so whatever gets in first should be updated...
And yeah, what @plach said :)
Comment #4
BerdirRelated: I pushed a fix for the ugly MapItem hacks to http://drupalcode.org/sandbox/dereine/2031809.git/commitdiff/refs/heads/... that supports different and no main properties for fields.
Comment #5
BerdirNote that there are two different scenarios:
MapItem: The whole field item needs to be stored in a single column
LinkItem: Has properties that need to be stored in a serialized column
And as mentioned before, this is storage dependent as well, because MongoDb doesn't care, it just throws all the values into a document and is done.
Comment #6
fagoThis is quite a major problem of our storage API, thus re-titling and re-priorising.
Comment #7
plachComment #8
XanoI would like to see functionality like this, because I am working on a field item that contains objects, which can be decomposed into scalars and arrays that need to be stored. From my point of view, decoupling field value storage from field properties and introducing methods to map to and from would be a good step towards providing the required functionality.
Comment #9
dawehnerSo #2414835: SqlContentEntityStorage does not unserialize base field data was marked as a duplicate of this one, so should this one maybe be a critical?
Comment #10
jibranCreated #2502913: Link field default options values should be array for link field option values. Please review.
Comment #12
xjm@catch and @plach both marked #2414835: SqlContentEntityStorage does not unserialize base field data as a duplicate of this issue.
There was an indication on that issue that this could cause difficulty for contributed modules. It would be helpful to clarify more how it is a blocker and give an example scenario. Thanks!
Also adding the code snippet from the other issue's summary since it helps show why this is weird.
Comment #13
xjmComment #15
BerdirWe have more duplicates of this, which IMHO shows that this is major as a lot of people are reporting this.
I'd actually suggest to close this now as duplicate in favor of #2788637: Values in shared table for SQL content entity storage do not get unserialized. as that is much further along, just needs a bit more test coverage.
Comment #17
EclipseGc CreditAttribution: EclipseGc commentedWould it be possible to introduce a serialization callback (and unserialization) so that core/contrib can determine which serialization they'd like to use instead of forcibly dealing with php's serialization. We could do this in the schema definition right under "serialize" => TRUE, or similar?
Just a thought, perhaps a different issue is better for this conversation?
Eclipse
Comment #18
EclipseGc CreditAttribution: EclipseGc commentedOr pick which serializer to use... just something that let's us have some control over this.
Comment #19
andypostComment #21
Graber CreditAttribution: Graber as a volunteer commented@xjm: It's a contrib blocker indeed. Scenario: when creating a custom entity type that uses another contrib module's custom field type that doesn't have the
unserialize
fix as a base entity field. The custom entity operations cause errors that are hard to debug. In my case it was Table Field.I see that core field type definitions like
LinkItem
have the unserialize fix if needed, so those can be used as base fields.What's the point to include
'serialize' => TRUE
in field schema definition if the type logic has to handle (un)serialization anyway?@Berdir, +1 for the #2788637 as the main problem issue.
Comment #22
tim.plunkettI have a case where at runtime, I want all get/set calls to return an object.
That object has no idea that it is backed by a field, it doesn't care. It has various getters and setters.
This only works right now because I am *serializing the entire object*.
Ideally though, there would be some method similar to EntityStorageBase's mapToStorageRecord/mapFromStorageRecords that existed on field types, allowing me to split up the object into its parts across individual columns on save, and have it reconstructed on load.
Comment #24
Wim LeersSee #2563843-52: MapItem is broken for configurable fields.
Comment #32
geek-merlinComment #34
geek-merlinTook a first stab at this. Feedback appreciated.
Comment #35
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #36
geek-merlinFTR: In the first commit i explored delegating load and save to FieldStorageDefinition, which in turn delegates to static FieldItem class methods.
As of the latter (delegate to FieldItem), there are ups and dows with this, and as FieldItem already is tightly coupled with storage, e.g. via ::schema, the better DX tipped the balance.
In later commits, i delegated save to a non-static FieldItem method. Why? Because we can and i wanted to explore it. All in all, the first approach with static methods on load and save looks cleaner to me. Also it gives FieldStorageDefinition full control, to maybe do sth different like annotation based mapping definition or whatever. So will work the code in that direction again.
Comment #37
geek-merlinThis is an interesting failure: How can (supposed) PathAliases return an array for
$entity->$fieldName
for a valid field name?? Die, __get, die!Comment #38
geek-merlinOK, this triggers other edgy cases:
Comment #39
geek-merlinYup, here's the api. Now we need an example implementation and tests.
Comment #40
smustgrave CreditAttribution: smustgrave at Mobomo commentedIn addition could you update the issue summary please. Or state why the tag is no longer needed/current IS is correct.
Comment #41
geek-merlinYup, i agree on needs-IS-update. Did that. Also removed the citation from #2414835: SqlContentEntityStorage does not unserialize base field data added by @xjm in #12 which basically stated that unserialization does not happen on load. Judging from the code, i guess this was changed long ago. Which makes this a task rather than a bug. Tentatively keeping this major as it's a key blocker for the whole JSON in DB columns work like JSON Map Field.
Comment #42
geek-merlinComment #43
geek-merlinBanzai!
The MR
- adds an API like requested in the IS
- implements the API on behalf of MapField
- Extends MapField tests to cover this
Currently working on a JsonMapField that depends on this API.
Comment #45
smustgrave CreditAttribution: smustgrave at Mobomo commentedWas hoping someone more knowledgable would review this one but don't want to leave it hanging.
Test coverage does seem to be good.
Only glaring thing is if there could be a change record for the interface?
Will keep an eye out for that too.
Comment #47
bradjones1Took a first stab at a change record. Also updating the title to reflect the fact this change is broader than just serialization.
I believe this should again be ready for maintainer/committer review.
Comment #48
bradjones1I think #2563843: MapItem is broken for configurable fields is heavily affected by this change, and might need an update/rethink after this lands.
Installed this patch along with #3343634: [PP-2] Add "json" as core data type to Schema and Database API and took a stab at a JSON-backed field - see https://git.drupalcode.org/-/snippets/124 for a working example.
This is mostly a reimplementation of the
MapItem
code but usingjson_encode()
andjson_decode()
instead. It works, however I'm curious if there's undesirable overhead in always doing ajson_decode()
on stringified JSON input to denormalize the values into theMap::$values
array.I don't think this is the right issue to add something like
JsonMapItem
, however the linked snippet at least proves out we're heading in a good direction on this change enabling things like JSON data storage.If anyone has thoughts on making
setValue()
less janky in either or both of these classes, I'm all ears.Before I go adding an issue to implement something like
JsonMapItem
in core... is that something we want in core? Or should this be implemented by more opinionated modules/services/etc. that bring along more business logic with them?Comment #49
smustgrave CreditAttribution: smustgrave at Mobomo commentedIS seems good and all threads are resolved.
Going to put in the committers bucket to see their thoughts.
Comment #50
quietone CreditAttribution: quietone at PreviousNext commentedI'm triaging RTBC issues. I read the IS and the comments and skimmed the MR.
The diff no longer applies, tagging for a reroll. And the MR should be against 11.x. Thanks.
I read the Change record and wonder if code examples are needed for how to implement this.
Comment #53
rpayanmPlease review.
Comment #54
smustgrave CreditAttribution: smustgrave at Mobomo commentedReroll seems good so restoring previous status.
Comment #55
geek-merlinBump...
Comment #56
tim.plunkett"Bumping" is unhelpful, now the issue is further down the RTBC queue
Comment #57
larowlanLeft some comments on the MR - I'll ping @Berdir and @catch in slack for a subsystem maintainer review
Comment #58
BerdirThis is a very old issue, but comment #1 still summarizes my concerns very well.
I'm not convinced that moving the serialize() calls out of the storage makes the situation better. In my opinion, that doesn't make moving to json easier:
* We have no API to change a field type, so i don't like introducing a new JsonMapItem field type as a solution.
* with json storage, there is also the idea to have all (configurable) fields in a single json data structure. With that, this suddenly becomes counter-productive and results in having a double json encoding.
* It also seems counter-productive for completely different possible backends, like the test key value implementation in core, or something like mongodb.
In regards to the contrib blocker tags here, I don't really see that. Both TMGMT and Paragraphs use serialized data in a base field, and Metatags in a configurable field. They do the mapping and serialization outside of storage and it works. both TMGMT and metatags also migrated from a serialized string to json. It's not always pretty, but it's not blocking. MapItem is a special case, it's architecture is both too heavy and too rigid to be a sensible solution for any of these modules (partially because it's not fully working to be fair), I'm a bit unsure what we should do with that.
To be clear, my concerns are about the *serialize* part, not the part that's actually in the issue title, which is mapping. We could just leave out the serialize() call and then treat the properties like we already do and support, for example with linkItem attributes?
The second part that I don't like is that we essentially introduce the same API twice, once through the field item class and once through the definition class, and we use both each, but the definition classes just use the trait as a passthrough to the field item class again. Going through the definition classes means we risk breaking this feature for field types when they are used for definitions that are neither config nor base fields (which have limited support but are technically possible). Why do we even need this? The implementation of FieldStorageConfig::getFieldItemClass() is imho not internal/secret, I'm certain we have other cases that directly get the field item class from the field type, that's basic plugin definition stuff.
Comment #60
bradjones1Trying to move this along in the context of #3343634: [PP-2] Add "json" as core data type to Schema and Database API.
I rebased the 11.x MR and it is passing tests, which is a nice fresh baseline.
Having read #58 a few times and I think I understand most of the main points, though I'm not as familiar with some of the contrib examples provided.
At the risk of reading too far into others' arguments without the benefit of a synchronous exchange, I think that we might be prematurely blocking ourselves from a big leap forward in some of the assumptions around JSON storage vis-a-vis the field API. While it's true there are issues like #3277081: [Plan] Convert field storage to use JSON fields proposing a more flexible storage solution for fields, that's solidly in D12 territory at this point.
Re: these specific concerns:
I don't think this need be an either-or type situation. In theory, we could provide a field-type-change API to convert backing storage from serialized PHP arrays to JSON data columns. Also, the use cases for a
JsonMapItem
type field are different enough from the current MapItem that there is room for both in the ecosystem. While the universe of use cases for the JSON map field probably entirely encapsulates the existing map field, many map fields likely don't need the other benefits of being in JSON storage. (Mainly, being able to query for document members in SQL.)I'll be honest, I don't entirely follow this point, although I'm sure that's just my lack of context. I think this gets to the point I made above about prematurely limiting ourselves re: a major change in field storage that is still only in an idea stage.
This is probably the strongest concern of the three, however I still think it's a bit theoretical and I'm struggling to find a case where this is a blocker. The test K/V entity storage is rather feature-incomplete and is a bit of a theoretical example. MongoDB is a bit more of an interesting case. Drupal certainly is written with an SQL backend in mind, and it will be a significant lift (but not impossible) to allow entities be stored in MongoDB. What that would look like, I'm not sure, however I don't think it's a reason to block anything here.
To the other concern:
I think I understand this in theory but am struggling to see it in practice. By my read the issue title "Allow field types to control how properties are mapped to and from storage" is inextricably linked to getting the storage out of the serialization business entirely, and allowing fields to determine how and what gets sent to the storage backend's columns. So if LinkItem were to be refactored to use this new interface, its ::mapColumnsOnSave() would look something like:
That's a rather trivial example, however one can imagine more interesting cases where field properties are retrieved and set in a much less storage-bound manner and the field then does load and save logic to form them into a sane mix of columns, which don't have to track with the field's user-facing properties at all. If some of them are serialized, great, if not, great.
I'm taking a fresh look at the MR and making a few stylistic edits, however I'm skeptical that this can be any less than it is now. Looking at it more by way of commenting, I'd actually love to see us entirely deprecate and remove all default serialization and require fields to handle it themselves.
Comment #61
BerdirI disagree with this. I see the use case that properties and columns don't map 1:1, I don't see the need to remove serialization from the storage and put it into the field item classes. IMHO, serialization belongs in the storage.
And what would we gain from that? I don't see this as an improvement. Alternative storage implementations are rarely used, but they do exist and we explicitly support that as a feature.
My proposal is that we define that each column must either be a scalar value or an array (with only scalars and arrays as content, which is where it gets interesting*), and in case of an array, the storage is responsible for serialization which is then an implementation detail and must not be relied on, which means that such properties/columns can't be queried, as it might be a php serialize string or json.
* one reason why switching to json in the database is hard is that there are cases where people currently store objects in there, for example language for link attributes.
The idea is that we have a storage implementation that stores all configurable fields in a single "fields" column in the database as json (which now all database backends allow to have queries on, with indexes), no more node__field_... tables. And then such properties wouldn't need to be serialized again as the whole thing is already a json structure.
Comment #62
bradjones1Thanks for the feedback and clarification.
I think I see where the disconnect in approaches and philosophy is coming from, now. From where I sit, the whole point of adding support for JSON data structures is to empower queries that do depend on the data being stored as JSON per se - not just as an implementation detail. It would be unfortunate if we ship first-class support for JSON data types in the database and then say, if you use it with field API, we make no representations that you can actually depend on this being stored as JSON in the db.
To help address your concerns over the serialization piece, and also help with the JSON data storage piece, I've updated the MR to focus on mapping. The serialization bits in the SQL content entity storage have been abstracted into a pair of new methods which will also help to auto-magically encode and decode JSON.
In the course of updating the MR I think I discovered a bit of a hole in the test coverage in the original MR, though it probably needs more to ensure all this works in all three cases of 1) base tables, 2) shared tables, 3) dedicated tables.
Comment #63
bradjones1Comment #64
bradjones1The new
::fromStoredValue()
method where we are consolidating unserialization andjson_decode()
could unlock us usingallowed_classes
for PHP class unserialization, too, I think. See #3046696-26: Move from serialized columns to JSON encoded data wherever possible, or use allowed_classes.Comment #65
BerdirNeed to have a closer look at the changes to see if that's closer to what I had imagined.
The last paragraph of #58 hasn't been adressed yet, we still have two interfaces and StorageMapperDelegatorTrait to channel the calls in some cases through the field definition objects, I'm pretty certain we don't need that.
Comment #66
bradjones1Yeah, I noticed that as well while refactoring out the serialization bits but reached my cognitive-load limit on refactoring. I think we can simplify this, or at least make it clearer why it ended up the way it is.
Comment #67
bradjones1Taking a fresh look at this after some sleep and a re-read of the code and I think the answer here is that these are similar, but distinct, interfaces. In this case, the one always delegates to the other, however that could potentially not be the case.
StorageMapperInterface
is for field storage definitions to determine how properties are mapped to db columns. In theory, at least, this would allow for custom storage implementation that is completely transparent to the field item. This was introduced at #36 with some reasoning explained there.In turn,
FieldItemStorageMapperInterface
signals to the storage that it is capable of handling this decision-making itself, and that the storage definition can delegate these method calls.I agree we could be more explicit about explaining this in the docblocks. I kinda like this solution since it does help with potential future flexibility around storage, and might even address your concerns around how database drivers might handle these situations e.g. in the case of MongoDB as first-class database entity storage.
if this is a no-go for you, then we rip it out and do away with the delegation piece easy enough. This is just a judgment call. I kinda like how it is now and can see the vision of why it was done, but I'm also motivated to get it passing muster with the maintainer (you) and the core committers to unblock other work. So IOW, I don't care enough to go back and forth on it much, and see both sides of this decision.
Comment #68
geek-merlin@bradjones1, @berdir: Thanks for moving this further forward! I see the concerns and decision-points, esp. that of #62/63. +1 for the thorough discussion of that.
Comment #69
bradjones1@geek-merlin, appreciate your kind words. Do you have anything to add here, since you were the one to first introduce these interfaces? Did I summarize your motivations correctly?
Comment #70
geek-merlin@bradjones:
Yes, what i want to say is that i feel that all the energy i spent to make the first iteration, i feel is in good hands with both of you.
I don't quite remember the details, but the two-interface thing was to have the first POC in the least invasive way that seemed possible. I did not feel too comfortable with it from the beginning, so feel free to replace with sth better.
I can follow @berdirs pints well, and having the storage say "hey, you have to decide between a scalar and a json value" feels like a even better solution. (Yes, it should be tied to JSON to allow JSON queries later.)
That said, i wouldn*t feel too comfortable with another version of the NoSQL wars a decade ago, where new features had to deny that our primary target is SQL.
Happy you are taking further the energy that i put in, even if for the foreseeable future i won't be able to spend a lot more. That said, feel free to PM me if a can add sth.
Comment #71
bradjones1Thanks. That's helpful. I kinda like it how it sits now, though as mentioned, not wed to it and I'd rather get something in than go back and forth for much longer. I think the ball is in @Berdir's court and we can try to get this RTBC "soon."