Problem/Motivation
The content element describes how and the order in which fields are displayed. This array is sorted by key and a weight is stored. This is a problem because moving fields in field UI will change weights and if the export of two such changes are merged then fields otherwise unchanged will conflict.
Proposed resolution
Do not store the weights. Rather, sort before saving, remove them and add them back on runtime. This makes for a very simple update, too.
This reverts some of #2388867: Notifying user of config changes when config has never been synched makes no sense / e6cd1fa5b7
however this approach also results in a stable config file so the problems fixed there won't reoccur.
Remaining tasks
Write tests, review, commit, party :)
User interface changes
none
API changes
none
Comment | File | Size | Author |
---|---|---|---|
#50 | 2350537-50.patch | 6.75 KB | karishmaamin |
#46 | 2350537-46.patch | 6.75 KB | nikitagupta |
#40 | 2350537_40.patch | 6.72 KB | Ghost of Drupal Past |
#34 | 2350537-34.patch | 2.95 KB | daffie |
Comments
Comment #1
webflo CreditAttribution: webflo commentedComment #2
dawehnerI wonder whether we can have some test coverage for that.
In general I totally agree with the proposed solution. The common change is to change the priority not the machine name of a thing.
Comment #3
webflo CreditAttribution: webflo commentedSure thats not a hard thing to do.
Comment #4
swentel CreditAttribution: swentel commentedYeah makes sense - I'm not sure if this has implications on Field UI - make sure we check this.
Comment #5
swentel CreditAttribution: swentel commentedAlso, I'm not sure it's useful to sort the hidden keys
Comment #6
dawehnerWell they could also suffer from the reordering problem as well.
Comment #7
yched CreditAttribution: yched commentedLooking at the actual content of the yaml file, seeing components ordered by actual display order is much more readable.
Even when it comes to diffs, I'd say - if you move a component, then you see it removed somewhere (-) and added somewhere else (+), looks clear and reasonable ?
In fact, since Field UI's drag-n-drop renumbers weights anyway, the actual weights themselves are not relevant, and *that* leads to irrelevant diff hunks (moving one element around produces diff hunks for 'weight' changing in all the components that are after either its initial or final location).
If we stored the components in the right order but without a numeric weight, then the diffs would make more sense IMO.
Comment #8
Gábor Hojtsy@yched: but what happens when you reorder AND change things? How can you tell that from the diff without looking very closely?
Comment #9
Gábor HojtsyBased on discussion on last week's CMI meeting, I opened #2361539: Config export key order is not predictable for sequences, add orderby property to config schema .
Comment #10
yched CreditAttribution: yched commented@Gabor:
Like for code, yes.
Also, it's mentally closer to what you would see if you had a visual diff of a rendered node before/after the EntityDisplay change. Some parts have moved, some have changed, some did both. The parts that did neither do not appear in the diff.
I still think I'd prefer that, be for diffs or when looking at the content of one file, but well, i won't die if others feels strongly against it :-)
Comment #11
yched CreditAttribution: yched commentedWell, looks like @alexpott made "do not order by weight in exported YAML files" the official recommendation in http://chapterthree.com/blog/principles-configuration-management-part-one.
Still not really convinced personally, but I'll shut up now :-)
Comment #12
BerdirI need this in #2248795: Entity displays fetch the bundle field definitions for deleted bundles, added it there as I also need to add it in a second place.
Comment #13
katzillaAdded the test for sorting
Comment #14
alexpottAnd this is needed in #2388867: Notifying user of config changes when config has never been synched makes no sense.
I've tested this several times and it works great.
Comment #17
alexpottNo commit credit please - reuploading patch to preserve test result because it is unrelated and odd.
Comment #18
BerdirNote that we have a slightly different patch in https://www.drupal.org/node/2248795
Comment #19
alexpott@Berdir do you have a preference for doing the sort in the toArray() or preSave()?
Comment #20
yched CreditAttribution: yched commentedpreSave() and toArray() are slightly different. preSave() changes the runtime object before it gets serialized to storage. toArray() is where the serialization happens. If we want to do stuff purely on the storage format, I'd say toArray() is the correct place.
Plus, doing the ksort in preSave() caused weird fails in #2248795: Entity displays fetch the bundle field definitions for deleted bundles that forced us to to the sort also in init(), where it doesn't belong. toArray() worked fine over there. See #37 / #38 / #43 over there.
Comment #21
alexpottAs per #18 and #20 then.
Comment #22
yched CreditAttribution: yched commentedMaybe we could just move the tests over to #2248795: Entity displays fetch the bundle field definitions for deleted bundles ?
Comment #23
BerdirSounds good to me. On mobile, so keeping it short.
Comment #24
alexpottThe thing is the tests break if you move the sort to toArray() because the when you save only what is written out is sorted so you have to reload the configuration entity from disk to get the new order.
Comment #32
Ghost of Drupal PastComment #33
Ghost of Drupal PastHere's a patch, ground up approach, I will attempt reusing the test tomorrow.
I noted in the IS I am aware I revert the relevant changes from #2388867: Notifying user of config changes when config has never been synched makes no sense -- but that's not a problem, this patch achieves the same goal but without storing weights.
Comment #34
daffie CreditAttribution: daffie commentedStraight reroll.
Comment #36
Ghost of Drupal PastComment #37
alexpottSo at the moment all the weights are changing but at least everything stays in the same place because everything is sorted by the key and that's the fieldname. We did that to fix hard to solve conflicts and incorrect merges when a field is intentionally moved. I'm not convinced that changing this to not store weight and not sort by display is a good idea. I think a better solution is to make the javascript weight tool only change weights when it has too. This would benefit all config that have weights that can be set by the javascript table row drag thingy.
Notes on the patch - if we do remove weight then we need to remove it from the schema too.
Comment #38
Ghost of Drupal PastWe are at an impasse. It seems no matter what we do, it confuses git greatly. Even if JS keeps the weights, if you have a field with weight 4 in one branch and a new field in the other branch has weight 4 , git knows nothing about our schema and will anchor on the weight: 4 line and make the mother of all messes.
But what about blasting the file apart ?
If the display settings of each field were a separate file, well, life is easy. Even if there is a weight conflict arising from reordering on both branches, there's no mess of different field display configs melting into each other, it's just one line which is trivial to resolve, easily scriptable, even. One could use the ours/theirs merge strategies per field as adequate.
Even better, we could do this only on export/import and keep the active storage the same.
config_export
sayconfig_shard
and have it contain a list of config key paths. This would not only handle entity displays but filter formatfilters
, paragraphs could shardsettings.handler_settings.target_bundles_drag_drop
, field_group could shardthird_party_settings.field_group
etc.core.entity_view_display.node.book.tile.yml.shard.content.body
.config.storage.sync
service to return a sharding capable storage. Much like the newconfig.storage.export
it could be "smarter" storage, one that talks to the entity type manager and collects a list of config entity prefixes and shard lists and then do the sharding inFileStorage::write
. The latter just needs to match the name with a prefix, iterate the paths to shard, then iterate the results of aNestedArray::getValue
and write. It sounds more complex than it is. :)Comment #39
alexpott@Charlie ChX Negyesi that feels super super complex. Is this really worth it? Field changes are not that common and re-ordering the fields is even less common. And when they are conflicts between prod and dev you have a mess to sort out but it is a mess you've got into by allowing production and dev change at the same time. When this type of thing occurs on projects I've worked on I recommend taking the production changes back to dev and then re-building the dev changes on top.
Comment #40
Ghost of Drupal PastNow that I realized (and edited in) that reads do not require any communication to the entity manager, while certainly there's some complexity, it's not that bad. Look at this. Also? I can just take this to contrib if core doesn't want to deal with it. As always, I am probably working on more complex problems than most. Such is life. We just had a team of ten people doing config changes on production and various branches for weeks. I was at my wits end and this week I broke config merge so craftily it took us several hours of intense debugging to figure out what happened.
This patch is proof-of-concept level.
Comment #41
alexpottWell on some projects the sharding strategy is going to result in hundreds of files which comes with a cost too. I still think that pursing a table sort that doesn't always change all the weights would help minimise this.
Comment #42
alexpott@Charlie ChX Negyesi. also - yeah different projects with different dev / prod workflows have different pain points. And I do recognise the pain and am up for trying to find a core solution if we can do that without breaking the 1 config object to 1 config file relationship. Breaking that feels very disruptive.
Comment #43
Ghost of Drupal PastI will continue in contrib. I have a very good idea now -- shard into directories like
configdir/.core.entity_view_display.node.test.teaser
(collections starting with dots do not work anyways so that's not a problem) and have files likecontent.body.yml
in there. But any sharding will break potx so it's not tenable for core. And fixing JavaScript is not something I can do.Comment #44
Ghost of Drupal Pasthttps://www.drupal.org/project/pax has been released, I am unfollowing this issue.
Comment #45
andypostComment #46
nikitagupta CreditAttribution: nikitagupta at Srijan | A Material+ Company for Drupal India Association commentedRerolled patch #40.
Comment #47
andypostPatch is not working
Comment #49
andypostComment #50
karishmaamin CreditAttribution: karishmaamin at Specbee commentedRe-rolled patch for 9.3.x
Comment #51
andypostNew duplicate filed #3251174: Don't use weights to store fields order in Field UI.
Comment #52
alexpottThinking about this issue some more - maybe there's another way. We could change how the javascript that changes the weights for a drag and drop table works and we could prioritise not changing existing weights. That way we'd benefit all configuration that using a drag and drop table.
Comment #53
andypostComment #54
donquixote CreditAttribution: donquixote at European Commission and European Union Institutions, Agencies and Bodies commentedComing from #3251174: Don't use weights to store fields order in Field UI.
I don't see test cases in the recent patches, so not sure how this is meant to work.
Anyway, I think I have a proposal that will solve the problems without creating additional files.
Problems with weights..
@alexpott:
This is good, but..
Even if we store a "canonical" weight always, this will still cause considerable noise when inserting or removing.
From the linked issue:
Proposal: List of keys
Instead of storing weights, store the order as a list of machine names.
The actual field config entries remain in alphabetic order.
This works both for view displays and form displays.
What about field groups?
Fields within field groups don't need to be in this list, because field groups have their own order.
However, field groups now also need to be in the order list.
We might need to prevent name clashes if field group and field have the same machine name (should not happen, but still..).
I noticed field_group is still in contrib, I did not check if core has any group support. Seems this would be a clear BC break, if a contrib module needs to change its implementation.
Comment #55
donquixote CreditAttribution: donquixote at European Commission and European Union Institutions, Agencies and Bodies commentedBtw
Url has changed:
https://www.chapterthree.com/blog/principles-of-configuration-management...
(please chapterthree, enable redirect module! or was the link in #11 always wrong?)
I no longer see weights mentioned there. Instead I now see this:
Luckily, with the solution I am proposing, we can have our cake and eat it too :)
Comment #56
alexpottThis is not necessarily true. The biggest annoyance here is when you make a small change and you end up with lots of change. In my opinion we could design an algorithm that minimised the amount of weight change by the JS that meant that inserting a single field rarely means adjusting any other weight. Same with moving a field. I do think it would be a big win if the change we make here was not specific to field configuration but specific to tabledrag as doing this will improve more things.
For example the default behaviour could be always to have a gap of ten when you add a field. And when you move the JS it would maintain this gap and it should be aware of which weights are already saved and therefore not adjusted. And if it has to choose between maintaining the gap and changing a weight that is saved it should always adjust the gap.
Comment #57
BerdirThat is interesting, one problem is that the default weight form element goes from -50 to 50 I think, so we'd need to adjust that to considerably higher numbers, which also means a lot more markup.
Comment #58
bircherRE #54: This is a neat idea but unfortunately only solves some problems while also creating other ones.
In particular the new
order
sequence is inherently unsortable itself because the information it contains is the order.So if you diff and merge it is not easy to get back the original. (I know git can do that by saving the lines above and below, so in non trivial changes you will just get a merge conflict, but in for example Config Split 2.x we just work with arrays and the array merge just adds it to the end. For that a way to sort the result is important in order to avoid unnecessary diffs.)
#56 is an interesting idea. instead of bigger numbers we could also switch from int to float and make the algorithm prefer whole numbers.
Comment #59
alexpott@bircher - oohh using floats... that is an awesome idea.
Comment #60
Berdirfloat would require config schema changes though and I guess we would need to replace the select with a number field? might be ok, but again, sounds very much non-trivial to change the generic tabledrag API like this.
Comment #61
PasqualleIn Drupal 7 it was possible to use float values as weight. I was using it a lot.
In D8+ it works also, just need to change the schema, but the config export is quite ugly with lots of !!float
Float sort also discussed in #2757207: Introduce "stable sort by weight" functions to sort arrays (and objects)
Comment #62
donquixote CreditAttribution: donquixote at European Commission and European Union Institutions, Agencies and Bodies commented@alexpott #56
We can do this. But then it is not "canonical" anymore.
To me, "canonical" means that the same order always results in the same file.
In #54 I mention two "problems":
The algorithm you (and others below) propose would solve the second problem (up to a limit, I guess), but not the first problem. Over time it can lead to a creeping infection (spicy language chosen to add flavor), depending how it is implemented.
E.g. if we use floats, "infected" config could look like this:
- 0
- 0.25
- 0.375
- 0.4375
- 0.46875
- 0.5
- 0.7109375
- 1
- 2
This can lead to ever more digits, and the non-canonicalness means that two developers that reorder fields in the same way, or almost the same way, will see a bigger diff than they should. And who wants to resolve a conflict with such fractal weights?
Is this not something we should try to avoid?
The solution I propose, with the extra 'order' entry solves both problems.
Or that's what I think - discussion below.
-------
@bircher (#58)
I don't quite get it.
What is "get back the original"?
And what happens in config split that cannot be implemented in a different way?
The solution I propose has a small version of the same problem as the very original idea of sorting all of the field config entries.
But on a much smaller scale, and much less confusing to git.
The problem with the original proposal is that all the field config is being reordered, whereas in my proposal only the machine name is being reordered. This way, git won't rip apart the config entries and build Frankensteins out of them. Or at least much less so.
Btw the same conflicts that would occur with the proposed solution already do occur with field group, where the sorted list of entries within a field group is part of config (somewhat redundant, because we already store weights for those fields).
-----
This said: A solution with non-canonical weights and creeping infection would be less of a BC break than the new "order" key.
Comment #63
bircherOk I get that switching to floats would be a schema change and would probably be a lot more work than it is worth.
But if we were to just add a factor of 10 and keep things integers there would still be plenty of gaps for a smart algorithm to find numbers that would change less frequently. This way there wouldn't be anything creeping and in most cases one would still look at integers in a reasonable range. Of course then when there is no space, the algorithm has to decide to reassign weights of elements you didn't touch, but that is still better than the current situation. (I could get nerd-sniped finding such an algorithm, that sounds very interesting)
But yes this would result in different weights for BA => AB depending on whether you sort B after A or A before B.
And a bit a longer explanation of why I think #54 is not something we should do.
The new key is inherently unsortable because the order of the elements is the information kept in the list. As such the information is stored in the array list keys. That is just not a very practical thing to deal with. It is equivalent to what you get if you array_flip it now you have the names as keys and the weight as values, except with the constraint that the weights must go from 0 to len()-1. Of course for a human to review the yaml not seeing the weights and reordering the fieldnames the way they should be when resolving a merge conflict is easier. But for a algorithm it is not so easy either way.
A practical example with Config Split (But this affects all the modules using the config transformation API which do more than adding or removing whole config files.) Config Split 2.x has a feature to "split modules as if they had been uninstalled" (Something core currently doesn't do for the excluded modules #3230825) It does so by taking the active config and asking core what would change if a module would be uninstalled and then taking that second config and save that to the sync storage and in order to do the reverse on import it saves a "config patch" in the split storage. The patch is created essentially with the recursive array_diff utility, and it is merged back with the array_merge utility. It is a bit more complicated than that because it doesn't work great with things that are
array_is_list
(from php 8.1) But essentially when you do array_diff and array_merge you lose the information about the position in the array of the thing you removed and then added back.If there is a way to sort the array then you get the order back and there is no diff when importing the config. If there is no way to sort it then we would have to find a way to record that position the element is in based on the other elements, but they could change independently too and in git this would be a merge conflict, but in a config transformation you can not stop and ask a user to solve a merge conflict, so we would have to just chose one way to solve it and hope that it was what it should be. We have yet to see how often that is the case (we currently cheat with the sorting so it is not ideal).
I am considering looking into how difficult that would be since config sequences are only rarely sorted even in core unfortunately. But being able to just sort the config would definitely be much easier. So I am trying to make all config sortable and thus solving the problem for everybody that tries to do something similar as config split. That is why I try to avoid inherently unsortable config sequences.
Comment #64
donquixote CreditAttribution: donquixote as a volunteer and commented@bircher (#63)
Wow, somehow I missed this recent reply. Sorry for that.
Btw, more paragraph breaks would be nice :)
Problems when splitting a list
I think it boils down to the problem of merging two pieces of such a list, while restoring the original order of the full list.
This would indeed be difficult..
From the rest of the post, I understand you want to "merge" two separate parts of config, and then sort everything into a canonical order.
This way you don't have to think about the position of where to insert the entries from the other part of config.
Unfortunately I listened to too many people who say "sort" when they mean "manually order", which is the opposite of what you and me mean by "sort".
I think what we mean is we treat the data not as a list, but as a collection/set, meaning there is no explicit order. To guarantee a deterministic storage process, we can either store it in a "canonical" order (sorted by machine name or some other criterion), or in separate files.
Conceptually, anything we manually order with tabledrag sort is a list (of distinct items, typically), not a collection/set. The non-canonical weights are merely a technical artifact. The storage model proposed in #54 attempts to reflect this 1:1.
A fragment of a list, which has to be combined with other fragments later to restore the original order, is no longer a "pure" list, but a "list with gaps". For the original use case, these gaps should be considered redundant information, but they are no longer redundant if we plan to recombine it with other pieces.
Our current storage model for these cases is a sorted map + weights. The sorted map is how we store a collection/set, and with the integer weights this makes a list + gaps. With float weights it would be like custom-placed points on a number scale.
But is sorted list + weights the best way to store a list + gaps? See below..
First, I want to be clear that we would not use the storage method from #54 for the module list itself (core.extension.yml). Module weight is not comparable to the usual tabledrag weight lists. Not something to argue about, just a point for completeness.
This already sounds limited, e.g. the patch can only add things, never remove. But probably this is ok 99% of the time, because most modules don't remove stuff when they are installed.
array_merge() behaves in undesirable ways for any array with any integer keys, not just for pure sequential arrays: https://3v4l.org/Q9NqI
If we want to treat all keys as associative, we can use "+" operator or array_replace().
True.
True.
All of this because we are really merging two sets, rather than two lists, if the weights are treated as properties of each item.
Weights getting out of sync?
I see one limitation: If somebody operates the tabledrag while the split-out modules are uninstalled, then merging the split-out entries back in will insert these entries in unexpected positions, because the weights are no longer in sync.
I assume the idea is that site builders should always work with all modules installed.
How relevant is this in practice?
Before we look into alternatives, I wonder:
Are there real-world scenarios where a config split would remove items from a list with custom (tabledrag) order?
E.g. would the uninstalling of modules in a config split lead to field instances being removed? Image effects? Text filters? Language selection methods?
It seems technically the answer is yes.
In practice, my wild guess is that for most sites where this happens, only a small number of lists would be split up in this way - but still more than zero, so it is a problem to be dealt with.
V2 of #54: List + gaps
As mentioned above, currently we are using a sorted list + weights to store a list + gaps.
The gaps are needed if we later want to combine this with another fragment to restore a complete list with the original order.
An alternative way to store such a list + gaps would be to store spaces or blank lines between machine names.
E.g. like so.
Note that this is equivalent to the following weights-based storage with sorted map:
(for this example, weights start at 1, not 0).
We can modify this model so that the main reduced list won't have any gaps, and only the "patches" have gaps.
To merge this, we can start with the patch version, and fill up the gaps with entries from the main version.
There are some questions what to do if we have multiple patches, and how to deal with duplicates, but all of this can be solved.
How would the recursive merge algorithm determine which nested array to treat in this special list-with-gap merge, instead of the usual array_merge() / "+" / array_replace()?
-> This can be determined either by a config schema, or by a heuristic.
(I remember doing this in custom code in the past, with a schema-like structure to determine the array type)