Right now the config_encode() and config_decode() functions are really hacky. The encode() functionality is just something I found on the internet, and decode() does a conversion through JSON because it is fast. There are probably better options for both. The plan right now is to keep the actual XML as simple as humanly possible, basically not much more than a key/value pair store, so keep that in mind. Performance and functionality need to be weighed pretty evenly.
The current status is described here: http://www.heyrocker.com/node/238 and here: http://www.heyrocker.com/how-use-drupal-8-configuration-system.
Comments
Comment #1
pounardFurther, those might need to be able to parse attributes (at least of the language as I heard).
Comment #2
gddSome other issues to consider:
- Right now the json conversion forces into objects OR arrays, we have no way to mix and match. A different decode mechanism would allow us to use an object/array specification in the schema.
- Ideally the serialize/deserialize process will retain the XML comments, otherwise they will get lost when written back out to files.
Comment #3
rwohlebI'm trying to get caught up on this initiative, so I'm sorry if this is addressed elsewhere. What is the reasoning behind using XML for storage rather than just JSON? Drupal already has decent JSON encode/decode support.
Comment #4
mitchell CreditAttribution: mitchell commentedI found this array-to-domdocument library which might be of value. Here are a few notes:
@rwohleb: Here you go -> Configuration management sprint - file formats && File format discussion continued
Comment #5
gddI have spent some time in the past week with this and other XML parsers, and here are my findings and associated thoughts.
So that's where I'm at right now. For the moment, the code in the repo has been updated to have a better parser (courtesy of EclipseGC and rszrama) and the XML files are very simple. More discussion welcomed!
Comment #6
Damien Tournoud CreditAttribution: Damien Tournoud commentedThere is nothing to gain from XML given those constraints. If you want XML anyway, you can use some standard key/value DTD, like the Apple Property List format, that should be supported by most IDE out there.
One additional question that has not been a big focus yet is the question of the mergeability. None of the serialization formats (JSON, XML, PHP) that have been suggesting have good mergeability. If you have already worked collaboratively using the Features module (to export Views, Panels, Fields, etc.) you probably already bumped into this issue: none of the common VCS and IDE out there now how to reasonably merge this type of files, because you need more then the standard line-based merge technique (see A State-of-the-Art Survey on Software Merging by T Mens for an overview of merge techniques).
This is going to be even more a problem if we package all the configuration using the same format. Not completely sure what the solution is at this point, but it is a discussion really worth having.
Comment #7
pounardMy guess is that any machine/serialization oriented configuration format won't be easily mergeable if it has not been created with human readability in mind.
Best format ever for this is plain good old ini file, or eventually YAML.
Comment #8
Damien Tournoud CreditAttribution: Damien Tournoud commentedHuman readability and machine mergeability are two independent concepts.
The main mergeability issue we currently have with structured text format (pretty JSON, pretty XML, YAML) is that the tree structure is not represented in each line, so the merge tool is going to try to merge independent parts of the tree.
Typical example: two developers are adding a different field to a View; instead of adding those below each other, a line-based merge tool is going to try to merge them, because some of the lines are common in those blocks.
One way of fixing that would be to materialize the whole tree path at every line.
Comment #9
pounardWhile they are indeed two different concepts in real life pretty much all merge algorithms (at least those we use everyday, git, svn, diff, etc..) are merging on a per line basis: pretty much the same way you format your own code to make it human readable.
Comment #10
Damien Tournoud CreditAttribution: Damien Tournoud commentedNot exactly: human readability is often necessary for line-merging algorithms, but it is *far* from sufficient.
Comment #11
pounardYes of course, but this plays its role. Most common diff algorithm is LCS (longuest common subsequence) and it definitely plays very well with human readable text, probably a lot more than any compiled binary (or not) data. We can consider XML being almost binary when you compile it with no pretty formatting, and considering the order doesn't matter.
Comment #12
Ralt CreditAttribution: Ralt commentedWhat about ASN.1 ? There is a PHP library for it, the format is standardized, and the language is made to define rules and structure, which is what configuration is.
Just a wild idea, though.
P.S. : sorry for being somehow off-topic, I couldn't find anywhere else to post this idea.
Comment #13
bojanz CreditAttribution: bojanz commented#4 looks interesting.
That said, I've always preferred JSON over XML (and didn't think the "_comment" convention was a bad idea back in the original discussions).
Still, it's partially-irrational (as with everyone when XML is discussed), so I haven't felt the need to jump into the holy wars until now.
Comment #14
philippejadin CreditAttribution: philippejadin commentedI don't really understand what we get with xml or json that we don't already have with php.
- any drupal user knows php
- php files are protected by the webserver
- php files are quick to parse
- php files are even quicker to parse when there is an opcode cache like apc
- php files (arrays) are quick to merge in php
- php files can be made easy to merge by a machine, look at this :
Php is so anchored in Drupal, that my config is even hightlighted (in color) in this comment :-)
I really don't see the point of using a file format that at the end, you will need to convert to a php array, instead of directly using a php array.
Using xml <-> array = impedance mismatch = developer nightmare
Comment #15
Crell CreditAttribution: Crell commentedPHP cannot be taken out of memory, ever, so it leaks memory.
PHP is a potential security attack vector.
PHP is not as human editable as you might think.
Point 1 is the deal killer. The other two are just icing. PHP was rejected months ago for good reason. Let's please not reopen that debate.
Additional datapoints: Composer users JSON, and there's discussion of using Composer in core. However, Fabien from Symfony noted this weekend that he hates JSON as a config format (mostly due to the stupid trailing comma issue), and wondered why we were using XML without a schema. Of course his preference is YAML, which we also already rejected. :-)
Take those data points as you will.
Comment #16
gddThe other problem with serialized PHP is that it is not interchangeable with any other systems unless they are also PHP. We have a stated desire of wanting to be able to easily integrate with deployment systems like Chef or Puppet, as well as people who roll their own. Serialized PHP is really terrible for this.
However, it has to be said, all the formats suck in their own special way. We may yet switch away from XML but whatever we switch to will have its own irritances. It is all about what we decide to prioritize.
Comment #17
pounard@hejrocker I think philippejadin was not talking about *serialized* PHP but about plain good old PHP files.
Comment #18
philippejadin CreditAttribution: philippejadin commentedI've seen the discussions about config formats. Sorry for reopening this, coming late to the discussion.
I have to say it, just because I have gone this way a long time ago (xml vs json vs php vs xyz), and it has been a very painful ride.
The fact that xml converted to php arrays creates ugly structures should be taken as a fact.
Here is what I came to in 2006 for my home made cms (later I switched to Drupal ;-) ) :
- http://svn.berlios.de/wsvn/thinkedit/trunk/config/tables-dist.php
- http://svn.berlios.de/wsvn/thinkedit/trunk/config/sample_config/yapaka.php
I can tell you that it was very easy to use, parse, even for a non developer. Please take it into account too.
I stop there, because, I guess there is a bigger picture I probably don't understand.
This config initiative is in all cases a great thing for drupal. I hope module developers (views for example) will use it!
Comment #19
philippejadin CreditAttribution: philippejadin commentedI feel guilty for re-questionning the choice of xml.
Taking xml as granted, I came to the same conclusions as #5 about atributes.
My (2006) implementation is still here : http://svn.berlios.de/wsvn/thinkedit/trunk/class/xml_parser.class.php
The battle plan was to limit features :
- no attributes
- only one element of the same name at the same level
- if more than one element, use "id" attribute to differentiate them
Comment #20
oadaeh CreditAttribution: oadaeh commentedI believe, due to http://www.heyrocker.com/node/238 and http://www.heyrocker.com/how-use-drupal-8-configuration-system, this can be closed.
Comment #20.0
oadaeh CreditAttribution: oadaeh commentedUpdating description to link to heyrocker's status update in mid-March 2012.
Comment #21
Crell CreditAttribution: Crell commentedI don't believe so. The need for a better encode/decode system is still present. heyrocker, feel free to correct me and re-close. :-)
Comment #22
sun#1470824: XML encoder can only handle a small subset of PHP arrays, so switch to YAML
Comment #22.0
sunAdding another link.