Right now, Schema API has a few problems.
1) The API for table creation and modification is completely different than the rest of the DB layer, because it's still wacky functions with big arrays rather than fluent method chaining.
2) The API for table alteration doesn't allow multiple alter commands in one query, which in some cases is totally nuts for performance, especially when creating or manipulating indexes. It also leads to some crazy footwork to get around not having an index in places, such as: http://drupal.org/node/159329
So the solution is to make Schema API suck less. I see two possible approaches. (Neither of these affect hook_schema(). That's a layer above all of this.)
1) Introduce DDL query builders, along these lines:
db_alter('tablename')
->addIndex('foo')
->addField('bar')
->addField('blob')
->dropField('baz')
->execute();
db_table('tablename')
->addField('bar')
->addIndex('baz')
->addField('foo')
->execute();
Pros:
- API is similar to the other query builders we have.
- Probably easier to implement than option 2.
Cons:
- Still a query-based approach rather than a DB-state-based approach.
2) Fully objectify Schema API with classes/objects to represent tables and fields.
$schema = $connection->schema();
$table = $schema->getTable('tablename');
$field = new DBField('newfieldname');
$field->setType('blah');
$table->addField($field)l;
$table->dropIndex('bar');
$table->save();
$table = $schema->createTable('tablename');
$table->addField($field);
$array = $table->getArrayVersionLikeHookSchemaUses();
$sql = $table->getSqlCreateStatement();
$table->save();
Pros:
- Richer, more complete API.
- More like other similar systems (I think).
Cons:
- Different than other parts of the DB API at the moment, but less different than it is now.
- More effort required to implement.
Thoughts?
Comments
Comment #1
Crell commentedTagging.
Comment #2
moshe weitzman commentedJust by reading those two examples, I think most will agree that #1 has better DX. #2 requires more brainpower and referring to docs. In either case, this could be a very big performance win. InnoDB and Sqlite are very very slow with ALTER TABLE operations and batching them like this is a essential.
Comment #3
Crell commentedNote that we certainly could make #2 mostly fluent, and I would rather insist on it. :-) I didn't chain anything for example purposes, but it could still be largely chainable.
Comment #4
bjaspan commentedI think both options represent more engineering than is needed to solve Moshe's actual problem: that you cannot perform multiple table alterations in a single query. Larry's suggestion #1 for DDL query builders really boils down to an OO interface to modify a schema structure and invoke the "do what is necessary to make the database table look like this schema structure" function. We always knew we'd want such a function, we just left if out of the initial D6 patch.
So, we could just write that function (per db driver), which the DDL query builder would need anyway, instead of completly redoing Schema API. Or, at least, we could just write the function first and let Larry spend his life on another big API rewrite if he really wants to. :-)
Comment #5
Crell commentedNo, the DDL query builder (option 1) would not involve a "make the database table look like this somehow" method. (There are no functions in DBTNG that are not just convenience wrappers for methods.) It would still be up to the developer to write the DDL commands for the query builder that did what he wanted to do, just as it's up to the developer to not send nonsensical configuration to SelectQuery.
Method 2 *looks* like that's what it's doing, but it would actually just queue up the changes you direct and issue a similar query to what #1 generates.
In neither case am I looking at the mythical "make the schema look like what hook_schema is now, kthxbye" operation. Either of these methods would be a prerequisite for that.
Comment #6
dgv commentedComment #2 from moshe talks about a performance win, but how would the generated SQL be different between this:
and that:
And in what way would it make any difference in performance?
Comment #7
Crell commented@dgv: SQL has the ability to say "ALTER TABLE foo CHANGE X, CHANGE Y, CHANGE Z", and then figure out how to make all of those changes in parallel. If you make it three separate statements, it has to lock the table, alter it, and update all indexes each time separately. There are also some changes, like changing primary keys, that break the table if not done in parallel. Right now we kinda hack around that, but that's not a good solution.
Think of it as the difference between "UPDATE foo SET a=1, b=2 WHERE c=3" and "UPDATE foo SET a=1 WHERE c=3", followed by "UPDATE foo SET b=2 WHERE c=3".
Comment #8
webchicksubscribe. No time to look at this now.
Comment #9
moshe weitzman commentedI read that indexes are not rebuilt until a transaction is finished. Would it make sense to do all of update.php inside a transaction?
Comment #10
bjaspan commentedre #9: That would allow update.php to rollback all updates if any fail...
Comment #11
Crell commentedMySQL doesn't support transactional DDL, so it commits immediately when you muck about with the schema.
I still don't see us reaching any consensus on the approach to take. We kinda need to decide on that, and if we don't then I'll just decide when I get around to writing it and y'all can suck it up. :-)
Comment #12
dave reidInteresting Crell. Adding another tag.
Comment #13
neilnz commentedI would love to help and have some ideas around making schema API more robust cross-database (eg. column quoting).
Subscribing for now, but I'll be back.
Comment #14
josh waihi commentedsubscribing
Comment #15
Crell commentedAdding a new tag as discussed with webchick.
Comment #16
Nick Lewis commentedI like #2. Actually, the first example makes no sense to me when I start thinking about how I'd go about setting a fields attributes.
Is this how you'd image actually doing something useful in the first example? (not a rhetorical question)
The first example seems fraught with peril (assuming its possible). Or would we be doing what we always do:
+1 for #2
double -1 for #1
Comment #17
Crell commentedI intended both as viable options. :-) In practice it could be db_table_alter() for the factory. Either way, the idea there would be to model closely to SQL's ALTER TABLE syntax, much the way the other query builders closely model INSERT INTO table_name, SELECT, and so forth.
So in model 1, to add a column to a table you'd do something like:
And of course could chain multiple addColumn, removeColumn, renameColumn, addIndex(), and so on commands together just like you can in SQL.
In method 2, you would instead do:
So rather than building a command to run, you load an entity, modify it, and save it again.
Both are, I think, viable approaches. I suspect that #1 will actually be easier to write, but I'm not certain of that.
Comment #18
Nick Lewis commentedI edited my comment to make me sound like less of a jackass. :-D
So if in both examples we rely on using the same old drupal arrays, e.g.
If we describe a columns attributes as classic schema api arrays in both examples. then I take back my minus -1 on #1
Something that attracted me to the second was that I wasn't passing these arrays. Usually, my schema updates are pretty damn specific, and I liked the idea of calling a function "e.g. setType, setSize" (which my IDE brings up suggestions for) instead of remembering what array keys.
The second strikes me as being better, but i confess its probably because it reminds me of a lot non-drupal classes I've already worked with. I haven't had a good experience with PHP objects that chain yet...
I also confess I've altered more tables through the existing schema api then straight sql.
Edit
Actually, i'm in #1 camp now. My dream of variable hints in my IDE seems dead, so #1 is at least better than the existing way -- and #2 seems cumbersome in comparison.
Comment #19
Crell commentedWell arguably we could do a full set of modifier methods on column objects and table objects. I really haven't thought that far ahead on that approach. #1 would definitely use arrays somewhere at the very least.
Variable hints in the IDE require typed @param and @return statements. I'd like those too, but that's a separate issue. :-)
Drieschick, any input? Beuler?
Comment #20
catchNo input, so this will have to wait until D8.
Comment #21
catchComment #22
donquixote commentedYour remark just prevented another sidetracking :)
#1154334: "Fluid" syntax for hook_schema() (method chaining).
Comment #23
larowlansubscribe
Comment #24
xjmDiscussed this with webchick in IRC. We agree this is more of a feature request.
Comment #25
webchickFor 9.x, we might want to define tables in YAML. hook_schema() is a real PITA to write by hand.
Comment #26
catchEntity/Field API and 'backends responsible for own storage' make hook_schema() much less necessary, to the point where we could look at removing it entirely in 9.x. Downgrading.
Comment #27
danithaca commented+1 for define tables in YAML! As a contrib module developer, my first instinct was to go to "mymodule/config/install/" to find the db schema yml, but it wasn't there. A temporary work-around solution for D8 is to define an API, such as
drupal_schema_yml_to_array(), and then do:I would say use YAML to define data in
ContentEntityBase::baseFieldDefinitions()as well. PHP is not Java and I don't see the point of using getters/settings to access data structure. In the "contact_entity_example" module,baseFieldDefinitions()took about 120 lines of code. It feels insane to me to write 120 lines of code just to define such a simple data structure. Even the old way of using hook_entity_info() looks cleaner.A side note: We as a community should really try to win over developers through better DX. Recently I'm looking at Python-Django per request from a client. I was amazed how simple it was to define tables in django.
Comment #28
webchickActually, unless I'm mistaken, I think this is the kind of thing we could do in 8.1.x, as long as we retain backwards compatibility for the old way of doing things.
Comment #29
webchickComment #32
xjmNo longer needs to be postponed (nor currently in need of committer feedback).
Also more of a task than a feature request since it's about DX and API improvement.
Comment #45
donquixote commentedAre there still any plans to define database table schemas in yaml instead of a hook?
It seems this would work fine for all schemas that are not dynamic, that is, that don't depend on site configuration.
Of course contrib modules can just call
Yaml::parseFile()within their hook_schema(), but then every module would have to invent its own convention how to name and where to store this yml file.Comment #46
catch@donquixote there's nothing active, but that would be a good issue to open.