Download & Extend

Update Schema API to make it more usable

Project:Drupal core
Version:8.x-dev
Component:database system
Category:task
Priority:major
Assigned:Unassigned
Status:active
Issue tags:DX (Developer Experience), Needs committer feedback, Schema API

Issue Summary

Right now, Schema API has a few problems.

1) The API for table creation and modification is completely different than the rest of the DB layer, because it's still wacky functions with big arrays rather than fluent method chaining.

2) The API for table alteration doesn't allow multiple alter commands in one query, which in some cases is totally nuts for performance, especially when creating or manipulating indexes. It also leads to some crazy footwork to get around not having an index in places, such as: http://drupal.org/node/159329

So the solution is to make Schema API suck less. I see two possible approaches. (Neither of these affect hook_schema(). That's a layer above all of this.)

1) Introduce DDL query builders, along these lines:

<?php
db_alter
('tablename')
  ->
addIndex('foo')
  ->
addField('bar')
  ->
addField('blob')
  ->
dropField('baz')
  ->
execute();

db_table('tablename')
  ->
addField('bar')
  ->
addIndex('baz')
  ->
addField('foo')
  ->
execute();
?>

Pros:
- API is similar to the other query builders we have.
- Probably easier to implement than option 2.

Cons:
- Still a query-based approach rather than a DB-state-based approach.

2) Fully objectify Schema API with classes/objects to represent tables and fields.

<?php
$schema
= $connection->schema();
$table = $schema->getTable('tablename');
$field = new DBField('newfieldname');
$field->setType('blah');
$table->addField($field)l;
$table->dropIndex('bar');
$table->save();

$table = $schema->createTable('tablename');
$table->addField($field);
$array = $table->getArrayVersionLikeHookSchemaUses();
$sql = $table->getSqlCreateStatement();
$table->save();
?>

Pros:
- Richer, more complete API.
- More like other similar systems (I think).

Cons:
- Different than other parts of the DB API at the moment, but less different than it is now.
- More effort required to implement.

Thoughts?

Comments

#1

Tagging.

#2

Just by reading those two examples, I think most will agree that #1 has better DX. #2 requires more brainpower and referring to docs. In either case, this could be a very big performance win. InnoDB and Sqlite are very very slow with ALTER TABLE operations and batching them like this is a essential.

#3

Note that we certainly could make #2 mostly fluent, and I would rather insist on it. :-) I didn't chain anything for example purposes, but it could still be largely chainable.

#4

I think both options represent more engineering than is needed to solve Moshe's actual problem: that you cannot perform multiple table alterations in a single query. Larry's suggestion #1 for DDL query builders really boils down to an OO interface to modify a schema structure and invoke the "do what is necessary to make the database table look like this schema structure" function. We always knew we'd want such a function, we just left if out of the initial D6 patch.

So, we could just write that function (per db driver), which the DDL query builder would need anyway, instead of completly redoing Schema API. Or, at least, we could just write the function first and let Larry spend his life on another big API rewrite if he really wants to. :-)

#5

No, the DDL query builder (option 1) would not involve a "make the database table look like this somehow" method. (There are no functions in DBTNG that are not just convenience wrappers for methods.) It would still be up to the developer to write the DDL commands for the query builder that did what he wanted to do, just as it's up to the developer to not send nonsensical configuration to SelectQuery.

Method 2 *looks* like that's what it's doing, but it would actually just queue up the changes you direct and issue a similar query to what #1 generates.

In neither case am I looking at the mythical "make the schema look like what hook_schema is now, kthxbye" operation. Either of these methods would be a prerequisite for that.

#6

Comment #2 from moshe talks about a performance win, but how would the generated SQL be different between this:

db_alter('tablename')
  ->addIndex('foo')
  ->addField('bar')
  ->execute();

and that:
db_alter('tablename')->addIndex('foo')->execute();
db_alter('tablename')->addField('bar')->execute();

And in what way would it make any difference in performance?

#7

@dgv: SQL has the ability to say "ALTER TABLE foo CHANGE X, CHANGE Y, CHANGE Z", and then figure out how to make all of those changes in parallel. If you make it three separate statements, it has to lock the table, alter it, and update all indexes each time separately. There are also some changes, like changing primary keys, that break the table if not done in parallel. Right now we kinda hack around that, but that's not a good solution.

Think of it as the difference between "UPDATE foo SET a=1, b=2 WHERE c=3" and "UPDATE foo SET a=1 WHERE c=3", followed by "UPDATE foo SET b=2 WHERE c=3".

#8

subscribe. No time to look at this now.

#9

I read that indexes are not rebuilt until a transaction is finished. Would it make sense to do all of update.php inside a transaction?

#10

re #9: That would allow update.php to rollback all updates if any fail...

#11

MySQL doesn't support transactional DDL, so it commits immediately when you muck about with the schema.

I still don't see us reaching any consensus on the approach to take. We kinda need to decide on that, and if we don't then I'll just decide when I get around to writing it and y'all can suck it up. :-)

#12

Issue tags:+Schema API

Interesting Crell. Adding another tag.

#13

I would love to help and have some ideas around making schema API more robust cross-database (eg. column quoting).

Subscribing for now, but I'll be back.

#14

subscribing

#15

Adding a new tag as discussed with webchick.

#16

I like #2. Actually, the first example makes no sense to me when I start thinking about how I'd go about setting a fields attributes.

Is this how you'd image actually doing something useful in the first example? (not a rhetorical question)

<?php
db_alter
('tablename')
  ->
addField('bar')
 
// i'm guessing that $this does not refer to the field "bar", but to the table named "tablename"?
 
->getField('bar')
 
// so I must get the field before I set type ?
 
->setType('int')
 
// when I want to get back to the table, will I do a jquery end() like thingy?
 
->end()
 
// now we are dealing with a table again?
 
->addIndex('foo')
  ->
execute();
?>

The first example seems fraught with peril (assuming its possible). Or would we be doing what we always do:
<?php
db_alter
('tablename')
  
// false refers to being unsigned of course
 
->addField('bar', 'int', 'small', false)
// or even better
db_alter('tablename')
// how drupal is that?
 
->addField('bar', 'int', array('size' => 'small', 'unsigned' => false, 'default' => 0))
?>

+1 for #2

double -1 for #1

#17

I intended both as viable options. :-) In practice it could be db_table_alter() for the factory. Either way, the idea there would be to model closely to SQL's ALTER TABLE syntax, much the way the other query builders closely model INSERT INTO table_name, SELECT, and so forth.

So in model 1, to add a column to a table you'd do something like:

<?php
db_table_alter
('mytable')
  ->
addColumn('columnname', $info_array_about_column)
  ->
execute();
?>

And of course could chain multiple addColumn, removeColumn, renameColumn, addIndex(), and so on commands together just like you can in SQL.

In method 2, you would instead do:

<?php
$schema
= $connection->schema();
$table = $schema->getTable('tablename');
$column = new DBColumn('newcolumnname');
$column->setType($some_information_about_the_column_here);
$table->addColumn($column)l;
$table->save();
?>

So rather than building a command to run, you load an entity, modify it, and save it again.

Both are, I think, viable approaches. I suspect that #1 will actually be easier to write, but I'm not certain of that.

#18

I edited my comment to make me sound like less of a jackass. :-D

So if in both examples we rely on using the same old drupal arrays, e.g.

<?php
db_table_alter
('mytable')
  ->
addColumn('columnname',array(
   
'type' => 'varchar',
   
'length' => 88,
   
'not null' => true,
   
'default' => 'Douglas',
  )),
  ->
addColumn('columnname2',array(
   
'type' => 'varchar',
   
'length' => 88,
   
'not null' => true,
   
'default' => 'Wombat',
  )),
  ->
alterColumn('whatever', array(
  
'type' => 'int'
 
))
  ->
execute();
?>

If we describe a columns attributes as classic schema api arrays in both examples. then I take back my minus -1 on #1

Something that attracted me to the second was that I wasn't passing these arrays. Usually, my schema updates are pretty damn specific, and I liked the idea of calling a function "e.g. setType, setSize" (which my IDE brings up suggestions for) instead of remembering what array keys.

The second strikes me as being better, but i confess its probably because it reminds me of a lot non-drupal classes I've already worked with. I haven't had a good experience with PHP objects that chain yet...

I also confess I've altered more tables through the existing schema api then straight sql.

Edit
Actually, i'm in #1 camp now. My dream of variable hints in my IDE seems dead, so #1 is at least better than the existing way -- and #2 seems cumbersome in comparison.

#19

Well arguably we could do a full set of modifier methods on column objects and table objects. I really haven't thought that far ahead on that approach. #1 would definitely use arrays somewhere at the very least.

Variable hints in the IDE require typed @param and @return statements. I'd like those too, but that's a separate issue. :-)

Drieschick, any input? Beuler?

#20

Version:7.x-dev» 8.x-dev

No input, so this will have to wait until D8.

#21

Priority:critical» major

#22

Neither of these affect hook_schema()

Your remark just prevented another sidetracking :)
#1154334: Shortcut syntax for hook_schema().

#23

subscribe

nobody click here