Add support for INSERT INTO ... SELECT FROM ...

Berdir - June 3, 2009 - 20:23
Project:Drupal
Version:7.x-dev
Component:database system
Category:feature request
Priority:normal
Assigned:Berdir
Status:closed
Issue tags:DBTNG Conversion, Performance
Description

The following INSERT syntax is standard SQL and supported by all databases, so we should add support for it to InsertQuery.

INSERT INTO table (field, field2, field3) SELECT otherfield, otherfield2, otherfield3 FROM othertable WHERE condition = 'something'

This is how it currently works:

<?php
$query
= db_select('test_people', 'tp')->fields('tp', array('name', 'age', 'job'));

db_insert('test')
  ->
fields(array('name', 'age', 'job'))
  ->
from($query)
  ->
execute();
?>

Use cases:
- #105639: Tracker performance improvements
- Materialized views
- Table cloning
- ...

The attached patch does introduce a new method, from(SelectQueryInterface $query). If set, it will use that and ignore anything set by values(). The patch contains a tests and has been tested on MySQL, PostgreSQL and SQlite.

AttachmentSizeStatusTest resultOperations
dbtng_insert_select.patch7.39 KBIdleFailed: Failed to apply patch.View details | Re-test

#1

catch - June 3, 2009 - 20:27

I'll let Crell, chx or Damien rtbc, but this looks really nice.

#2

killes@www.drop.org - June 4, 2009 - 08:57

Great! I'd consider DBTNG incomlete without this. Unfortunately, I can't comment much on the implementation.

#3

andypost - June 4, 2009 - 09:09

+1 For migration cases

#4

Berdir - June 4, 2009 - 11:24

Another reason why we need this:
http://api.drupal.org/api/function/comment_enable/7

#5

Crell - June 4, 2009 - 14:54
Status:needs review» needs work

$from_query is an object property and so should be $fromQuery for consistency.

I am not certain if from() is the best method name here. It could be, I'm just a little bit wary. If no one has a better idea, though, I'm game to go with it.

Aside from those minor points, this looks awesome!

#6

Berdir - June 5, 2009 - 15:18
Status:needs work» needs review

Re-roll with $from_query renamed.

I don't really like from() either, it was just the first thing that came to my mind. If someone has a better suggestion....

AttachmentSizeStatusTest resultOperations
dbtng_insert_select2.patch7.37 KBIdleFailed: Failed to apply patch.View details | Re-test

#7

Crell - June 5, 2009 - 16:52
Status:needs review» reviewed & tested by the community

Well I can't think of a better name, and neither could anyone in #Drupal, so let's go with this. :-) Thanks, Berdir!

#8

Dries - June 5, 2009 - 16:56
Status:reviewed & tested by the community» fixed

Committed to CVS HEAD. Thanks!

#9

Berdir - June 6, 2009 - 13:12
Status:fixed» needs review

Currently, it does not work if the used SelectQuery has one or multiple arguments, attached is a fix.

AttachmentSizeStatusTest resultOperations
select_insert_args.patch2.31 KBIdleFailed: Invalid PHP syntax in includes/database/pgsql/query.inc.View details | Re-test

#10

Berdir - June 6, 2009 - 13:21
Status:needs review» needs work

Above patch has a bug in the PostgreSQL part..

#11

Berdir - June 6, 2009 - 13:35
Status:needs work» needs review

Ok, the new patch is tested on all DBMS.

Another thing that we might need to consider is the order of the fields. The order needs to match and it can be quite hard to guess the correct order, especially if expressions are used. We should probably detect the order if the same names/aliases are used and try to correct the fields in the INSERT statement, as we do for a common insert.

Edit: I've also updated the tests to use a condition.

AttachmentSizeStatusTest resultOperations
select_insert_args2.patch3.01 KBIdleFailed: Failed to install HEAD.View details | Re-test

#12

Crell - June 6, 2009 - 15:25

Auto-detecting the fields from the select shouldn't be difficult. You can just call getFields() on the select object and then, I believe, array_keys() on it to get the field aliases in the order in which they will appear.

#13

Berdir - June 6, 2009 - 23:16

Hm, I see, I'll try it out. It would need getFields() and getExpressions() then and we could call fields() automatically, no need to do that explicit.
-

#14

Berdir - June 7, 2009 - 23:08

Ok, works pretty well. The attached patch does now 4 things:

- Apply the params from the used fromQuery, if available

- The fields are now set automatically, it is not necessary anymore to call fields(). It would still be possible for special cases, because only the first fields() call does have any effect, all other are simply ignored. So you could overwrite the default by calling fields() manually.

- Updated the tests to use an dummy expression and a condition, to test above things.

- Because this is the third time I have to extend the insert query preparation/validation exactly the same way in all 3 drivers, I decided to take up my previous idea about creating a generic prepare/validate function and introduced prepare(). As usual, I don't like the name, if someone has a better idea...

AttachmentSizeStatusTest resultOperations
select_insert_args3.patch7.6 KBIdleUnable to apply patch select_insert_args3.patchView details | Re-test

#15

Crell - June 7, 2009 - 23:21

Hm. I can't believe I'm about to say this... but wouldn't validate() be the better name? Half of what it's doing is validating the stability of the object itself. Although that would then push the data modification bits (like the SelectQuery -> fields() logic) back to execute(), or to a preExecute(). method. I'm not sure about adding the second method, but I'd be OK with it.

Then if we do add partial SQL-syntax checking later as discussed in #473876: PostgreSQL requires GROUP BY when using HAVING, there's a natural place for them.

For debugging purposes, it may be better to throw exceptions for each failure than to just return FALSE, but we can change that later.

#16

Berdir - June 8, 2009 - 08:40

Hm. I can't believe I'm about to say this... but wouldn't validate() be the better name? Half of what it's doing is validating the stability of the object itself. Although that would then push the data modification bits (like the SelectQuery -> fields() logic) back to execute(), or to a preExecute(). method. I'm not sure about adding the second method, but I'd be OK with it.

Yes, those parts are the reason I used prepare and not validate... though preExecute() could work for both things, or not?

For debugging purposes, it may be better to throw exceptions for each failure than to just return FALSE, but we can change that later.

We can't throw exceptions for everything. For some of these checks yes, but not for the insertValues for example. That needs to work like it currently does, example:

<?php
$query
= db_insert('something')->fields(array('some_field'));
foreach (
$array as $field) {
  if (
$field > 5) {
   
$query->values(array('some_field' => $field));
  }
}
$query->execute();
?>

It is possible that no values are added to the query, we simply don't know it. And we have quite a few of similiar conditional insert loops. If nothing is going to be inserted, that is fine and should not throw an exception, it should just go on.

#17

Crell - June 8, 2009 - 15:22

True, so for that it's not a real error and therefore not a real exception (since empty inserts are support as a no-op). If we're going to have all of this pre-processing in the same method I'd call it preExecute(), as it's more descriptive of what it does and also familiar to people who have used Views 2.

#18

Berdir - June 8, 2009 - 18:18

Renamed to preExecute()

AttachmentSizeStatusTest resultOperations
select_insert_args4.patch7.62 KBIdleFailed: Failed to install HEAD.View details | Re-test

#19

Berdir - June 10, 2009 - 21:26

Adding the conversion tag, because we need this to properly convert comment_enable().

#21

killes@www.drop.org - June 15, 2009 - 18:08
Issue tags:+Performance

tagging

#22

System Message - June 24, 2009 - 01:25
Status:needs review» needs work

The last submitted patch failed testing.

#23

Berdir - June 24, 2009 - 09:08
Status:needs work» needs review

Testbot, I don't believe you :)

#24

Crell - June 27, 2009 - 01:06

Some minor documentation cleanup. Also moved a little code around in InsertQuery_mysql::execute() to ensure a consistent behavior of "if a subselect query is defined, ignore any other values". That's now documented as well.

As soon as the bot approves this is ready to go. Of course, the bot is misbehaving right now. It passes all of my tests, anyway. :-)

AttachmentSizeStatusTest resultOperations
select_insert_args.patch9.86 KBIdlePassed: 11695 passes, 0 fails, 0 exceptionsView details | Re-test

#25

System Message - June 27, 2009 - 04:05
Status:needs review» needs work

The last submitted patch failed testing.

#26

andypost - June 28, 2009 - 04:26
Status:needs work» needs review

Bot was failed

#27

Crell - June 29, 2009 - 15:02
Status:needs review» needs work

Per #505086: Exception handling in InsertQuery_mysql::execute, preExecute() should throw exceptions rather than returning NULL. I would actually go a step further and declare a new child exception for each error to make them easier to catch. Each exception class can easily be just a single line in database.inc. We have a few there already.

#28

Berdir - June 29, 2009 - 15:22

For things that are exceptions, yes, but again, the empty values check is not, for example.

Now that more and more of Drupal core is going to be OOP, we should probably define some standards for those exceptions. How to name them, how to organize (NoFieldsException extends DatabaseException extends DrupalException, whatever) them and so on. And also, when to use them and when not...

#29

Crell - June 29, 2009 - 19:20

Yeah, probably. For now, though, let's just follow the pattern that the DB layer is using. Agreed on not throwing an exception on an "empty" query as that's a silent-do-nothing by design as it simplifies calling code.

#30

Berdir - July 1, 2009 - 21:38

Re-rolled, added two exceptions and an explanation why the no values stuff does not use an exception.

AttachmentSizeStatusTest resultOperations
select_insert_args5.patch10.03 KBIdleFailed: 11565 passes, 0 fails, 1 exceptionView details | Re-test

#31

Berdir - July 1, 2009 - 21:43
Status:needs work» needs review

#32

System Message - July 1, 2009 - 22:00
Status:needs review» needs work

The last submitted patch failed testing.

#33

Berdir - July 1, 2009 - 23:03
Status:needs work» needs review

I changed the behavior and the tests failed ? Crazy stuff :p

Updated the test to expect an exception instead of return value NULL.

AttachmentSizeStatusTest resultOperations
select_insert_args6.patch11.88 KBIdlePassed: 11695 passes, 0 fails, 0 exceptionsView details | Re-test

#34

Crell - July 2, 2009 - 04:21
Status:needs review» reviewed & tested by the community

I've run out of things to object to.

#35

joshmiller - July 3, 2009 - 12:11

I've written a total of two tests, so I may not be the most qualified person, but this seems backwards:

+    try {
+      $result = db_insert('test')->execute();
+      // This is only executed if no exception has been thrown.
+      $this->fail(t('Expected exception NoFieldsException has not been thrown.'));
+    } catch (NoFieldsException $e) {
+      $this->pass(t('Expected exception NoFieldsException has been thrown.'));
+    }

Since we want SQL queries to silently do nothing when inserting nothing, should the fact that no exception has been thrown be a "$this->pass()" and if there is an exception be a "$this->fail()" ?

I'm not changing the status because I could just be mis-reading the issue and the test all together.

Josh

#36

Berdir - July 3, 2009 - 12:35

There are two things that can be missing:

- Fields : You are *required* to specify into which fields you want to insert data. If there are no fields, this is an exception and you need to fix your code.
- Values: Values can be added conditionally, if there aren't any, it should just silently do nothing.

Compare with my example in #16. The fields are always added and the values only if a certain condition matches.

#37

joshmiller - July 3, 2009 - 17:32

thus, your test is checking to make sure that fields are required.

#38

Liam McDermott - July 9, 2009 - 21:14

One minor PHPDoc/comment gripe:

+ * Exception thrown if an insert query does specify a field twice.

Shouldn't this be: 'Exception thrown if an insert query specifies a field twice'? Other than that tiny thing, this is ready to rock and roll. I've been testing it on MySQL and Postgres* for #515034: Mark all forum topics read.

* Although there is a Postgres problem with 515034, that is not related to this patch.

#39

webchick - July 10, 2009 - 06:10

I wanted to commit this tonight but either it needs more comments or I need more sleep (or possibly both!) I could really use a walkthrough on IRC. If either of you are around tomorrow night, ping me. Or if Dries wants to commit this, go for it. :)

#40

webchick - July 10, 2009 - 18:06

Ok, here's the result of our talk today. Basically everything is comment- and wtf- related.

-    $query = db_select('test_people', 'tp')->fields('tp', array('name', 'age', 'job'));
+    $query = db_select('test_people', 'tp');
+    // Use an expression to test the correct order of the insert query fields.
+    $query->addExpression('tp.age', 'age');
+    $query
+      ->fields('tp', array('name','job'))
+      ->condition('tp.name', 'Meredith');

I stared at this for 10 minutes and could not for the life of me figure it out. I had to ask Crell and Berdir on IRC.

They stated that what this is doing is basically changing SELECT name, age, job FROM test_people tp to SELECT age AS age, name, job FROM test_people tp so that we can ensure the fields are properly being re-ordered. This needs clarification, as the new code is not anywhere near remotely as clear as the old code.

+ * Exception thrown if an insert query does specify a field twice.

specifies.

+ * exception is thrown if that is the case.
+ *
+ */
+class FieldsOverlapException extends Exception {}

Remove blank line there.

+    // If a SelectQuery is specified to select from, that takes precedence over
+    // any other configured values.

This comment is repeated in several places in the code, but does not remotely help someone like me without deep DBTNG knowledge to understand what's going on.

After about a half hour of back and forth, I finally figured out that this comment wasn't intended to describe the code below it at all, as normally our comments do, but to explain why we're returning something from the function earlier than normal.

I'd suggest changing this to something like:

+    // If we're selecting from a SelectQuery, finish building the query and pass it back, as any remaining options are irrelevant.

However, that still doesn't help with some of the trickier parts of the code. I'd love a one-liner summary at:

+      $sql = (string)$this;
+      return $this->connection->query($sql, $this->fromQuery->getArguments(), $this->queryOptions);

This is the entire crux of the patch and we don't document what we're doing here for posterity so it does not get "fixed" again later.

Additionally, the comments here continue to leave me clueless as to what the complicated logic below is doing.

+      // Use the defined aliases for the INSERT query fields.
+      // Fields are added before expressions in __toString().
+      $this->fields(array_merge(array_keys($this->fromQuery->getFields()), array_keys($this->fromQuery->getExpressions())));
+

From talking to Crell, I think this would something like "Add regular fields before aliased fields so they can be treated consistently in SELECT and VALUES." Basically, I'm looking for some "why" here, rather than "what".

+      throw new NoFieldsException("There are no fields available to insert with.");

Should use single quotes.

I'm not sure about preExecute() as a name. validate() would be more clear. But Crell pointed out that it's not only validation we're doing here, and I can't think of anything better atm, so I think it's fine to leave it as-is.

#41

webchick - July 10, 2009 - 18:06
Status:reviewed & tested by the community» needs work

#42

Crell - July 10, 2009 - 18:14
Status:needs work» reviewed & tested by the community

As a note, preExecute() also matches the pre_execute() in Views, #pre_render in FAPI, etc. It's a general "do stuff right before we do this other big thing" pattern.

#43

Crell - July 10, 2009 - 18:15
Status:reviewed & tested by the community» needs work

Crossposted.

#44

Berdir - July 11, 2009 - 09:43
Status:needs work» needs review

Ok, I tried to work on the comments. These things are quit hard to explain, so please report of the new ones are better.

An extract of the bigger changes is also available at http://drupalbin.com/10393

AttachmentSizeStatusTest resultOperations
select_insert_args7.patch12.57 KBIdlePassed: 11695 passes, 0 fails, 0 exceptionsView details | Re-test

#45

Crell - July 13, 2009 - 03:03
Status:needs review» reviewed & tested by the community

I am OK with these comments, so back to webchick for her second pass. :-)

#46

webchick - July 21, 2009 - 01:56
Status:reviewed & tested by the community» fixed

Thanks, folks! Committed to HEAD. :)

#47

System Message - August 4, 2009 - 02:00
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

#48

Gábor Hojtsy - October 9, 2009 - 13:21

Where was this documented? I was searching for subselect support for db_insert() in the update docs, but could not find it, so falled back on a good old db_query() in my latest patch for shortcuts at http://drupal.org/node/511286#comment-2131522

 
 

Drupal is a registered trademark of Dries Buytaert.