I need to construct some complex SQL statements having UNION or UNION ALL inside. It looks missing in the documentation... if it has been implemented.

For e.g.

"SELECT ll.*
      FROM {linkchecker_links} ll
      INNER JOIN (
        SELECT lid FROM (
          SELECT DISTINCT ll.lid
          FROM {node} n
          INNER JOIN {node_revisions} r ON r.vid = n.vid
          INNER JOIN {linkchecker_nodes} ln ON ln.nid = n.nid
          INNER JOIN {linkchecker_links} ll ON ll.lid = ln.lid AND ll.last_checked <> %d AND ll.status = %d AND ll.code NOT IN (" . db_placeholders($ignore_response_codes, 'int') . ")
          WHERE n.uid = %d OR r.uid = %d
          UNION
          SELECT DISTINCT ll.lid
          FROM {comments} c
          INNER JOIN {linkchecker_comments} lc ON lc.cid = c.nid
          INNER JOIN {linkchecker_links} ll ON ll.lid = lc.lid AND ll.last_checked <> %d AND ll.status = %d AND ll.code NOT IN (" . db_placeholders($ignore_response_codes, 'int') . ")
          WHERE c.uid = %d
        ) q1
      ) q2 ON q2.lid = ll.lid"

I have completly no idea how I should add UNION inside...

Aside - I'm new to D7 DB api, but the rest of this SQL code looks not that easy to convert. It would be great if someone could provide some help with the rest...

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Crell’s picture

For static queries (using db_query()), there is no change. You just write a static string with UNION or whatever else you want. Placeholders are now handled automatically in static queries if you pass in an array rather than a scalar as the parameter.

For dynamic queries using db_select(), UNION is not yet supported. I never figured out how to do that nicely. If you want to send in a patch I'm game to try and sneak that in before code freeze. :-)

hass’s picture

Thank you for quick feedback.

My main issue with the above query is the db_placeholders($ignore_response_codes, 'int'). I have no idea how could looks like in a static fashion. db_placeholders() is gone and I do not know how to convert to D7... If you could give me an example it would be very helpful.

I'm still not sure what the difference is between a static and dynamic query and when I should use on or the other... I was told on IRC complicated querys need to use db_select()... so I expect I have nearly only complicated querys next to me for the reason of db_placeholders() :-(

hass’s picture

I'm sorry about the UNION implementation... as I'm still trying to upgrade a module... I do not have the background to develop the core database API stuff :-(. But having no UNION support would be critical for linkchecker module. I may never get a D7 version out if this is not possible. :-(((

Crell’s picture

For placeholders:

$result = db_query("SELECT ... WHERE nid IN (:nids)", array(':nids' => array(1, 2, 3));

The rest is magic.

You should use the select builder for queries *whose structure will change at runtime*, either because they need to vary per database, or conditionally based on user input, or because they need node access, or they're using a pager or tablesort or something, etc. If it's one hideously long and complicated query whose structure does not change, then you can just use db_query().

hass’s picture

Oh my goodness... Thank you *very* much. I overcomplicated the most query's :-(. This magic was my problem and this is not documented or I missed it... we should add such an "IN" example to the DB API doc pages (http://drupal.org/node/310086)... not only the "easy" statements...

hass’s picture

Crell: Is it possible that extend('PagerDefault')->extend('TableSort') is required to build a pager? I've found this in core very often and there is *no* pager in core that is not build this way. Looks like I need to move all Pager with/o TableSort to db_select() !?

Crell’s picture

You should be able to do a pager with just PagerDefault, not TableSort. That was tested when extenders were added. In practice nearly all pagers also use tables, and tablesort, but that's not a requirement of pagers.

That said, yes, pagers should be done using PagerDefault. The old pager_query() is deprecated and should be removed as soon as we're sure core is not using it anywhere anymore.

hass’s picture

But how can I use PagerDefault without db_select()? With db_select I cannot use UNION... :-(

hass’s picture

@Crell: The only thing I can come out is the below... but it's not working at all and misses $subquery3 and $subquery4. Are you able to work on the UNION stuff, please? I can test, but cannot develop the driver. I really hope that such a critical thing can get in after code freeze if it breaks modules working in past!? "nicely" is something that we can nail out later... but "working" is required.

  $ignore_response_codes = preg_split('/(\r\n?|\n)/', variable_get('linkchecker_ignore_response_codes', "200\n302\n304\n401\n403"));

  $subquery4 = db_select('linkchecker_node', 'ln')
    ->distinct()
    ->fields('ln', array('lid'));

  $subquery3 = db_select('linkchecker_comment', 'lc')
    ->distinct()
    ->fields('lc', array('lid'));

  $subquery2 = db_select('linkchecker_box', 'lb')
    ->distinct()
    ->fields('lb', array('lid'));

  $subquery1 = db_select($subquery2, 'q1')->fields('q1', array('lid'));

  $query = db_select('linkchecker_link', 'll')->extend('PagerDefault')->extend('TableSort');
  $query->innerJoin($subquery1, 'q2', 'q2.lid = ll.lid');
  $query->fields('ll');
  $query->condition('ll.last_checked', 0, '<>');
  $query->condition('ll.status', 1);
  $query->condition('ll.code', $ignore_response_codes, 'NOT IN');
  $test = $query->execute();

  krumo($test);

This gives me:

SELECT ll.* FROM linkchecker_link ll
INNER JOIN (
  SELECT q1.lid AS lid FROM (
    SELECT DISTINCT lb.lid AS lid FROM linkchecker_box lb
  ) q1
) q2 ON q2.lid = ll.lid
WHERE (ll.last_checked <> :db_condition_placeholder_1) AND (ll.status = :db_condition_placeholder_2) AND (ll.code NOT IN (:db_condition_placeholder_3, :db_condition_placeholder_4, :db_condition_placeholder_5, :db_condition_placeholder_6, :db_condition_placeholder_7)) LIMIT 0, 10

If you have an idea how to build a pager query without db_select() - PLEASE shed some light on me. I'm fully lost here and I do not like to discontinue the linkchecker module only for a core DB API that is missing a small feature. :-(

Crell’s picture

Assigned: Unassigned » Crell
Status: Active » Needs review
FileSize
6.29 KB

Let's try this patch. Seems to work and has a unit test. :-)

Status: Needs review » Needs work

The last submitted patch failed testing.

hass’s picture

Status: Needs work » Needs review

*GREAT* - I will give this a try in a few hours. What is about "UNION ALL"?

Crell’s picture

I don't understand how Union All vs Union distinct works, to be honest. :-) I've only ever just used UNION.

Crell’s picture

Issue tags: +API change

Tagging.

hass’s picture

Issue tags: -API change

I cannot test. Latest core code has a broken install process and the next oldest version I have is ~8 days old and the patch does not apply to this old code :-(.

Crell’s picture

Issue tags: +API change

Cross-posted. Retagging.

zzolo’s picture

Patch looks good, but I definitely am in the opion that UNION should be avoided, and if needed can be done with the db_query function anyway.

As far as UNION ALL
* http://www.techonthenet.com/sql/union_all.php

I'm not sure UNION DISTINCT is always supported. I know in MSSQL, its simply a matter of using a distinct in one of the queries. I think most SQL implementations are selecting distinct rows by default.

But again, I feel these can be handled in the db_query range, since UNIONS are pretty expensive.

hass’s picture

Status: Needs review » Reviewed & tested by the community

I used my 1 week old core version with the latest database folder to install and apply the patch. By this I was able to get this working as expected and changed in the #9 example the line

$subquery1 = db_select($subquery2, 'q1')->fields('q1', array('lid'));

to

$subquery1 = db_select($subquery2->union($subquery3)->union($subquery4), 'q1')->fields('q1', array('lid'));

what resulted in the expected SQL statement:

SELECT ll.* FROM linkchecker_link ll 
INNER JOIN (
  SELECT q1.lid AS lid FROM (
    SELECT DISTINCT lb.lid AS lid FROM linkchecker_box lb 
    UNION
    SELECT DISTINCT lc.lid AS lid FROM linkchecker_comment lc
    UNION
    SELECT DISTINCT ln.lid AS lid FROM linkchecker_node ln
  ) q1
) q2 ON q2.lid = ll.lid
WHERE (ll.last_checked <> :db_condition_placeholder_0) AND (ll.status = :db_condition_placeholder_1) AND (ll.code NOT IN (:db_condition_placeholder_2, :db_condition_placeholder_3, :db_condition_placeholder_4, :db_condition_placeholder_5, :db_condition_placeholder_6)) LIMIT 0, 10

This patch solves my UNION only issue, but I believe we should have UNION ALL, too.

moshe weitzman’s picture

Passes its own Beatles test and the bot is green. Code looks good. RTBC

cha0s’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
9.97 KB

I got a patch for not only UNION but UNION ALL too. Hopefully it can get pushed in.

Yes, the code is a bit dupe'y, but really I don't think it can be all that much better without making it less clear.

Thanks!

hass’s picture

Codewise it looks like you have missed something the the __clone(). I'm also not sure if it makes more sense to have something like ->union($queryObject, 'ALL') as it may not duplicate so much code and looks like the condition logic.

Crell’s picture

Absolutely it should be a unified list, if for no other reason than the patch in #20 means you can never have a UNION ALL before a UNION in the same query. The order does matter. Rather, the $union array should store not just a query object but also the type of union to do for each.

That said, I still don't actually understand what the difference between UNION, UNION ALL, and UNION DISTINCT is. Until someone explains that to me, the patch in #10 is still the approach we want to use.

cha0s’s picture

FileSize
9.96 KB

There were some problems with indentation.

hass’s picture

Status: Needs work » Needs review

@Crell: There is a link in #17 that explain the difference between UNION and UNION ALL. UNION ALL does not merge the same entries into one. If UNION only is used the *same* rows from different tables are merged into one row. This is not the case with UNION ALL. You see in the latest test an update for this...

or with other words:

The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL.

I'm not sure what UNION DISTINCT does... maybe not DB independed - never heard about it. Sounds like normal UNION.

hass’s picture

Status: Needs review » Needs work

t('UNION ALL correctly preserved duplicates.') sounds wrong... it need to be UNION and does NOT preserve duplicates... I believe you have flipped t('UNION correctly discarded duplicates.') with t('UNION ALL correctly preserved duplicates.')

hass’s picture

Status: Needs review » Needs work

UNION and UNION DISTINCT seems to be the same, see http://dev.mysql.com/doc/refman/5.1/en/union.html.

The default behavior for UNION is that duplicate rows are removed from the result. The optional DISTINCT keyword has no effect other than the default because it also specifies duplicate-row removal. With the optional ALL keyword, duplicate-row removal does not occur and the result includes all matching rows from all the SELECT statements.

You can mix UNION ALL and UNION DISTINCT in the same query. Mixed UNION types are treated such that a DISTINCT union overrides any ALL union to its left. A DISTINCT union can be produced explicitly by using UNION DISTINCT or implicitly by using UNION with no following DISTINCT or ALL keyword.

hass’s picture

Sounds like UNION DISTINCT is defined in the Standards and UNION not... maybe we should better go with the Standard :-)

cha0s’s picture

Status: Needs work » Needs review
FileSize
10.09 KB

Good points, hass. I fixed the backwards strings for the unit tests, and also regular union() now uses the more specific UNION DISTINCT.

hass’s picture

Why are we not merge UNION with UNION ALL and UNION DISTINCT by using ->union($selectQuery, 'ALL'), ->union($selectQuery, 'DISTINCT') and so on. This would also fix the issue Crell pointed out.

cha0s’s picture

FileSize
8.38 KB

Okay, I got the different union types unified under the union() call, also aggregating the relevant info in the $union variable in the query to get rid of the duplication.

The $union variable is now an array of arrays, the child arrays containing 2 keys, 'type', which is either '', 'ALL', or 'DISTINCT' (mapping to UNION, UNION ALL, and UNION DISTINCT, respectively) defaulting to '' (UNION), and 'query' which is the query object.

Updated the tests for UNION and UNION ALL, noting that the UNION DISTINCT test is not done, since UNION and UNION DISTINCT are semantically equal.

hass’s picture

Status: Needs review » Needs work

I've found a few code style issues...

1. We may need to remove this from the inline documentation as it's only correct for UNION and UNION DISTINCT.

Duplicate columns will
+   * be discarded.

2. "An array of additional Select queries." documentation also have some good space we can use until the 80 letters limit per line is reached :-). Currently max 50 is used.

3. A few trailing space issues need to be fixed. One is above "// Ensure we only get 2 records." others in the tests and so on.

4. "explicity" -> explicitly

Code wise it looks good... I will try to test.

hass’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
7.84 KB

Patch attached does only change the documentation as written in #31.

I've tested UNION, UNION ALL, UNION DISTINCT with linkchecker and all works as expected (statements have been verified with latest core / devel query output).

cha0s’s picture

Yeah, I noticed the comment margin was a bit conservative, but I was just trying to follow the rest of the docs in the general area.

P.S. Nice work. :)

Crell’s picture

Status: Reviewed & tested by the community » Needs work

Sweet work, folks! A few small notes, though.

1) The docblock for the $union property seems confusing. It should be a better description of the data structure, vis, "an array with each item representing a query to union. Each entry itself is a 2-key associative array. The 'type' key is..." etc.

2) The default handling should not happen in the toString() method. Instead, set the default value of $type to "DISTINCT". That way, all union statements will always specify what type of union to use. That's a bit more self documenting, and let us remove the ugly ternary from the toString() block.

3) The __clone() implementation was not updated for the changed $union structure.

Let's get those fixed and we can get this puppy in.

hass’s picture

After a quick search around... I'm currently not for adding distinct by default. It looks not very well supportet. Sqlite, mssql may not understand it. No idea what about oracle. Does pdo themself help us here?

cha0s’s picture

FileSize
8.76 KB

Yeah, unfortunately I don't think UNION DISTINCT is implemented across the board, so we should probably leave UNION as the default... I know, the ternary is gross. I've fixed it in this patch.

In addition, I've seen two other query types in a couple database, INTERSECT and EXCEPT, which function like UNIONS, with the operator working like PHP's array_intersect() and array_diff() respectively. I don't mind if this is too much to push, and I know for a fact it isn't universally supported, but should we give users at least the option? If we decide to add it, I'll write unit tests (unless some other kind soul does it first! ;))

I have also cleared up the docs as specified.

Is there any value in actually validating the type argument? My latest patch does that, but that also I don't mind if it's removed... Here's the patch:

hass’s picture

I like the validation idea, too. We may also think about automatically changing UNION DISTINCT into UNION only if a developer use it. If we know for sure it's the same like UNION and someone uses it, but it's not working in all DB servers we should care about this... I'm not sure what evangelism rule apply here for Drupal, but it shouldn't be a developers decision to make code incompatible.

The sqlite docs are only talking about UNION and UNION ALL. I have no sqlite with me and cannot verify if DISTINCT nevertheless works... MsSQL2005 documentation also does not have anything about UNION DISTINCT... standard or not... we better go with UNION only for UNION DISTINCT - only to be save and for compatibility reasons. I cannot say anything about INTERSECT, MINUS and EXCEPT, but if this is only available on Oracle I'm not sure if we should support it!???

I like APIs that work flawless and 100% DB independent...

cha0s’s picture

FileSize
8.73 KB

Yeah, INTERSECT and EXCEPT are oddballs, but I saw them in SQLite documentation as well. I suppose we could add them pretty trivially later if deadly necessary..? I don't imagine they're used much anyways, honestly.

I also took your (good) suggestion and made 'DISTINCT' an alias for 'UNION'. I think that is a better way to keep things cross-db... only thing is if we find out there's some crazy db that doesn't treat UNION and UNION DISTINCT as the same.

hass’s picture

UNION MINUS (Oracle) is also such a candidate...

I hope Crell can comment on the evangelism?

lambic’s picture

Yeah, INTERSECT and EXCEPT are oddballs, but I saw them in SQLite documentation as well. I suppose we could add them pretty trivially later if deadly necessary..? I don't imagine they're used much anyways, honestly.

PostgreSQL supports INTERSECT and EXCEPT. We use EXCEPT quite a lot for data import but it's probably less likely to be used in a non-batch environment.

cha0s’s picture

I think we should stick to the bare minimum of UNION implementations, at least for now....

The implementations in my last patch cover those that seem to be the most common, I'd say personally that if the more exotic UNION ops are needed, we should just let people invoke them statically... I guess I'm just stretching to think of some case where Module X is thinking "I really need to modify this UNION MINUS call that Module Y is making"...

I think I agree with hass, in that the DB API should be as universal as possible, and UNION, UNION ALL, and UNION DISTINCT (with our new alias in my last patch) are the most universal of all.

cha0s’s picture

Status: Needs work » Needs review

Setting to needs review.

Crell’s picture

Status: Needs review » Needs work

I think we can easily kill 2 birds with one stone.

Don't validate the incoming union type. Check for just "DISTINCT" and fold that to empty string. Anything else, allow through as is. Don't put the "UNION" keyword into the array.

Then on compile, just put whatever the user entered in as a string after the DISTINCT keyword in the toString method.

That means if someone is working on a DB they know supports a more esoteric UNION variant, they can use it without having to muck about with the core code. The worst side effect is that we may get two spaces instead of one in the query string itself, which is really not an issue. SQL is whitespace insensitive.

hass’s picture

Status: Needs work » Needs review
FileSize
8.09 KB

Updated patch. I hope we are done now.

Status: Needs review » Needs work

The last submitted patch failed testing.

hass’s picture

Status: Needs work » Needs review
FileSize
8.08 KB

Updated patch.

Crell’s picture

Status: Needs review » Needs work

We don't need the ternary. It just makes the code uglier. This should work, and be more readable:

$query .= ' UNION ' . $union['type'] . ' ' . (string) $union['query'];

Also, I just noticed the change in the comment above that. "UNION and co." is a rather colloquial expression. Even I had to think twice to figure out what it meant. Let's just say "UNION is a little odd" (or whatever). We're referring to UNION in all its various forms as being odd.

hass’s picture

Status: Needs work » Needs review
FileSize
8.04 KB

Updated patch.

hass’s picture

This code is RTBC. Do we need to get this in *before* code freeze? It cannot wait for D8...

cha0s’s picture

FileSize
8.77 KB

I think I disagree on one part of the last patch, that is, why are we forcing 'UNION', when this type of thing can use just INTERSECT and EXCEPT (these don't use the UNION keyword). I think if we check for our favorites, and then pass everything completely through, that should all work. If I have 'UNION WOW' in my db, 'UNION WOW' would be the arg to union().

Crell’s picture

Status: Needs review » Reviewed & tested by the community

According to webchick this is a code slush-friendly patch, but I'd still like to get it in. :-)

I guess with the patch in #50, alternate query combiners besides UNION work, but aren't technically supported. The three possible variants of UNION are all supported, however. I hate that there's a switch statement in the otherwise pretty DB layer, but, yeah. :-P

So #50 is ready, IMO. Over to you, webchick.

hass’s picture

THX

I wonder how it's possible to make

  • MySQL: SUBSTR(), SUBSTRING()
  • Oracle: SUBSTR()
  • SQL Server: SUBSTRING()

and

  • MySQL: TRIM(), RTRIM(), LTRIM()
  • Oracle: RTRIM(), LTRIM()
  • SQL Server: RTRIM(), LTRIM()

database independent without switch statements. We may need to fold TRIM() to LTRIM(RTRIM(' string ')) and I'm not sure what we need to do with SUBSTRING. But this is OT here... Give me a hint if we should fix these issues in a two follow up issues :-)

lambic’s picture

Postgres:
BTRIM(), RTRIM(), LTRIM(), TRIM([leading | trailing | both] [characters] from string)
SUBSTR(), SUBSTRING()

There are lots of SQL functions that differ among RDBMSs (don't get me started on date handling), it's ambitious to try to abstract everything.

Crell’s picture

Please stay on topic. There are already other threads that deal with the trainwreck that is cross-DB SQL function support. Just get this committed and move on.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks.

hass’s picture

@Dries: THX!!!

@Crell: Can you point us to the thread, please? I cannot find it...

Crell’s picture

See #167335: Handle database-specific SQL functions/operators by simple abstraction for a long and melodramatic discussion of SQL function handling. :-) Please read carefully before diving in, as it's a substantially harder problem than you think.

Oh yes, and w00t!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

drewish’s picture

I just realized the one thing this patch didn't handle was ORDER BY clauses. They end up placed ahead of the UNION rather than afterwards.

#1145076: UNION queries don't support ORDER BY clauses

drewish’s picture