Top Visitor Page problem with PostgreSQL 8.3

jaydub - March 2, 2008 - 14:20
Project:Drupal
Version:7.x-dev
Component:statistics.module
Category:bug report
Priority:critical
Assigned:Unassigned
Status:active
Issue tags:PostgreSQL Code Sprint, PostgreSQL Surge
Description

As reported in http://drupal.org/node/220064 and in the PostgreSQL 8.3 release notes http://www.postgresql.org/docs/8.3/static/release-8-3.html there is a problem with certain queries in PostgreSQL 8.3 due to the fact that PostgreSQL 8.3 no longer automatically casts columns to match the data type in JOIN columns or in this case the column in a CONCAT operation.

The CONCAT function created for PostgreSQL in the Drupal install

      if (!db_result(db_query("SELECT COUNT(*) FROM pg_proc WHERE proname = 'concat'"))) {
        db_query('CREATE OR REPLACE FUNCTION "concat"(text, text) RETURNS text AS
          \'SELECT $1 || $2;\'
          LANGUAGE \'sql\''
        );
      }

expects TEXT data types. In previous PostgreSQL versions a non-text column would have been automatically cast to text. This will no longer work in 8.3.

The attached patches fix one query from the statistics module where CONCAT is used. I have included a patch for 6.1 and for 5.7.

AttachmentSizeStatusTest resultOperations
statistics-5.7.module.patch756 bytesIdleFailed: Failed to apply patch.View details | Re-test
statistics-6.1.admin_.inc_.patch731 bytesIdleFailed: Failed to apply patch.View details | Re-test

#1

ainigma32 - December 3, 2008 - 21:07
Version:6.1» 7.x-dev
Category:support request» bug report

This issue still exists in HEAD so I rerolled the patch.

Works fine on MySQL 5.0 but this needs to be tested on PostgreSQL 8.3 and I don't have that setup at the moment.

So: Calling all Postgres fans...

- Arie

AttachmentSizeStatusTest resultOperations
statistics_postgresql_0.patch936 bytesIdleUnable to apply patch statistics_postgresql_0.patchView details | Re-test

#2

Damien Tournoud - December 3, 2008 - 21:13
Title:Top Visitor Page problem with PostgreSQL 8.3» PostgreSQL surge #8: Top Visitor Page problem with PostgreSQL 8.3

Promoted to the PostgreSQL surge.

#3

Josh Waihi - December 4, 2008 - 05:11

the stats module tests are currently broken in HEAD, applying this patch didn't fix it :(

#4

Josh Waihi - December 4, 2008 - 05:39

ok so tests are failing because PDO doesn't like floats on int_unsigned columns. this maybe linked with this http://drupal.org/node/339588

#5

Josh Waihi - December 4, 2008 - 06:53

this patch fixes the what I was commenting about above.
Statistics Module: 36 passes, 0 fails, and 0 exceptions

AttachmentSizeStatusTest resultOperations
statistics.module.patch666 bytesIdleFailed: 7117 passes, 8 fails, 12 exceptionsView details | Re-test

#6

System Message - December 6, 2008 - 08:45
Status:needs review» needs work

The last submitted patch failed testing.

#7

Josh Waihi - December 7, 2008 - 20:55
Status:needs work» needs review

I find it hard to believe that this patch failed as all it does is type cast a float to a integer. by the looks of the testbed result, it looks rather like something went wrong

#9

Josh Waihi - December 9, 2008 - 23:13

re-submitting previous patch that failed because there is no reason that it should have failed

AttachmentSizeStatusTest resultOperations
statistics.module_7.patch666 bytesIdleFailed: Failed to apply patch.View details | Re-test

#10

Josh Waihi - December 9, 2008 - 23:15

edit: posted on wrong node

updating status

#11

Crell - December 12, 2008 - 22:30
Status:needs review» reviewed & tested by the community

Expecting the DB layer to handle float to int type coercion is not reasonable. We can expect module authors to only pass in sane data. Trivial fix, the bot likes, it, yay.

#12

Dries - December 13, 2008 - 13:53
Version:7.x-dev» 6.x-dev
Status:reviewed & tested by the community» needs work

I've committed this to CVS HEAD but it probably needs work for Drupal 6. Updating version and status. Thanks!

#13

Dave Reid - December 13, 2008 - 20:31
Version:6.x-dev» 7.x-dev
Priority:normal» critical
Status:needs work» needs review

Apparently, the patch in #1 was supposed to also go in but wasn't merged with the patch in #9 (sorry Dries). Right now this is an incomplete commit to HEAD, so I'm bumping to critical. Reposting patch from #1 to bring it down to the bottom for review and retesting.

AttachmentSizeStatusTest resultOperations
statistics_postgresql_0.patch936 bytesIdleUnable to apply patch statistics_postgresql_0_0.patchView details | Re-test

#14

Josh Waihi - December 13, 2008 - 23:24
Status:needs review» reviewed & tested by the community

I have tested this and it works as stated in comment #3-#5

#15

webchick - December 30, 2008 - 03:54
Version:7.x-dev» 6.x-dev
Status:reviewed & tested by the community» patch (to be ported)

Committed #13 to HEAD. Marking back to needs porting for 6.x.

#16

Damien Tournoud - January 5, 2009 - 11:26
Title:PostgreSQL surge #8: Top Visitor Page problem with PostgreSQL 8.3» Top Visitor Page problem with PostgreSQL 8.3

#17

Josh Waihi - January 6, 2009 - 22:33
Status:patch (to be ported)» needs review
AttachmentSizeStatusTest resultOperations
statistics_postgresql_D6.patch1001 bytesIgnoredNoneNone

#18

druido - March 28, 2009 - 11:03

I think this patch does not fix the problem correctly, because in PostgreSQL 'char' is equivalent to 'char(1)', e.g., compare

mysql> SELECT CAST(57 AS CHAR);

+------------------+
| CAST(57 AS CHAR) |
+------------------+
| 57               |
+------------------+

with

psql> SELECT CAST(57 AS CHAR);

bpchar
--------
5
(1 row)

I have posted a patch that works for me here: http://drupal.org/node/412402. For convenience, I attach it to this node, too. Tested with Drupal 6.x.

AttachmentSizeStatusTest resultOperations
statistics.admin_.inc_.patch773 bytesIgnoredNoneNone

#19

Josh Waihi - March 29, 2009 - 20:36
Status:needs review» needs work

doing a sub select will slow the query down, can we cast as text instead?

josh=# SELECT CAST(57 AS TEXT);
text
------
57
(1 row)

#20

druido - March 30, 2009 - 14:01

TEXT is not supported by MySQL. In fact, I have found no way to cast an integer to a string that is compatible with both MySQL and PostgreSQL.

#21

Josh Waihi - March 30, 2009 - 23:58

we could write a special query function for the search module in database.pgsql.inc and database.mysql-common.inc to cast specifically for each DB??

#22

druido - March 31, 2009 - 14:48

I did a quick test by running the two queries of the last patch above against a table with ~10^6 rows. Curiously enough, the query with the subselect in the from clause is more or less twice as fast as the other (with uid cast to TEXT) in PostgreSQL 8.3. In MySQL 5.0, instead, the query using concat() is around 10 times faster than the other.

My test is certainly very limited in scope but it suggests that, if performance is an important factor, the way to go is to optimize the query according to the DBMS, i.e., something like:

if ($db_type == 'mysql' or $db_type == 'mysqli') {
  $sql_cnt = "SELECT COUNT(DISTINCT(CONCAT(uid, hostname))) FROM {accesslog}";
elseif ($db_type == 'pgsql') {
  $sql_cnt = "SELECT COUNT(*) FROM (SELECT DISTINCT UID, HOSTNAME FROM {accesslog}) AS FOO";
}
else {
  # default query
}

#23

Josh Waihi - March 31, 2009 - 19:51
Version:6.x-dev» 7.x-dev
Status:needs work» postponed

OK, well regardless, it seems D7 has a bug in it in regards to PostgreSQL not type-casting like MySQL does so we'll need to go back there and fix it. I've noticed there are a few Drupal casting issues about so I've opened #419858: Database Type Casting Per Database Connection to maybe remedy this issue for D7. Lets get HEAD right then port back to D6.

Postponed till #419858: Database Type Casting Per Database Connection is resolved.

#24

poka_dan - April 6, 2009 - 14:52

Had this issue as well. Hoping it will get fixed with the next release..

#25

ivanSB - May 13, 2009 - 19:56

trim(cast(10 AS char(10)))

works for postgresql and mysql

Since nothing seems changed across version it could be ported.

see #449736

AttachmentSizeStatusTest resultOperations
patch_trim_cast.patch1009 bytesIgnoredNoneNone

#26

Damien Tournoud - May 13, 2009 - 20:30
Status:postponed» active

I suggest changing the signature of the CONCAT() function in PostgreSQL. It is currently:

CREATE OR REPLACE FUNCTION "concat"(text, text) RETURNS text AS
  \'SELECT $1 || $2;\'
  LANGUAGE \'sql\''

Let's change that to:

CREATE OR REPLACE FUNCTION "concat"(anyelement, anyelement) RETURNS text AS
  \'SELECT CAST($1 AS TEXT) || CAST($2 AS TEXT);\'
  LANGUAGE \'sql\''

I have not tested this, but I believe it can solve this issue quite elegantly: no developer working on MySQL will figure out that we need to explicitely TRIM(CAST()) every whenever calling CONCAT() on non character columns.

#27

Josh Waihi - May 14, 2009 - 00:18

As discussed in IRC, @DamZ's function alteration won't work as anyelement parameters must be of the same datatype, however we can define a family of functions with the same name 'concat' that handle various datatypes as arguments:

CREATE OR REPLACE FUNCTION "concat"(text, int) RETURNS text AS 'SELECT $1 || $2::text;' LANGUAGE 'sql';
CREATE FUNCTION "concat"(int, text) RETURNS text AS 'SELECT $1::text || $2;' LANGUAGE 'sql';

We'd need to add a function definition for every datatype possibility - well, the datatypes we use in the schema api.

#28

ivanSB - May 14, 2009 - 12:25

Solution start to look more complicated than the problem and pretty hakish.
After all... CONCATenating int and text (and CASTing) shouldn't be something doesn't look something that shine for good design.

I see adding tons of CONCAT just to overcome this as polluting the code.

This is not just a performance problem. This is going to break a postgresql feature: stricter data type checking.
Better getting rid of the casts. Or when there isn't any viable solution use a "local" trick as

trim(cast(10 AS char(10)))

Do we really need to write that query with that CAST?
I know it is an historical query... but well that stat doesn't shine. It is mixing and splitting anonymous users and IP.
If I want to know what are the top visitors I can't. The top visitor may have several entries depending on the IP. If I want to know what is the top IP... I can't. If I want to know how many "distinct users I can't.

BTW this query looks the only one using CAST in core.

#29

Josh Waihi - July 7, 2009 - 02:18

Time to get this in (if not already). Marking for PostgreSQL Code Sprint

#30

Shiny - July 10, 2009 - 05:11

subscribing

#31

stormsweeper - July 31, 2009 - 16:05

I think you can define the implicit cast in 8.3, although I haven't tried it yet (rebuilding my test system):

http://www.postgresql.org/docs/8.3/static/sql-createcast.html

#32

stormsweeper - August 1, 2009 - 18:46

Yes, you can redefine the casts, but to do so you also need to redefine a bunch of old functions and you don't get a lot of benefit. If SQL rewriting was still a concern for D7, we could rewrite all concat() functions to use the || operator, which works as long as one input is text. That said, it's probably best to either abstract that in the SelectQuery interface somehow, or just nix any old queries using the concat() function.

#33

Dave Reid - October 12, 2009 - 03:41

Marked #544848: admin/reports/visitors as a duplicate of this issue.

#34

Damien Tournoud - October 12, 2009 - 03:51

I think it make sense to add implict numeric-to-string casts to PostgreSQL.

 
 

Drupal is a registered trademark of Dries Buytaert.