As reported in http://drupal.org/node/220064 and in the PostgreSQL 8.3 release notes http://www.postgresql.org/docs/8.3/static/release-8-3.html there is a problem with certain queries in PostgreSQL 8.3 due to the fact that PostgreSQL 8.3 no longer automatically casts columns to match the data type in JOIN columns or in this case the column in a CONCAT operation.

The CONCAT function created for PostgreSQL in the Drupal install

      if (!db_result(db_query("SELECT COUNT(*) FROM pg_proc WHERE proname = 'concat'"))) {
        db_query('CREATE OR REPLACE FUNCTION "concat"(text, text) RETURNS text AS
          \'SELECT $1 || $2;\'
          LANGUAGE \'sql\''
        );
      }

expects TEXT data types. In previous PostgreSQL versions a non-text column would have been automatically cast to text. This will no longer work in 8.3.

The attached patches fix one query from the statistics module where CONCAT is used. I have included a patch for 6.1 and for 5.7.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

ainigma32’s picture

Version: 6.1 » 7.x-dev
Category: support » bug
FileSize
936 bytes

This issue still exists in HEAD so I rerolled the patch.

Works fine on MySQL 5.0 but this needs to be tested on PostgreSQL 8.3 and I don't have that setup at the moment.

So: Calling all Postgres fans...

- Arie

Damien Tournoud’s picture

Title: Top Visitor Page problem with PostgreSQL 8.3 » PostgreSQL surge #8: Top Visitor Page problem with PostgreSQL 8.3

Promoted to the PostgreSQL surge.

Josh Waihi’s picture

the stats module tests are currently broken in HEAD, applying this patch didn't fix it :(

Josh Waihi’s picture

ok so tests are failing because PDO doesn't like floats on int_unsigned columns. this maybe linked with this http://drupal.org/node/339588

Josh Waihi’s picture

FileSize
666 bytes

this patch fixes the what I was commenting about above.
Statistics Module: 36 passes, 0 fails, and 0 exceptions

Status: Needs review » Needs work

The last submitted patch failed testing.

Josh Waihi’s picture

Status: Needs work » Needs review

I find it hard to believe that this patch failed as all it does is type cast a float to a integer. by the looks of the testbed result, it looks rather like something went wrong

Dave Reid’s picture

Josh Waihi’s picture

FileSize
666 bytes

re-submitting previous patch that failed because there is no reason that it should have failed

Josh Waihi’s picture

edit: posted on wrong node

updating status

Crell’s picture

Status: Needs review » Reviewed & tested by the community

Expecting the DB layer to handle float to int type coercion is not reasonable. We can expect module authors to only pass in sane data. Trivial fix, the bot likes, it, yay.

Dries’s picture

Version: 7.x-dev » 6.x-dev
Status: Reviewed & tested by the community » Needs work

I've committed this to CVS HEAD but it probably needs work for Drupal 6. Updating version and status. Thanks!

Dave Reid’s picture

Version: 6.x-dev » 7.x-dev
Priority: Normal » Critical
Status: Needs work » Needs review
FileSize
936 bytes

Apparently, the patch in #1 was supposed to also go in but wasn't merged with the patch in #9 (sorry Dries). Right now this is an incomplete commit to HEAD, so I'm bumping to critical. Reposting patch from #1 to bring it down to the bottom for review and retesting.

Josh Waihi’s picture

Status: Needs review » Reviewed & tested by the community

I have tested this and it works as stated in comment #3-#5

webchick’s picture

Version: 7.x-dev » 6.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Committed #13 to HEAD. Marking back to needs porting for 6.x.

Damien Tournoud’s picture

Title: PostgreSQL surge #8: Top Visitor Page problem with PostgreSQL 8.3 » Top Visitor Page problem with PostgreSQL 8.3
Issue tags: +PostgreSQL Surge
Josh Waihi’s picture

Status: Patch (to be ported) » Needs review
FileSize
1001 bytes
lifepillar’s picture

FileSize
773 bytes

I think this patch does not fix the problem correctly, because in PostgreSQL 'char' is equivalent to 'char(1)', e.g., compare

mysql> SELECT CAST(57 AS CHAR);

+------------------+
| CAST(57 AS CHAR) |
+------------------+
| 57               | 
+------------------+

with

psql> SELECT CAST(57 AS CHAR);

bpchar 
--------
 5
(1 row)

I have posted a patch that works for me here: http://drupal.org/node/412402. For convenience, I attach it to this node, too. Tested with Drupal 6.x.

Josh Waihi’s picture

Status: Needs review » Needs work

doing a sub select will slow the query down, can we cast as text instead?

josh=# SELECT CAST(57 AS TEXT);
 text 
------
 57
(1 row)
lifepillar’s picture

TEXT is not supported by MySQL. In fact, I have found no way to cast an integer to a string that is compatible with both MySQL and PostgreSQL.

Josh Waihi’s picture

we could write a special query function for the search module in database.pgsql.inc and database.mysql-common.inc to cast specifically for each DB??

lifepillar’s picture

I did a quick test by running the two queries of the last patch above against a table with ~10^6 rows. Curiously enough, the query with the subselect in the from clause is more or less twice as fast as the other (with uid cast to TEXT) in PostgreSQL 8.3. In MySQL 5.0, instead, the query using concat() is around 10 times faster than the other.

My test is certainly very limited in scope but it suggests that, if performance is an important factor, the way to go is to optimize the query according to the DBMS, i.e., something like:

if ($db_type == 'mysql' or $db_type == 'mysqli') {
  $sql_cnt = "SELECT COUNT(DISTINCT(CONCAT(uid, hostname))) FROM {accesslog}";
elseif ($db_type == 'pgsql') {
  $sql_cnt = "SELECT COUNT(*) FROM (SELECT DISTINCT UID, HOSTNAME FROM {accesslog}) AS FOO";
}
else {
  # default query
} 
Josh Waihi’s picture

Version: 6.x-dev » 7.x-dev
Status: Needs work » Postponed

OK, well regardless, it seems D7 has a bug in it in regards to PostgreSQL not type-casting like MySQL does so we'll need to go back there and fix it. I've noticed there are a few Drupal casting issues about so I've opened #419858: Database Type Casting Per Database Connection to maybe remedy this issue for D7. Lets get HEAD right then port back to D6.

Postponed till #419858: Database Type Casting Per Database Connection is resolved.

pokadan’s picture

Had this issue as well. Hoping it will get fixed with the next release..

ivanSB@drupal.org’s picture

FileSize
1009 bytes

trim(cast(10 AS char(10)))

works for postgresql and mysql

Since nothing seems changed across version it could be ported.

see #449736

Damien Tournoud’s picture

Status: Postponed » Active

I suggest changing the signature of the CONCAT() function in PostgreSQL. It is currently:

CREATE OR REPLACE FUNCTION "concat"(text, text) RETURNS text AS
  \'SELECT $1 || $2;\'
  LANGUAGE \'sql\''

Let's change that to:

CREATE OR REPLACE FUNCTION "concat"(anyelement, anyelement) RETURNS text AS
  \'SELECT CAST($1 AS TEXT) || CAST($2 AS TEXT);\'
  LANGUAGE \'sql\''

I have not tested this, but I believe it can solve this issue quite elegantly: no developer working on MySQL will figure out that we need to explicitely TRIM(CAST()) every whenever calling CONCAT() on non character columns.

Josh Waihi’s picture

As discussed in IRC, @DamZ's function alteration won't work as anyelement parameters must be of the same datatype, however we can define a family of functions with the same name 'concat' that handle various datatypes as arguments:

CREATE OR REPLACE FUNCTION "concat"(text, int) RETURNS text AS 'SELECT $1 || $2::text;' LANGUAGE 'sql';
CREATE FUNCTION "concat"(int, text) RETURNS text AS 'SELECT $1::text || $2;' LANGUAGE 'sql';

We'd need to add a function definition for every datatype possibility - well, the datatypes we use in the schema api.

ivanSB@drupal.org’s picture

Solution start to look more complicated than the problem and pretty hakish.
After all... CONCATenating int and text (and CASTing) shouldn't be something doesn't look something that shine for good design.

I see adding tons of CONCAT just to overcome this as polluting the code.

This is not just a performance problem. This is going to break a postgresql feature: stricter data type checking.
Better getting rid of the casts. Or when there isn't any viable solution use a "local" trick as

trim(cast(10 AS char(10)))

Do we really need to write that query with that CAST?
I know it is an historical query... but well that stat doesn't shine. It is mixing and splitting anonymous users and IP.
If I want to know what are the top visitors I can't. The top visitor may have several entries depending on the IP. If I want to know what is the top IP... I can't. If I want to know how many "distinct users I can't.

BTW this query looks the only one using CAST in core.

Josh Waihi’s picture

Issue tags: +PostgreSQL Code Sprint

Time to get this in (if not already). Marking for PostgreSQL Code Sprint

Shiny’s picture

subscribing

stormsweeper’s picture

I think you can define the implicit cast in 8.3, although I haven't tried it yet (rebuilding my test system):

http://www.postgresql.org/docs/8.3/static/sql-createcast.html

stormsweeper’s picture

Yes, you can redefine the casts, but to do so you also need to redefine a bunch of old functions and you don't get a lot of benefit. If SQL rewriting was still a concern for D7, we could rewrite all concat() functions to use the || operator, which works as long as one input is text. That said, it's probably best to either abstract that in the SelectQuery interface somehow, or just nix any old queries using the concat() function.

Dave Reid’s picture

Marked #544848: admin/reports/visitors as a duplicate of this issue.

Damien Tournoud’s picture

I think it make sense to add implict numeric-to-string casts to PostgreSQL.

Dave Reid’s picture

Issue tags: +Release blocker

Adding release blocker tag for D7 release.

jfranklin’s picture

subscribe

catch’s picture

Issue tags: -Release blocker

Critical bugs are already release blockers, removing tag.

Josh Waihi’s picture

I'd like to investigate #28 a little more. Maybe we can re-write the query? CASTing is dirty anyway and a performance issue. It's more important for PostgreSQL not to CAST that it is for MySQL because, IMO, Drupal/MySQL sites are generally smaller sites compared to Drupal/PostgreSQL sites. When the database tables get bigger, CASTing becomes a bigger issue.

Alternately, while it maybe a bitch to maintain, specifying all the possible data types to concat with in function definitions maybe a better solution.

Can I get some thoughts on the topic?

bellHead’s picture

Combining the ideas of #26 and #27 may give an easier to implement and manage answer.

The problem with anyelement parameter types is that they have to match, but I'd expect at least of the parameters for CONCATing will be text. So an answer (admittedly a pragmatic rather than perfect one) would be

CREATE OR REPLACE FUNCTION concat(anyelement, text) RETURNS text AS 'SELECT $1::text || $2;' LANGUAGE 'sql';
CREATE OR REPLACE FUNCTION concat(text, anyelement) RETURNS text AS 'SELECT $1 || $2::text;' LANGUAGE 'sql';

Which will allow anything to be CONCATed with text or one of it's implicit casts like varchar and char.

On the question of CAST and the like being dirty, I agree. It smells of presentation logic in the database. I'm not familiar enough with the code base or developer community to know if it's realistic to expect the problem to be eliminated that way though.

ivanSB@drupal.org’s picture

The CAST is really an alarm bell for something wrong at a higher abstraction level.

The queries in D6 parlage just to make the SQL code cleaner to read are:

SELECT COUNT(a.uid) AS hits, a.uid, u.name, a.hostname, SUM(a.timer) AS total, ac.aid FROM {accesslog} a LEFT JOIN {access} ac ON ac.type = 'host' AND LOWER(a.hostname) LIKE (ac.mask) LEFT JOIN {users} u ON a.uid = u.uid GROUP BY a.hostname, a.uid, u.name, ac.aid

and

SELECT COUNT(DISTINCT(CONCAT(uid, hostname))) FROM {accesslog}

The first is grouping accesses made by anonymous users from the same IP but separating same users accessing the site from different IP.
This statistic is named "Top visitors"... but well it is mixing and matching potatoes with apples.
Furthermore we've the "ban" link... but we aren't banning the user... rather its IP...

Produce meaningful an unambiguous stats and improve usability will get rid of the need of a CAST.

I sincerely won't miss the "Top visitors" page as it is.
Count "Top users" and/or "Top IP" or get rid off that page and the CAST/CONCAT problem as well.

You can't know what are the most active users. If you order by user... you don't have aggregated stats since they are separated by IP AND they won't be ordered by "activity".
Same if you order by IP...
Furthermore... banning IP from that page is not really practical, since an annoying IP may not be among the 30 used and there is no "search IP" feature... etc... etc... etc...

This part of the code is just an hindrance for something better that may have been already solved in contrib and it is dirt at many levels.

Josh Waihi’s picture

Drupal 7 is feature freeze. This means we cannot add OR remove features such as Top Visitors for Drupal 7. Let create a workaround patch. If we can get away with not using CAST (even if it means writing some more PHP) thats would be great. This isn't a page that gets high load or requested frequently so resource conservation isn't a huge deal here.

Once we have a fix for Drupal 7, I think Ivan has a vaild reason to request removal of this in Drupal 8.

ivanSB@drupal.org’s picture

In core "Top visitors" is the only place where that mess is made. No other place in core relies on that mess.
No contrib relies on accesslog (checked downloading all contrib) and statistics doesn't provide an API.

Wouldn't it be better to fix that mess and avoid delaying for more than one year getting rid of stuff that make working with pg a pain and promote bad habits?

Can anyone come up with a real use case for the current "Top visitors"?
Keeping this code means it has to be maintained.

No one is actually going to ban "Top IP" at PHP level.
etc... etc... etc...

That stat is a shame not a feature... it is so '90.

A lot of stats are better made outside drupal. The ones we should take care of are the one related to users and sessions where a log analyzer won't work.
There is just one place where accesslog.sid is used in statistict (hook_ranking) (BTW there is another CAST there).
So... maybe making that page a "Top users" now and taking care of pages visited before login in D8?

bellHead’s picture

Status: Active » Needs review
FileSize
975 bytes

Now with 100% less CAST.

The patch reformats the query to do a COUNT on a DISTINCT subquery, so no CONCAT, so no CAST. Works for mysql 5.1, postgres 8.3. Performance looks marginally better on my small local dataset.

Crell’s picture

The two addField() calls can be collapsed into a single ->fields('accesslog', array('uid', 'hostname')) call.

bellHead’s picture

FileSize
919 bytes

Cool, thanks for the tip. The OO approach to DB is something I'm still having some trouble adapting to.

Updated patch attached.

Josh Waihi’s picture

Status: Needs review » Reviewed & tested by the community

bot looks happy :http://qa.drupal.org/pifr/test/26648

I'm happy to.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks.

bellHead’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Needs review
FileSize
1012 bytes

Patch in the same vein for 6.x

Dave Reid’s picture

Seems to work on both PosgreSQL and MySQL. Someone needs to confirm that the paging works correctly.

suzanne.aldrich’s picture

Status: Needs review » Reviewed & tested by the community

At first when I started reading this issue, I got my usual annoyance at a pgsql bug that's over 2 years old. But after reading all the commentary, I actually learned something new about SQL, patch review, and drupal community development standards. Kudos to all of you for keeping up with these bugs and smashing them the right way. I'm really looking forward to DrupalCon SF 2010 and meeting some of you excellent people.

I just applied bellHead's #48 no cast patch for 6, and on a site with 13 pages worth of visitors. Everything seemed to come up fine, with separate rows for users with different ips, and the columns sorted their data fine, although I'm not quite sure what the point of the "Visitor" column is. I paged around and all the data seemed to be in the right place.

I think ivanSB's comments in #40 and #42 are right on - this report has a very limited use and would be much better if it were divided into Users and Sessions.

Gábor Hojtsy’s picture

Priority: Critical » Normal
Status: Reviewed & tested by the community » Fixed

Uh, the top visitors list not paginating correctly is really not a critical issue in my eyes. Anyway, since it was already marked as such, I've committed the fix to Drupal 6, thanks!

Status: Fixed » Closed (fixed)
Issue tags: -PostgreSQL Surge, -PostgreSQL Code Sprint

Automatically closed -- issue fixed for 2 weeks with no activity.