As reported in http://drupal.org/node/220064 and in the PostgreSQL 8.3 release notes http://www.postgresql.org/docs/8.3/static/release-8-3.html there is a problem with certain queries in PostgreSQL 8.3 due to the fact that PostgreSQL 8.3 no longer automatically casts columns to match the data type in JOIN columns or in this case the column in a CONCAT operation.

The CONCAT function created for PostgreSQL in the Drupal install

      if (!db_result(db_query("SELECT COUNT(*) FROM pg_proc WHERE proname = 'concat'"))) {
        db_query('CREATE OR REPLACE FUNCTION "concat"(text, text) RETURNS text AS
          \'SELECT $1 || $2;\'
          LANGUAGE \'sql\''
        );
      }

expects TEXT data types. In previous PostgreSQL versions a non-text column would have been automatically cast to text. This will no longer work in 8.3.

The attached patches fix one query from the statistics module where CONCAT is used. I have included a patch for 6.1 and for 5.7.

Files: 
CommentFileSizeAuthor
#48 topvisitors_nocast_d6.patch1012 bytesbellHead
#45 topvisitors_nocast_1.patch919 bytesbellHead
PASSED: [[SimpleTest]]: [MySQL] 17,388 pass(es).
[ View ]
#43 topvisitors_nocast.patch975 bytesbellHead
PASSED: [[SimpleTest]]: [MySQL] 17,377 pass(es).
[ View ]
#25 patch_trim_cast.patch1009 bytesivanSB@drupal.org
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch patch_trim_cast_0.patch.
[ View ]
#18 statistics.admin_.inc_.patch773 byteslifepillar
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch statistics.admin_.inc__0.patch.
[ View ]
#17 statistics_postgresql_D6.patch1001 bytesJosh Waihi
#13 statistics_postgresql_0.patch936 bytesDave Reid
Unable to apply patch statistics_postgresql_0_0.patch
[ View ]
#9 statistics.module_7.patch666 bytesJosh Waihi
Failed: Failed to apply patch.
[ View ]
#5 statistics.module.patch666 bytesJosh Waihi
Failed: 7117 passes, 8 fails, 12 exceptions
[ View ]
#1 statistics_postgresql_0.patch936 bytesainigma32
Unable to apply patch statistics_postgresql_0.patch
[ View ]
statistics-6.1.admin_.inc_.patch731 bytesjaydub
Failed: Failed to apply patch.
[ View ]
statistics-5.7.module.patch756 bytesjaydub
Failed: Failed to apply patch.
[ View ]

Comments

Version:6.1» 7.x-dev
Category:support» bug
StatusFileSize
new936 bytes
Unable to apply patch statistics_postgresql_0.patch
[ View ]

This issue still exists in HEAD so I rerolled the patch.

Works fine on MySQL 5.0 but this needs to be tested on PostgreSQL 8.3 and I don't have that setup at the moment.

So: Calling all Postgres fans...

- Arie

Title:Top Visitor Page problem with PostgreSQL 8.3PostgreSQL surge #8: Top Visitor Page problem with PostgreSQL 8.3

Promoted to the PostgreSQL surge.

the stats module tests are currently broken in HEAD, applying this patch didn't fix it :(

ok so tests are failing because PDO doesn't like floats on int_unsigned columns. this maybe linked with this http://drupal.org/node/339588

StatusFileSize
new666 bytes
Failed: 7117 passes, 8 fails, 12 exceptions
[ View ]

this patch fixes the what I was commenting about above.
Statistics Module: 36 passes, 0 fails, and 0 exceptions

Status:Needs review» Needs work

The last submitted patch failed testing.

Status:Needs work» Needs review

I find it hard to believe that this patch failed as all it does is type cast a float to a integer. by the looks of the testbed result, it looks rather like something went wrong

StatusFileSize
new666 bytes
Failed: Failed to apply patch.
[ View ]

re-submitting previous patch that failed because there is no reason that it should have failed

edit: posted on wrong node

updating status

Status:Needs review» Reviewed & tested by the community

Expecting the DB layer to handle float to int type coercion is not reasonable. We can expect module authors to only pass in sane data. Trivial fix, the bot likes, it, yay.

Version:7.x-dev» 6.x-dev
Status:Reviewed & tested by the community» Needs work

I've committed this to CVS HEAD but it probably needs work for Drupal 6. Updating version and status. Thanks!

Version:6.x-dev» 7.x-dev
Priority:Normal» Critical
Status:Needs work» Needs review
StatusFileSize
new936 bytes
Unable to apply patch statistics_postgresql_0_0.patch
[ View ]

Apparently, the patch in #1 was supposed to also go in but wasn't merged with the patch in #9 (sorry Dries). Right now this is an incomplete commit to HEAD, so I'm bumping to critical. Reposting patch from #1 to bring it down to the bottom for review and retesting.

Status:Needs review» Reviewed & tested by the community

I have tested this and it works as stated in comment #3-#5

Version:7.x-dev» 6.x-dev
Status:Reviewed & tested by the community» Patch (to be ported)

Committed #13 to HEAD. Marking back to needs porting for 6.x.

Title:PostgreSQL surge #8: Top Visitor Page problem with PostgreSQL 8.3Top Visitor Page problem with PostgreSQL 8.3
Issue tags:+PostgreSQL Surge

Status:Patch (to be ported)» Needs review
StatusFileSize
new1001 bytes

StatusFileSize
new773 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch statistics.admin_.inc__0.patch.
[ View ]

I think this patch does not fix the problem correctly, because in PostgreSQL 'char' is equivalent to 'char(1)', e.g., compare

mysql> SELECT CAST(57 AS CHAR);
+------------------+
| CAST(57 AS CHAR) |
+------------------+
| 57               |
+------------------+

with

psql> SELECT CAST(57 AS CHAR);
bpchar
--------
5
(1 row)

I have posted a patch that works for me here: http://drupal.org/node/412402. For convenience, I attach it to this node, too. Tested with Drupal 6.x.

Status:Needs review» Needs work

doing a sub select will slow the query down, can we cast as text instead?

josh=# SELECT CAST(57 AS TEXT);
text
------
57
(1 row)

TEXT is not supported by MySQL. In fact, I have found no way to cast an integer to a string that is compatible with both MySQL and PostgreSQL.

we could write a special query function for the search module in database.pgsql.inc and database.mysql-common.inc to cast specifically for each DB??

I did a quick test by running the two queries of the last patch above against a table with ~10^6 rows. Curiously enough, the query with the subselect in the from clause is more or less twice as fast as the other (with uid cast to TEXT) in PostgreSQL 8.3. In MySQL 5.0, instead, the query using concat() is around 10 times faster than the other.

My test is certainly very limited in scope but it suggests that, if performance is an important factor, the way to go is to optimize the query according to the DBMS, i.e., something like:

if ($db_type == 'mysql' or $db_type == 'mysqli') {
  $sql_cnt = "SELECT COUNT(DISTINCT(CONCAT(uid, hostname))) FROM {accesslog}";
elseif ($db_type == 'pgsql') {
  $sql_cnt = "SELECT COUNT(*) FROM (SELECT DISTINCT UID, HOSTNAME FROM {accesslog}) AS FOO";
}
else {
  # default query
}

Version:6.x-dev» 7.x-dev
Status:Needs work» Postponed

OK, well regardless, it seems D7 has a bug in it in regards to PostgreSQL not type-casting like MySQL does so we'll need to go back there and fix it. I've noticed there are a few Drupal casting issues about so I've opened #419858: Database Type Casting Per Database Connection to maybe remedy this issue for D7. Lets get HEAD right then port back to D6.

Postponed till #419858: Database Type Casting Per Database Connection is resolved.

Had this issue as well. Hoping it will get fixed with the next release..

StatusFileSize
new1009 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch patch_trim_cast_0.patch.
[ View ]

trim(cast(10 AS char(10)))

works for postgresql and mysql

Since nothing seems changed across version it could be ported.

see #449736

Status:Postponed» Active

I suggest changing the signature of the CONCAT() function in PostgreSQL. It is currently:

CREATE OR REPLACE FUNCTION "concat"(text, text) RETURNS text AS
  \'SELECT $1 || $2;\'
  LANGUAGE \'sql\''

Let's change that to:

CREATE OR REPLACE FUNCTION "concat"(anyelement, anyelement) RETURNS text AS
  \'SELECT CAST($1 AS TEXT) || CAST($2 AS TEXT);\'
  LANGUAGE \'sql\''

I have not tested this, but I believe it can solve this issue quite elegantly: no developer working on MySQL will figure out that we need to explicitely TRIM(CAST()) every whenever calling CONCAT() on non character columns.

As discussed in IRC, @DamZ's function alteration won't work as anyelement parameters must be of the same datatype, however we can define a family of functions with the same name 'concat' that handle various datatypes as arguments:

CREATE OR REPLACE FUNCTION "concat"(text, int) RETURNS text AS 'SELECT $1 || $2::text;' LANGUAGE 'sql';
CREATE FUNCTION "concat"(int, text) RETURNS text AS 'SELECT $1::text || $2;' LANGUAGE 'sql';

We'd need to add a function definition for every datatype possibility - well, the datatypes we use in the schema api.

Solution start to look more complicated than the problem and pretty hakish.
After all... CONCATenating int and text (and CASTing) shouldn't be something doesn't look something that shine for good design.

I see adding tons of CONCAT just to overcome this as polluting the code.

This is not just a performance problem. This is going to break a postgresql feature: stricter data type checking.
Better getting rid of the casts. Or when there isn't any viable solution use a "local" trick as

trim(cast(10 AS char(10)))

Do we really need to write that query with that CAST?
I know it is an historical query... but well that stat doesn't shine. It is mixing and splitting anonymous users and IP.
If I want to know what are the top visitors I can't. The top visitor may have several entries depending on the IP. If I want to know what is the top IP... I can't. If I want to know how many "distinct users I can't.

BTW this query looks the only one using CAST in core.

Time to get this in (if not already). Marking for PostgreSQL Code Sprint

subscribing

I think you can define the implicit cast in 8.3, although I haven't tried it yet (rebuilding my test system):

http://www.postgresql.org/docs/8.3/static/sql-createcast.html

Yes, you can redefine the casts, but to do so you also need to redefine a bunch of old functions and you don't get a lot of benefit. If SQL rewriting was still a concern for D7, we could rewrite all concat() functions to use the || operator, which works as long as one input is text. That said, it's probably best to either abstract that in the SelectQuery interface somehow, or just nix any old queries using the concat() function.

Marked #544848: admin/reports/visitors as a duplicate of this issue.

I think it make sense to add implict numeric-to-string casts to PostgreSQL.

Issue tags:+Release blocker

Adding release blocker tag for D7 release.

subscribe

Issue tags:-Release blocker

Critical bugs are already release blockers, removing tag.

I'd like to investigate #28 a little more. Maybe we can re-write the query? CASTing is dirty anyway and a performance issue. It's more important for PostgreSQL not to CAST that it is for MySQL because, IMO, Drupal/MySQL sites are generally smaller sites compared to Drupal/PostgreSQL sites. When the database tables get bigger, CASTing becomes a bigger issue.

Alternately, while it maybe a bitch to maintain, specifying all the possible data types to concat with in function definitions maybe a better solution.

Can I get some thoughts on the topic?

Combining the ideas of #26 and #27 may give an easier to implement and manage answer.

The problem with anyelement parameter types is that they have to match, but I'd expect at least of the parameters for CONCATing will be text. So an answer (admittedly a pragmatic rather than perfect one) would be

CREATE OR REPLACE FUNCTION concat(anyelement, text) RETURNS text AS 'SELECT $1::text || $2;' LANGUAGE 'sql';
CREATE OR REPLACE FUNCTION concat(text, anyelement) RETURNS text AS 'SELECT $1 || $2::text;' LANGUAGE 'sql';

Which will allow anything to be CONCATed with text or one of it's implicit casts like varchar and char.

On the question of CAST and the like being dirty, I agree. It smells of presentation logic in the database. I'm not familiar enough with the code base or developer community to know if it's realistic to expect the problem to be eliminated that way though.

The CAST is really an alarm bell for something wrong at a higher abstraction level.

The queries in D6 parlage just to make the SQL code cleaner to read are:

SELECT COUNT(a.uid) AS hits, a.uid, u.name, a.hostname, SUM(a.timer) AS total, ac.aid FROM {accesslog} a LEFT JOIN {access} ac ON ac.type = 'host' AND LOWER(a.hostname) LIKE (ac.mask) LEFT JOIN {users} u ON a.uid = u.uid GROUP BY a.hostname, a.uid, u.name, ac.aid

and

SELECT COUNT(DISTINCT(CONCAT(uid, hostname))) FROM {accesslog}

The first is grouping accesses made by anonymous users from the same IP but separating same users accessing the site from different IP.
This statistic is named "Top visitors"... but well it is mixing and matching potatoes with apples.
Furthermore we've the "ban" link... but we aren't banning the user... rather its IP...

Produce meaningful an unambiguous stats and improve usability will get rid of the need of a CAST.

I sincerely won't miss the "Top visitors" page as it is.
Count "Top users" and/or "Top IP" or get rid off that page and the CAST/CONCAT problem as well.

You can't know what are the most active users. If you order by user... you don't have aggregated stats since they are separated by IP AND they won't be ordered by "activity".
Same if you order by IP...
Furthermore... banning IP from that page is not really practical, since an annoying IP may not be among the 30 used and there is no "search IP" feature... etc... etc... etc...

This part of the code is just an hindrance for something better that may have been already solved in contrib and it is dirt at many levels.

Drupal 7 is feature freeze. This means we cannot add OR remove features such as Top Visitors for Drupal 7. Let create a workaround patch. If we can get away with not using CAST (even if it means writing some more PHP) thats would be great. This isn't a page that gets high load or requested frequently so resource conservation isn't a huge deal here.

Once we have a fix for Drupal 7, I think Ivan has a vaild reason to request removal of this in Drupal 8.

In core "Top visitors" is the only place where that mess is made. No other place in core relies on that mess.
No contrib relies on accesslog (checked downloading all contrib) and statistics doesn't provide an API.

Wouldn't it be better to fix that mess and avoid delaying for more than one year getting rid of stuff that make working with pg a pain and promote bad habits?

Can anyone come up with a real use case for the current "Top visitors"?
Keeping this code means it has to be maintained.

No one is actually going to ban "Top IP" at PHP level.
etc... etc... etc...

That stat is a shame not a feature... it is so '90.

A lot of stats are better made outside drupal. The ones we should take care of are the one related to users and sessions where a log analyzer won't work.
There is just one place where accesslog.sid is used in statistict (hook_ranking) (BTW there is another CAST there).
So... maybe making that page a "Top users" now and taking care of pages visited before login in D8?

Status:Active» Needs review
StatusFileSize
new975 bytes
PASSED: [[SimpleTest]]: [MySQL] 17,377 pass(es).
[ View ]

Now with 100% less CAST.

The patch reformats the query to do a COUNT on a DISTINCT subquery, so no CONCAT, so no CAST. Works for mysql 5.1, postgres 8.3. Performance looks marginally better on my small local dataset.

The two addField() calls can be collapsed into a single ->fields('accesslog', array('uid', 'hostname')) call.

StatusFileSize
new919 bytes
PASSED: [[SimpleTest]]: [MySQL] 17,388 pass(es).
[ View ]

Cool, thanks for the tip. The OO approach to DB is something I'm still having some trouble adapting to.

Updated patch attached.

Status:Needs review» Reviewed & tested by the community

bot looks happy :http://qa.drupal.org/pifr/test/26648

I'm happy to.

Status:Reviewed & tested by the community» Fixed

Committed to CVS HEAD. Thanks.

Version:7.x-dev» 6.x-dev
Status:Fixed» Needs review
StatusFileSize
new1012 bytes

Patch in the same vein for 6.x

Seems to work on both PosgreSQL and MySQL. Someone needs to confirm that the paging works correctly.

Status:Needs review» Reviewed & tested by the community

At first when I started reading this issue, I got my usual annoyance at a pgsql bug that's over 2 years old. But after reading all the commentary, I actually learned something new about SQL, patch review, and drupal community development standards. Kudos to all of you for keeping up with these bugs and smashing them the right way. I'm really looking forward to DrupalCon SF 2010 and meeting some of you excellent people.

I just applied bellHead's #48 no cast patch for 6, and on a site with 13 pages worth of visitors. Everything seemed to come up fine, with separate rows for users with different ips, and the columns sorted their data fine, although I'm not quite sure what the point of the "Visitor" column is. I paged around and all the data seemed to be in the right place.

I think ivanSB's comments in #40 and #42 are right on - this report has a very limited use and would be much better if it were divided into Users and Sessions.

Priority:Critical» Normal
Status:Reviewed & tested by the community» Fixed

Uh, the top visitors list not paginating correctly is really not a critical issue in my eyes. Anyway, since it was already marked as such, I've committed the fix to Drupal 6, thanks!

Status:Fixed» Closed (fixed)
Issue tags:-PostgreSQL Surge, -PostgreSQL Code Sprint

Automatically closed -- issue fixed for 2 weeks with no activity.