Posted by plj on September 23, 2009 at 3:04pm
| Project: | Spam |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed (duplicate) |
Issue Summary
There is a PostgreSQL problem in duplicate filter's mega-SQL that searches for blacklisted IPs; it complains about missing GROUP BY fields, and won't list anything.
I've attached a patch that fixes that, and also includes another fix, where Spam attempts to update spam_tracker as if content_id were a string ('%s' modifier), even though it is an integer. This causes an “invalid input syntax for integer” SQL error on Postgres, if the variable is empty.
| Attachment | Size |
|---|---|
| spam_psql.patch | 2.47 KB |
Comments
#1
(note: edited from my original for clarity/readability)
Thank you for your patch.
----
On the second fix, shouldn't an error be coming back if content_id is empty? Or, to ask another way, when is it the case that it is valid for content_id to be empty? Maybe my memory is rusty on the code.
----
On the first fix, you point out a SQL statement that may have seriously needed some looking at. Thank you.
I think, however, your fix doesn't work -- it gets rid of the error, but it breaks the purpose of the line:
sql = "SELECT * FROM (SELECT DISTINCT x.hostname, x.timestamp, COUNT(x.hostname) AS count FROM (SELECT timestamp, hostname FROM {spam_tracker} WHERE score > %d ORDER BY timestamp DESC) AS x GROUP BY x.hostname, x.timestamp) AS y WHERE y.count >= %d";will, since the chance is nearly 0% that anything will have an identical timestamp to something else, make everything have count = 1 and thereby effectively turn off the blacklist (while slowing everything down with an extra GROUP BY that matches everything individually).
The original:
sql = "SELECT * FROM (SELECT DISTINCT x.hostname, x.timestamp, COUNT(x.hostname) AS count FROM (SELECT timestamp, hostname FROM {spam_tracker} WHERE score > %d ORDER BY timestamp DESC) AS x GROUP BY x.hostname) AS y WHERE y.count >= %d";is better fixed by adding a MAX aggregate function to the timestamp (and removing the now-redundant ORDER BY and probably never actually necessary DISTINCT clauses), fixing your problem while returning the same thing as the MySQL would (which I believe your code would not):
sql = "SELECT * FROM (SELECT x.hostname, MAX(x.timestamp), COUNT(x.hostname) AS count FROM (SELECT timestamp, hostname FROM {spam_tracker} WHERE score > %d) AS x GROUP BY x.hostname) AS y WHERE y.count >= %d";Then the inner SQL simplifies to one SELECT clause, so the final statement would be:
sql = "SELECT * FROM (SELECT hostname, MAX(timestamp), COUNT(hostname) AS count FROM {spam_tracker} WHERE score > %d GROUP BY hostname) AS y WHERE y.count >= %d";That final SQL string should work in both MySQL and Postgres, and additionally execute a good deal faster than the first version.
#2
Rerolled the patch against 6.x-1.x-dev
Query 1 should run fine. I looked the replace function up
http://api.drupal.org/api/function/_db_query_callback/6
There is an (int) cast on the variable so it should be 0 or some int. Otherwise you are correct that the query breaks. I tested that.
For Query 2 I added your SQL which looks cool. :)
Still it needs review!
#3
Wooooah. Bigger problem with the content_id string vs. integer issue.
Recall that my first question was, "why would the content_id ever be blank?" What I was getting at was -- the problem isn't that it's typed as a string. SQL will know to clobber it to an integer anyway. The problem is that there are cases where it would be blank, and it should *never* be blank -- and that's the problem that needs fixing.
Well, look at the line itself:
- db_query("UPDATE {spam_tracker} SET score = %d, hostname = %d, timestamp = %d WHERE content_type = '%s' AND content_id = '%s'", $score, time(), $type, $id);
+ db_query("UPDATE {spam_tracker} SET score = %d, hostname = %d, timestamp = %d WHERE content_type = '%s' AND content_id = %d", $score, time(), $type, $id);
I pasted both, because both the original version and your version have the same problem.
score = %d
hostname = %d
timestamp = %d
content_type = %s
content_id = %s or %d
That's five replacement variables.
And yet the replacements are: $score, time(), $type, $id.
content_id is blank because it's *not getting a value*. Ever.
Looking at it that way, it's pretty obvious that between $score and time(), you need the hostname passed. (That, and the hostname should be a string replacement, not an integer.) That'll fix your original problem.
#4
This appears to be a duplicate of two other issues:
#282062: PostgreSQL Support - table creation in .install files
#937384: spam_content_update() is broken - inconsistent hostname derivation