Column type of int_unsigned causing pdoexception
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | database system |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Josh Waihi |
| Status: | closed |
| Issue tags: | PostgreSQL Surge |
I discovered this issue while trying to figure out why #330633: Temporary file cleanup needs some love (UnitTest included!) was breaking the pgsql intall. I traced the error down to the fact that the files timestamp column is of type int_unsigned, and as such, this query:
SELECT fid FROM {files} WHERE status & :permanent != :permanent AND timestamp < :timestamp
causes this error. While, after changing the column type to integer (similar to the created column in the nodes table), the problem goes away.
There's a few issues here. Does the timestamp column in the files table needs to be specified as unsigned? If not, changing that will fix the immediate problem in #330633: Temporary file cleanup needs some love (UnitTest included!). The larger issue though, is that setting an unsigned flag for a column is causing PDOExceptions...

#1
Interesting. Promoted to the PostgreSQL surge.
#2
as talked about in IRC with Crell, webchick and Damz, since it is a PostgreSQL API issue related with PDO, the best solution is to not use unsigned. this means there will be a loss of 1bit used to determine a positive or negative value - its kind of a bug anyway to use an unsigned column as a timestamp.
to make up for this bit, we'll make int -> bigint instead.
patch on its way, will put it through simpletests first
#3
holding off making a patch until unit tests pass first.
#4
Guys make this easy on yourself. Change the unsigned to a bigint and use a check constraint. Alternatively use a domain but check constraint should be faster.
#5
a timestamp should not be any integer type, but type timestamp, which is available in all standard conforming databases and allows calculating, comparison and validity checking.
#6
@#5, sadly it does not exist in a standard way across all databases we wish to support. The date calculation functions are inconsistent, and timezone handling is horrid and not commonly supported.
We'd love to use real timestamps, if they didn't suck so much. :-)
@linuxpoet: Do you mean across all of Drupal, or just convert in the postgres driver (the current plan)? A patch for the latter would be awesome. Thanks. :-)
#7
Attached is a patch that converts 'unsigned' types to a simple check constraint. The logic was already there, but only being applied for columns of type serial.
#8
@jhedstrom: what does that means in term of data size limit?
#9
Good call Damien...I forgot about that extra bit of storage saved on unsigned types. We can stick with the suggestion in #4, and change them all to bigint. If that makes sense, I can try and roll a patch for that.
#10
@jhedstrom: I'm not even sure about that. In SQL, int(10) means "a ten-figure-long integer". So maybe it doesn't even matter. I would just like to have a confirmation.
#11
What if we kept the type the same, and simply bumped the storage limit up by one in the case of unsigned? For example, unsigned int(10) would become int(11) with the check constraint.
#12
Wait, nvm, I think that missed the point...essentially I don't know enough about the internals to give the confirmation you're looking for.
#13
there is another small issue here, as a signed int, the max number that can be stored is 2147483647, see @Shiny's post http://coffee.geek.nz/node/19437 and PHP will round down any number over that value! so even if we made the column a big int, which maybe we should, PHP would still round down unless its a string.
#14
php will *silent* change the int to 2147483647 if it's greater than that, and used in sprintf('%d')
#15
ok did anyone see this?: http://drupal.pastebin.com/f7e89895
that comes from system.install function system_install. no wonder PDO throughs the exception, it doesn't know about the datatype because natively it doesn't exist. and the term int_unsigned is false anyway because the column still is signed. postgres shouldn't be pushed around like that just to fit in with mysql.
since postgres doesn't natively support unsigned integers, I propose that we remove the 'CREATE DOMAIN ....' parts at least and let postgres just use int.
int means the highest number you can have is 2147483647 which is half of mysql's unsigned int so to combat that we can default unsigned integers to bigint which uses 8 bytes instead of 4 which means the highest number becomes 17179869176 ( I think) which is better.
#16
Ok, agreed. So let's default:
int, CHECK > 0bigint, CHECK > 0bigint, CHECK > 0About the latter: MySQL does all bigint arithmetics in a signed manner. And adds: "you should not use unsigned big integers larger than 9223372036854775807 (63 bits) except with bit functions!". So nobody will notice.
About the patch:
<?phpif ($spec['type'] == 'serial') {
$sql .= " CHECK ($name >= 0)";
}
- else {
- $sql .= '_unsigned';
- }
?>
I think you want to remove the if statement.
<?php- // Create unsigned types.
- if (!db_result(db_query("SELECT COUNT(*) FROM pg_constraint WHERE conname = 'int_unsigned_check'"))) {
- db_query("CREATE DOMAIN int_unsigned integer CHECK (VALUE >= 0)");
- }
- if (!db_result(db_query("SELECT COUNT(*) FROM pg_constraint WHERE conname = 'smallint_unsigned_check'"))) {
- db_query("CREATE DOMAIN smallint_unsigned smallint CHECK (VALUE >= 0)");
- }
- if (!db_result(db_query("SELECT COUNT(*) FROM pg_constraint WHERE conname = 'bigint_unsigned_check'"))) {
- db_query("CREATE DOMAIN bigint_unsigned bigint CHECK (VALUE >= 0)");
- }
-
?>
Good... but we will also need an upgrade path. We will need to migrate those *_unsigned columns to their proper native types. And drop those domains.
#17
Below is a query that will find all the fields using the domains to be dropped and altered to standard postgres types.
This query works on PG 8.2 and up, but it's been a long time since I've used any version less than that, so further testing/verification should be done on supported PG versions before it makes it into an update script.
SELECT c.relname AS table, a.attname AS field, pg_catalog.format_type(a.atttypid, a.atttypmod) AS type FROM pg_catalog.pg_attribute a LEFT JOIN pg_class c ON (c.oid = a.attrelid) WHERE pg_catalog.pg_table_is_visible(c.oid) AND c.relkind = 'r' AND pg_catalog.format_type(a.atttypid, a.atttypmod) LIKE '%unsigned%';I won't have time to add this to the patch until sometime next week, but figured I would post it here if anybody else wanted to jump on this.
#18
We don't support postgres below 8.3, so no need to test below that.
#19
@jhedstrom: done.
#20
#21
It occurred to me after I posted that query above, that the database schema will need to be updated for the changed column types.
http://api.drupal.org/api/group/schemaapi/7
Also, the schema could be used to find the same results as the query posted above, but, in the event of a corrupted schema (if such a thing is possible), the above query would be more inclusive.
#22
@jhedstrom: I don't see any need to change the schema.
On the patch in #19:
<?php/**
+ * remove custom datatype *_unsigned in postgres
+ */
?>
- "remove" should start with an uppercase, and the whole comment must end with a dot (please see http://drupal.org/node/1354 for more details).
- We need to wrap that function code in something like
if (db_engine() == 'pgsql').- Shouldn't we remove the old datatypes we don't use anymore?
#23
@Damien
You're right. I forgot that, internal to the schema, this won't represent a datatype change.
#24
@Damien
Done. :)
#25
@Damien
Done. :)
#26
It looks like there is some whitespace problems in system_update_7016. T
<?php+ }
+ $ret[] = update_sql('ALTER TABLE {' . $row->table . '} ALTER COLUMN ' . $row->field . ' TYPE ' . $datatype);
+ $ret[] = update_sql('ALTER TABLE {' . $row->table . '} ADD CHECK (' . $row->field . ' >= 0)');
+ }
?>
The two $ret[]s are one space too short of being align in the while loop.
#27
better formatting per CalebD's request
#28
Elsewhere in Drupal, we simply define timestamp as:
'timestamp' => array('description' => 'A Unix timestamp indicating when this batch was submitted for processing. Stale batches are purged at cron time.',
'type' => 'int',
'not null' => TRUE,
),
so can't we simply get rid of the 'unsigned'?
Of course, this patch would still be useful otherwise.
Also, there are some code style issues here. The following line contains various issues, for example:
//only run querys if the driver used is postgres#29
<?php- // We create some custom types and functions using global names instead of
+ // We create some functions using global names instead of
// prefixing them like we do with table names. If this function is ever
// called again (for example, by the test framework when creating prefixed
// test databases), the global names will already exist. We therefore avoid
// trying to create them again in that case.
?>
Please properly reformat this text block.
<?php+ //only run querys if the driver used is postgres
+ if (db_driver() != 'pgsql') return array();
?>
Two small issues in the above lines: (1) comments are in the form "// My sentence is properly capitalized and ends with a dot.", and (2) it would be cleaner as
if (db_driver() == 'pgsql') {.<?php+ $ret[] = update_sql('ALTER TABLE {' . $row->table . '} ALTER COLUMN ' . $row->field . ' TYPE ' . $datatype);
+ $ret[] = update_sql('ALTER TABLE {' . $row->table . '} ADD CHECK (' . $row->field . ' >= 0)');
?>
I think the '{}' should be removed.
#30
#31
There are still some code comments that are not conform. Search for "bump the".
Otherwise this looks good but I can't really test it -- no PostgreSQL installed on this machine and I'm _still_ at the airport.
#32
fair call, I've added a bit of a comment block to explain why we do what we're doing here
#33
Hmm, I've looked over the patch, and I have made a few modifications to the phrasing of the comments and formatting of the SQL code, but otherwise, this looks fine. I'm currently waiting on the unit tests to run…
#34
Please use proper case for things like "postgres" and "drupal" in code comments.
#35
I'm not entirely sure that "postgres" should be uppercase, since it's not the proper name, but I've changed it to PostgreSQL where applicable and corrected capitalisation of Drupal.
That aside, the tests still don't pass, but I don't think that it is a function of this patch, but rather that #337796: Make all tests pass on PostgreSQL hasn't been committed yet.
#36
I have run postgres unit tests with 100% pass rate - as @mikl said, other patches are required to do this. this is one of
#37
I do not see any other problems with this patch via visual inspection, but I'm not at home right now so I can't test on an actual PostgreSQL install.
Setting to code needs review. Will try to test it later tonight.
#38
#39
I was going to try testing the upgrade path with this patch but got hamstrung by #351002: Cannot update D6 to D7 on pgsql
#40
whoops, restoring the title.
#41
I think this is fundamentally sound but there's a few nitpicky things. The indenting of the
break;lines in system_update_7016() is incorrect. They should be indented beyond thecasestatements.Did some further testing:
* Upgraded a D6 database after using #349671: Make the postgresql driver independant of the schema and it seemed to work fine.
* Upgraded an existing D7 database.
* Tested a clean installation and it worked fine -- It also allowed me to apply the #330633: Temporary file cleanup needs some love (UnitTest included!) and get the cron tests to pass.
#42
Both #330633: Temporary file cleanup needs some love (UnitTest included!) and #341910: file_space_used() not checking properly checking bitmapped status value (adds unit tests) are marked as postponed waiting on this.
#43
Fixed the indendation as per #41.
#44
sorry, i just noticed that the breaks in includes/database/pgsql/schema.inc have the same indenting problem.
#45
I won't claim to be a Postgres expert, but on visual inspection #43 seems sane to me (modulo code formatting issues that drewish already seems to have under control).
#46
#44: Oh, yeah. Missed that too. Re-rolling.
#47
Committed to CVS HEAD. Thanks.
#48
thanks to mikl, drewish and Crell for keeping up the good work over chrsitmas!
#49
#50
closing
#51
I just had to point out that int(10) does NOT mean a 10 wide integer. It has no meaning outside of MySQL, such as on postgresql where int(10) generates an error, as it should. If you want a 10 wide int, use numeric(10). In MySQL, int(10), which certainly looks to mean a 10 wide int, actually means an int that's padded on retrieval with spaces to take up 10 characters width. I.e. it's a output formatting option, not an input / storage option.
I see this mistake all the time. And it's quite understandable. It's one of the most counterintuitive type definitions in existence.