Column type of int_unsigned causing pdoexception

jhedstrom - November 26, 2008 - 22:13
Project:Drupal
Version:7.x-dev
Component:database system
Category:bug report
Priority:normal
Assigned:Josh Waihi
Status:closed
Issue tags:PostgreSQL Surge
Description

I discovered this issue while trying to figure out why #330633: Temporary file cleanup needs some love (UnitTest included!) was breaking the pgsql intall. I traced the error down to the fact that the files timestamp column is of type int_unsigned, and as such, this query:

SELECT fid FROM {files} WHERE status & :permanent != :permanent AND timestamp < :timestamp

causes this error. While, after changing the column type to integer (similar to the created column in the nodes table), the problem goes away.

There's a few issues here. Does the timestamp column in the files table needs to be specified as unsigned? If not, changing that will fix the immediate problem in #330633: Temporary file cleanup needs some love (UnitTest included!). The larger issue though, is that setting an unsigned flag for a column is causing PDOExceptions...

#1

Damien Tournoud - November 26, 2008 - 22:33
Title:pgsql: column type of int_unsigned causing pdoexception» PostgreSQL surge #5: column type of int_unsigned causing pdoexception

Interesting. Promoted to the PostgreSQL surge.

#2

Josh Waihi - December 2, 2008 - 01:28
Assigned to:Anonymous» Josh Waihi

as talked about in IRC with Crell, webchick and Damz, since it is a PostgreSQL API issue related with PDO, the best solution is to not use unsigned. this means there will be a loss of 1bit used to determine a positive or negative value - its kind of a bug anyway to use an unsigned column as a timestamp.
to make up for this bit, we'll make int -> bigint instead.

patch on its way, will put it through simpletests first

#3

Josh Waihi - December 4, 2008 - 03:46

holding off making a patch until unit tests pass first.

#4

linuxpoet - December 5, 2008 - 02:29

Guys make this easy on yourself. Change the unsigned to a bigint and use a check constraint. Alternatively use a domain but check constraint should be faster.

#5

holgerjakobs - December 5, 2008 - 14:47

a timestamp should not be any integer type, but type timestamp, which is available in all standard conforming databases and allows calculating, comparison and validity checking.

#6

Crell - December 5, 2008 - 18:31

@#5, sadly it does not exist in a standard way across all databases we wish to support. The date calculation functions are inconsistent, and timezone handling is horrid and not commonly supported.

We'd love to use real timestamps, if they didn't suck so much. :-)

@linuxpoet: Do you mean across all of Drupal, or just convert in the postgres driver (the current plan)? A patch for the latter would be awesome. Thanks. :-)

#7

jhedstrom - December 12, 2008 - 01:07
Status:active» needs review

Attached is a patch that converts 'unsigned' types to a simple check constraint. The logic was already there, but only being applied for columns of type serial.

AttachmentSizeStatusTest resultOperations
339588.patch727 bytesIdlePassed: 7991 passes, 0 fails, 0 exceptionsView details | Re-test

#8

Damien Tournoud - December 12, 2008 - 01:45

@jhedstrom: what does that means in term of data size limit?

#9

jhedstrom - December 12, 2008 - 01:51
Status:needs review» needs work

Good call Damien...I forgot about that extra bit of storage saved on unsigned types. We can stick with the suggestion in #4, and change them all to bigint. If that makes sense, I can try and roll a patch for that.

#10

Damien Tournoud - December 12, 2008 - 01:53

@jhedstrom: I'm not even sure about that. In SQL, int(10) means "a ten-figure-long integer". So maybe it doesn't even matter. I would just like to have a confirmation.

#11

jhedstrom - December 12, 2008 - 01:56

What if we kept the type the same, and simply bumped the storage limit up by one in the case of unsigned? For example, unsigned int(10) would become int(11) with the check constraint.

#12

jhedstrom - December 12, 2008 - 01:57

Wait, nvm, I think that missed the point...essentially I don't know enough about the internals to give the confirmation you're looking for.

#13

Josh Waihi - December 12, 2008 - 03:11

there is another small issue here, as a signed int, the max number that can be stored is 2147483647, see @Shiny's post http://coffee.geek.nz/node/19437 and PHP will round down any number over that value! so even if we made the column a big int, which maybe we should, PHP would still round down unless its a string.

#14

Josh Waihi - December 12, 2008 - 03:12

php will *silent* change the int to 2147483647 if it's greater than that, and used in sprintf('%d')

#15

Josh Waihi - December 12, 2008 - 06:13

ok did anyone see this?: http://drupal.pastebin.com/f7e89895
that comes from system.install function system_install. no wonder PDO throughs the exception, it doesn't know about the datatype because natively it doesn't exist. and the term int_unsigned is false anyway because the column still is signed. postgres shouldn't be pushed around like that just to fit in with mysql.

since postgres doesn't natively support unsigned integers, I propose that we remove the 'CREATE DOMAIN ....' parts at least and let postgres just use int.
int means the highest number you can have is 2147483647 which is half of mysql's unsigned int so to combat that we can default unsigned integers to bigint which uses 8 bytes instead of 4 which means the highest number becomes 17179869176 ( I think) which is better.

AttachmentSizeStatusTest resultOperations
remove-pgsql-unsigned_int.patch2.92 KBIdlePassed: 8010 passes, 0 fails, 0 exceptionsView details | Re-test

#16

Damien Tournoud - December 12, 2008 - 08:08

Ok, agreed. So let's default:

  • unsigned smallint to int, CHECK > 0
  • unsigned int to bigint, CHECK > 0
  • and unsigned bigint to bigint, CHECK > 0

About the latter: MySQL does all bigint arithmetics in a signed manner. And adds: "you should not use unsigned big integers larger than 9223372036854775807 (63 bits) except with bit functions!". So nobody will notice.

About the patch:

<?php
      
if ($spec['type'] == 'serial') {
        
$sql .= " CHECK ($name >= 0)";
       }
-      else {
-       
$sql .= '_unsigned';
-      }
?>

I think you want to remove the if statement.

<?php
-    // Create unsigned types.
-    if (!db_result(db_query("SELECT COUNT(*) FROM pg_constraint WHERE conname = 'int_unsigned_check'"))) {
-     
db_query("CREATE DOMAIN int_unsigned integer CHECK (VALUE >= 0)");
-    }
-    if (!
db_result(db_query("SELECT COUNT(*) FROM pg_constraint WHERE conname = 'smallint_unsigned_check'"))) {
-     
db_query("CREATE DOMAIN smallint_unsigned smallint CHECK (VALUE >= 0)");
-    }
-    if (!
db_result(db_query("SELECT COUNT(*) FROM pg_constraint WHERE conname = 'bigint_unsigned_check'"))) {
-     
db_query("CREATE DOMAIN bigint_unsigned bigint CHECK (VALUE >= 0)");
-    }
-
?>

Good... but we will also need an upgrade path. We will need to migrate those *_unsigned columns to their proper native types. And drop those domains.

#17

jhedstrom - December 12, 2008 - 20:01

Below is a query that will find all the fields using the domains to be dropped and altered to standard postgres types.

This query works on PG 8.2 and up, but it's been a long time since I've used any version less than that, so further testing/verification should be done on supported PG versions before it makes it into an update script.

SELECT c.relname AS table, a.attname AS field, pg_catalog.format_type(a.atttypid, a.atttypmod) AS type FROM pg_catalog.pg_attribute a LEFT JOIN pg_class c ON (c.oid =  a.attrelid) WHERE pg_catalog.pg_table_is_visible(c.oid) AND c.relkind = 'r' AND pg_catalog.format_type(a.atttypid, a.atttypmod) LIKE '%unsigned%';

I won't have time to add this to the patch until sometime next week, but figured I would post it here if anybody else wanted to jump on this.

#18

catch - December 12, 2008 - 20:38

We don't support postgres below 8.3, so no need to test below that.

#19

Josh Waihi - December 12, 2008 - 23:00

@jhedstrom: done.

AttachmentSizeStatusTest resultOperations
pgsql-schema-2.patch4.14 KBIdleFailed: 7863 passes, 6 fails, 117 exceptionsView details | Re-test

#20

Josh Waihi - December 13, 2008 - 05:50
Status:needs work» needs review

#21

jhedstrom - December 15, 2008 - 17:21
Status:needs review» needs work

It occurred to me after I posted that query above, that the database schema will need to be updated for the changed column types.

http://api.drupal.org/api/group/schemaapi/7

Also, the schema could be used to find the same results as the query posted above, but, in the event of a corrupted schema (if such a thing is possible), the above query would be more inclusive.

#22

Damien Tournoud - December 15, 2008 - 17:29

@jhedstrom: I don't see any need to change the schema.

On the patch in #19:

<?php
/**
+ * remove custom datatype *_unsigned in postgres
+ */
?>

- "remove" should start with an uppercase, and the whole comment must end with a dot (please see http://drupal.org/node/1354 for more details).

- We need to wrap that function code in something like if (db_engine() == 'pgsql').

- Shouldn't we remove the old datatypes we don't use anymore?

#23

jhedstrom - December 15, 2008 - 17:46

@Damien

You're right. I forgot that, internal to the schema, this won't represent a datatype change.

#24

Josh Waihi - December 16, 2008 - 20:40

@Damien

Done. :)

AttachmentSizeStatusTest resultOperations
remove-pgsql-unsigned_int.patch4.45 KBIdlePassed: 8010 passes, 0 fails, 0 exceptionsView details | Re-test

#25

Josh Waihi - December 16, 2008 - 20:40
Status:needs work» needs review

@Damien

Done. :)

#26

CalebD - December 18, 2008 - 20:41

It looks like there is some whitespace problems in system_update_7016. T

<?php
+    }
+  
$ret[] = update_sql('ALTER TABLE {' . $row->table . '} ALTER COLUMN ' . $row->field . ' TYPE ' . $datatype);
+  
$ret[] = update_sql('ALTER TABLE {' . $row->table . '} ADD CHECK (' . $row->field . ' >= 0)');
+  }
?>

The two $ret[]s are one space too short of being align in the while loop.

#27

Josh Waihi - December 19, 2008 - 05:01

better formatting per CalebD's request

AttachmentSizeStatusTest resultOperations
remove-pgsql-unsigned_int.patch4.46 KBIdlePassed: 7991 passes, 0 fails, 0 exceptionsView details | Re-test

#28

Dries - December 19, 2008 - 15:46

Elsewhere in Drupal, we simply define timestamp as:

      'timestamp' => array(
        'description' => 'A Unix timestamp indicating when this batch was submitted for processing. Stale batches are purged at cron time.',
        'type' => 'int',
        'not null' => TRUE,
      ),

so can't we simply get rid of the 'unsigned'?

Of course, this patch would still be useful otherwise.

Also, there are some code style issues here. The following line contains various issues, for example:
//only run querys if the driver used is postgres

#29

Damien Tournoud - December 20, 2008 - 04:18
Status:needs review» needs work

<?php
-    // We create some custom types and functions using global names instead of
+    // We create some functions using global names instead of
     // prefixing them like we do with table names. If this function is ever
     // called again (for example, by the test framework when creating prefixed
     // test databases), the global names will already exist. We therefore avoid
     // trying to create them again in that case.
?>

Please properly reformat this text block.

<?php
//only run querys if the driver used is postgres
+  if (db_driver() != 'pgsql') return array();
?>

Two small issues in the above lines: (1) comments are in the form "// My sentence is properly capitalized and ends with a dot.", and (2) it would be cleaner as if (db_driver() == 'pgsql') {.

<?php
+    $ret[] = update_sql('ALTER TABLE {' . $row->table . '} ALTER COLUMN ' . $row->field . ' TYPE ' . $datatype);
+   
$ret[] = update_sql('ALTER TABLE {' . $row->table . '} ADD CHECK (' . $row->field . ' >= 0)');
?>

I think the '{}' should be removed.

#30

Josh Waihi - December 20, 2008 - 04:45
Status:needs work» needs review
AttachmentSizeStatusTest resultOperations
remove-pgsql-unsigned_int_1.patch4.69 KBIdlePassed: 8010 passes, 0 fails, 0 exceptionsView details | Re-test

#31

Dries - December 20, 2008 - 18:07
Status:needs review» needs work

There are still some code comments that are not conform. Search for "bump the".

Otherwise this looks good but I can't really test it -- no PostgreSQL installed on this machine and I'm _still_ at the airport.

#32

Josh Waihi - December 20, 2008 - 20:25
Status:needs work» needs review

fair call, I've added a bit of a comment block to explain why we do what we're doing here

AttachmentSizeStatusTest resultOperations
remove-pgsql-unsigned_int_1.patch5.03 KBIdlePassed: 7991 passes, 0 fails, 0 exceptionsView details | Re-test

#33

mikl - December 21, 2008 - 23:53

Hmm, I've looked over the patch, and I have made a few modifications to the phrasing of the comments and formatting of the SQL code, but otherwise, this looks fine. I'm currently waiting on the unit tests to run…

AttachmentSizeStatusTest resultOperations
remove-pgsql-unsigned_int_1_2.patch5.16 KBIdlePassed: 7991 passes, 0 fails, 0 exceptionsView details | Re-test

#34

keith.smith - December 22, 2008 - 00:40
Status:needs review» needs work

Please use proper case for things like "postgres" and "drupal" in code comments.

#35

mikl - December 22, 2008 - 07:38

Please use proper case for things like "postgres" and "drupal" in code comments.

I'm not entirely sure that "postgres" should be uppercase, since it's not the proper name, but I've changed it to PostgreSQL where applicable and corrected capitalisation of Drupal.

That aside, the tests still don't pass, but I don't think that it is a function of this patch, but rather that #337796: Make all tests pass on PostgreSQL hasn't been committed yet.

AttachmentSizeStatusTest resultOperations
remove-pgsql-unsigned_int_1_3.patch5.67 KBIdlePassed: 7991 passes, 0 fails, 0 exceptionsView details | Re-test

#36

Josh Waihi - December 23, 2008 - 13:32

I have run postgres unit tests with 100% pass rate - as @mikl said, other patches are required to do this. this is one of

#37

CalebD - December 23, 2008 - 21:08
Status:needs work» needs review

I do not see any other problems with this patch via visual inspection, but I'm not at home right now so I can't test on an actual PostgreSQL install.

Setting to code needs review. Will try to test it later tonight.

#38

Damien Tournoud - December 24, 2008 - 19:18
Title:PostgreSQL surge #5: column type of int_unsigned causing pdoexception» PostgreSQL surge #5: remove *_unsigned types and rework their sizes

#39

drewish - December 25, 2008 - 00:49
Title:PostgreSQL surge #5: remove *_unsigned types and rework their sizes» PostgreSQL surge #5: column type of int_unsigned causing pdoexception

I was going to try testing the upgrade path with this patch but got hamstrung by #351002: Cannot update D6 to D7 on pgsql

#40

drewish - December 25, 2008 - 00:51

whoops, restoring the title.

#41

drewish - December 25, 2008 - 03:40
Status:needs review» needs work

I think this is fundamentally sound but there's a few nitpicky things. The indenting of the break; lines in system_update_7016() is incorrect. They should be indented beyond the case statements.

Did some further testing:
* Upgraded a D6 database after using #349671: Make the postgresql driver independant of the schema and it seemed to work fine.
* Upgraded an existing D7 database.
* Tested a clean installation and it worked fine -- It also allowed me to apply the #330633: Temporary file cleanup needs some love (UnitTest included!) and get the cron tests to pass.

#43

mikl - December 25, 2008 - 08:31
Status:needs work» reviewed & tested by the community

Fixed the indendation as per #41.

AttachmentSizeStatusTest resultOperations
remove-pgsql-unsigned_int_1_4.patch5.67 KBIdleUnable to apply patch remove-pgsql-unsigned_int_1_4.patchView details | Re-test

#44

drewish - December 25, 2008 - 08:47
Status:reviewed & tested by the community» needs work

sorry, i just noticed that the breaks in includes/database/pgsql/schema.inc have the same indenting problem.

#45

Crell - December 25, 2008 - 09:12

I won't claim to be a Postgres expert, but on visual inspection #43 seems sane to me (modulo code formatting issues that drewish already seems to have under control).

#46

mikl - December 25, 2008 - 11:12
Status:needs work» reviewed & tested by the community

#44: Oh, yeah. Missed that too. Re-rolling.

AttachmentSizeStatusTest resultOperations
remove-pgsql-unsigned_int_1_5.patch5.68 KBIdlePassed: 8010 passes, 0 fails, 0 exceptionsView details | Re-test

#47

Dries - December 26, 2008 - 11:04
Status:reviewed & tested by the community» fixed

Committed to CVS HEAD. Thanks.

#48

Josh Waihi - December 26, 2008 - 22:48

thanks to mikl, drewish and Crell for keeping up the good work over chrsitmas!

#49

Damien Tournoud - January 5, 2009 - 11:26
Title:PostgreSQL surge #5: column type of int_unsigned causing pdoexception» Column type of int_unsigned causing pdoexception

#50

Josh Waihi - January 6, 2009 - 05:14
Status:fixed» closed

closing

#51

Scott_Marlowe - January 12, 2009 - 21:22

I just had to point out that int(10) does NOT mean a 10 wide integer. It has no meaning outside of MySQL, such as on postgresql where int(10) generates an error, as it should. If you want a 10 wide int, use numeric(10). In MySQL, int(10), which certainly looks to mean a 10 wide int, actually means an int that's padded on retrieval with spaces to take up 10 characters width. I.e. it's a output formatting option, not an input / storage option.

I see this mistake all the time. And it's quite understandable. It's one of the most counterintuitive type definitions in existence.

 
 

Drupal is a registered trademark of Dries Buytaert.