PostgreSQL doesn't support the UNSIGNED attribute, but it an be enforced by using the CHECK constraint.

As in example, the user_roles table in MySQL:

CREATE TABLE users_roles (
  uid int(10) unsigned NOT NULL default '0',
  rid int(10) unsigned NOT NULL default '0',
  PRIMARY KEY (uid, rid)
)

Is defined like this in PostgreSQL:

CREATE TABLE users_roles (
  uid integer NOT NULL default '0',
  rid integer NOT NULL default '0',
  PRIMARY KEY (uid, rid)
)

However, the correct way would be like this:

CREATE TABLE users_roles (
  uid integer NOT NULL default 0,
  rid integer NOT NULL default 0,
  PRIMARY KEY (uid, rid),
  CHECK (uid>=0),
  CHECK (rid>=0)
)

Otherwise, negative values could be inserted in pgsql databases. I think this method could also be of interest for contrib modules.

Also, I'm using numeric constants to define the defaults here. Not sure why the .pgsql schema delimits such constants between single quotes.

Comments

sammys’s picture

I believe in this case it would be better to use a column constraint rather than two table constraints. So the suggestion below becomes:

CREATE TABLE users_roles (
  uid integer NOT NULL default 0 CHECK (uid >= 0),
  rid integer NOT NULL default 0 CHECK (rid >= 0),
  PRIMARY KEY (uid, rid)
)

Alternatively, we can put one single table constraint:

CHECK (uid >= 0 AND rid >= 0)

I believe the second option won't give us a specific error message. IMHO, the first option and the options given in earlier posts are both acceptable.

I post this out to everyone so anyone with a comment can pipe in. Are there any other reasons to prefer one over the other?

--
Sammy Spets
Synerger Pty Ltd
http://www.synerger.com

sammys’s picture

One other advantage to using column constraints instead of table constraints in pgsql is the constraints become named according to

_ rather than being $1, $2, ... , $n
markus_petrux’s picture

Yeah, column contraints seem nicer and maybe easier to maintain (closer to the column itself). Though, I believe PostgreSQL treats them the same (when a constraint affects just one column). If would be nice someone with experience here...

For what it worths, here's the reference (2 versions):
http://www.postgresql.org/docs/8.1/static/sql-createtable.html
http://www.postgresql.org/docs/7.3/static/sql-createtable.html

As per contraint names, it seems that, if not explicitly set, PostgreSQL will generate one. How? seems to be undocumented. So, if that's needed I would simply explicitly define the constraint name. :-)

Cvbge’s picture

Hi,

IIRC it'd be the best if we named the constraint explicitely, and in the update scripts didn't use prefixes on the name, i.e. e.g. (update script)

CREATE TABLE {users_roles} (
  uid integer NOT NULL default 0 CONSTRAINT users_roles_uid_chk CHECK (uid >= 0),
...

That is because we want to know the name and we want it to work both with "normal" and "schema" based db_prefix...

sammys’s picture

We have two options for PostgreSQL. One is to create an unsigned integer type for the entire database, though that breaks SQL:2003. The other is to simply add a constraint to the field. Either/or. Adding the type will need to be system wide.

sammys’s picture

Ok... progress has been made on this front for 4.8+ and I thought I better let the world know. I've created some unsigned types that will be included in the system as of 4.8. I've considered backporting the changes, but it'll make a very big mess if 4.7 modules supporting the new types are used on an older version of core. So, from 4.8 onwards it is.

I've already patched the types into the install system and all core modules. However, i'm going to have to do it again because after some more thought it would be better to name the unsigned variants with a 'u' prefix or suffix rather than stuffing the u in the damn middle of the type name. This will make coding things like CCK a lot easier.

The unsigned types for 4.8 are going to be: uint, usmallint and ubigint.

Cheers,

Sammy Spets
Synerger
http://www.synerger.com

sammys’s picture

Ok... change that to the following types: int_unsigned, smallint_unsigned, bigint_unsigned so the syntax closer matches the MySQL type declarations.

Steve Simms’s picture

I stumbled across the database changes when fixing another bug, and sammys directed me here.

Regarding check naming constraints, they aren't defined, but here's the pattern:

- Pre 8.0: column constraints get named "[table]_[field]", and table constraints get named "$1", "$2", etc. (if there's already a column constraint with a given name, you get an error).

- 8.0 and later: both column and table check constraints get named "[table]_[field]_check", and a number gets added to the end if the name is in use. Multiple-column table check constraints get named "[table]_check".

Which is all to say that we can't rely on the name unless we set it, so if we plan on ever updating the constraints, we should name them explicitly.

Regarding table constraint vs. column constraint, my preference would be to use column constraints, for the following reasons:

- they're easy to keep track of, since they're attached to the column, and get named nicely in older PG versions as well as newer ones.

- you don't get the problem that you would have if you have several fields in a single table constraint, and want to change one of them. (e.g. dropping a column removes the entire table constraint, not just the part of it specific to that column)

Compared to creating new types, I still think we should stick with column constraints:

- new types mean a higher learning curve for anyone wanting to add PG support, since they would also need to know the Drupal-specific types in addition to the SQL standard types

- as it stands, the new unsigned types are slightly misnamed, since they have the check >= 0, but not a new upper limit (e.g. int_unsigned has only half the range of MySQL's int unsigned, or 31 bits instead of 32)

- if code and/or schemas are going to need to be updated either way to be PG-specific, it shouldn't matter a whole lot if they're adding a CHECK constraint vs. using a different type name

I'm not convinced that the benefit of having an automatic CHECK >=0 constraint outweighs the complexity cost of adding the new types. If PG actually *had* unsigned types, or some easy way to create them, that would be a different story (and I don't suppose there's any convincing of core committers to use signed datatypes in MySQL in order to be more cross-database compatible...). :-)

What do others think?

(sammys: if the consensus is to use column constraints instead of types, I'll redo the system.install file as my apology for coming in late on this.)

sammys’s picture

Thanks for posting this to the issue. As we both agreed, this needs to be out in the public domain.

Naming of constraints is a non-issue if column constraints are used and that is what we have pushed for 4.7. No need for me to go further on that topic.

Types vs column constraints is the main issue. There are three areas of contention that you've brought up.

  1. Learning curve is steeper
  2. Numerical range is smaller than unsigned types in MySQL
  3. Difference between adding a CHECK constraint and a _unsigned is moot so either can be used

Answer to 1: I totally disagree with your suggestion the learning curve will be more difficult. A person looking at the schema will see immediately from the column type that the type is unsigned. If they are a noob they'll take it that pgsql has that type built in. If they are an experienced DB person, they'll know that the type is not built in and it has been added somehow. Should the person fall somewhere in between those two, heaven help them for they require some sleep/vitamins/something. From an admin (and module developer) perspective it makes no difference if code they write uses the system-wide domains or if they use column check constraints.

Answer to 2: You're absolutely right that the range of the unsigned types in pgsql are half that of mysql. However, let's think about this realistically. What percentage of sites actually reach 2 billion nodes? Drupal.org is currently sitting on 70000. If they reach 2 billion (or foresee reaching that amount at installation time) then they'll need to adjust the column to a bigint_unsigned. A process that won't be required on every table either. Perhaps only the node tables, watchdog or accesslog. Given that heavy use sites see 50000 hits a day. 2 billion entries into access log will take 109 years to complete. I think concern about the numerical range of 31 bits compared with 32 bits is a little misplaced.

Answer to 3: You're also right that the difference between adding _unsigned and CHECK (fname >= 0) is negligible. However, i've seen some people write in CHECK (fname > 0) in their constraints. This is unwanted and if we can reduce the possibility of error then it's time saved for code maintainers. As I mentioned before, those wanting to use CHECK constraints are still capable of doing so without any troubles.

My motivations for the type definitions were to minimize the changes required between mysql and pgsql schema definitions and to prevent errors caused by mistyped constraints.

If anyone else cares to voice their concerns with the changes made to PostgreSQL in core. Now is the time to do it! Once the code freeze is established it will all be gospel.

Steve Simms’s picture

Ok, I guess we can chalk this up to a difference of opinion. I'm curious what others think.

On a related note, I didn't include this in my earlier post because it was already long enough, but what's the motivation for including the CHECK >= 0 in the first place? Is there any reason why negative values would be a problem? If not, just leaving it as integer instead of either "int_unsigned" or "integer check >= 0" would be a whole lot simpler.

sammys’s picture

The importance is data integrity. It's about time we started using PostgreSQL properly to enforce values to be what we want them to be. Referential integrity (then hopefully transactions) will probably be our next target system-wide (if we ever get around to it). It'll prevent broken records from being added.

sammys’s picture

Version: x.y.z » 5.1
Assigned: Unassigned » sammys
Status: Active » Closed (fixed)