currently (from nightly development snapshot from HEAD on 2007-05-30), the following fields are defined as text:big:

dc:/tmp/drupal-6.x-dev# find . -type f -name '*.schema' | xargs fgrep -nH 'text' | grep 'big'
./modules/search/search.schema:9:      'data' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big')
./modules/user/user.schema:30:      'perm' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/user/user.schema:65:      'data'      => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/node/node.schema:72:      'body'      => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:73:      'teaser'    => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:74:      'log'       => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:10:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:77:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/drupal/drupal.schema:11:      'slogan'  => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/drupal/drupal.schema:12:      'mission' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/system/system.schema:10:      'batch'     => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/system/system.schema:153:      'session'   => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/system/system.schema:194:      'value'    => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/dblog/dblog.schema:10:      'message'   => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/dblog/dblog.schema:11:      'variables' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/block/block.schema:39:      'body'   => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/comment/comment.schema:12:      'comment'   => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/comment/comment.schema:19:      'users'     => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:9:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:40:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:41:      'image'       => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:60:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:9:      'recipients' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:10:      'reply'      => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),

as mysql driver map text:big as longtext, which support for maximum 4GB size of data, it seems not reasonable to preform any comparison with such fields.

on the other hand, since some fields are really required to preform some testing within SQL directly, i will suggest change them into text:medium, or even smaller. so developer can simply guess that text:big should ONLY use as storage, but not for any SQL comparison.

finally, as oracle only provide VARCHAR2: maximum 4000 characters, can compare, and CLOB: maximum 4GB, can't compare, oracle driver will only map text:big as CLOB for maximum flexibly of sizing, and keep others as VARCHAR2(4000) for maximum compatibility. only keep required fields as text:big will greatly increase cross DB compatibility.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

hswong3i’s picture

from my point of view, at least the following fields should keep as text:big, as we are not expected to preform any direct action to it, in whatever SQL or PHP level:

./modules/node/node.schema:72:      'body'      => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:73:      'teaser'    => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:74:      'log'       => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:10:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:77:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/block/block.schema:39:      'body'   => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:9:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:40:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:60:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
chx’s picture

Havng big is utterly pointless, medium is already 16M, can you imagine what would happen with poor PHP when you load a node which has a body with more (lot more) than 16M byte?

hswong3i’s picture

16M or not it not the only consideration: i am trying to mark text:big as a "storage only" type, so whenever developer trying to use it, keep this in mind: don't do comparison to it.

this clear instruction can increase the usability of code, and also decrease the difficulty of debug. as mentioned in #2, text:medium is totally enough for daily usage.

just feel free to let text:big act as symbolic label, set up a suitable "rule of game", and make use of it ;)

hswong3i’s picture

Status: Needs work » Needs review
FileSize
7.06 KB

this patch change some fields from text:big into text:medium in a simple reason: they are not target as "storage only". after apply this patch, will only left the following fields as text:big, which also means not expected to take any comparison toward it:

dc:/tmp/drupal-6.x-dev# find . -type f -name '*.schema' | xargs fgrep -nH 'text' | grep 'big'
./modules/node/node.schema:72:      'body'      => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:73:      'teaser'    => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:74:      'log'       => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:10:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:77:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/drupal/drupal.schema:11:      'slogan'  => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/drupal/drupal.schema:12:      'mission' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/block/block.schema:39:      'body'   => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/comment/comment.schema:12:      'comment'   => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:9:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:40:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:60:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:10:      'reply'      => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
Dries’s picture

What is the problem with 'big' and 'Oracle'? I'd suspect they go hand in hand. :)

Care to shed some light on why this is a problem? I'd like to learn a bit more about this/Oracle.

hswong3i’s picture

finally, as oracle only provide VARCHAR2: maximum 4000 characters, can compare, and CLOB: maximum 4GB, can't compare, oracle driver will only map text:big as CLOB for maximum flexibly of sizing, and keep others as VARCHAR2(4000) for maximum compatibility. only keep required fields as text:big will greatly increase cross DB compatibility.

on the other hand, a lot of additional handling are required for CLOB (http://php.net/oci8). you can seems oracle CLOB as a completely different data type from daily used text type under mysql: mysql seems longtext and text as the same, and so you will even able to run GROUP BY and ORDER BY with it, even it is in 4GB size!

BTW it is also seems reasonable: if you are going to do some SQL comparison, fine, use VARCHAR2 and take action within reasonable size for performance consideration; if you need space, fine, i give CLOB to you, but don't ask for any others ;(

and the truth is: i hate oracle's string handling!

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

seems reasonable to me

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Approved by Moshe, committed by Dries. :)

Anonymous’s picture

Status: Fixed » Closed (fixed)
bjaspan’s picture

Priority: Normal » Critical
Status: Closed (fixed) » Needs work

This patch changes the schema without updating existing databases to match the changes. When upgrading from Drupal 5 to 6, it appears that 13 tables end up not matching the schema. So, either this patch needs to be reverted or existing tables need to be updated.

I did not know about this patch until today. I haven't yet thought about the basic point of this patch thoroughly. Offhand, I don't like the idea that text:big is semantically different than text:medium on some database engines. The type 'text' should be semantically the same (i.e. comparable or not comparable) on all platforms regardless of the size which, as per the docs, is only a hint anyway. So if Oracle can only handle 4000 bytes for a comparable text:big, then that's all text:big can be on Oracle. If there is a database column that can't accept being limited to 4000 bytes on Oracle, it should use type blob or some other type that we define as "big enough but not comparable." If there is a database column that must be longer than 4000 bytes *and* must be comparable, then perhaps that functionality is just not implementable with Oracle (which I find hard to believe). However, as I said, I have not thought this through thoroughly yet.

I would also like to propose two things:

1. All patches to .install or .schema files must be tested with schema.module and produce a zero-mismatch comparison report before they can be committed.

2. All bugs which result in a schema comparison mismatch should be considered critical. We should not accept a bug which results in a database not matching the results of hook_schema. To this end, I have marked this issue as critical. If core committers disagree, they can downgrade it.

hswong3i’s picture

i just find some useful information, and hope to share according to this topic (from http://phplens.com/lens/adodb/docs-oracle.htm):

4. Working With LOBs

Oracle treats data which is more than 4000 bytes in length specially. These are called Large Objects, or LOBs for short. Binary LOBs are BLOBs, and character LOBs are CLOBs. In most Oracle libraries, you need to do a lot of work to process LOBs, probably because Oracle designed it to work in systems with little memory. ADOdb tries to make things easy by assuming the LOB can fit into main memory.

ADOdb will transparently handle LOBs in select statements. The LOBs are automatically converted to PHP variables without any special coding.

For updating records with LOBs, the functions UpdateBlob( ) and UpdateClob( ) are provided. Here's a BLOB example. The parameters should be self-explanatory:

$ok = $db->Execute("insert into aTable (id, name, ablob) 
                                values (aSequence.nextVal, 'Name', null)");
if (!$ok) return LogError($db->ErrorMsg());
# params: $tableName, $blobFieldName, $blobValue, $whereClause
$db->UpdateBlob('aTable', 'ablob', $blobValue, 'id=aSequence.currVal');

and the analogous CLOB example:

$ok = $db->Execute("insert into aTable (id, name, aclob) 
                                values (aSequence.nextVal, 'Name', null)");
if (!$ok) return LogError($db->ErrorMsg());
$db->UpdateClob('aTable', 'aclob', $clobValue, 'id=aSequence.currVal');

Note that LogError( ) is a user-defined function, and not part of ADOdb.

Inserting LOBs is more complicated. Since ADOdb 4.55, we allow you to do this (assuming that the photo field is a BLOB, and we want to store $blob_data into this field, and the primary key is the id field):

	$sql = "INSERT INTO photos ( ID, photo) ".
			"VALUES ( :id, empty_blob() )".
			" RETURNING photo INTO :xx";

 		$stmt = $db->PrepareSP($sql);
		$db->InParameter($stmt, $id, 'id');
		$blob = $db->InParameter($stmt, $blob_data, 'xx',-1, OCI_B_BLOB);
		$db->StartTrans();
		$ok = $db->Execute($stmt);
		$db->CompleteTrans();

i also found that such special handling of LOBs is not only exists within oracle driver of ADOdb, but also for that of postgres, odbtp, sqlanywhere, informix72, ibase, odbc, mssql and db2:

dc:/home/hswong3i/project/adodb# find drivers/ -type f | xargs fgrep -nH 'function UpdateBlob'
drivers/adodb-postgres64.inc.php:347:   function UpdateBlobFile($table,$column,$path,$where,$blobtype='BLOB')
drivers/adodb-postgres64.inc.php:442:   function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-odbtp.inc.php:502:        function UpdateBlob($table,$column,$val,$where,$blobtype='image')
drivers/adodb-sqlanywhere.inc.php:144:  function UpdateBlob($table,$column,&$val,$where,$blobtype='BLOB')
drivers/adodb-informix72.inc.php:239:   function UpdateBlob($table, $column, $val, $where, $blobtype = 'BLOB')
drivers/adodb-ibase.inc.php:597:        function UpdateBlobFile($table,$column,$path,$where,$blobtype='BLOB')
drivers/adodb-ibase.inc.php:625:        function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-odbc.inc.php:567: function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-oci8.inc.php:675: function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-oci8.inc.php:709: function UpdateBlobFile($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-mssql.inc.php:645:        function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-db2.inc.php:678:  function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')

ok, let's back to the topic:

  1. sorry that this patch didn't consider about the update of existing DB schema, as i think someone may help about this some days later. if this patch really break the upgrade process, i think it should be roll back
  2. i agree about the use of "text" type should tread as the same across different DB backend. if oracle just can handle it by using VARCHAR2 with 4000 characters limitation, then this should be the problem of oracle. we shouldn't spend too much effort to do something that seems to be "mission impossible". e.g. i am now trying to work out a complete SQL query parser, to parse all SQL element into atomic, apply necessary changes (e.g. for CLOB, reserved word, max characters limitation, etc), and rebuild it. it is just wasting time, even though all query syntax are well documented within oracle official document
  3. BTW, this also means that we need SOMETHING other than "text" type, as a storage type that SHOULD NEVER use for comparison during SQL level. may be it should call as CLOB (just similar as BLOB type) and label it as %c, or something else. therefore we can let node->body (and some other else) to use this new datatype. this seems much more reasonable and logical
  4. due to the above suggestion, we should also take some improvement to DB API, so we can handle such new datetype as like as ADOdb does. as you can see that, a direct insert of CLOB is too complicated, we should have some helper function (as like as UpdateBlob) to handle such update independently, after a complete insertion. you can compare this idea with what we did with db_last_insert_id(), i think the case is just similar.

this is not the consideration of oracle driver implementation only, but also a cross DB compatibility concern. ADOdb try it, and come out with current solution. i don't really think we can do much better than that of ADOdb for Drupal: ADOdb is a project that starting from Sept 9 2000. maybe we should reuse their progress, and skip something that we are not really needed for, according to performance consideration.

hswong3i’s picture

Title: some filed shouldn't defined as text:big » fix text:big, add CLOB, update BLOB handling
Category: bug » feature
Status: Needs work » Needs review
FileSize
36.99 KB

this patch add a new data type "CLOB" into drupal-6.x-dev schema API, and 2 more APIs: db_update_blob() and db_upadte_clob(). according to this change, some core module update its schema from "text:big" to "CLOB" type, with using new APIs for updating its values, e.g. node->body. NULL should input as placeholder for %b and %c during traditional db_query() insert/update query. On the other hand, this patch also rollback some field from "text:medium" into "text:big" for backward compatibility.

as bjaspan mention, all "text" types should remain same behavior among different DB backend: it should be able for compare. "text:big" should not be the special among the others, within oracle driver handling. the new data type "CLOB" can clearly explain its property as: character storage, large enough, but never directly using it for SQL-level compare. therefore developer can have a more strict guild line for their core/contribute development.

BTW, after studying about ADOdb cross DB handling, i found that the handling problem of character large object (CLOB) is not only appear within Oracle, but also among some other DBs. ADOdb propose another method, rather than insert/update CLOB directly: it provide additional API call UpdateBlob() for such LOBs update, after normal insert/update query. it seems to be a good idea, as we don't need to do additional query rewrite (which also means a complete SQL parser and rebuilder, which seems to be mission impossible) for CLOB handling.

additional benefit: as mysql version map "clob" into "longtext" and "text", where pgsql map this to "text", the add of additional API will not affect contribute module development, unless they change their module's schema, and choose to use CLOB for data storage.

P.S.: little bit change about %b handling: will insert/update NULL as placeholder within db_query(), for further more value update by using db_update_blob() and db_update_clob() manually. therefore cache.module require a small change about current blob handling. it may also affect other contribute modules which are currently using BLOB. BTW, as BLOB is not always use, this may not be a great problem.

hswong3i’s picture

Category: feature » bug
hswong3i’s picture

this patch standardize drupal BLOB/CLOB handling for cross DB compatibility, e.g. db_update_* for LOBs value update, db_encode_*/db_decode_* for LOBs value encoding/decoding. change incude:

  1. new API: db_update_blob(), db_update_clob(), db_encode_clob(), db_decode_clob()
  2. updated API: _db_query_callback(), db_encode_blob()
  3. module with BLOB change: cache
  4. modules with CLOB change: aggregator, block, comment, contact, node

for current stage, it is only a dummy mapping for "clob" type among mysql/pgsql, where mapping it as like as that of "text" type. BTW, it will greatly decrease the difficulty for other DB driver implementation, e.g. oralce, mssql and db2.

this patch only change the handling for LOBs type. therefore, unless developer choose to use BLOB/CLOB type, nothing will affect them. on the other hand, if developer choose to use LOBs type, which means that they will have (nearly) unlimited storage size, BTW trade off some query level features, e.g. using GROUP BY, ORDER BY, WHERE, etc.

hswong3i’s picture

FileSize
46.19 KB

minor update, patch with latest CVS, fix some pgsql bugs.

complete tested with mysql + mysqli, pass though all core + optional modules installation. patched modules are all functioning :)

bjaspan’s picture

Status: Needs review » Reviewed & tested by the community

@Dries: I marked this RTBC because the patch from comment #4 which was previously committed needs to be rolled back (the author agrees with this as stated in #11). As of today, it still reverse-applies to HEAD. Once the patch is rolled back, the issue should be set to CNW so further development can occur.

@hswong3i: I briefly reviewed (but did not test) your patch and in general it looks good. However, this is clearly an API-changing feature enhancement and, as such, cannot be committed now that D6 has reached code-freeze. We should continue working on this for D7. I realize this means that Oracle support may be (once again) limited for D6 without core patches, but in my opinion we cannot risk slipping in a change like this at this point in the cycle. If Dries/others disagrees, I'm certainly open to reviewing the patch more carefully.

I noticed one problem: db_update_clob/blob() is written to expect '{tablename}' instead of 'tablename' like the other schema api functions. The {tablename} notation is only for when table names are embedded in SQL. Otherwise, confusion will result.

hass’s picture

MsSQL introduced in Ms SQL 2005 a varchar(max) data type to solve this CLOB comparison problem... so only older MsSQL Versions have this problem... maybe Oracle works on a new datatype, too?

hswong3i’s picture

@bjaspan: i agree that it should be a bit difficult to commit this patch together with drupal-6.x-dev after code freeze, even though it is mainly a API enhancement. BTW, as oracle driver is already closed to drupal-6.x-dev, i will suggest to review the chance of accepting this enhancement. on the other hand, unless developer choose to use the new LOBs handling, none of their code need to be review.

the API of db_update_clob/blob() is still a prototype, we still can change it :)

@hass: hopelessly, oracle don't have such new data type, please refer to http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_eleme... for more information :)

Dries’s picture

Priority: Critical » Normal
Status: Reviewed & tested by the community » Needs work

I've rolled back the old patch so we can work more on the new patch.

hass’s picture

Aside if we get this solved for Oracle, we will not have any troubles with MsSQL 2000, too :-).

hswong3i’s picture

FileSize
39.05 KB

patch for latest CVS, update db_update_clob/blob(), update some document.

@bjaspan: now db_update_clob/blob() will accept real table name directly.

@hass: yes, as i know that is the problem for MSSQL 2000, if oracle driver solve it, MSSQL (and some other else) will also benefit from it :)

hswong3i’s picture

FileSize
37.44 KB

patch according latest CVS, update handling for block.module and contact.module, fix minor bug for comment.module.
completely tested modules: cache, aggregator, block, comment, contact and node (all patched modules)

hswong3i’s picture

FileSize
37.63 KB

minor update according to v0.7, update code which more make senses with argument input.

hswong3i’s picture

Priority: Normal » Critical
Status: Needs work » Needs review
FileSize
37.63 KB

updated patch for latest CVS.

hswong3i’s picture

FileSize
39.76 KB

version 1.1 update according to latest CVS, with following enhancements:

  1. fix bug for contact.*. since contact.module is updated and split into contact.admin.inc, this patch will re-patch the LOBs handling for it
  2. update db_update_*() API. now db_update_*() API will follow the style of db_query_*(), which become db_update_*($query, ..., $table, $column, $value). according to this change, the function will now only handle one LOBs field update each time, which simplify the use of it
  3. update all modules LOBs patch according to above API update

since this patch is required by latest oracle driver implementation (http://drupal.org/node/39260), please kindly review the chance of accepting this patch after drupal-6.x-dev code freeze, many thanks :)

Dries’s picture

Status: Needs review » Needs work

I had a look at the patch and here are some comments:

1. In general, I think it makes sense to differentiate between 'binary data' and 'character data'.

2.

+ * NOTE: %b and %c should input NULL as placeholder, and further more 
+ * update its value manually by using db_update_blob() or db_update_clob().

I don't understand that explanation. Please consider rewording it for clarity.

3. I don't understand why we need db_update_clob() and db_update_blob() functions? If there is no critical need for these functions, I'd prefer to remove them, and to let people write regular db_query("UPDATE ...") statements. If these functions are critical (rather than just handy), document in the code why they are critical.

4.

-  return "'". mysql_real_escape_string($data, $active_db) ."'";
+  return !is_null($data) ? "'". mysql_real_escape_string($data, $active_db) ."'" : "''";

Why is that extra check required?

That's all for now. Looks promising but needs some more work around the edges. Let's give this patch some prime time so we might be able to include it in D6.

hswong3i’s picture

most of the change are function hook for oracle driver, or other driver backend that require special handling to LOBs data.

2 & 3 & 4. for the case of oracle, we can't simply insert a "NULL" value for INSERT as placeholder, but EMPTY_BLOB() or EMPTY_CLOB(). this is because we can't do any update to a LOBs field with NULL value, but EMPTY_BLOB()/EMPTY_CLOB().

on the other hand, we also don't need to do any input value escape to input LOBs value as it will be handled by LOBs->save().

for example, in oracle driver:

<?php
function db_encode_blob($data) {
  return !is_null($data) ? $data : "EMPTY_BLOB()";
}
function db_encode_clob($data) {
  return !is_null($data) ? $data : "EMPTY_CLOB()";
}
?>

therefore, take block.module as example, the insert query will become:

INSERT INTO {boxes} (body, info, format) VALUES  (EMPTY_CLOB(), '%s', %d)

the update query will become:

SELECT body FROM boxes WHERE bid = %d FOR UPDATE

after we fetch the pointer for "body", we can then run LOBs->save() and update its value:

<?php
  // Fetch the SELECTed rows
  $rows = array();
  while ($row = oci_fetch_assoc($last_result)) {
    // Note:  Oracle returns all field names in uppercase and associative
    // indices in the result array will be uppercased too. We reset it back
    // to lower case in here.
    $row = array_change_key_case($row, CASE_LOWER);

    // Check if LOB object, discard the existing contents, and save a value to it.
    if (!(is_object($row[$column]) && $row[$column]->truncate() && $row[$column]->save($value))) {
      oci_rollback($active_db);
      return FALSE;
    }

    $rows[] = $row;
  }
?>

for more detail reason of this API changes, please compare the differences between mysql/pgsql handling with that of oracle. i feel so sorry to insert some dummy handling to mysql/pgsql version, but this give a great chance for other DB driver implementation :)

hass’s picture

Don't forget to tell why we need db_update_clob() and db_update_blob() functions. There are differences in the update command for different DB's. In MsSQL there is a special UPDATETEXT, so we are better to know what we are updating. Additional you sometimes need to use pointers syntax for updating TEXT. Maybe other DBs have different update functions for updated text and binary data. So it's better to split, isn't it?

hswong3i’s picture

@hass: great! you give us the answer that we are seeding for: different DB have different LOBs update method, and so we better split it out, and handle it with independent function API. seems db_update_*() can really help in MsSQL too :)

hswong3i’s picture

FileSize
40.98 KB

patch according to latest CVS, update some document.

@Dries: i detail the document according to your questions:

2. since oracle will save pointer of LObs into DB table, and further more fetch value by calling LOB->save/load(), NULL (none of pointer) != EMPTY_CLOB() (pointer with empty value, able to use LOB->save/load()). the document update as:

<?php
+ * NOTE: %b and %c should input NULL as placeholder, and further more 
+ * update its value manually by using db_update_blob() or db_update_clob().
+ *
+ * This is because different database have different implementation about
+ * empty LOBs field. e.g. as MySQL/PgSQL map CLOB to general text type, NULL
+ * input will simply replace as '' (empty string). Update action will also
+ * be a simple UPDATE query.
+ * 
+ * On the other hand, as Oracle will save a pointer of CLOB into database,
+ * and update its value by LOB->save/load(), a NULL value means none of pointer
+ * (which not able to update its value). Therefore NULL input will replace as
+ * 'EMPTY_CLOB()' (a placeholder for "CLOB pointer with empty value") for
+ * further more update.
?>

3. similar reason as 2.: since oracle save a point for LOBs value into table, update of its value require additional handling, rather than a simple UPDATE query, an isolated APIs for LOBs value update are requred.

e.g. case of oracle driver: first of all, we need to fetch out LOB's pointer, by SELECT ... FOR UPDATE (current oracle implementation); secondly, start the transaction by oci_execute($last_result, OCI_DEFAULT); try to truncate old value, and then save its new value; rollback whenever face error, or commit transaction if success. it is very complicated, and should be handle individually :)

document of db_update_clob(), for mysql/mysqli, update as:

<?php
 * NOTE: Since MySQL support direct read/write to LOBs value by INSERT
 * or UPDATE query, this function will become a simple wrapper of UPDATE
 * query.
?>

4. the different will be shown by comparing MySQL handling with Oracle's. for MySQL, CLOB is a dummy mapping to longtext, so handle as:

<?php
function db_encode_clob($data) {
  global $active_db;
  return !is_null($data) ? "'". mysql_real_escape_string($data, $active_db) ."'" : "''";
}
?>

BTW for oracle, no escape is required for input value (will handle by LOB->save()), but require EMPTY_CLOB() as placeholder during INSERT/UPDATE query (as stated in 2.), code will become:

<?php
function db_encode_clob($data) {
  return !is_null($data) ? $data : "EMPTY_CLOB()";
}
?>
hswong3i’s picture

Title: fix text:big, add CLOB, update BLOB handling » standardize LOBs handling for cross DB compatibility
Category: bug » feature
Status: Needs work » Needs review
Dries’s picture

I'll try to investigate this issue some more as soon time permits. In the mean time, it would be helpful if someone could help you massage the documentation in proper English. It's a little rough around the edges, but nothing we shouldn't be able to help you with. Thanks for the hard work, I'll get to it soon..

hswong3i’s picture

FileSize
41.58 KB

@Dries: document updated. detail of changes: http://rafb.net/p/WGZMcE39.html

hswong3i’s picture

according to my previous benchmarking result, there is no performance different when comparing LOBs patched with CVS HEAD (at least in case of MySQL). impact for API update should be the only concern of this patch.

on the other hand, with LOBs patch helps, oracle driver (and some other databases with similar physical limitation in VARCHAR storage sizing) will able to implement in a much flexible style, e.g. we are now able to store node body with more than 4000 characters :)

i have no further more suggestion or idea about this patch. LOBs patch is the major dependence of oracle driver for drupal-6.x-dev, based on functionality concern. hope someone may help about its coding or documents improvement :)

hswong3i’s picture

patch based on latest CVS HEAD, update comment.module handling.

hswong3i’s picture

when studying DB2, IBM's DB2 + Drupal document, and reference to ADOdb for existing implement, i found the following message about BLOB handling from ADOdb:

<?php
	/*
		Insert a null into the blob field of the table first.
		Then use UpdateBlob to store the blob.
		
		Usage:
		 
		$conn->Execute('INSERT INTO blobtable (id, blobcol) VALUES (1, null)');
		$conn->UpdateBlob('blobtable','blobcol',$blob,'id=1');
	*/
	function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
	{
		return $this->Execute("UPDATE $table SET $column=? WHERE $where",array($val)) != false;
	}
?>

i think this prove the necessary of LOBs patch: different database comes with different LOBs handling, sometime we need to insert special placeholder during INSERT/UPDATE, and further more update its value by special API :)

hswong3i’s picture

and here is the case of MSSQL, within ADOdb implementation:

<?php
	/* 
		Unfortunately, it appears that mssql cannot handle varbinary > 255 chars
		So all your blobs must be of type "image".
		
		Remember to set in php.ini the following...
		
		; Valid range 0 - 2147483647. Default = 4096. 
		mssql.textlimit = 0 ; zero to pass through 

		; Valid range 0 - 2147483647. Default = 4096. 
		mssql.textsize = 0 ; zero to pass through 
	*/
	function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
	{
	
		if (strtoupper($blobtype) == 'CLOB') {
			$sql = "UPDATE $table SET $column='" . $val . "' WHERE $where";
			return $this->Execute($sql) != false;
		}
		$sql = "UPDATE $table SET $column=0x".bin2hex($val)." WHERE $where";
		return $this->Execute($sql) != false;
	}
?>
hswong3i’s picture

FileSize
41.71 KB

minor update: patch for latest CVS HEAD, a bit document update, and update node_save() handling based on latest changes :)

P.S. i found this is very helpful when implementing both oracle and DB2 driver. as ADOdb and PDO both come with similar handling, and this patch will change programming logic only if developer choose to use CLOB for (almost) unlimited storage size, i guess it should ready to commit with no question. hope someone may help about its review :)

P.P.S. world is always not such perfect: oracle comes with VARCHAR2 (max. 4000 characters) and CLOB (max. 128TB), where DB comes with VARCHAR (max. 32672 characters) and BLOB/CLOB (max. 2GB). it is not easy to implement MySQL's text:big => longtext mapping for similar functionality: able to use for compare (VARCHAR2/VARCHAR), and also comes with unlimited size (CLOB). we should split size concern (clob) from functionality concern (text), and preform isolated handling :)

hass’s picture

2GB or more shouldn't be an issue... i think - no - site will have inline images or website pages of such a size. And people saving images in blobs should be beaten... :-)

Will this patch land for D6? I don't like to spend time on a "database.mssql.inc" if this patch never goes in...

hswong3i’s picture

@hass: i point out DB2's CLOB size just because i hope to compare it with MySQL. in case of DB2, we need to choose VARCHAR for functionality, or CLOB for size. neither one can clone MySQL's text:big => longtext handling, we need some trade off :(

P.S. MSSQL is also within my schedule, after Oracle and DB2. this is because we have ZendCore for both Oracle and DB2, so i don't need to spend too much time for system setup :)

hswong3i’s picture

FileSize
41.29 KB

patch update based on latest CVS HEAD, with minor code change in node_save(). completely tested with MySQL.

hswong3i’s picture

Title: standardize LOBs handling for cross DB compatibility » standardize LOBs handling
FileSize
41.29 KB

patch for latest CVS

hswong3i’s picture

after an indeed studying about ADOdb and PDO database abstract layer, i found that a special LOBs value handling API is almost a must, based on cross database compatibility concern. here is a simple definition of LOBs value, quote from PDO (http://hk.php.net/pdo):

At some point in your application, you might find that you need to store "large" data in your database. Large typically means "around 4kb or more", although some databases can happily handle up to 32kb before data becomes "large". Large objects can be either textual or binary in nature.

in case of ADOdb, LOBs value will handle by: 1. normal INSERT query, with LOBs value placeholder (NULL for most database, EMPTY_BLOB/EMPTY_CLOB for Oracle); 2. manually update LOBs value by calling UpdateBlob. here is a simple code snippet for CLOB handling (reference to http://phplens.com/lens/adodb/docs-oracle.htm):

<?php
$ok = $db->Execute("insert into aTable (id, name, aclob) 
                                values (aSequence.nextVal, 'Name', null)");
if (!$ok) return LogError($db->ErrorMsg());
$db->UpdateClob('aTable', 'aclob', $clobValue, 'id=aSequence.currVal');
?>

in case of PDO, LOBs value will handle by: 1. normal INSERT query, with placeholder as "?"; 2. prepare and bind each value with placeholder, by calling bindParam() (NOTE: you will need to define data_type as PARAM_LOB explicitly for LOBs value binding); 3. execute transaction. here is a simple code snippet for inserting an image into a database by using PDO (reference to http://hk2.php.net/pdo):

<?php
$db = new PDO('odbc:SAMPLE', 'db2inst1', 'ibmdb2');
$stmt = $db->prepare("insert into images (id, contenttype, imagedata) values (?, ?, ?)");
$id = get_new_id(); // some function to allocate a new ID

// assume that we are running as part of a file upload form
// You can find more information in the PHP documentation

$fp = fopen($_FILES['file']['tmp_name'], 'rb');

$stmt->bindParam(1, $id);
$stmt->bindParam(2, $_FILES['file']['type']);
$stmt->bindParam(3, $fp, PDO::PARAM_LOB);

$stmt->beginTransaction();
$stmt->execute();
$stmt->commit();
?> 

so what's the case of this patch? case is just similar as both ADOdb and PDO: 1. normal INSERT/UPDATE query, with NULL as LOBs value placeholder (it is because db_encode_blob() and db_encode_clob() will translate this NULL as suitable under layer placeholder); 2. manually update LOBs value by calling db_update_blob() or db_update_clob(). here is a simple code snippet for block content insertion:

  db_query("UPDATE {boxes} SET body = %c, info = '%s', format = %d WHERE bid = %d", NULL, $edit['info'], $edit['format'], $delta);
  db_update_clob('bid = %d', $delta, db_prefix_tables('{boxes}'), 'body', $edit['body']);

my research result is very simple: a special API is the best, common, and strict forward solution for LOBs value handling. even we may working with PDO in D7 or later (with array-style variable binding), it is also a must, for a hook which telling under layer what type of data we are now inserting, or else we can NEVER handle LOBs value insertion correctly.

moreover, that is no point to revamp our database API, even we are now changing our backend as PDO: our database API is powerful and suitable for most cases. PDO should only act as another under layer database driver supporting (maybe call as database.pdo.inc, or some other else). PDO SHOULDN'T be a replacement of existing implementation.

anyway, if we are just focusing on how PDO can benefit on performance when using with MySQL/PgSQL (which own more than 98% of our clients), but simply forget its powerful cross database ability, this will seems a bit silly... as we take a lot of effort on schema API, database abstract layer, etc, based on cross database concern, this idea is just a step backward ;p

Crell’s picture

Since I suspect the latest comment is aimed at me based on a conversation in IRC earlier today, I'll jump in here. :-)

First off, no, the PHP 4/MySQL 3-era mysql driver is not at all robust or secure. That we layer a regex on top of it that is as slow as the much more robust and complex code in PDO is ample evidence of that. As long as our lowest common denominator is the mysql extension, we can't do anything interesting in the database system. mysqli is also not part of the standard install of PHP, and many shared hosts don't offer it. (mysql isn't either as of PHP 5.0, but most hosts install it for legacy support.) We can't rely on mysqli for MySQL support, which is what the vast majority of Drupal sites run on. That leaves PDO, which is reasonably cross-database (at the API level at least), reasonably fast (although not as fast as native APIs, it's still not a dog because it's all in C), and, most importantly, we can rely on it being present in any PHP 5 install as it's been part of the default configuration since PHP 5.1. That alone makes it a worthwhile move for D7, even if it weren't for easier Oracle/DB2/MS SQL support.

For data-modification queries, actually, in the D7 API as I'm currently building it you should never, ever use db_query() for an insert/update/delete statement. You should instead do

db_insert('node', array('fieldname'=>$value));

Aside from being easier to work with and more robust, it means you're passing not a serialized string to the database system but a data structure. Within the database driver, if necessary, you can do additional escaping, LOB handling, etc. without worrying about regexes and without putting extra work on the module developer.

Which brings up the next point. Right now, virtually all Drupal sites run MySQL or Postgres, with MySQL far and away the leader. Most Drupal developers know from MySQL primarily, I wager. The concept of LOBs and BLOBs is foreign to them, and many will likely see the necessity of it as a bug in those databases. (There are times when I do, too.) We should not build an API around a database that on a good day may be 2-3% of all Drupal installs for the foreseeable future. Most developers won't even have any way to test it to see if they're doing it "right". (How many people here have a copy of Oracle just lying around? :-) )

That sort of logic belongs inside the Oracle/DB2 driver, not exposed to every module developer to be able to forget or get wrong. If we can make that driver's job easier without making the API substantially more complex, we should do so. But not if it means adding a lot of needless, untestable, easy to break work for the vast majority of Drupal developers who are MySQL-based.

hass’s picture

Most databases are less then 2 or 4 GB... there are free versions of Oracle and MSSQL. Aside if i have the chance to get it running on our MSSQL2K5 active/active cluster i would move there... why should i spend my time on administering two different DBs if i can do it with one. Today we have simply no choice - eat or die with MySQL or PgSQL.

hswong3i’s picture

@Crell: i would like to have more details about how can your abstract layer handle LOBs value correctly? i have no question about inserting data with your array-style input, but where are you obtaining the information of these data type? it will be a bit not acceptable if: "as most data type will bind automatically and correctly by using PDO (unless LOBs, which need for explicitly defined), so we will simply forget LOBs handling in D7"...

@hass: good point. e.g. my client require for Oracle, due to political concern. therefore drupal will NEVER become their choice, as we are not supporting Oracle natively... (even though, we are suitable based on functional concern...)

hswong3i’s picture

@Crell: moreover, i am not opposing PDO supporting in D7/8/..., but try to figure out those common problems that we both need to due with. as we are both working for database abstract layer, we should facing with similar problem, no matter we are working with ADOdb or PDO ;)

e.g. db_num_rows() is a common problem among different database. this is not a problem of database abstract layer implementation, but dependent on database natively: some database won't pass the complete result set to client side after SELECT query; on the other hand, pass each row from result set when calling for fetch action (Oracle/DB2). since PDO working as a cross database abstract layer (as what i am now doing, but user level), which can't solve this problem natively, they try to REMOVE this support.

this is just similar as case of LOBs handling: since special API is a must, so we ADD this support. both ADOdb and PDO (moreover, this patch, too) propose for similar API, in order to due with similar difficulty.

LOBs handling is a common problem that we both need to solve; on the other hand, the successful of this patch should also equal to the successful of PDO, in coming future ;)

hswong3i’s picture

After an indeed studying about differences between ADOdb, PDO and Drupal 6.x database abstract layer, it show that we are still missing a complete and standardized LOBs handling API, based on cross database compatibility concern (besides MySQL/PgSQL).

Drupal 6.x existing database abstract layer provide db_encode_blob() and db_encode_blob() for BLOB handling only, as PgSQL require special API for BLOB encode/decode (In case of MySQL, it is a dummy abstraction, as MySQL seems BLOB data as like as normal string). BTW, this is not the complete picture of LOBs handling and its difficulties: besides BLOB, it is also a critical concern about limitation on CHAR/VARCHAR/CLOB (Character Large OBject) among most database; moreover, method of LOBs INSERT/UPDATE is also database specific.

LOBs patch for Drupal 6.x (http://drupal.org/node/147947) will able to fill in this missing section, extent and complete our LOBs API functionality, with handy procedures. It should be the most common and strict forward solution when dueling with LOBs related difficulties. On the other hand, it shouldn't be a confusion for normal MySQL users, as it is very closed to the case of existing db_encode_blob() and db_decode_blob() implementation.

We shouldn't miss the chances, but our chances are waiting for this patch :)

(To view the thread, please navigate to http://edin.no-ip.com/html/?q=node/302)

hswong3i’s picture

Status: Needs review » Reviewed & tested by the community

as i am now indeed studying about PDO + MySQL/Oracle/DB2, i find that this patch would also be a very important functional hook for our next generation database abstract layer development, based on cross database concern (even we may move to PDO, as this is a database-level but not database-abstraction-level problem). on the other hand, this patch is stable enough since July 24, 2007, and it functionality is already fully proved by the successful of Oracle/DB2 driver development. it should be RTBC.

hswong3i’s picture

patch via latest CVS HEAD

hswong3i’s picture

As it is now one day before D6 beta, it seems that not easy to commit this patch on time, together with Oracle and DM2 supporting. Even though it is a bit pity (as all are ready for production), I am now merging this patch into another technical preview of D7 database abstract layer, too (http://drupal.org/node/172541). I will keep on promoting this LOBs handling in our next generation of database abstraction layer.

BTW, if you are looking for unofficial Oracle (http://drupal.org/node/39260) and DB2 (http://drupal.org/node/165788) supporting within D6, please refer to their own issue. I will merge this patch (maybe some other additional patches) with those issues. On the other hand, I will provide patches based on latest CVS HEAD of D6, within D6 life cycle. So you will able to have those unofficial database supporting with a single patch apply :)

hass’s picture

That's unbelievable bad information... i hoped to see this committed :-(.

hswong3i’s picture

Version: 6.x-dev » 7.x-dev
Priority: Critical » Normal
Status: Reviewed & tested by the community » Active

@hass: yes, how pity it is... so let's postpone it into D7?

i will provide unofficial patch for both oracle and db2 within D6 life cycle, so people who are interested in this topic can get such supporting ON TIME, but not sit and waiting for D7 comes. i guess most of the projects needed for such unofficial patch are enterprise-related project, such little tailor-made handling may be acceptable for their professional and skillful developer :)

P.S. the main idea is: i will also benefit on this combined unofficial patch handling. i am now scheduled to move my personal Drupal site into Oracle with D6, and so i will able to keep on develop and test it daily. a single patch can save a lot of our time, during update ;p

hass’s picture

I'm not going to use unofficial patches. I have too many dependencies with modules and so on and i don't like to patch them all myself if they are buggy. This makes me going the mysql way another year or more, what suxxx much.

hswong3i’s picture

@hass: let's take it easy :)

i will not force anyone to use this unofficial patch: as it is unofficial, no one can generate about it correctness, and it is always not targeting for our end users. e.g. you will never use Debian's unstable release for production purpose, but only stable, isn't it?

BTW, we will always need time to prove its correctness. we need helpers (or better says as HACKERS ;p) to test the implementation, and provide feedback, before it can finally goes into core. as proved by the fail of this patch within D6, it will for sure to be another pity issue, if we start the development TOGETHER with the open of D7: it will be too late once again :(

a simple and handy unofficial patch may speed up the development, on the other hand give a big hand to those require for such features ON TIME. so please just let it be :)

hswong3i’s picture

Title: standardize LOBs handling » Siren #2: Add %c for CLOB, and standardize LOG handling
chx’s picture

Title: Siren #2: Add %c for CLOB, and standardize LOG handling » Add %c for CLOB, and standardize LOG handling

As Siren is not a project on Drupal.org but an unofficial fork, I am removing it from issue titles to avoid confusion.

hswong3i’s picture

Even though DB2 don't explicitly indicate the limitation of VARCHAR as like as that for Oracle (Oracle will only allow for VARCHAR2 with max 4000 characters; above this limitation, please consider to use CLOB with TB-scale storage), it is still logically limited as 4000 characters.

A minor footnote from IBM DB2:

The functions in the SYSFUN schema taking a VARCHAR as an argument will not accept VARCHARs greater than 4 000 bytes long as an argument. However, many of these functions also have an alternative signature accepting a CLOB(1M). For these functions, the user may explicitly cast the greater than 4 000 VARCHAR strings into CLOBs and then recast the result back into VARCHARs of desired length.

This should be a good example why CLOB is something necessary for most database implementation. We are not always as lucky as case of MySQL and PostgreSQL, which we are able to work with TEXT for (nearly) unlimited size. On the other hand, we have CLOB, which usually required for special handling, e.g. database dependent INSERT/UPDATE/DELETE abstraction, decode CLOB as like as BLOB when fetching data, etc.

Since we are now having drupal_write_record() for high level programming abstraction, and (may) have INSERT/UPDATE/DELETE API for driver level abstraction, CLOB should be something much simpler than case of before. Moreover, this logic is now proved as function with Siren: Siren is able to work with Oracle CLOB, on the other hand don't introduce complicated abstraction to MySQL and PostgreSQL.

hswong3i’s picture

Title: Add %c for CLOB, and standardize LOG handling » Replace some TEXT:BIG with BLOB, based on cross database compatiblilty concern
Category: feature » task
Priority: Normal » Critical
Status: Active » Needs review
FileSize
11.3 KB

After an indeed study of pdo_oci, CLOB is not the optimal solution for this issue (pdo_oci currently not support CLOB). On the other hand, I would like to summarize the reason of this issues once again:

  • Most DB don't support text:big as large storage space as MySQL and PostgreSQL. E.g. DB2 don't support VARCHAR > 4000, Oracle ONLY support VARCHAR2 with max 4000, legacy PHP MSSQL driver only support max 255 characters when pass though. Assume text:big as "unlimited storage size, fully functional as VARCHAR" is ALWAYS NOT TRUTH for cross database compatible concern.
  • Some DB support CLOB, e.g. Oracle and DB2. BTW, CLOB is not something standardized, and so its support stage is most likely implementation dependent. Therefore adding CLOB support, where map MySQL and PostgreSQL as a dummy mapping to TEXT won't be an optimal solution.
  • BLOB is something MOST database support, and its definition is ALWAYS clear: (assume) unlimited storage space, raw read-write, no SQL functionality, (assume) used as storage ONLY". As we are requesting something for huge storage, and we all understand that will be a performance impact if we preform SQL functionality with TEXT (which may store up to 4GB in case of MySQL), BLOB seems to be a suitable storage type when compare with CLOB or TEXT.

Moreover, using blob (but not clob) as text:big replace also introduce some other benefits:

  • Drupal DB API already support BLOB, coming with both db_encode_blob() and db_decode_blob(). Even though Oracle require special BLOB INSERT/UPDATE handling, this can be easily solved by drupal_write_record(). So all stuff needed for a now ready.
  • Store huge data with blob can give a clear hints for normal contribute developer that "don't preform any SQL function with it, as this is not supported", or else may introduce potential performance impact.
  • According to my Siren research progress, replace text:big with blob can greatly improve cross database compatibility: oci8, pdo_oci, ibm_db2 and pdo_sqlite are both function correctly without any special problem.

P.S. 'default' => '', is important for Oracle and DB2 implementation: they will need to catch it and replace as EMPTY_BLOB() - the default empty blob placeholder. This is completely different from a simple NULL handling. This is also what http://drupal.org/node/195169 mentioned for.

Since Drupal 7.x is now open for public development, this patch de-fork from latest Siren research progress, and via latest Drupal 7.x CVS HEAD. Fully tested with MySQL and PostgreSQL.

hswong3i’s picture

Title: Replace some TEXT:BIG with BLOB, based on cross database compatiblilty concern » Replace some TEXT:BIG with BLOB, based on cross database compatibility concern
FileSize
15.68 KB

Append corresponding drupal_write_record() handling.

catch’s picture

Status: Needs review » Needs work

Not critical, is this still valid now #225450: Database Layer: The Next Generation is in?

catch’s picture

Priority: Critical » Normal
hswong3i’s picture

@catch: Thanks for your consideration. This issue won't able to be fix with our new D7 DB abstraction layer, as it is the physical limitation of DB itself.

In case of Oracle, only maximum 4000 characters is allowed for VARCHAR2, which come with GROUP BY and other SQL functionality. CLOB have unlimited size but can't function as like as MySQL's TEXT type; on the other hand, its support stage is not such stable as BLOB. Therefore replace some text:big field with BLOB can simply solve the problem. Case is also similar in DB2 and MSSQL.

E.g. {batch} will store a serialized command into DB, which always result as a single text content with more than 4000 characters, and so will buggy in case of Oracle. Node body and block body come with similar situation, too.

I am going to summarize my research progress into individual small patches for CVS HEAD. Hope to have your kindly support :-)

Damien Tournoud’s picture

@hswong3i: Please limit the scope of that issue to changing all TEXT:BIG that are used as BLOB (ie we only store chunk of data in them, we don't filter or order by them). Please also provide an appropriate upgrade path (ie. _update functions).

That mean your patch in #60 could get committed if you remove db_decode_blob() and drupal_write_record changes, and add upgrade paths.

Crell’s picture

That sounds like a limitation of Oracle if it has no equivalent to TEXT fields.

With Schema API, though, the exact underlying field type needn't be identical on each database. So we can use TEXT on MySQL and CLOB/BLOB on Oracle, as long as the driver itself handles marshalling data back and forth. That was a deliberate and key design of the new database API. So unless I'm misunderstanding you, we need only use CLOB/BLOB in the mappings for an Oracle driver; we don't need to change everyone else to use BLOB.

That said, don't we need non-opaque data for some of our larger text fields? If we do any sort of processing on them (LIKE for instance) then a BLOB field will break that. It's something to look into.

hswong3i’s picture

@Crell: It is truth that we can map different data type with database specific container, dependent on different requirement; but some fundamental functionalities are required for different mapping, too. Therefore replace ALL text:big filed as BLOB in core, or only map text:big as BLOB in case of Oracle within schema, will not help this case: we need to change the mapping case by case within different field definition, but not in schema absolutely.

E.g. We can map text:big (or every text type) as VARCHAR2(4000) in Oracle/DB2/MSSQL, so we can provide GROUP BY or LIKE functionality for all DB backend. Therefore we can utilize text:big for all description field, e.g. {boxex}.info and {term_data}.description.

On the other hand, we need to map some storage only fields as BLOB, e.g. {boxes}.body and {node_revisions}.body, so we can provide enough storage space without SQL functionality. This can also prevent crazily GROUP BY or LIKE SQL against hung storage which result as performance degrade.

My conclusion is: if we really need storage space rather than SQL functionality, remap field as BLOB; else just left it as text:big and let the 4000 characters limitation belongs to Oracle/DB2/MSSQL only. This should be the most fair solution with maximum cross database compatibility.

Crell’s picture

I'm still not sure I follow you. It sounds like you're saying we need a "text big, as parsable text" and a "text as opaque value" as separate field types that on MySQL/PostgreSQL/SQLite would both map to TEXT while on Oracle they'd map to varchar and BLOB, respectively? (Why BLOB and not CLOB I still don't understand, but then I don't understand anything about Oracle's design.)

hswong3i’s picture

This oracle schema implementation is my latest research progress before our next generation of database abstraction layer commit, which function with both installation and most core operation:

    'text:tiny'       => 'VARCHAR2(4000)',
    'text:small'      => 'VARCHAR2(4000)',
    'text:medium'     => 'VARCHAR2(4000)',
    'text:big'        => 'VARCHAR2(4000)',
    'text:normal'     => 'VARCHAR2(4000)',

    'blob:big'        => 'BLOB',
    'blob:normal'     => 'BLOB',

CLOB is only available in Oracle but not DB2/MSSQL, PHP + PDO + Oracle CLOB come with a lot of bugs , and CLOB type is much different from MySQL/PostgreSQL TEXT type (P.S. DB2 also come with similar limitation):

  • You cannot specify LOB columns in the ORDER BY clause of a query, or in the GROUP BY clause of a query or in an aggregate function.
  • You cannot specify a LOB column in a SELECT... DISTINCT or SELECT... UNIQUE statement or in a join. However, you can specify a LOB attribute of an object type column in a SELECT... DISTINCT statement or in a query that uses the UNION or MINUS set operator if the column's object type has a MAP or ORDER function defined on it.

So my suggestion is: as Drupal text type always assume with ORDER BY/GROUP BY/SELECT ... DISTINCT/SELECT... UNIQUE functionality, just let Oracle/DB2/MSSQL map it with VARCHAR(4000) with size limitation; on the other hand, replace some necessary fields as BLOB, trade its (not required) SQL functionality with nearly unlimited storage size.

hass’s picture

As I know - MsSQL 2005/2008 can use "varchar(max)" that works like MySQL "LONGTEXT"

hswong3i’s picture

@hass: But the main point is legacy PHP MSSQL driver (FreeTDS?) can't handle this correctly (only support up to VARCHAR 255; CLOB INSERT/UPDATE seems buggy). I still not yet have time to test with Microsoft's new PHP driver... Hopefully it should really work fine...

So at this moment, I will still suggest mapping text as VARCHAR(4000) in case of MSSQL for better compatibility :)

UPDATE: Sorry for wrong information: FreeTDS would be crazy with CLOB INSERT/UPDATE and only handle the first 255 characters; both VARCHAR and BLOB are safe. So mapping Drupal text type as VARCHAR(4000) should also be safe.

hass’s picture

Sounds like the driver is MsSQL 2000 based... maybe only not yet updated to the latest features... :-( as PDO MsSQL is experimental I expect not everything is already implemented :-). I wouldn't waste my time with SQL2K any more.

hswong3i’s picture

A split version for feedback is available: Remap field as BLOB: {aggregator_item}.description.

Crell’s picture

OK, I still don't see why we need to change anything on MySQL. Map text:big differently on Oracle and be done with it. No muss, no fuss. The only question is whether or not there are any text:big fields currently that we do need to parse in the database as text, not as an opaque blob. Before we make any schema changes, can you check through core and see if there are any? That will affect what if anything we need to do.

hswong3i’s picture

@Crell: Yes, I think we don't need any code base change for MySQL/PostgreSQL schema/abstraction implementation, but may need some footnote for general developers about how the behavior of text:big and blob should be, e.g. footnote for text:

Based on maximum cross database compatibility concern, some rules for using Drupal text type:
- Always assume text:big storage capacity as maximum 4000 characters only.
- Text type should function with the following SQL command: "ORDER BY", "GROUP BY", "SELECT ... DISTINCT" and "SELECT... UNIQUE".
- For storage capacity greater than 4000, use blob instead.

In case of blob:

Based on maximum cross database compatibility concern, some rules for using Drupal blob type:
- Usually support GB-scale storage capacity.
- Blob type should NOT function with the following SQL command: "ORDER BY", "GROUP BY", "SELECT ... DISTINCT" and "SELECT... UNIQUE".
- Always utilize drupal_write_record(), db_insert() and db_update() for blob operation.

Adding above information into DB driver schema definition should give a great hand to general developers, e.g. within MySQL/PostgreSQL's getFieldTypeMap(). Any idea?

hass’s picture

I don't like your "recommendation" very much. It is a mayor improvement in MsSQL 2005+ to have varchar(max). I'm really happy that we have this now as it adds compatibility to MySQL and PostgreSQL. Oracle and DB2 don't have it... well they need to fix their software :-))). In MsSQL you cannot compare TEXT fields with ... WHERE column1 = 'Your very long text that is longer than 8000 chars...'. Additional MS have dropped TEXT field support in MsSQL2008. I heard "rumours" from my colleagues that we must change all our TEXT fields before upgrading to 2008 (unverified myself).

hswong3i’s picture

Status: Needs work » Needs review
FileSize
5.74 KB

@hass: I don't like this "recommendation" too, but it do reflect to the real case: Oracle only support maximum 4000, DB2 only function as expected if smaller than 4000, where MSSQL not support larger than 8000 friendly; for a GCD, document Drupal's text support as maximum 4000 should be a reasonable choice (P.S. Sorry that this issue is not target for MSSQL only, but try to figure out a most fair handling among different database engine, together with maximum functionality).

Patch via CVS HEAD, documentation plus some code sync when compare MySQL and PostgreSQL implementation. May someone give a hand for my poor English?

hswong3i’s picture

FileSize
5.79 KB

Patch update with better documentation.

hass’s picture

Shouldn't TEXT not become a CLOB!? BLOB is binary large object... CLOB is character large object

hswong3i’s picture

@hass: I have comment for the reason in http://drupal.org/node/147947#comment-720555 :-)

Long story short: BLOB is stable, large in storage size, with less limitation among different database engines, when compare with that of CLOB.

Anonymous’s picture

Status: Needs review » Needs work

The last submitted patch failed testing.

hswong3i’s picture

Status: Needs work » Needs review
FileSize
5.77 KB

Patch reroll via CVS HEAD for documentation only.

We don't need to change MySQL implementation, this patch just give developers more idea about limitation between different database engine, and so choose suitable data type based on cross database concern.

hswong3i’s picture

Title: Replace some TEXT:BIG with BLOB, based on cross database compatibility concern » [DBTNG + XDB] Replace some TEXT:BIG with BLOB

Better title.

Status: Needs review » Needs work

The last submitted patch failed testing.

lilou’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch failed testing.

hswong3i’s picture

All BLOB related patches. Reroll via CVS HEAD.

Status: Needs review » Needs work

The last submitted patch failed testing.

Crell’s picture

Version: 7.x-dev » 8.x-dev

I am not convinced about moving so many fields to BLOB, but since so much has changed in core since the last post I don't think much of this is even relevant anymore. It's also not changing in Drupal 7.

jhedstrom’s picture

Version: 8.0.x-dev » 8.1.x-dev
Issue tags: +Needs issue summary update

Probably too late for 8.0.x now too.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

mgifford’s picture

Assigned: hswong3i » Unassigned

Unassigning stale issue. Hopefully someone else will pursue this.

  • Dries committed ae0392b on 8.3.x
    - Patch #147947 by hswong3i: some files shouldn't defined as text:big...
  • Dries committed 08c059e on 8.3.x
    - Rollback of patch #147947: lacks proper upgrade path.
    
    

  • Dries committed ae0392b on 8.3.x
    - Patch #147947 by hswong3i: some files shouldn't defined as text:big...
  • Dries committed 08c059e on 8.3.x
    - Rollback of patch #147947: lacks proper upgrade path.
    
    

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

  • Dries committed ae0392b on 8.4.x
    - Patch #147947 by hswong3i: some files shouldn't defined as text:big...
  • Dries committed 08c059e on 8.4.x
    - Rollback of patch #147947: lacks proper upgrade path.
    
    

  • Dries committed ae0392b on 8.4.x
    - Patch #147947 by hswong3i: some files shouldn't defined as text:big...
  • Dries committed 08c059e on 8.4.x
    - Rollback of patch #147947: lacks proper upgrade path.
    
    

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

  • Dries committed ae0392b on 9.1.x
    - Patch #147947 by hswong3i: some files shouldn't defined as text:big...
  • Dries committed 08c059e on 9.1.x
    - Rollback of patch #147947: lacks proper upgrade path.
    
    
daffie’s picture

Status: Needs work » Fixed
daffie’s picture

Status: Fixed » Closed (fixed)