[DBTNG + XDB] Replace some TEXT:BIG with BLOB [#147947]

currently (from nightly development snapshot from HEAD on 2007-05-30), the following fields are defined as text:big:

dc:/tmp/drupal-6.x-dev# find . -type f -name '*.schema' | xargs fgrep -nH 'text' | grep 'big'
./modules/search/search.schema:9:      'data' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big')
./modules/user/user.schema:30:      'perm' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/user/user.schema:65:      'data'      => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/node/node.schema:72:      'body'      => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:73:      'teaser'    => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:74:      'log'       => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:10:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:77:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/drupal/drupal.schema:11:      'slogan'  => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/drupal/drupal.schema:12:      'mission' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/system/system.schema:10:      'batch'     => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/system/system.schema:153:      'session'   => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/system/system.schema:194:      'value'    => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/dblog/dblog.schema:10:      'message'   => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/dblog/dblog.schema:11:      'variables' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/block/block.schema:39:      'body'   => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/comment/comment.schema:12:      'comment'   => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/comment/comment.schema:19:      'users'     => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:9:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:40:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:41:      'image'       => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:60:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:9:      'recipients' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:10:      'reply'      => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),

as mysql driver map text:big as longtext, which support for maximum 4GB size of data, it seems not reasonable to preform any comparison with such fields.

on the other hand, since some fields are really required to preform some testing within SQL directly, i will suggest change them into text:medium, or even smaller. so developer can simply guess that text:big should ONLY use as storage, but not for any SQL comparison.

finally, as oracle only provide VARCHAR2: maximum 4000 characters, can compare, and CLOB: maximum 4GB, can't compare, oracle driver will only map text:big as CLOB for maximum flexibly of sizing, and keep others as VARCHAR2(4000) for maximum compatibility. only keep required fields as text:big will greatly increase cross DB compatibility.

Comment	File	Size	Author
#89	blob-text2blob-1227760222.patch	13.46 KB	hswong3i
#89	blob-boxes-body-1227763338.patch	15.59 KB	hswong3i
#89	blob-menu_router-arguments-1227761227.patch	35.6 KB	hswong3i
#89	blob-locales_source_hash-1227760668.patch	10.54 KB	hswong3i
#87	dbtng-text2blob-1227552445.patch	13.44 KB	hswong3i
#83	dbtng-getFieldTypeMap-1226499767.patch	5.77 KB	hswong3i
#78	getFieldTypeMap.patch	5.79 KB	hswong3i
#76	getFieldTypeMap.patch	5.74 KB	hswong3i
#60	std_lob-2.1.patch	15.68 KB	hswong3i
#59	std_lob-2.0.patch	11.3 KB	hswong3i
#50	drupal-6.x-dev-clob-1.8.patch	42.47 KB	hswong3i
#42	drupal-6.x-dev-clob-1.7.patch	41.29 KB	hswong3i
#41	drupal-6.x-dev-clob-1.6.diff	41.29 KB	hswong3i
#38	drupal-6.x-dev-clob-1.5.diff	41.71 KB	hswong3i
#35	drupal-6.x-dev-clob-1.4.diff	41.6 KB	hswong3i
#33	drupal-6.x-dev-clob-1.3.diff	41.58 KB	hswong3i
#30	drupal-6.x-dev-clob-1.2.diff	40.98 KB	hswong3i
#25	drupal-6.x-dev-clob-1.1.diff	39.76 KB	hswong3i
#24	drupal-6.x-dev-clob-1.0.diff	37.63 KB	hswong3i
#23	drupal-6.x-dev-clob-0.8.diff	37.63 KB	hswong3i
#22	drupal-6.x-dev-clob-0.7.diff	37.44 KB	hswong3i
#21	drupal-6.x-dev-clob-0.6.diff	39.05 KB	hswong3i
#15	drupal-6.x-dev-clob-0.5.diff	46.19 KB	hswong3i
#14	drupal-6.x-dev-clob-0.4.diff	46.2 KB	hswong3i
#12	drupal-6.x-dev-clob-0.3.diff	36.99 KB	hswong3i
#4	drupal-6.x-dev-text_medium-1.0.patch	7.06 KB	hswong3i

Comments

Comment #1

hswong3i commented 30 May 2007 at 10:08

from my point of view, at least the following fields should keep as text:big, as we are not expected to preform any direct action to it, in whatever SQL or PHP level:

./modules/node/node.schema:72:      'body'      => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:73:      'teaser'    => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:74:      'log'       => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:10:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:77:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/block/block.schema:39:      'body'   => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:9:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:40:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:60:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),

Comment #2

chx commented 30 May 2007 at 13:24

Havng big is utterly pointless, medium is already 16M, can you imagine what would happen with poor PHP when you load a node which has a body with more (lot more) than 16M byte?

Comment #3

hswong3i commented 30 May 2007 at 14:39

16M or not it not the only consideration: i am trying to mark text:big as a "storage only" type, so whenever developer trying to use it, keep this in mind: don't do comparison to it.

this clear instruction can increase the usability of code, and also decrease the difficulty of debug. as mentioned in #2, text:medium is totally enough for daily usage.

just feel free to let text:big act as symbolic label, set up a suitable "rule of game", and make use of it ;)

Comment #4

hswong3i commented 30 May 2007 at 18:15

Status:

Needs work

» Needs review

Status	File	Size
new	drupal-6.x-dev-text_medium-1.0.patch	7.06 KB

this patch change some fields from text:big into text:medium in a simple reason: they are not target as "storage only". after apply this patch, will only left the following fields as text:big, which also means not expected to take any comparison toward it:

dc:/tmp/drupal-6.x-dev# find . -type f -name '*.schema' | xargs fgrep -nH 'text' | grep 'big'
./modules/node/node.schema:72:      'body'      => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:73:      'teaser'    => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:74:      'log'       => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:10:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:77:      'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/drupal/drupal.schema:11:      'slogan'  => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/drupal/drupal.schema:12:      'mission' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/block/block.schema:39:      'body'   => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/comment/comment.schema:12:      'comment'   => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:9:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:40:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:60:      'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:10:      'reply'      => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),

Comment #5

dries commented 30 May 2007 at 18:31

What is the problem with 'big' and 'Oracle'? I'd suspect they go hand in hand. :)

Care to shed some light on why this is a problem? I'd like to learn a bit more about this/Oracle.

Comment #6

hswong3i commented 30 May 2007 at 18:48

finally, as oracle only provide VARCHAR2: maximum 4000 characters, can compare, and CLOB: maximum 4GB, can't compare, oracle driver will only map text:big as CLOB for maximum flexibly of sizing, and keep others as VARCHAR2(4000) for maximum compatibility. only keep required fields as text:big will greatly increase cross DB compatibility.

on the other hand, a lot of additional handling are required for CLOB (http://php.net/oci8). you can seems oracle CLOB as a completely different data type from daily used text type under mysql: mysql seems longtext and text as the same, and so you will even able to run GROUP BY and ORDER BY with it, even it is in 4GB size!

BTW it is also seems reasonable: if you are going to do some SQL comparison, fine, use VARCHAR2 and take action within reasonable size for performance consideration; if you need space, fine, i give CLOB to you, but don't ask for any others ;(

and the truth is: i hate oracle's string handling!

Comment #7

moshe weitzman commented 31 May 2007 at 00:30

Status:

Needs review

» Reviewed & tested by the community

seems reasonable to me

Comment #8

dries commented 15 June 2007 at 07:14

Status:

Reviewed & tested by the community

» Fixed

Approved by Moshe, committed by Dries. :)

Comment #9

(not verified) commented 29 June 2007 at 07:18

Status:

Fixed

» Closed (fixed)

Comment #10

bjaspan commented 5 July 2007 at 04:15

Priority:	Normal	» Critical
Status:	Closed (fixed)	» Needs work

This patch changes the schema without updating existing databases to match the changes. When upgrading from Drupal 5 to 6, it appears that 13 tables end up not matching the schema. So, either this patch needs to be reverted or existing tables need to be updated.

I did not know about this patch until today. I haven't yet thought about the basic point of this patch thoroughly. Offhand, I don't like the idea that text:big is semantically different than text:medium on some database engines. The type 'text' should be semantically the same (i.e. comparable or not comparable) on all platforms regardless of the size which, as per the docs, is only a hint anyway. So if Oracle can only handle 4000 bytes for a comparable text:big, then that's all text:big can be on Oracle. If there is a database column that can't accept being limited to 4000 bytes on Oracle, it should use type blob or some other type that we define as "big enough but not comparable." If there is a database column that must be longer than 4000 bytes *and* must be comparable, then perhaps that functionality is just not implementable with Oracle (which I find hard to believe). However, as I said, I have not thought this through thoroughly yet.

I would also like to propose two things:

1. All patches to .install or .schema files must be tested with schema.module and produce a zero-mismatch comparison report before they can be committed.

2. All bugs which result in a schema comparison mismatch should be considered critical. We should not accept a bug which results in a database not matching the results of hook_schema. To this end, I have marked this issue as critical. If core committers disagree, they can downgrade it.

Comment #11

hswong3i commented 5 July 2007 at 10:01

i just find some useful information, and hope to share according to this topic (from http://phplens.com/lens/adodb/docs-oracle.htm):

4. Working With LOBs

Oracle treats data which is more than 4000 bytes in length specially. These are called Large Objects, or LOBs for short. Binary LOBs are BLOBs, and character LOBs are CLOBs. In most Oracle libraries, you need to do a lot of work to process LOBs, probably because Oracle designed it to work in systems with little memory. ADOdb tries to make things easy by assuming the LOB can fit into main memory.

ADOdb will transparently handle LOBs in select statements. The LOBs are automatically converted to PHP variables without any special coding.

For updating records with LOBs, the functions UpdateBlob( ) and UpdateClob( ) are provided. Here's a BLOB example. The parameters should be self-explanatory:
$ok = $db->Execute("insert into aTable (id, name, ablob) 
                                values (aSequence.nextVal, 'Name', null)");
if (!$ok) return LogError($db->ErrorMsg());
# params: $tableName, $blobFieldName, $blobValue, $whereClause
$db->UpdateBlob('aTable', 'ablob', $blobValue, 'id=aSequence.currVal');
and the analogous CLOB example:
$ok = $db->Execute("insert into aTable (id, name, aclob) 
                                values (aSequence.nextVal, 'Name', null)");
if (!$ok) return LogError($db->ErrorMsg());
$db->UpdateClob('aTable', 'aclob', $clobValue, 'id=aSequence.currVal');
Note that LogError( ) is a user-defined function, and not part of ADOdb.

Inserting LOBs is more complicated. Since ADOdb 4.55, we allow you to do this (assuming that the photo field is a BLOB, and we want to store $blob_data into this field, and the primary key is the id field):
	$sql = "INSERT INTO photos ( ID, photo) ".
			"VALUES ( :id, empty_blob() )".
			" RETURNING photo INTO :xx";

 		$stmt = $db->PrepareSP($sql);
		$db->InParameter($stmt, $id, 'id');
		$blob = $db->InParameter($stmt, $blob_data, 'xx',-1, OCI_B_BLOB);
		$db->StartTrans();
		$ok = $db->Execute($stmt);
		$db->CompleteTrans();

i also found that such special handling of LOBs is not only exists within oracle driver of ADOdb, but also for that of postgres, odbtp, sqlanywhere, informix72, ibase, odbc, mssql and db2:

dc:/home/hswong3i/project/adodb# find drivers/ -type f | xargs fgrep -nH 'function UpdateBlob'
drivers/adodb-postgres64.inc.php:347:   function UpdateBlobFile($table,$column,$path,$where,$blobtype='BLOB')
drivers/adodb-postgres64.inc.php:442:   function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-odbtp.inc.php:502:        function UpdateBlob($table,$column,$val,$where,$blobtype='image')
drivers/adodb-sqlanywhere.inc.php:144:  function UpdateBlob($table,$column,&$val,$where,$blobtype='BLOB')
drivers/adodb-informix72.inc.php:239:   function UpdateBlob($table, $column, $val, $where, $blobtype = 'BLOB')
drivers/adodb-ibase.inc.php:597:        function UpdateBlobFile($table,$column,$path,$where,$blobtype='BLOB')
drivers/adodb-ibase.inc.php:625:        function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-odbc.inc.php:567: function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-oci8.inc.php:675: function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-oci8.inc.php:709: function UpdateBlobFile($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-mssql.inc.php:645:        function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-db2.inc.php:678:  function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')

ok, let's back to the topic:

sorry that this patch didn't consider about the update of existing DB schema, as i think someone may help about this some days later. if this patch really break the upgrade process, i think it should be roll back
i agree about the use of "text" type should tread as the same across different DB backend. if oracle just can handle it by using VARCHAR2 with 4000 characters limitation, then this should be the problem of oracle. we shouldn't spend too much effort to do something that seems to be "mission impossible". e.g. i am now trying to work out a complete SQL query parser, to parse all SQL element into atomic, apply necessary changes (e.g. for CLOB, reserved word, max characters limitation, etc), and rebuild it. it is just wasting time, even though all query syntax are well documented within oracle official document
BTW, this also means that we need SOMETHING other than "text" type, as a storage type that SHOULD NEVER use for comparison during SQL level. may be it should call as CLOB (just similar as BLOB type) and label it as %c, or something else. therefore we can let node->body (and some other else) to use this new datatype. this seems much more reasonable and logical
due to the above suggestion, we should also take some improvement to DB API, so we can handle such new datetype as like as ADOdb does. as you can see that, a direct insert of CLOB is too complicated, we should have some helper function (as like as UpdateBlob) to handle such update independently, after a complete insertion. you can compare this idea with what we did with db_last_insert_id(), i think the case is just similar.

this is not the consideration of oracle driver implementation only, but also a cross DB compatibility concern. ADOdb try it, and come out with current solution. i don't really think we can do much better than that of ADOdb for Drupal: ADOdb is a project that starting from Sept 9 2000. maybe we should reuse their progress, and skip something that we are not really needed for, according to performance consideration.

Comment #12

hswong3i commented 9 July 2007 at 07:16

Title:	some filed shouldn't defined as text:big	» fix text:big, add CLOB, update BLOB handling
Category:	bug	» feature
Status:	Needs work	» Needs review

Status	File	Size
new	drupal-6.x-dev-clob-0.3.diff	36.99 KB

this patch add a new data type "CLOB" into drupal-6.x-dev schema API, and 2 more APIs: db_update_blob() and db_upadte_clob(). according to this change, some core module update its schema from "text:big" to "CLOB" type, with using new APIs for updating its values, e.g. node->body. NULL should input as placeholder for %b and %c during traditional db_query() insert/update query. On the other hand, this patch also rollback some field from "text:medium" into "text:big" for backward compatibility.

as bjaspan mention, all "text" types should remain same behavior among different DB backend: it should be able for compare. "text:big" should not be the special among the others, within oracle driver handling. the new data type "CLOB" can clearly explain its property as: character storage, large enough, but never directly using it for SQL-level compare. therefore developer can have a more strict guild line for their core/contribute development.

BTW, after studying about ADOdb cross DB handling, i found that the handling problem of character large object (CLOB) is not only appear within Oracle, but also among some other DBs. ADOdb propose another method, rather than insert/update CLOB directly: it provide additional API call UpdateBlob() for such LOBs update, after normal insert/update query. it seems to be a good idea, as we don't need to do additional query rewrite (which also means a complete SQL parser and rebuilder, which seems to be mission impossible) for CLOB handling.

additional benefit: as mysql version map "clob" into "longtext" and "text", where pgsql map this to "text", the add of additional API will not affect contribute module development, unless they change their module's schema, and choose to use CLOB for data storage.

P.S.: little bit change about %b handling: will insert/update NULL as placeholder within db_query(), for further more value update by using db_update_blob() and db_update_clob() manually. therefore cache.module require a small change about current blob handling. it may also affect other contribute modules which are currently using BLOB. BTW, as BLOB is not always use, this may not be a great problem.

Comment #13

hswong3i commented 9 July 2007 at 07:25

Category:

feature

» bug

Comment #14

hswong3i commented 11 July 2007 at 17:40

Status	File	Size
new	drupal-6.x-dev-clob-0.4.diff	46.2 KB

this patch standardize drupal BLOB/CLOB handling for cross DB compatibility, e.g. db_update_* for LOBs value update, db_encode_*/db_decode_* for LOBs value encoding/decoding. change incude:

new API: db_update_blob(), db_update_clob(), db_encode_clob(), db_decode_clob()
updated API: _db_query_callback(), db_encode_blob()
module with BLOB change: cache
modules with CLOB change: aggregator, block, comment, contact, node

for current stage, it is only a dummy mapping for "clob" type among mysql/pgsql, where mapping it as like as that of "text" type. BTW, it will greatly decrease the difficulty for other DB driver implementation, e.g. oralce, mssql and db2.

this patch only change the handling for LOBs type. therefore, unless developer choose to use BLOB/CLOB type, nothing will affect them. on the other hand, if developer choose to use LOBs type, which means that they will have (nearly) unlimited storage size, BTW trade off some query level features, e.g. using GROUP BY, ORDER BY, WHERE, etc.

Comment #15

hswong3i commented 12 July 2007 at 00:28

Status	File	Size
new	drupal-6.x-dev-clob-0.5.diff	46.19 KB

minor update, patch with latest CVS, fix some pgsql bugs.

complete tested with mysql + mysqli, pass though all core + optional modules installation. patched modules are all functioning :)

Comment #16

bjaspan commented 12 July 2007 at 12:37

Status:

Needs review

» Reviewed & tested by the community

@Dries: I marked this RTBC because the patch from comment #4 which was previously committed needs to be rolled back (the author agrees with this as stated in #11). As of today, it still reverse-applies to HEAD. Once the patch is rolled back, the issue should be set to CNW so further development can occur.

@hswong3i: I briefly reviewed (but did not test) your patch and in general it looks good. However, this is clearly an API-changing feature enhancement and, as such, cannot be committed now that D6 has reached code-freeze. We should continue working on this for D7. I realize this means that Oracle support may be (once again) limited for D6 without core patches, but in my opinion we cannot risk slipping in a change like this at this point in the cycle. If Dries/others disagrees, I'm certainly open to reviewing the patch more carefully.

I noticed one problem: db_update_clob/blob() is written to expect '{tablename}' instead of 'tablename' like the other schema api functions. The {tablename} notation is only for when table names are embedded in SQL. Otherwise, confusion will result.

Comment #17

hass commented 12 July 2007 at 16:07

MsSQL introduced in Ms SQL 2005 a varchar(max) data type to solve this CLOB comparison problem... so only older MsSQL Versions have this problem... maybe Oracle works on a new datatype, too?

Comment #18

hswong3i commented 15 July 2007 at 01:01

@bjaspan: i agree that it should be a bit difficult to commit this patch together with drupal-6.x-dev after code freeze, even though it is mainly a API enhancement. BTW, as oracle driver is already closed to drupal-6.x-dev, i will suggest to review the chance of accepting this enhancement. on the other hand, unless developer choose to use the new LOBs handling, none of their code need to be review.

the API of db_update_clob/blob() is still a prototype, we still can change it :)

@hass: hopelessly, oracle don't have such new data type, please refer to http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_eleme... for more information :)

Comment #19

dries commented 15 July 2007 at 10:08

Priority:	Critical	» Normal
Status:	Reviewed & tested by the community	» Needs work

I've rolled back the old patch so we can work more on the new patch.

Comment #20

hass commented 15 July 2007 at 10:52

Aside if we get this solved for Oracle, we will not have any troubles with MsSQL 2000, too :-).

Comment #21

hswong3i commented 15 July 2007 at 16:24

Status	File	Size
new	drupal-6.x-dev-clob-0.6.diff	39.05 KB

patch for latest CVS, update db_update_clob/blob(), update some document.

@bjaspan: now db_update_clob/blob() will accept real table name directly.

@hass: yes, as i know that is the problem for MSSQL 2000, if oracle driver solve it, MSSQL (and some other else) will also benefit from it :)

Comment #22

hswong3i commented 18 July 2007 at 13:30

Status	File	Size
new	drupal-6.x-dev-clob-0.7.diff	37.44 KB

patch according latest CVS, update handling for block.module and contact.module, fix minor bug for comment.module.
completely tested modules: cache, aggregator, block, comment, contact and node (all patched modules)

Comment #23

hswong3i commented 18 July 2007 at 14:55

Status	File	Size
new	drupal-6.x-dev-clob-0.8.diff	37.63 KB

minor update according to v0.7, update code which more make senses with argument input.

Comment #24

hswong3i commented 19 July 2007 at 10:06

Priority:	Normal	» Critical
Status:	Needs work	» Needs review

Status	File	Size
new	drupal-6.x-dev-clob-1.0.diff	37.63 KB

updated patch for latest CVS.

Comment #25

hswong3i commented 21 July 2007 at 04:17

Status	File	Size
new	drupal-6.x-dev-clob-1.1.diff	39.76 KB

version 1.1 update according to latest CVS, with following enhancements:

fix bug for contact.*. since contact.module is updated and split into contact.admin.inc, this patch will re-patch the LOBs handling for it
update db_update_*() API. now db_update_*() API will follow the style of db_query_*(), which become db_update_*($query, ..., $table, $column, $value). according to this change, the function will now only handle one LOBs field update each time, which simplify the use of it
update all modules LOBs patch according to above API update

since this patch is required by latest oracle driver implementation (http://drupal.org/node/39260), please kindly review the chance of accepting this patch after drupal-6.x-dev code freeze, many thanks :)

Comment #26

dries commented 21 July 2007 at 09:22

Status:

Needs review

» Needs work

I had a look at the patch and here are some comments:

1. In general, I think it makes sense to differentiate between 'binary data' and 'character data'.

+ * NOTE: %b and %c should input NULL as placeholder, and further more 
+ * update its value manually by using db_update_blob() or db_update_clob().

I don't understand that explanation. Please consider rewording it for clarity.

3. I don't understand why we need db_update_clob() and db_update_blob() functions? If there is no critical need for these functions, I'd prefer to remove them, and to let people write regular db_query("UPDATE ...") statements. If these functions are critical (rather than just handy), document in the code why they are critical.

-  return "'". mysql_real_escape_string($data, $active_db) ."'";
+  return !is_null($data) ? "'". mysql_real_escape_string($data, $active_db) ."'" : "''";

Why is that extra check required?

That's all for now. Looks promising but needs some more work around the edges. Let's give this patch some prime time so we might be able to include it in D6.

Comment #27

hswong3i commented 21 July 2007 at 09:59

most of the change are function hook for oracle driver, or other driver backend that require special handling to LOBs data.

2 & 3 & 4. for the case of oracle, we can't simply insert a "NULL" value for INSERT as placeholder, but EMPTY_BLOB() or EMPTY_CLOB(). this is because we can't do any update to a LOBs field with NULL value, but EMPTY_BLOB()/EMPTY_CLOB().

on the other hand, we also don't need to do any input value escape to input LOBs value as it will be handled by LOBs->save().

for example, in oracle driver:

<?php
function db_encode_blob($data) {
  return !is_null($data) ? $data : "EMPTY_BLOB()";
}
function db_encode_clob($data) {
  return !is_null($data) ? $data : "EMPTY_CLOB()";
}
?>

therefore, take block.module as example, the insert query will become:

INSERT INTO {boxes} (body, info, format) VALUES  (EMPTY_CLOB(), '%s', %d)

the update query will become:

SELECT body FROM boxes WHERE bid = %d FOR UPDATE

after we fetch the pointer for "body", we can then run LOBs->save() and update its value:

<?php
  // Fetch the SELECTed rows
  $rows = array();
  while ($row = oci_fetch_assoc($last_result)) {
    // Note:  Oracle returns all field names in uppercase and associative
    // indices in the result array will be uppercased too. We reset it back
    // to lower case in here.
    $row = array_change_key_case($row, CASE_LOWER);

    // Check if LOB object, discard the existing contents, and save a value to it.
    if (!(is_object($row[$column]) && $row[$column]->truncate() && $row[$column]->save($value))) {
      oci_rollback($active_db);
      return FALSE;
    }

    $rows[] = $row;
  }
?>

for more detail reason of this API changes, please compare the differences between mysql/pgsql handling with that of oracle. i feel so sorry to insert some dummy handling to mysql/pgsql version, but this give a great chance for other DB driver implementation :)

Comment #28

hass commented 21 July 2007 at 12:11

Don't forget to tell why we need db_update_clob() and db_update_blob() functions. There are differences in the update command for different DB's. In MsSQL there is a special UPDATETEXT, so we are better to know what we are updating. Additional you sometimes need to use pointers syntax for updating TEXT. Maybe other DBs have different update functions for updated text and binary data. So it's better to split, isn't it?

Comment #29

hswong3i commented 21 July 2007 at 15:32

@hass: great! you give us the answer that we are seeding for: different DB have different LOBs update method, and so we better split it out, and handle it with independent function API. seems db_update_*() can really help in MsSQL too :)

Comment #30

hswong3i commented 24 July 2007 at 04:52

Status	File	Size
new	drupal-6.x-dev-clob-1.2.diff	40.98 KB

patch according to latest CVS, update some document.

@Dries: i detail the document according to your questions:

2. since oracle will save pointer of LObs into DB table, and further more fetch value by calling LOB->save/load(), NULL (none of pointer) != EMPTY_CLOB() (pointer with empty value, able to use LOB->save/load()). the document update as:

<?php
+ * NOTE: %b and %c should input NULL as placeholder, and further more 
+ * update its value manually by using db_update_blob() or db_update_clob().
+ *
+ * This is because different database have different implementation about
+ * empty LOBs field. e.g. as MySQL/PgSQL map CLOB to general text type, NULL
+ * input will simply replace as '' (empty string). Update action will also
+ * be a simple UPDATE query.
+ * 
+ * On the other hand, as Oracle will save a pointer of CLOB into database,
+ * and update its value by LOB->save/load(), a NULL value means none of pointer
+ * (which not able to update its value). Therefore NULL input will replace as
+ * 'EMPTY_CLOB()' (a placeholder for "CLOB pointer with empty value") for
+ * further more update.
?>

3. similar reason as 2.: since oracle save a point for LOBs value into table, update of its value require additional handling, rather than a simple UPDATE query, an isolated APIs for LOBs value update are requred.

e.g. case of oracle driver: first of all, we need to fetch out LOB's pointer, by SELECT ... FOR UPDATE (current oracle implementation); secondly, start the transaction by oci_execute($last_result, OCI_DEFAULT); try to truncate old value, and then save its new value; rollback whenever face error, or commit transaction if success. it is very complicated, and should be handle individually :)

document of db_update_clob(), for mysql/mysqli, update as:

<?php
 * NOTE: Since MySQL support direct read/write to LOBs value by INSERT
 * or UPDATE query, this function will become a simple wrapper of UPDATE
 * query.
?>

4. the different will be shown by comparing MySQL handling with Oracle's. for MySQL, CLOB is a dummy mapping to longtext, so handle as:

<?php
function db_encode_clob($data) {
  global $active_db;
  return !is_null($data) ? "'". mysql_real_escape_string($data, $active_db) ."'" : "''";
}
?>

BTW for oracle, no escape is required for input value (will handle by LOB->save()), but require EMPTY_CLOB() as placeholder during INSERT/UPDATE query (as stated in 2.), code will become:

<?php
function db_encode_clob($data) {
  return !is_null($data) ? $data : "EMPTY_CLOB()";
}
?>

Comment #31

hswong3i commented 24 July 2007 at 04:54

Title:	fix text:big, add CLOB, update BLOB handling	» standardize LOBs handling for cross DB compatibility
Category:	bug	» feature
Status:	Needs work	» Needs review

Comment #32

dries commented 24 July 2007 at 09:51

I'll try to investigate this issue some more as soon time permits. In the mean time, it would be helpful if someone could help you massage the documentation in proper English. It's a little rough around the edges, but nothing we shouldn't be able to help you with. Thanks for the hard work, I'll get to it soon..

Comment #33

hswong3i commented 24 July 2007 at 15:46

Status	File	Size
new	drupal-6.x-dev-clob-1.3.diff	41.58 KB

@Dries: document updated. detail of changes: http://rafb.net/p/WGZMcE39.html

Comment #34

hswong3i commented 1 August 2007 at 04:21

according to my previous benchmarking result, there is no performance different when comparing LOBs patched with CVS HEAD (at least in case of MySQL). impact for API update should be the only concern of this patch.

on the other hand, with LOBs patch helps, oracle driver (and some other databases with similar physical limitation in VARCHAR storage sizing) will able to implement in a much flexible style, e.g. we are now able to store node body with more than 4000 characters :)

i have no further more suggestion or idea about this patch. LOBs patch is the major dependence of oracle driver for drupal-6.x-dev, based on functionality concern. hope someone may help about its coding or documents improvement :)

Comment #35

hswong3i commented 1 August 2007 at 09:12

Status	File	Size
new	drupal-6.x-dev-clob-1.4.diff	41.6 KB

patch based on latest CVS HEAD, update comment.module handling.

Comment #36

hswong3i commented 3 August 2007 at 16:21

when studying DB2, IBM's DB2 + Drupal document, and reference to ADOdb for existing implement, i found the following message about BLOB handling from ADOdb:

<?php
	/*
		Insert a null into the blob field of the table first.
		Then use UpdateBlob to store the blob.
		
		Usage:
		 
		$conn->Execute('INSERT INTO blobtable (id, blobcol) VALUES (1, null)');
		$conn->UpdateBlob('blobtable','blobcol',$blob,'id=1');
	*/
	function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
	{
		return $this->Execute("UPDATE $table SET $column=? WHERE $where",array($val)) != false;
	}
?>

i think this prove the necessary of LOBs patch: different database comes with different LOBs handling, sometime we need to insert special placeholder during INSERT/UPDATE, and further more update its value by special API :)

Comment #37

hswong3i commented 3 August 2007 at 16:32

and here is the case of MSSQL, within ADOdb implementation:

<?php
	/* 
		Unfortunately, it appears that mssql cannot handle varbinary > 255 chars
		So all your blobs must be of type "image".
		
		Remember to set in php.ini the following...
		
		; Valid range 0 - 2147483647. Default = 4096. 
		mssql.textlimit = 0 ; zero to pass through 

		; Valid range 0 - 2147483647. Default = 4096. 
		mssql.textsize = 0 ; zero to pass through 
	*/
	function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
	{
	
		if (strtoupper($blobtype) == 'CLOB') {
			$sql = "UPDATE $table SET $column='" . $val . "' WHERE $where";
			return $this->Execute($sql) != false;
		}
		$sql = "UPDATE $table SET $column=0x".bin2hex($val)." WHERE $where";
		return $this->Execute($sql) != false;
	}
?>

Comment #38

hswong3i commented 7 August 2007 at 17:35

Status	File	Size
new	drupal-6.x-dev-clob-1.5.diff	41.71 KB

minor update: patch for latest CVS HEAD, a bit document update, and update node_save() handling based on latest changes :)

P.S. i found this is very helpful when implementing both oracle and DB2 driver. as ADOdb and PDO both come with similar handling, and this patch will change programming logic only if developer choose to use CLOB for (almost) unlimited storage size, i guess it should ready to commit with no question. hope someone may help about its review :)

P.P.S. world is always not such perfect: oracle comes with VARCHAR2 (max. 4000 characters) and CLOB (max. 128TB), where DB comes with VARCHAR (max. 32672 characters) and BLOB/CLOB (max. 2GB). it is not easy to implement MySQL's text:big => longtext mapping for similar functionality: able to use for compare (VARCHAR2/VARCHAR), and also comes with unlimited size (CLOB). we should split size concern (clob) from functionality concern (text), and preform isolated handling :)

Comment #39

hass commented 7 August 2007 at 17:47

2GB or more shouldn't be an issue... i think - no - site will have inline images or website pages of such a size. And people saving images in blobs should be beaten... :-)

Will this patch land for D6? I don't like to spend time on a "database.mssql.inc" if this patch never goes in...

Comment #40

hswong3i commented 7 August 2007 at 18:08

@hass: i point out DB2's CLOB size just because i hope to compare it with MySQL. in case of DB2, we need to choose VARCHAR for functionality, or CLOB for size. neither one can clone MySQL's text:big => longtext handling, we need some trade off :(

P.S. MSSQL is also within my schedule, after Oracle and DB2. this is because we have ZendCore for both Oracle and DB2, so i don't need to spend too much time for system setup :)

Comment #41

hswong3i commented 14 August 2007 at 09:52

Status	File	Size
new	drupal-6.x-dev-clob-1.6.diff	41.29 KB

patch update based on latest CVS HEAD, with minor code change in node_save(). completely tested with MySQL.

Comment #42

hswong3i commented 20 August 2007 at 16:55

Title:

standardize LOBs handling for cross DB compatibility

» standardize LOBs handling

Status	File	Size
new	drupal-6.x-dev-clob-1.7.patch	41.29 KB

patch for latest CVS

Comment #43

hswong3i commented 24 August 2007 at 05:01

after an indeed studying about ADOdb and PDO database abstract layer, i found that a special LOBs value handling API is almost a must, based on cross database compatibility concern. here is a simple definition of LOBs value, quote from PDO (http://hk.php.net/pdo):

At some point in your application, you might find that you need to store "large" data in your database. Large typically means "around 4kb or more", although some databases can happily handle up to 32kb before data becomes "large". Large objects can be either textual or binary in nature.

in case of ADOdb, LOBs value will handle by: 1. normal INSERT query, with LOBs value placeholder (NULL for most database, EMPTY_BLOB/EMPTY_CLOB for Oracle); 2. manually update LOBs value by calling UpdateBlob. here is a simple code snippet for CLOB handling (reference to http://phplens.com/lens/adodb/docs-oracle.htm):

<?php
$ok = $db->Execute("insert into aTable (id, name, aclob) 
                                values (aSequence.nextVal, 'Name', null)");
if (!$ok) return LogError($db->ErrorMsg());
$db->UpdateClob('aTable', 'aclob', $clobValue, 'id=aSequence.currVal');
?>

in case of PDO, LOBs value will handle by: 1. normal INSERT query, with placeholder as "?"; 2. prepare and bind each value with placeholder, by calling bindParam() (NOTE: you will need to define data_type as PARAM_LOB explicitly for LOBs value binding); 3. execute transaction. here is a simple code snippet for inserting an image into a database by using PDO (reference to http://hk2.php.net/pdo):

<?php
$db = new PDO('odbc:SAMPLE', 'db2inst1', 'ibmdb2');
$stmt = $db->prepare("insert into images (id, contenttype, imagedata) values (?, ?, ?)");
$id = get_new_id(); // some function to allocate a new ID

// assume that we are running as part of a file upload form
// You can find more information in the PHP documentation

$fp = fopen($_FILES['file']['tmp_name'], 'rb');

$stmt->bindParam(1, $id);
$stmt->bindParam(2, $_FILES['file']['type']);
$stmt->bindParam(3, $fp, PDO::PARAM_LOB);

$stmt->beginTransaction();
$stmt->execute();
$stmt->commit();
?>

so what's the case of this patch? case is just similar as both ADOdb and PDO: 1. normal INSERT/UPDATE query, with NULL as LOBs value placeholder (it is because db_encode_blob() and db_encode_clob() will translate this NULL as suitable under layer placeholder); 2. manually update LOBs value by calling db_update_blob() or db_update_clob(). here is a simple code snippet for block content insertion:

  db_query("UPDATE {boxes} SET body = %c, info = '%s', format = %d WHERE bid = %d", NULL, $edit['info'], $edit['format'], $delta);
  db_update_clob('bid = %d', $delta, db_prefix_tables('{boxes}'), 'body', $edit['body']);

my research result is very simple: a special API is the best, common, and strict forward solution for LOBs value handling. even we may working with PDO in D7 or later (with array-style variable binding), it is also a must, for a hook which telling under layer what type of data we are now inserting, or else we can NEVER handle LOBs value insertion correctly.

moreover, that is no point to revamp our database API, even we are now changing our backend as PDO: our database API is powerful and suitable for most cases. PDO should only act as another under layer database driver supporting (maybe call as database.pdo.inc, or some other else). PDO SHOULDN'T be a replacement of existing implementation.

anyway, if we are just focusing on how PDO can benefit on performance when using with MySQL/PgSQL (which own more than 98% of our clients), but simply forget its powerful cross database ability, this will seems a bit silly... as we take a lot of effort on schema API, database abstract layer, etc, based on cross database concern, this idea is just a step backward ;p

Comment #44

Crell commented 24 August 2007 at 05:49

Since I suspect the latest comment is aimed at me based on a conversation in IRC earlier today, I'll jump in here. :-)

First off, no, the PHP 4/MySQL 3-era mysql driver is not at all robust or secure. That we layer a regex on top of it that is as slow as the much more robust and complex code in PDO is ample evidence of that. As long as our lowest common denominator is the mysql extension, we can't do anything interesting in the database system. mysqli is also not part of the standard install of PHP, and many shared hosts don't offer it. (mysql isn't either as of PHP 5.0, but most hosts install it for legacy support.) We can't rely on mysqli for MySQL support, which is what the vast majority of Drupal sites run on. That leaves PDO, which is reasonably cross-database (at the API level at least), reasonably fast (although not as fast as native APIs, it's still not a dog because it's all in C), and, most importantly, we can rely on it being present in any PHP 5 install as it's been part of the default configuration since PHP 5.1. That alone makes it a worthwhile move for D7, even if it weren't for easier Oracle/DB2/MS SQL support.

For data-modification queries, actually, in the D7 API as I'm currently building it you should never, ever use db_query() for an insert/update/delete statement. You should instead do

db_insert('node', array('fieldname'=>$value));

Aside from being easier to work with and more robust, it means you're passing not a serialized string to the database system but a data structure. Within the database driver, if necessary, you can do additional escaping, LOB handling, etc. without worrying about regexes and without putting extra work on the module developer.

Which brings up the next point. Right now, virtually all Drupal sites run MySQL or Postgres, with MySQL far and away the leader. Most Drupal developers know from MySQL primarily, I wager. The concept of LOBs and BLOBs is foreign to them, and many will likely see the necessity of it as a bug in those databases. (There are times when I do, too.) We should not build an API around a database that on a good day may be 2-3% of all Drupal installs for the foreseeable future. Most developers won't even have any way to test it to see if they're doing it "right". (How many people here have a copy of Oracle just lying around? :-) )

That sort of logic belongs inside the Oracle/DB2 driver, not exposed to every module developer to be able to forget or get wrong. If we can make that driver's job easier without making the API substantially more complex, we should do so. But not if it means adding a lot of needless, untestable, easy to break work for the vast majority of Drupal developers who are MySQL-based.

Comment #45

hass commented 24 August 2007 at 06:10

Most databases are less then 2 or 4 GB... there are free versions of Oracle and MSSQL. Aside if i have the chance to get it running on our MSSQL2K5 active/active cluster i would move there... why should i spend my time on administering two different DBs if i can do it with one. Today we have simply no choice - eat or die with MySQL or PgSQL.

Comment #46

hswong3i commented 24 August 2007 at 06:58

@Crell: i would like to have more details about how can your abstract layer handle LOBs value correctly? i have no question about inserting data with your array-style input, but where are you obtaining the information of these data type? it will be a bit not acceptable if: "as most data type will bind automatically and correctly by using PDO (unless LOBs, which need for explicitly defined), so we will simply forget LOBs handling in D7"...

@hass: good point. e.g. my client require for Oracle, due to political concern. therefore drupal will NEVER become their choice, as we are not supporting Oracle natively... (even though, we are suitable based on functional concern...)

Comment #47

hswong3i commented 24 August 2007 at 07:39

@Crell: moreover, i am not opposing PDO supporting in D7/8/..., but try to figure out those common problems that we both need to due with. as we are both working for database abstract layer, we should facing with similar problem, no matter we are working with ADOdb or PDO ;)

e.g. db_num_rows() is a common problem among different database. this is not a problem of database abstract layer implementation, but dependent on database natively: some database won't pass the complete result set to client side after SELECT query; on the other hand, pass each row from result set when calling for fetch action (Oracle/DB2). since PDO working as a cross database abstract layer (as what i am now doing, but user level), which can't solve this problem natively, they try to REMOVE this support.

this is just similar as case of LOBs handling: since special API is a must, so we ADD this support. both ADOdb and PDO (moreover, this patch, too) propose for similar API, in order to due with similar difficulty.

LOBs handling is a common problem that we both need to solve; on the other hand, the successful of this patch should also equal to the successful of PDO, in coming future ;)

Comment #48

hswong3i commented 30 August 2007 at 10:12

After an indeed studying about differences between ADOdb, PDO and Drupal 6.x database abstract layer, it show that we are still missing a complete and standardized LOBs handling API, based on cross database compatibility concern (besides MySQL/PgSQL).

Drupal 6.x existing database abstract layer provide db_encode_blob() and db_encode_blob() for BLOB handling only, as PgSQL require special API for BLOB encode/decode (In case of MySQL, it is a dummy abstraction, as MySQL seems BLOB data as like as normal string). BTW, this is not the complete picture of LOBs handling and its difficulties: besides BLOB, it is also a critical concern about limitation on CHAR/VARCHAR/CLOB (Character Large OBject) among most database; moreover, method of LOBs INSERT/UPDATE is also database specific.

LOBs patch for Drupal 6.x (http://drupal.org/node/147947) will able to fill in this missing section, extent and complete our LOBs API functionality, with handy procedures. It should be the most common and strict forward solution when dueling with LOBs related difficulties. On the other hand, it shouldn't be a confusion for normal MySQL users, as it is very closed to the case of existing db_encode_blob() and db_decode_blob() implementation.

We shouldn't miss the chances, but our chances are waiting for this patch :)

(To view the thread, please navigate to http://edin.no-ip.com/html/?q=node/302)

Comment #49

hswong3i commented 31 August 2007 at 18:18

Status:

Needs review

» Reviewed & tested by the community

as i am now indeed studying about PDO + MySQL/Oracle/DB2, i find that this patch would also be a very important functional hook for our next generation database abstract layer development, based on cross database concern (even we may move to PDO, as this is a database-level but not database-abstraction-level problem). on the other hand, this patch is stable enough since July 24, 2007, and it functionality is already fully proved by the successful of Oracle/DB2 driver development. it should be RTBC.

Comment #50

hswong3i commented 31 August 2007 at 19:36

Status	File	Size
new	drupal-6.x-dev-clob-1.8.patch	42.47 KB

patch via latest CVS HEAD

Comment #51

hswong3i commented 2 September 2007 at 18:52

As it is now one day before D6 beta, it seems that not easy to commit this patch on time, together with Oracle and DM2 supporting. Even though it is a bit pity (as all are ready for production), I am now merging this patch into another technical preview of D7 database abstract layer, too (http://drupal.org/node/172541). I will keep on promoting this LOBs handling in our next generation of database abstraction layer.

BTW, if you are looking for unofficial Oracle (http://drupal.org/node/39260) and DB2 (http://drupal.org/node/165788) supporting within D6, please refer to their own issue. I will merge this patch (maybe some other additional patches) with those issues. On the other hand, I will provide patches based on latest CVS HEAD of D6, within D6 life cycle. So you will able to have those unofficial database supporting with a single patch apply :)

Comment #52

hass commented 2 September 2007 at 20:50

That's unbelievable bad information... i hoped to see this committed :-(.

Comment #53

hswong3i commented 3 September 2007 at 06:55

Version:	6.x-dev	» 7.x-dev
Priority:	Critical	» Normal
Status:	Reviewed & tested by the community	» Active

@hass: yes, how pity it is... so let's postpone it into D7?

i will provide unofficial patch for both oracle and db2 within D6 life cycle, so people who are interested in this topic can get such supporting ON TIME, but not sit and waiting for D7 comes. i guess most of the projects needed for such unofficial patch are enterprise-related project, such little tailor-made handling may be acceptable for their professional and skillful developer :)

P.S. the main idea is: i will also benefit on this combined unofficial patch handling. i am now scheduled to move my personal Drupal site into Oracle with D6, and so i will able to keep on develop and test it daily. a single patch can save a lot of our time, during update ;p

Comment #54

hass commented 3 September 2007 at 08:44

I'm not going to use unofficial patches. I have too many dependencies with modules and so on and i don't like to patch them all myself if they are buggy. This makes me going the mysql way another year or more, what suxxx much.

Comment #55

hswong3i commented 3 September 2007 at 10:09

@hass: let's take it easy :)

i will not force anyone to use this unofficial patch: as it is unofficial, no one can generate about it correctness, and it is always not targeting for our end users. e.g. you will never use Debian's unstable release for production purpose, but only stable, isn't it?

BTW, we will always need time to prove its correctness. we need helpers (or better says as HACKERS ;p) to test the implementation, and provide feedback, before it can finally goes into core. as proved by the fail of this patch within D6, it will for sure to be another pity issue, if we start the development TOGETHER with the open of D7: it will be too late once again :(

a simple and handy unofficial patch may speed up the development, on the other hand give a big hand to those require for such features ON TIME. so please just let it be :)

Comment #56

hswong3i commented 22 December 2007 at 07:19

Title:

standardize LOBs handling

» Siren #2: Add %c for CLOB, and standardize LOG handling

Comment #57

chx commented 23 December 2007 at 09:20

Title:

Siren #2: Add %c for CLOB, and standardize LOG handling

» Add %c for CLOB, and standardize LOG handling

As Siren is not a project on Drupal.org but an unofficial fork, I am removing it from issue titles to avoid confusion.

Comment #58

hswong3i commented 2 January 2008 at 02:56

Even though DB2 don't explicitly indicate the limitation of VARCHAR as like as that for Oracle (Oracle will only allow for VARCHAR2 with max 4000 characters; above this limitation, please consider to use CLOB with TB-scale storage), it is still logically limited as 4000 characters.

A minor footnote from IBM DB2:

The functions in the SYSFUN schema taking a VARCHAR as an argument will not accept VARCHARs greater than 4 000 bytes long as an argument. However, many of these functions also have an alternative signature accepting a CLOB(1M). For these functions, the user may explicitly cast the greater than 4 000 VARCHAR strings into CLOBs and then recast the result back into VARCHARs of desired length.

This should be a good example why CLOB is something necessary for most database implementation. We are not always as lucky as case of MySQL and PostgreSQL, which we are able to work with TEXT for (nearly) unlimited size. On the other hand, we have CLOB, which usually required for special handling, e.g. database dependent INSERT/UPDATE/DELETE abstraction, decode CLOB as like as BLOB when fetching data, etc.

Since we are now having drupal_write_record() for high level programming abstraction, and (may) have INSERT/UPDATE/DELETE API for driver level abstraction, CLOB should be something much simpler than case of before. Moreover, this logic is now proved as function with Siren: Siren is able to work with Oracle CLOB, on the other hand don't introduce complicated abstraction to MySQL and PostgreSQL.

Comment #59

hswong3i commented 7 February 2008 at 09:34

Title:	Add %c for CLOB, and standardize LOG handling	» Replace some TEXT:BIG with BLOB, based on cross database compatiblilty concern
Category:	feature	» task
Priority:	Normal	» Critical
Status:	Active	» Needs review

Status	File	Size
new	std_lob-2.0.patch	11.3 KB

After an indeed study of pdo_oci, CLOB is not the optimal solution for this issue (pdo_oci currently not support CLOB). On the other hand, I would like to summarize the reason of this issues once again:

Most DB don't support text:big as large storage space as MySQL and PostgreSQL. E.g. DB2 don't support VARCHAR > 4000, Oracle ONLY support VARCHAR2 with max 4000, legacy PHP MSSQL driver only support max 255 characters when pass though. Assume text:big as "unlimited storage size, fully functional as VARCHAR" is ALWAYS NOT TRUTH for cross database compatible concern.
Some DB support CLOB, e.g. Oracle and DB2. BTW, CLOB is not something standardized, and so its support stage is most likely implementation dependent. Therefore adding CLOB support, where map MySQL and PostgreSQL as a dummy mapping to TEXT won't be an optimal solution.
BLOB is something MOST database support, and its definition is ALWAYS clear: (assume) unlimited storage space, raw read-write, no SQL functionality, (assume) used as storage ONLY". As we are requesting something for huge storage, and we all understand that will be a performance impact if we preform SQL functionality with TEXT (which may store up to 4GB in case of MySQL), BLOB seems to be a suitable storage type when compare with CLOB or TEXT.

Moreover, using blob (but not clob) as text:big replace also introduce some other benefits:

Drupal DB API already support BLOB, coming with both db_encode_blob() and db_decode_blob(). Even though Oracle require special BLOB INSERT/UPDATE handling, this can be easily solved by drupal_write_record(). So all stuff needed for a now ready.
Store huge data with blob can give a clear hints for normal contribute developer that "don't preform any SQL function with it, as this is not supported", or else may introduce potential performance impact.
According to my Siren research progress, replace text:big with blob can greatly improve cross database compatibility: oci8, pdo_oci, ibm_db2 and pdo_sqlite are both function correctly without any special problem.

P.S. 'default' => '', is important for Oracle and DB2 implementation: they will need to catch it and replace as EMPTY_BLOB() - the default empty blob placeholder. This is completely different from a simple NULL handling. This is also what http://drupal.org/node/195169 mentioned for.

Since Drupal 7.x is now open for public development, this patch de-fork from latest Siren research progress, and via latest Drupal 7.x CVS HEAD. Fully tested with MySQL and PostgreSQL.

Comment #60

hswong3i commented 7 February 2008 at 09:56

Title:

Replace some TEXT:BIG with BLOB, based on cross database compatiblilty concern

» Replace some TEXT:BIG with BLOB, based on cross database compatibility concern

Status	File	Size
new	std_lob-2.1.patch	15.68 KB

Append corresponding drupal_write_record() handling.

Comment #61

catch

he/him

English

commented 22 August 2008 at 10:13

Status:

Needs review

» Needs work

Not critical, is this still valid now #225450: Database Layer: The Next Generation is in?

Comment #62

catch

he/him

English

commented 22 August 2008 at 10:13

Priority:

Critical

» Normal

Comment #63

hswong3i commented 23 August 2008 at 17:54

@catch: Thanks for your consideration. This issue won't able to be fix with our new D7 DB abstraction layer, as it is the physical limitation of DB itself.

In case of Oracle, only maximum 4000 characters is allowed for VARCHAR2, which come with GROUP BY and other SQL functionality. CLOB have unlimited size but can't function as like as MySQL's TEXT type; on the other hand, its support stage is not such stable as BLOB. Therefore replace some text:big field with BLOB can simply solve the problem. Case is also similar in DB2 and MSSQL.

E.g. {batch} will store a serialized command into DB, which always result as a single text content with more than 4000 characters, and so will buggy in case of Oracle. Node body and block body come with similar situation, too.

I am going to summarize my research progress into individual small patches for CVS HEAD. Hope to have your kindly support :-)

Comment #64

damien tournoud commented 23 August 2008 at 18:01

@hswong3i: Please limit the scope of that issue to changing all TEXT:BIG that are used as BLOB (ie we only store chunk of data in them, we don't filter or order by them). Please also provide an appropriate upgrade path (ie. _update functions).

That mean your patch in #60 could get committed if you remove db_decode_blob() and drupal_write_record changes, and add upgrade paths.

Comment #65

Crell commented 23 August 2008 at 20:38

That sounds like a limitation of Oracle if it has no equivalent to TEXT fields.

With Schema API, though, the exact underlying field type needn't be identical on each database. So we can use TEXT on MySQL and CLOB/BLOB on Oracle, as long as the driver itself handles marshalling data back and forth. That was a deliberate and key design of the new database API. So unless I'm misunderstanding you, we need only use CLOB/BLOB in the mappings for an Oracle driver; we don't need to change everyone else to use BLOB.

That said, don't we need non-opaque data for some of our larger text fields? If we do any sort of processing on them (LIKE for instance) then a BLOB field will break that. It's something to look into.

Comment #66

hswong3i commented 24 August 2008 at 05:12

@Crell: It is truth that we can map different data type with database specific container, dependent on different requirement; but some fundamental functionalities are required for different mapping, too. Therefore replace ALL text:big filed as BLOB in core, or only map text:big as BLOB in case of Oracle within schema, will not help this case: we need to change the mapping case by case within different field definition, but not in schema absolutely.

E.g. We can map text:big (or every text type) as VARCHAR2(4000) in Oracle/DB2/MSSQL, so we can provide GROUP BY or LIKE functionality for all DB backend. Therefore we can utilize text:big for all description field, e.g. {boxex}.info and {term_data}.description.

On the other hand, we need to map some storage only fields as BLOB, e.g. {boxes}.body and {node_revisions}.body, so we can provide enough storage space without SQL functionality. This can also prevent crazily GROUP BY or LIKE SQL against hung storage which result as performance degrade.

My conclusion is: if we really need storage space rather than SQL functionality, remap field as BLOB; else just left it as text:big and let the 4000 characters limitation belongs to Oracle/DB2/MSSQL only. This should be the most fair solution with maximum cross database compatibility.

Comment #67

Crell commented 24 August 2008 at 07:05

I'm still not sure I follow you. It sounds like you're saying we need a "text big, as parsable text" and a "text as opaque value" as separate field types that on MySQL/PostgreSQL/SQLite would both map to TEXT while on Oracle they'd map to varchar and BLOB, respectively? (Why BLOB and not CLOB I still don't understand, but then I don't understand anything about Oracle's design.)

Comment #68

hswong3i commented 24 August 2008 at 11:41

This oracle schema implementation is my latest research progress before our next generation of database abstraction layer commit, which function with both installation and most core operation:

    'text:tiny'       => 'VARCHAR2(4000)',
    'text:small'      => 'VARCHAR2(4000)',
    'text:medium'     => 'VARCHAR2(4000)',
    'text:big'        => 'VARCHAR2(4000)',
    'text:normal'     => 'VARCHAR2(4000)',

    'blob:big'        => 'BLOB',
    'blob:normal'     => 'BLOB',

CLOB is only available in Oracle but not DB2/MSSQL, PHP + PDO + Oracle CLOB come with a lot of bugs , and CLOB type is much different from MySQL/PostgreSQL TEXT type (P.S. DB2 also come with similar limitation):

You cannot specify LOB columns in the ORDER BY clause of a query, or in the GROUP BY clause of a query or in an aggregate function.

You cannot specify a LOB column in a SELECT... DISTINCT or SELECT... UNIQUE statement or in a join. However, you can specify a LOB attribute of an object type column in a SELECT... DISTINCT statement or in a query that uses the UNION or MINUS set operator if the column's object type has a MAP or ORDER function defined on it.

So my suggestion is: as Drupal text type always assume with ORDER BY/GROUP BY/SELECT ... DISTINCT/SELECT... UNIQUE functionality, just let Oracle/DB2/MSSQL map it with VARCHAR(4000) with size limitation; on the other hand, replace some necessary fields as BLOB, trade its (not required) SQL functionality with nearly unlimited storage size.

Comment #69

hass commented 24 August 2008 at 12:05

As I know - MsSQL 2005/2008 can use "varchar(max)" that works like MySQL "LONGTEXT"

Comment #70

hswong3i commented 24 August 2008 at 17:51

@hass: But the main point is legacy PHP MSSQL driver (FreeTDS?) can't handle this correctly (only support up to VARCHAR 255; CLOB INSERT/UPDATE seems buggy). I still not yet have time to test with Microsoft's new PHP driver... Hopefully it should really work fine...

So at this moment, I will still suggest mapping text as VARCHAR(4000) in case of MSSQL for better compatibility :)

UPDATE: Sorry for wrong information: FreeTDS would be crazy with CLOB INSERT/UPDATE and only handle the first 255 characters; both VARCHAR and BLOB are safe. So mapping Drupal text type as VARCHAR(4000) should also be safe.

Comment #71

hass commented 24 August 2008 at 15:48

Sounds like the driver is MsSQL 2000 based... maybe only not yet updated to the latest features... :-( as PDO MsSQL is experimental I expect not everything is already implemented :-). I wouldn't waste my time with SQL2K any more.

Comment #72

hswong3i commented 24 August 2008 at 17:53

A split version for feedback is available: Remap field as BLOB: {aggregator_item}.description.

Comment #73

Crell commented 24 August 2008 at 19:14

OK, I still don't see why we need to change anything on MySQL. Map text:big differently on Oracle and be done with it. No muss, no fuss. The only question is whether or not there are any text:big fields currently that we do need to parse in the database as text, not as an opaque blob. Before we make any schema changes, can you check through core and see if there are any? That will affect what if anything we need to do.

Comment #74

hswong3i commented 25 August 2008 at 08:52

@Crell: Yes, I think we don't need any code base change for MySQL/PostgreSQL schema/abstraction implementation, but may need some footnote for general developers about how the behavior of text:big and blob should be, e.g. footnote for text:

Based on maximum cross database compatibility concern, some rules for using Drupal text type:
- Always assume text:big storage capacity as maximum 4000 characters only.
- Text type should function with the following SQL command: "ORDER BY", "GROUP BY", "SELECT ... DISTINCT" and "SELECT... UNIQUE".
- For storage capacity greater than 4000, use blob instead.

In case of blob:

Based on maximum cross database compatibility concern, some rules for using Drupal blob type:
- Usually support GB-scale storage capacity.
- Blob type should NOT function with the following SQL command: "ORDER BY", "GROUP BY", "SELECT ... DISTINCT" and "SELECT... UNIQUE".
- Always utilize drupal_write_record(), db_insert() and db_update() for blob operation.

Adding above information into DB driver schema definition should give a great hand to general developers, e.g. within MySQL/PostgreSQL's getFieldTypeMap(). Any idea?

Comment #75

hass commented 25 August 2008 at 10:41

I don't like your "recommendation" very much. It is a mayor improvement in MsSQL 2005+ to have varchar(max). I'm really happy that we have this now as it adds compatibility to MySQL and PostgreSQL. Oracle and DB2 don't have it... well they need to fix their software :-))). In MsSQL you cannot compare TEXT fields with ... WHERE column1 = 'Your very long text that is longer than 8000 chars...'. Additional MS have dropped TEXT field support in MsSQL2008. I heard "rumours" from my colleagues that we must change all our TEXT fields before upgrading to 2008 (unverified myself).

Comment #76

hswong3i commented 25 August 2008 at 16:44

Status:

Needs work

» Needs review

Status	File	Size
new	getFieldTypeMap.patch	5.74 KB

@hass: I don't like this "recommendation" too, but it do reflect to the real case: Oracle only support maximum 4000, DB2 only function as expected if smaller than 4000, where MSSQL not support larger than 8000 friendly; for a GCD, document Drupal's text support as maximum 4000 should be a reasonable choice (P.S. Sorry that this issue is not target for MSSQL only, but try to figure out a most fair handling among different database engine, together with maximum functionality).

Patch via CVS HEAD, documentation plus some code sync when compare MySQL and PostgreSQL implementation. May someone give a hand for my poor English?

Comment #77

hswong3i commented 28 August 2008 at 05:59

May someone give a hand for these split patches review?
Remap field as BLOB: {aggregator_item}.description
Remap field as BLOB: {boxes}.body
Remap field as BLOB: {comments}.comment
Remap field as BLOB: {node_revisions}.body and {node_revisions}.teaser
Remap field as BLOB: {watchdog}.variables

Comment #78

hswong3i commented 31 August 2008 at 09:58

Status	File	Size
new	getFieldTypeMap.patch	5.79 KB

Patch update with better documentation.

Comment #79

hswong3i commented 8 October 2008 at 18:22

Updated tasklist, with successful simpletest in both MySQL and PostgreSQL:

A bit buggy with PostgreSQL based on http://drupal.org/node/316095:

Remap field as BLOB: {node_revisions}.body and {node_revisions}.teaser

Comment #80

hass commented 8 October 2008 at 18:46

Shouldn't TEXT not become a CLOB!? BLOB is binary large object... CLOB is character large object

Comment #81

hswong3i commented 8 October 2008 at 19:15

@hass: I have comment for the reason in http://drupal.org/node/147947#comment-720555 :-)

Long story short: BLOB is stable, large in storage size, with less limitation among different database engines, when compare with that of CLOB.

Comment #82

Anonymous (not verified) commented 11 November 2008 at 18:20

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Comment #83

hswong3i commented 12 November 2008 at 14:36

Status:

Needs work

» Needs review

Status	File	Size
new	dbtng-getFieldTypeMap-1226499767.patch	5.77 KB

Patch reroll via CVS HEAD for documentation only.

We don't need to change MySQL implementation, this patch just give developers more idea about limitation between different database engine, and so choose suitable data type based on cross database concern.

Comment #84

hswong3i commented 13 November 2008 at 05:07

Title:

Replace some TEXT:BIG with BLOB, based on cross database compatibility concern

» [DBTNG + XDB] Replace some TEXT:BIG with BLOB

Better title.

Comment #85

16 November 2008 at 22:00

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Comment #86

lilou commented 17 November 2008 at 14:28

Status:

Needs work

» Needs review

See: #335122: Test clean HEAD after every commit and http://pastebin.ca/1258476

Comment #87

hswong3i commented 24 November 2008 at 18:58

Status	File	Size
new	dbtng-text2blob-1227552445.patch	13.44 KB

According to IRC review with chx, this issue is now integrate with following issues:

The following issues are still keep open because of other dependence:

#300203: Remap field as BLOB: {boxes}.body: depend on DBTNG syntax revamp #337212: DBTNG: Block module
#318650: Remap field as BLOB: {menu_router}.access_arguments and {menu_router}.page_arguments: depend on DBTNG syntax revamp #320510: [DBTNG] revamp menu.inc with DBTNG syntax
#313999: [DBTNG + BLOB + hash]: Revamp locales_source.source with BLOB and MD5 hashing: coming with DBTNG syntax revamp + hashing

Comment #88

25 November 2008 at 19:45

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Comment #89

hswong3i commented 27 November 2008 at 05:31

Status:

Needs work

» Needs review

Status	File	Size
new	blob-locales_source_hash-1227760668.patch	10.54 KB
new	blob-menu_router-arguments-1227761227.patch	35.6 KB
new	blob-boxes-body-1227763338.patch	15.59 KB
new	blob-text2blob-1227760222.patch	13.46 KB

All BLOB related patches. Reroll via CVS HEAD.

Comment #90

4 December 2008 at 11:30

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Comment #91

Crell commented 4 January 2010 at 06:46

Version:

7.x-dev

» 8.x-dev

I am not convinced about moving so many fields to BLOB, but since so much has changed in core since the last post I don't think much of this is even relevant anymore. It's also not changing in Drupal 7.

Comment #92

jhedstrom

English

Portland, OR

commented 14 April 2015 at 20:25

Version:	8.0.x-dev	» 8.1.x-dev
Issue tags:		+Needs issue summary update

Probably too late for 8.0.x now too.

Comment #93

14 April 2015 at 20:25

Version:

8.1.x-dev

» 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #94

mgifford

he/him

English

commented 24 March 2016 at 04:08

Assigned:

hswong3i

» Unassigned

Unassigning stale issue. Hopefully someone else will pursue this.

Comment #95

3 August 2016 at 00:04

Dries committed ae0392b on 8.3.x

- Patch #147947 by hswong3i: some files shouldn't defined as text:big...

Dries committed 08c059e on 8.3.x

- Rollback of patch #147947: lacks proper upgrade path.

Comment #96

3 August 2016 at 00:05

Dries committed ae0392b on 8.3.x

- Patch #147947 by hswong3i: some files shouldn't defined as text:big...

Dries committed 08c059e on 8.3.x

- Rollback of patch #147947: lacks proper upgrade path.

Comment #97

3 August 2016 at 00:05

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #98

27 January 2017 at 17:53

Dries committed ae0392b on 8.4.x

- Patch #147947 by hswong3i: some files shouldn't defined as text:big...

Dries committed 08c059e on 8.4.x

- Rollback of patch #147947: lacks proper upgrade path.

Comment #99

27 January 2017 at 17:52

Dries committed ae0392b on 8.4.x

- Patch #147947 by hswong3i: some files shouldn't defined as text:big...

Dries committed 08c059e on 8.4.x

- Rollback of patch #147947: lacks proper upgrade path.

Comment #100

27 January 2017 at 17:52

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #101

27 January 2017 at 17:52

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #102

27 January 2017 at 17:52

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #103

27 January 2017 at 17:52

Version:

8.6.x-dev

» 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #104

27 January 2017 at 17:52

Version:

8.7.x-dev

» 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #105

27 January 2017 at 17:52

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #106

20 March 2020 at 16:52

Dries committed ae0392b on 9.1.x

- Patch #147947 by hswong3i: some files shouldn't defined as text:big...

Dries committed 08c059e on 9.1.x

- Rollback of patch #147947: lacks proper upgrade path.

Comment #107

daffie commented 4 April 2020 at 08:35

Status:

Needs work

» Fixed

Comment #108

daffie commented 4 April 2020 at 08:43

Status:

Fixed

» Closed (fixed)