Replace some TEXT:BIG with BLOB, based on cross database compatibility concern
hswong3i - May 30, 2007 - 10:04
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | database system |
| Category: | task |
| Priority: | critical |
| Assigned: | hswong3i |
| Status: | patch (code needs review) |
Description
currently (from nightly development snapshot from HEAD on 2007-05-30), the following fields are defined as text:big:
dc:/tmp/drupal-6.x-dev# find . -type f -name '*.schema' | xargs fgrep -nH 'text' | grep 'big'
./modules/search/search.schema:9: 'data' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big')
./modules/user/user.schema:30: 'perm' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/user/user.schema:65: 'data' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/node/node.schema:72: 'body' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:73: 'teaser' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:74: 'log' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:10: 'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:77: 'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/drupal/drupal.schema:11: 'slogan' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/drupal/drupal.schema:12: 'mission' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/system/system.schema:10: 'batch' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/system/system.schema:153: 'session' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/system/system.schema:194: 'value' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/dblog/dblog.schema:10: 'message' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/dblog/dblog.schema:11: 'variables' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/block/block.schema:39: 'body' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/comment/comment.schema:12: 'comment' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/comment/comment.schema:19: 'users' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:9: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:40: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:41: 'image' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:60: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:9: 'recipients' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:10: 'reply' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),as mysql driver map text:big as longtext, which support for maximum 4GB size of data, it seems not reasonable to preform any comparison with such fields.
on the other hand, since some fields are really required to preform some testing within SQL directly, i will suggest change them into text:medium, or even smaller. so developer can simply guess that text:big should ONLY use as storage, but not for any SQL comparison.
finally, as oracle only provide VARCHAR2: maximum 4000 characters, can compare, and CLOB: maximum 4GB, can't compare, oracle driver will only map text:big as CLOB for maximum flexibly of sizing, and keep others as VARCHAR2(4000) for maximum compatibility. only keep required fields as text:big will greatly increase cross DB compatibility.

#1
from my point of view, at least the following fields should keep as text:big, as we are not expected to preform any direct action to it, in whatever SQL or PHP level:
./modules/node/node.schema:72: 'body' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),./modules/node/node.schema:73: 'teaser' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:74: 'log' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:10: 'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:77: 'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/block/block.schema:39: 'body' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:9: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:40: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:60: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
#2
Havng big is utterly pointless, medium is already 16M, can you imagine what would happen with poor PHP when you load a node which has a body with more (lot more) than 16M byte?
#3
16M or not it not the only consideration: i am trying to mark
text:bigas a "storage only" type, so whenever developer trying to use it, keep this in mind: don't do comparison to it.this clear instruction can increase the usability of code, and also decrease the difficulty of debug. as mentioned in #2,
text:mediumis totally enough for daily usage.just feel free to let text:big act as symbolic label, set up a suitable "rule of game", and make use of it ;)
#4
this patch change some fields from
text:bigintotext:mediumin a simple reason: they are not target as "storage only". after apply this patch, will only left the following fields astext:big, which also means not expected to take any comparison toward it:dc:/tmp/drupal-6.x-dev# find . -type f -name '*.schema' | xargs fgrep -nH 'text' | grep 'big'./modules/node/node.schema:72: 'body' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:73: 'teaser' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:74: 'log' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:10: 'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:77: 'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/drupal/drupal.schema:11: 'slogan' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/drupal/drupal.schema:12: 'mission' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/block/block.schema:39: 'body' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/comment/comment.schema:12: 'comment' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:9: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:40: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:60: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:10: 'reply' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
#5
What is the problem with 'big' and 'Oracle'? I'd suspect they go hand in hand. :)
Care to shed some light on why this is a problem? I'd like to learn a bit more about this/Oracle.
#6
on the other hand, a lot of additional handling are required for CLOB (http://php.net/oci8). you can seems oracle CLOB as a completely different data type from daily used text type under mysql: mysql seems longtext and text as the same, and so you will even able to run GROUP BY and ORDER BY with it, even it is in 4GB size!
BTW it is also seems reasonable: if you are going to do some SQL comparison, fine, use VARCHAR2 and take action within reasonable size for performance consideration; if you need space, fine, i give CLOB to you, but don't ask for any others ;(
and the truth is: i hate oracle's string handling!
#7
seems reasonable to me
#8
Approved by Moshe, committed by Dries. :)
#9
#10
This patch changes the schema without updating existing databases to match the changes. When upgrading from Drupal 5 to 6, it appears that 13 tables end up not matching the schema. So, either this patch needs to be reverted or existing tables need to be updated.
I did not know about this patch until today. I haven't yet thought about the basic point of this patch thoroughly. Offhand, I don't like the idea that text:big is semantically different than text:medium on some database engines. The type 'text' should be semantically the same (i.e. comparable or not comparable) on all platforms regardless of the size which, as per the docs, is only a hint anyway. So if Oracle can only handle 4000 bytes for a comparable text:big, then that's all text:big can be on Oracle. If there is a database column that can't accept being limited to 4000 bytes on Oracle, it should use type blob or some other type that we define as "big enough but not comparable." If there is a database column that must be longer than 4000 bytes *and* must be comparable, then perhaps that functionality is just not implementable with Oracle (which I find hard to believe). However, as I said, I have not thought this through thoroughly yet.
I would also like to propose two things:
1. All patches to .install or .schema files must be tested with schema.module and produce a zero-mismatch comparison report before they can be committed.
2. All bugs which result in a schema comparison mismatch should be considered critical. We should not accept a bug which results in a database not matching the results of hook_schema. To this end, I have marked this issue as critical. If core committers disagree, they can downgrade it.
#11
i just find some useful information, and hope to share according to this topic (from http://phplens.com/lens/adodb/docs-oracle.htm):
i also found that such special handling of LOBs is not only exists within oracle driver of ADOdb, but also for that of postgres, odbtp, sqlanywhere, informix72, ibase, odbc, mssql and db2:
dc:/home/hswong3i/project/adodb# find drivers/ -type f | xargs fgrep -nH 'function UpdateBlob'drivers/adodb-postgres64.inc.php:347: function UpdateBlobFile($table,$column,$path,$where,$blobtype='BLOB')
drivers/adodb-postgres64.inc.php:442: function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-odbtp.inc.php:502: function UpdateBlob($table,$column,$val,$where,$blobtype='image')
drivers/adodb-sqlanywhere.inc.php:144: function UpdateBlob($table,$column,&$val,$where,$blobtype='BLOB')
drivers/adodb-informix72.inc.php:239: function UpdateBlob($table, $column, $val, $where, $blobtype = 'BLOB')
drivers/adodb-ibase.inc.php:597: function UpdateBlobFile($table,$column,$path,$where,$blobtype='BLOB')
drivers/adodb-ibase.inc.php:625: function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-odbc.inc.php:567: function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-oci8.inc.php:675: function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-oci8.inc.php:709: function UpdateBlobFile($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-mssql.inc.php:645: function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
drivers/adodb-db2.inc.php:678: function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
ok, let's back to the topic:
this is not the consideration of oracle driver implementation only, but also a cross DB compatibility concern. ADOdb try it, and come out with current solution. i don't really think we can do much better than that of ADOdb for Drupal: ADOdb is a project that starting from Sept 9 2000. maybe we should reuse their progress, and skip something that we are not really needed for, according to performance consideration.
#12
this patch add a new data type "CLOB" into drupal-6.x-dev schema API, and 2 more APIs: db_update_blob() and db_upadte_clob(). according to this change, some core module update its schema from "text:big" to "CLOB" type, with using new APIs for updating its values, e.g. node->body. NULL should input as placeholder for %b and %c during traditional db_query() insert/update query. On the other hand, this patch also rollback some field from "text:medium" into "text:big" for backward compatibility.
as bjaspan mention, all "text" types should remain same behavior among different DB backend: it should be able for compare. "text:big" should not be the special among the others, within oracle driver handling. the new data type "CLOB" can clearly explain its property as: character storage, large enough, but never directly using it for SQL-level compare. therefore developer can have a more strict guild line for their core/contribute development.
BTW, after studying about ADOdb cross DB handling, i found that the handling problem of character large object (CLOB) is not only appear within Oracle, but also among some other DBs. ADOdb propose another method, rather than insert/update CLOB directly: it provide additional API call UpdateBlob() for such LOBs update, after normal insert/update query. it seems to be a good idea, as we don't need to do additional query rewrite (which also means a complete SQL parser and rebuilder, which seems to be mission impossible) for CLOB handling.
additional benefit: as mysql version map "clob" into "longtext" and "text", where pgsql map this to "text", the add of additional API will not affect contribute module development, unless they change their module's schema, and choose to use CLOB for data storage.
P.S.: little bit change about %b handling: will insert/update NULL as placeholder within db_query(), for further more value update by using db_update_blob() and db_update_clob() manually. therefore cache.module require a small change about current blob handling. it may also affect other contribute modules which are currently using BLOB. BTW, as BLOB is not always use, this may not be a great problem.
#13
#14
this patch standardize drupal BLOB/CLOB handling for cross DB compatibility, e.g. db_update_* for LOBs value update, db_encode_*/db_decode_* for LOBs value encoding/decoding. change incude:
for current stage, it is only a dummy mapping for "clob" type among mysql/pgsql, where mapping it as like as that of "text" type. BTW, it will greatly decrease the difficulty for other DB driver implementation, e.g. oralce, mssql and db2.
this patch only change the handling for LOBs type. therefore, unless developer choose to use BLOB/CLOB type, nothing will affect them. on the other hand, if developer choose to use LOBs type, which means that they will have (nearly) unlimited storage size, BTW trade off some query level features, e.g. using GROUP BY, ORDER BY, WHERE, etc.
#15
minor update, patch with latest CVS, fix some pgsql bugs.
complete tested with mysql + mysqli, pass though all core + optional modules installation. patched modules are all functioning :)
#16
@Dries: I marked this RTBC because the patch from comment #4 which was previously committed needs to be rolled back (the author agrees with this as stated in #11). As of today, it still reverse-applies to HEAD. Once the patch is rolled back, the issue should be set to CNW so further development can occur.
@hswong3i: I briefly reviewed (but did not test) your patch and in general it looks good. However, this is clearly an API-changing feature enhancement and, as such, cannot be committed now that D6 has reached code-freeze. We should continue working on this for D7. I realize this means that Oracle support may be (once again) limited for D6 without core patches, but in my opinion we cannot risk slipping in a change like this at this point in the cycle. If Dries/others disagrees, I'm certainly open to reviewing the patch more carefully.
I noticed one problem: db_update_clob/blob() is written to expect '{tablename}' instead of 'tablename' like the other schema api functions. The {tablename} notation is only for when table names are embedded in SQL. Otherwise, confusion will result.
#17
MsSQL introduced in Ms SQL 2005 a varchar(max) data type to solve this CLOB comparison problem... so only older MsSQL Versions have this problem... maybe Oracle works on a new datatype, too?
#18
@bjaspan: i agree that it should be a bit difficult to commit this patch together with drupal-6.x-dev after code freeze, even though it is mainly a API enhancement. BTW, as oracle driver is already closed to drupal-6.x-dev, i will suggest to review the chance of accepting this enhancement. on the other hand, unless developer choose to use the new LOBs handling, none of their code need to be review.
the API of db_update_clob/blob() is still a prototype, we still can change it :)
@hass: hopelessly, oracle don't have such new data type, please refer to http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_eleme... for more information :)
#19
I've rolled back the old patch so we can work more on the new patch.
#20
Aside if we get this solved for Oracle, we will not have any troubles with MsSQL 2000, too :-).
#21
patch for latest CVS, update db_update_clob/blob(), update some document.
@bjaspan: now db_update_clob/blob() will accept real table name directly.
@hass: yes, as i know that is the problem for MSSQL 2000, if oracle driver solve it, MSSQL (and some other else) will also benefit from it :)
#22
patch according latest CVS, update handling for block.module and contact.module, fix minor bug for comment.module.
completely tested modules: cache, aggregator, block, comment, contact and node (all patched modules)
#23
minor update according to v0.7, update code which more make senses with argument input.
#24
updated patch for latest CVS.
#25
version 1.1 update according to latest CVS, with following enhancements:
since this patch is required by latest oracle driver implementation (http://drupal.org/node/39260), please kindly review the chance of accepting this patch after drupal-6.x-dev code freeze, many thanks :)
#26
I had a look at the patch and here are some comments:
1. In general, I think it makes sense to differentiate between 'binary data' and 'character data'.
2.
<?php+ * NOTE: %b and %c should input NULL as placeholder, and further more
+ * update its value manually by using db_update_blob() or db_update_clob().
?>
3. I don't understand why we need db_update_clob() and db_update_blob() functions? If there is no critical need for these functions, I'd prefer to remove them, and to let people write regular db_query("UPDATE ...") statements. If these functions are critical (rather than just handy), document in the code why they are critical.
4.
<?php- return "'". mysql_real_escape_string($data, $active_db) ."'";
+ return !is_null($data) ? "'". mysql_real_escape_string($data, $active_db) ."'" : "''";
?>
That's all for now. Looks promising but needs some more work around the edges. Let's give this patch some prime time so we might be able to include it in D6.
#27
most of the change are function hook for oracle driver, or other driver backend that require special handling to LOBs data.
2 & 3 & 4. for the case of oracle, we can't simply insert a "NULL" value for INSERT as placeholder, but EMPTY_BLOB() or EMPTY_CLOB(). this is because we can't do any update to a LOBs field with NULL value, but EMPTY_BLOB()/EMPTY_CLOB().
on the other hand, we also don't need to do any input value escape to input LOBs value as it will be handled by LOBs->save().
for example, in oracle driver:
<?phpfunction db_encode_blob($data) {
return !is_null($data) ? $data : "EMPTY_BLOB()";
}
function db_encode_clob($data) {
return !is_null($data) ? $data : "EMPTY_CLOB()";
}
?>
therefore, take block.module as example, the insert query will become:
INSERT INTO {boxes} (body, info, format) VALUES (EMPTY_CLOB(), '%s', %d)the update query will become:
SELECT body FROM boxes WHERE bid = %d FOR UPDATEafter we fetch the pointer for "body", we can then run LOBs->save() and update its value:
<?php
// Fetch the SELECTed rows
$rows = array();
while ($row = oci_fetch_assoc($last_result)) {
// Note: Oracle returns all field names in uppercase and associative
// indices in the result array will be uppercased too. We reset it back
// to lower case in here.
$row = array_change_key_case($row, CASE_LOWER);
// Check if LOB object, discard the existing contents, and save a value to it.
if (!(is_object($row[$column]) && $row[$column]->truncate() && $row[$column]->save($value))) {
oci_rollback($active_db);
return FALSE;
}
$rows[] = $row;
}
?>
for more detail reason of this API changes, please compare the differences between mysql/pgsql handling with that of oracle. i feel so sorry to insert some dummy handling to mysql/pgsql version, but this give a great chance for other DB driver implementation :)
#28
Don't forget to tell why we need db_update_clob() and db_update_blob() functions. There are differences in the update command for different DB's. In MsSQL there is a special UPDATETEXT, so we are better to know what we are updating. Additional you sometimes need to use pointers syntax for updating TEXT. Maybe other DBs have different update functions for updated text and binary data. So it's better to split, isn't it?
#29
@hass: great! you give us the answer that we are seeding for: different DB have different LOBs update method, and so we better split it out, and handle it with independent function API. seems db_update_*() can really help in MsSQL too :)
#30
patch according to latest CVS, update some document.
@Dries: i detail the document according to your questions:
2. since oracle will save pointer of LObs into DB table, and further more fetch value by calling LOB->save/load(), NULL (none of pointer) != EMPTY_CLOB() (pointer with empty value, able to use LOB->save/load()). the document update as:
<?php+ * NOTE: %b and %c should input NULL as placeholder, and further more
+ * update its value manually by using db_update_blob() or db_update_clob().
+ *
+ * This is because different database have different implementation about
+ * empty LOBs field. e.g. as MySQL/PgSQL map CLOB to general text type, NULL
+ * input will simply replace as '' (empty string). Update action will also
+ * be a simple UPDATE query.
+ *
+ * On the other hand, as Oracle will save a pointer of CLOB into database,
+ * and update its value by LOB->save/load(), a NULL value means none of pointer
+ * (which not able to update its value). Therefore NULL input will replace as
+ * 'EMPTY_CLOB()' (a placeholder for "CLOB pointer with empty value") for
+ * further more update.
?>
3. similar reason as 2.: since oracle save a point for LOBs value into table, update of its value require additional handling, rather than a simple UPDATE query, an isolated APIs for LOBs value update are requred.
e.g. case of oracle driver: first of all, we need to fetch out LOB's pointer, by SELECT ... FOR UPDATE (current oracle implementation); secondly, start the transaction by oci_execute($last_result, OCI_DEFAULT); try to truncate old value, and then save its new value; rollback whenever face error, or commit transaction if success. it is very complicated, and should be handle individually :)
document of db_update_clob(), for mysql/mysqli, update as:
<?php* NOTE: Since MySQL support direct read/write to LOBs value by INSERT
* or UPDATE query, this function will become a simple wrapper of UPDATE
* query.
?>
4. the different will be shown by comparing MySQL handling with Oracle's. for MySQL, CLOB is a dummy mapping to longtext, so handle as:
<?phpfunction db_encode_clob($data) {
global $active_db;
return !is_null($data) ? "'". mysql_real_escape_string($data, $active_db) ."'" : "''";
}
?>
BTW for oracle, no escape is required for input value (will handle by LOB->save()), but require EMPTY_CLOB() as placeholder during INSERT/UPDATE query (as stated in 2.), code will become:
<?phpfunction db_encode_clob($data) {
return !is_null($data) ? $data : "EMPTY_CLOB()";
}
?>
#31
#32
I'll try to investigate this issue some more as soon time permits. In the mean time, it would be helpful if someone could help you massage the documentation in proper English. It's a little rough around the edges, but nothing we shouldn't be able to help you with. Thanks for the hard work, I'll get to it soon..
#33
@Dries: document updated. detail of changes: http://rafb.net/p/WGZMcE39.html
#34
according to my previous benchmarking result, there is no performance different when comparing LOBs patched with CVS HEAD (at least in case of MySQL). impact for API update should be the only concern of this patch.
on the other hand, with LOBs patch helps, oracle driver (and some other databases with similar physical limitation in VARCHAR storage sizing) will able to implement in a much flexible style, e.g. we are now able to store node body with more than 4000 characters :)
i have no further more suggestion or idea about this patch. LOBs patch is the major dependence of oracle driver for drupal-6.x-dev, based on functionality concern. hope someone may help about its coding or documents improvement :)
#35
patch based on latest CVS HEAD, update comment.module handling.
#36
when studying DB2, IBM's DB2 + Drupal document, and reference to ADOdb for existing implement, i found the following message about BLOB handling from ADOdb:
<?php
/*
Insert a null into the blob field of the table first.
Then use UpdateBlob to store the blob.
Usage:
$conn->Execute('INSERT INTO blobtable (id, blobcol) VALUES (1, null)');
$conn->UpdateBlob('blobtable','blobcol',$blob,'id=1');
*/
function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
{
return $this->Execute("UPDATE $table SET $column=? WHERE $where",array($val)) != false;
}
?>
i think this prove the necessary of LOBs patch: different database comes with different LOBs handling, sometime we need to insert special placeholder during INSERT/UPDATE, and further more update its value by special API :)
#37
and here is the case of MSSQL, within ADOdb implementation:
<?php
/*
Unfortunately, it appears that mssql cannot handle varbinary > 255 chars
So all your blobs must be of type "image".
Remember to set in php.ini the following...
; Valid range 0 - 2147483647. Default = 4096.
mssql.textlimit = 0 ; zero to pass through
; Valid range 0 - 2147483647. Default = 4096.
mssql.textsize = 0 ; zero to pass through
*/
function UpdateBlob($table,$column,$val,$where,$blobtype='BLOB')
{
if (strtoupper($blobtype) == 'CLOB') {
$sql = "UPDATE $table SET $column='" . $val . "' WHERE $where";
return $this->Execute($sql) != false;
}
$sql = "UPDATE $table SET $column=0x".bin2hex($val)." WHERE $where";
return $this->Execute($sql) != false;
}
?>
#38
minor update: patch for latest CVS HEAD, a bit document update, and update node_save() handling based on latest changes :)
P.S. i found this is very helpful when implementing both oracle and DB2 driver. as ADOdb and PDO both come with similar handling, and this patch will change programming logic only if developer choose to use CLOB for (almost) unlimited storage size, i guess it should ready to commit with no question. hope someone may help about its review :)
P.P.S. world is always not such perfect: oracle comes with VARCHAR2 (max. 4000 characters) and CLOB (max. 128TB), where DB comes with VARCHAR (max. 32672 characters) and BLOB/CLOB (max. 2GB). it is not easy to implement MySQL's text:big => longtext mapping for similar functionality: able to use for compare (VARCHAR2/VARCHAR), and also comes with unlimited size (CLOB). we should split size concern (clob) from functionality concern (text), and preform isolated handling :)
#39
2GB or more shouldn't be an issue... i think - no - site will have inline images or website pages of such a size. And people saving images in blobs should be beaten... :-)
Will this patch land for D6? I don't like to spend time on a "database.mssql.inc" if this patch never goes in...
#40
@hass: i point out DB2's CLOB size just because i hope to compare it with MySQL. in case of DB2, we need to choose VARCHAR for functionality, or CLOB for size. neither one can clone MySQL's text:big => longtext handling, we need some trade off :(
P.S. MSSQL is also within my schedule, after Oracle and DB2. this is because we have ZendCore for both Oracle and DB2, so i don't need to spend too much time for system setup :)
#41
patch update based on latest CVS HEAD, with minor code change in node_save(). completely tested with MySQL.
#42
patch for latest CVS
#43
after an indeed studying about ADOdb and PDO database abstract layer, i found that a special LOBs value handling API is almost a must, based on cross database compatibility concern. here is a simple definition of LOBs value, quote from PDO (http://hk.php.net/pdo):
in case of ADOdb, LOBs value will handle by: 1. normal INSERT query, with LOBs value placeholder (NULL for most database, EMPTY_BLOB/EMPTY_CLOB for Oracle); 2. manually update LOBs value by calling
UpdateBlob. here is a simple code snippet for CLOB handling (reference to http://phplens.com/lens/adodb/docs-oracle.htm):<?php$ok = $db->Execute("insert into aTable (id, name, aclob)
values (aSequence.nextVal, 'Name', null)");
if (!$ok) return LogError($db->ErrorMsg());
$db->UpdateClob('aTable', 'aclob', $clobValue, 'id=aSequence.currVal');
?>
in case of PDO, LOBs value will handle by: 1. normal INSERT query, with placeholder as "?"; 2. prepare and bind each value with placeholder, by calling
bindParam()(NOTE: you will need to define data_type as PARAM_LOB explicitly for LOBs value binding); 3. execute transaction. here is a simple code snippet for inserting an image into a database by using PDO (reference to http://hk2.php.net/pdo):<?php
$db = new PDO('odbc:SAMPLE', 'db2inst1', 'ibmdb2');
$stmt = $db->prepare("insert into images (id, contenttype, imagedata) values (?, ?, ?)");
$id = get_new_id(); // some function to allocate a new ID
// assume that we are running as part of a file upload form
// You can find more information in the PHP documentation
$fp = fopen($_FILES['file']['tmp_name'], 'rb');
$stmt->bindParam(1, $id);
$stmt->bindParam(2, $_FILES['file']['type']);
$stmt->bindParam(3, $fp, PDO::PARAM_LOB);
$stmt->beginTransaction();
$stmt->execute();
$stmt->commit();
?>
so what's the case of this patch? case is just similar as both ADOdb and PDO: 1. normal INSERT/UPDATE query, with NULL as LOBs value placeholder (it is because db_encode_blob() and db_encode_clob() will translate this NULL as suitable under layer placeholder); 2. manually update LOBs value by calling
db_update_blob()ordb_update_clob(). here is a simple code snippet for block content insertion:<?phpdb_query("UPDATE {boxes} SET body = %c, info = '%s', format = %d WHERE bid = %d", NULL, $edit['info'], $edit['format'], $delta);
db_update_clob('bid = %d', $delta, db_prefix_tables('{boxes}'), 'body', $edit['body']);
?>
my research result is very simple: a special API is the best, common, and strict forward solution for LOBs value handling. even we may working with PDO in D7 or later (with array-style variable binding), it is also a must, for a hook which telling under layer what type of data we are now inserting, or else we can NEVER handle LOBs value insertion correctly.
moreover, that is no point to revamp our database API, even we are now changing our backend as PDO: our database API is powerful and suitable for most cases. PDO should only act as another under layer database driver supporting (maybe call as database.pdo.inc, or some other else). PDO SHOULDN'T be a replacement of existing implementation.
anyway, if we are just focusing on how PDO can benefit on performance when using with MySQL/PgSQL (which own more than 98% of our clients), but simply forget its powerful cross database ability, this will seems a bit silly... as we take a lot of effort on schema API, database abstract layer, etc, based on cross database concern, this idea is just a step backward ;p
#44
Since I suspect the latest comment is aimed at me based on a conversation in IRC earlier today, I'll jump in here. :-)
First off, no, the PHP 4/MySQL 3-era mysql driver is not at all robust or secure. That we layer a regex on top of it that is as slow as the much more robust and complex code in PDO is ample evidence of that. As long as our lowest common denominator is the mysql extension, we can't do anything interesting in the database system. mysqli is also not part of the standard install of PHP, and many shared hosts don't offer it. (mysql isn't either as of PHP 5.0, but most hosts install it for legacy support.) We can't rely on mysqli for MySQL support, which is what the vast majority of Drupal sites run on. That leaves PDO, which is reasonably cross-database (at the API level at least), reasonably fast (although not as fast as native APIs, it's still not a dog because it's all in C), and, most importantly, we can rely on it being present in any PHP 5 install as it's been part of the default configuration since PHP 5.1. That alone makes it a worthwhile move for D7, even if it weren't for easier Oracle/DB2/MS SQL support.
For data-modification queries, actually, in the D7 API as I'm currently building it you should never, ever use db_query() for an insert/update/delete statement. You should instead do
<?phpdb_insert('node', array('fieldname'=>$value));
?>
Aside from being easier to work with and more robust, it means you're passing not a serialized string to the database system but a data structure. Within the database driver, if necessary, you can do additional escaping, LOB handling, etc. without worrying about regexes and without putting extra work on the module developer.
Which brings up the next point. Right now, virtually all Drupal sites run MySQL or Postgres, with MySQL far and away the leader. Most Drupal developers know from MySQL primarily, I wager. The concept of LOBs and BLOBs is foreign to them, and many will likely see the necessity of it as a bug in those databases. (There are times when I do, too.) We should not build an API around a database that on a good day may be 2-3% of all Drupal installs for the foreseeable future. Most developers won't even have any way to test it to see if they're doing it "right". (How many people here have a copy of Oracle just lying around? :-) )
That sort of logic belongs inside the Oracle/DB2 driver, not exposed to every module developer to be able to forget or get wrong. If we can make that driver's job easier without making the API substantially more complex, we should do so. But not if it means adding a lot of needless, untestable, easy to break work for the vast majority of Drupal developers who are MySQL-based.
#45
Most databases are less then 2 or 4 GB... there are free versions of Oracle and MSSQL. Aside if i have the chance to get it running on our MSSQL2K5 active/active cluster i would move there... why should i spend my time on administering two different DBs if i can do it with one. Today we have simply no choice - eat or die with MySQL or PgSQL.
#46
@Crell: i would like to have more details about how can your abstract layer handle LOBs value correctly? i have no question about inserting data with your array-style input, but where are you obtaining the information of these data type? it will be a bit not acceptable if: "as most data type will bind automatically and correctly by using PDO (unless LOBs, which need for explicitly defined), so we will simply forget LOBs handling in D7"...
@hass: good point. e.g. my client require for Oracle, due to political concern. therefore drupal will NEVER become their choice, as we are not supporting Oracle natively... (even though, we are suitable based on functional concern...)
#47
@Crell: moreover, i am not opposing PDO supporting in D7/8/..., but try to figure out those common problems that we both need to due with. as we are both working for database abstract layer, we should facing with similar problem, no matter we are working with ADOdb or PDO ;)
e.g. db_num_rows() is a common problem among different database. this is not a problem of database abstract layer implementation, but dependent on database natively: some database won't pass the complete result set to client side after SELECT query; on the other hand, pass each row from result set when calling for fetch action (Oracle/DB2). since PDO working as a cross database abstract layer (as what i am now doing, but user level), which can't solve this problem natively, they try to REMOVE this support.
this is just similar as case of LOBs handling: since special API is a must, so we ADD this support. both ADOdb and PDO (moreover, this patch, too) propose for similar API, in order to due with similar difficulty.
LOBs handling is a common problem that we both need to solve; on the other hand, the successful of this patch should also equal to the successful of PDO, in coming future ;)
#48
After an indeed studying about differences between ADOdb, PDO and Drupal 6.x database abstract layer, it show that we are still missing a complete and standardized LOBs handling API, based on cross database compatibility concern (besides MySQL/PgSQL).
Drupal 6.x existing database abstract layer provide db_encode_blob() and db_encode_blob() for BLOB handling only, as PgSQL require special API for BLOB encode/decode (In case of MySQL, it is a dummy abstraction, as MySQL seems BLOB data as like as normal string). BTW, this is not the complete picture of LOBs handling and its difficulties: besides BLOB, it is also a critical concern about limitation on CHAR/VARCHAR/CLOB (Character Large OBject) among most database; moreover, method of LOBs INSERT/UPDATE is also database specific.
LOBs patch for Drupal 6.x (http://drupal.org/node/147947) will able to fill in this missing section, extent and complete our LOBs API functionality, with handy procedures. It should be the most common and strict forward solution when dueling with LOBs related difficulties. On the other hand, it shouldn't be a confusion for normal MySQL users, as it is very closed to the case of existing db_encode_blob() and db_decode_blob() implementation.
We shouldn't miss the chances, but our chances are waiting for this patch :)
(To view the thread, please navigate to http://edin.no-ip.com/html/?q=node/302)
#49
as i am now indeed studying about PDO + MySQL/Oracle/DB2, i find that this patch would also be a very important functional hook for our next generation database abstract layer development, based on cross database concern (even we may move to PDO, as this is a database-level but not database-abstraction-level problem). on the other hand, this patch is stable enough since July 24, 2007, and it functionality is already fully proved by the successful of Oracle/DB2 driver development. it should be RTBC.
#50
patch via latest CVS HEAD
#51
As it is now one day before D6 beta, it seems that not easy to commit this patch on time, together with Oracle and DM2 supporting. Even though it is a bit pity (as all are ready for production), I am now merging this patch into another technical preview of D7 database abstract layer, too (http://drupal.org/node/172541). I will keep on promoting this LOBs handling in our next generation of database abstraction layer.
BTW, if you are looking for unofficial Oracle (http://drupal.org/node/39260) and DB2 (http://drupal.org/node/165788) supporting within D6, please refer to their own issue. I will merge this patch (maybe some other additional patches) with those issues. On the other hand, I will provide patches based on latest CVS HEAD of D6, within D6 life cycle. So you will able to have those unofficial database supporting with a single patch apply :)
#52
That's unbelievable bad information... i hoped to see this committed :-(.
#53
@hass: yes, how pity it is... so let's postpone it into D7?
i will provide unofficial patch for both oracle and db2 within D6 life cycle, so people who are interested in this topic can get such supporting ON TIME, but not sit and waiting for D7 comes. i guess most of the projects needed for such unofficial patch are enterprise-related project, such little tailor-made handling may be acceptable for their professional and skillful developer :)
P.S. the main idea is: i will also benefit on this combined unofficial patch handling. i am now scheduled to move my personal Drupal site into Oracle with D6, and so i will able to keep on develop and test it daily. a single patch can save a lot of our time, during update ;p
#54
I'm not going to use unofficial patches. I have too many dependencies with modules and so on and i don't like to patch them all myself if they are buggy. This makes me going the mysql way another year or more, what suxxx much.
#55
@hass: let's take it easy :)
i will not force anyone to use this unofficial patch: as it is unofficial, no one can generate about it correctness, and it is always not targeting for our end users. e.g. you will never use Debian's unstable release for production purpose, but only stable, isn't it?
BTW, we will always need time to prove its correctness. we need helpers (or better says as HACKERS ;p) to test the implementation, and provide feedback, before it can finally goes into core. as proved by the fail of this patch within D6, it will for sure to be another pity issue, if we start the development TOGETHER with the open of D7: it will be too late once again :(
a simple and handy unofficial patch may speed up the development, on the other hand give a big hand to those require for such features ON TIME. so please just let it be :)
#56
#57
As Siren is not a project on Drupal.org but an unofficial fork, I am removing it from issue titles to avoid confusion.
#58
Even though DB2 don't explicitly indicate the limitation of VARCHAR as like as that for Oracle (Oracle will only allow for VARCHAR2 with max 4000 characters; above this limitation, please consider to use CLOB with TB-scale storage), it is still logically limited as 4000 characters.
A minor footnote from IBM DB2:
This should be a good example why CLOB is something necessary for most database implementation. We are not always as lucky as case of MySQL and PostgreSQL, which we are able to work with TEXT for (nearly) unlimited size. On the other hand, we have CLOB, which usually required for special handling, e.g. database dependent INSERT/UPDATE/DELETE abstraction, decode CLOB as like as BLOB when fetching data, etc.
Since we are now having drupal_write_record() for high level programming abstraction, and (may) have INSERT/UPDATE/DELETE API for driver level abstraction, CLOB should be something much simpler than case of before. Moreover, this logic is now proved as function with Siren: Siren is able to work with Oracle CLOB, on the other hand don't introduce complicated abstraction to MySQL and PostgreSQL.
#5