currently (from nightly development snapshot from HEAD on 2007-05-30), the following fields are defined as text:big
:
dc:/tmp/drupal-6.x-dev# find . -type f -name '*.schema' | xargs fgrep -nH 'text' | grep 'big'
./modules/search/search.schema:9: 'data' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big')
./modules/user/user.schema:30: 'perm' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/user/user.schema:65: 'data' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/node/node.schema:72: 'body' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:73: 'teaser' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/node/node.schema:74: 'log' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:10: 'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/taxonomy/taxonomy.schema:77: 'description' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/drupal/drupal.schema:11: 'slogan' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/drupal/drupal.schema:12: 'mission' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/system/system.schema:10: 'batch' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/system/system.schema:153: 'session' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big')
./modules/system/system.schema:194: 'value' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/dblog/dblog.schema:10: 'message' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/dblog/dblog.schema:11: 'variables' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/block/block.schema:39: 'body' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/comment/comment.schema:12: 'comment' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/comment/comment.schema:19: 'users' => array('type' => 'text', 'not null' => FALSE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:9: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:40: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:41: 'image' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/aggregator/aggregator.schema:60: 'description' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:9: 'recipients' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
./modules/contact/contact.schema:10: 'reply' => array('type' => 'text', 'not null' => TRUE, 'size' => 'big'),
as mysql driver map text:big as longtext, which support for maximum 4GB size of data, it seems not reasonable to preform any comparison with such fields.
on the other hand, since some fields are really required to preform some testing within SQL directly, i will suggest change them into text:medium, or even smaller. so developer can simply guess that text:big should ONLY use as storage, but not for any SQL comparison.
finally, as oracle only provide VARCHAR2: maximum 4000 characters, can compare, and CLOB: maximum 4GB, can't compare, oracle driver will only map text:big as CLOB for maximum flexibly of sizing, and keep others as VARCHAR2(4000) for maximum compatibility. only keep required fields as text:big will greatly increase cross DB compatibility.
Comment | File | Size | Author |
---|---|---|---|
#89 | blob-text2blob-1227760222.patch | 13.46 KB | hswong3i |
#89 | blob-boxes-body-1227763338.patch | 15.59 KB | hswong3i |
#89 | blob-menu_router-arguments-1227761227.patch | 35.6 KB | hswong3i |
#89 | blob-locales_source_hash-1227760668.patch | 10.54 KB | hswong3i |
#87 | dbtng-text2blob-1227552445.patch | 13.44 KB | hswong3i |
Comments
Comment #1
hswong3i CreditAttribution: hswong3i commentedfrom my point of view, at least the following fields should keep as text:big, as we are not expected to preform any direct action to it, in whatever SQL or PHP level:
Comment #2
chx CreditAttribution: chx commentedHavng big is utterly pointless, medium is already 16M, can you imagine what would happen with poor PHP when you load a node which has a body with more (lot more) than 16M byte?
Comment #3
hswong3i CreditAttribution: hswong3i commented16M or not it not the only consideration: i am trying to mark
text:big
as a "storage only" type, so whenever developer trying to use it, keep this in mind: don't do comparison to it.this clear instruction can increase the usability of code, and also decrease the difficulty of debug. as mentioned in #2,
text:medium
is totally enough for daily usage.just feel free to let text:big act as symbolic label, set up a suitable "rule of game", and make use of it ;)
Comment #4
hswong3i CreditAttribution: hswong3i commentedthis patch change some fields from
text:big
intotext:medium
in a simple reason: they are not target as "storage only". after apply this patch, will only left the following fields astext:big
, which also means not expected to take any comparison toward it:Comment #5
Dries CreditAttribution: Dries commentedWhat is the problem with 'big' and 'Oracle'? I'd suspect they go hand in hand. :)
Care to shed some light on why this is a problem? I'd like to learn a bit more about this/Oracle.
Comment #6
hswong3i CreditAttribution: hswong3i commentedon the other hand, a lot of additional handling are required for CLOB (http://php.net/oci8). you can seems oracle CLOB as a completely different data type from daily used text type under mysql: mysql seems longtext and text as the same, and so you will even able to run GROUP BY and ORDER BY with it, even it is in 4GB size!
BTW it is also seems reasonable: if you are going to do some SQL comparison, fine, use VARCHAR2 and take action within reasonable size for performance consideration; if you need space, fine, i give CLOB to you, but don't ask for any others ;(
and the truth is: i hate oracle's string handling!
Comment #7
moshe weitzman CreditAttribution: moshe weitzman commentedseems reasonable to me
Comment #8
Dries CreditAttribution: Dries commentedApproved by Moshe, committed by Dries. :)
Comment #9
(not verified) CreditAttribution: commentedComment #10
bjaspan CreditAttribution: bjaspan commentedThis patch changes the schema without updating existing databases to match the changes. When upgrading from Drupal 5 to 6, it appears that 13 tables end up not matching the schema. So, either this patch needs to be reverted or existing tables need to be updated.
I did not know about this patch until today. I haven't yet thought about the basic point of this patch thoroughly. Offhand, I don't like the idea that text:big is semantically different than text:medium on some database engines. The type 'text' should be semantically the same (i.e. comparable or not comparable) on all platforms regardless of the size which, as per the docs, is only a hint anyway. So if Oracle can only handle 4000 bytes for a comparable text:big, then that's all text:big can be on Oracle. If there is a database column that can't accept being limited to 4000 bytes on Oracle, it should use type blob or some other type that we define as "big enough but not comparable." If there is a database column that must be longer than 4000 bytes *and* must be comparable, then perhaps that functionality is just not implementable with Oracle (which I find hard to believe). However, as I said, I have not thought this through thoroughly yet.
I would also like to propose two things:
1. All patches to .install or .schema files must be tested with schema.module and produce a zero-mismatch comparison report before they can be committed.
2. All bugs which result in a schema comparison mismatch should be considered critical. We should not accept a bug which results in a database not matching the results of hook_schema. To this end, I have marked this issue as critical. If core committers disagree, they can downgrade it.
Comment #11
hswong3i CreditAttribution: hswong3i commentedi just find some useful information, and hope to share according to this topic (from http://phplens.com/lens/adodb/docs-oracle.htm):
i also found that such special handling of LOBs is not only exists within oracle driver of ADOdb, but also for that of postgres, odbtp, sqlanywhere, informix72, ibase, odbc, mssql and db2:
ok, let's back to the topic:
this is not the consideration of oracle driver implementation only, but also a cross DB compatibility concern. ADOdb try it, and come out with current solution. i don't really think we can do much better than that of ADOdb for Drupal: ADOdb is a project that starting from Sept 9 2000. maybe we should reuse their progress, and skip something that we are not really needed for, according to performance consideration.
Comment #12
hswong3i CreditAttribution: hswong3i commentedthis patch add a new data type "CLOB" into drupal-6.x-dev schema API, and 2 more APIs: db_update_blob() and db_upadte_clob(). according to this change, some core module update its schema from "text:big" to "CLOB" type, with using new APIs for updating its values, e.g. node->body. NULL should input as placeholder for %b and %c during traditional db_query() insert/update query. On the other hand, this patch also rollback some field from "text:medium" into "text:big" for backward compatibility.
as bjaspan mention, all "text" types should remain same behavior among different DB backend: it should be able for compare. "text:big" should not be the special among the others, within oracle driver handling. the new data type "CLOB" can clearly explain its property as: character storage, large enough, but never directly using it for SQL-level compare. therefore developer can have a more strict guild line for their core/contribute development.
BTW, after studying about ADOdb cross DB handling, i found that the handling problem of character large object (CLOB) is not only appear within Oracle, but also among some other DBs. ADOdb propose another method, rather than insert/update CLOB directly: it provide additional API call UpdateBlob() for such LOBs update, after normal insert/update query. it seems to be a good idea, as we don't need to do additional query rewrite (which also means a complete SQL parser and rebuilder, which seems to be mission impossible) for CLOB handling.
additional benefit: as mysql version map "clob" into "longtext" and "text", where pgsql map this to "text", the add of additional API will not affect contribute module development, unless they change their module's schema, and choose to use CLOB for data storage.
P.S.: little bit change about %b handling: will insert/update NULL as placeholder within db_query(), for further more value update by using db_update_blob() and db_update_clob() manually. therefore cache.module require a small change about current blob handling. it may also affect other contribute modules which are currently using BLOB. BTW, as BLOB is not always use, this may not be a great problem.
Comment #13
hswong3i CreditAttribution: hswong3i commentedComment #14
hswong3i CreditAttribution: hswong3i commentedthis patch standardize drupal BLOB/CLOB handling for cross DB compatibility, e.g. db_update_* for LOBs value update, db_encode_*/db_decode_* for LOBs value encoding/decoding. change incude:
for current stage, it is only a dummy mapping for "clob" type among mysql/pgsql, where mapping it as like as that of "text" type. BTW, it will greatly decrease the difficulty for other DB driver implementation, e.g. oralce, mssql and db2.
this patch only change the handling for LOBs type. therefore, unless developer choose to use BLOB/CLOB type, nothing will affect them. on the other hand, if developer choose to use LOBs type, which means that they will have (nearly) unlimited storage size, BTW trade off some query level features, e.g. using GROUP BY, ORDER BY, WHERE, etc.
Comment #15
hswong3i CreditAttribution: hswong3i commentedminor update, patch with latest CVS, fix some pgsql bugs.
complete tested with mysql + mysqli, pass though all core + optional modules installation. patched modules are all functioning :)
Comment #16
bjaspan CreditAttribution: bjaspan commented@Dries: I marked this RTBC because the patch from comment #4 which was previously committed needs to be rolled back (the author agrees with this as stated in #11). As of today, it still reverse-applies to HEAD. Once the patch is rolled back, the issue should be set to CNW so further development can occur.
@hswong3i: I briefly reviewed (but did not test) your patch and in general it looks good. However, this is clearly an API-changing feature enhancement and, as such, cannot be committed now that D6 has reached code-freeze. We should continue working on this for D7. I realize this means that Oracle support may be (once again) limited for D6 without core patches, but in my opinion we cannot risk slipping in a change like this at this point in the cycle. If Dries/others disagrees, I'm certainly open to reviewing the patch more carefully.
I noticed one problem: db_update_clob/blob() is written to expect '{tablename}' instead of 'tablename' like the other schema api functions. The {tablename} notation is only for when table names are embedded in SQL. Otherwise, confusion will result.
Comment #17
hass CreditAttribution: hass commentedMsSQL introduced in Ms SQL 2005 a varchar(max) data type to solve this CLOB comparison problem... so only older MsSQL Versions have this problem... maybe Oracle works on a new datatype, too?
Comment #18
hswong3i CreditAttribution: hswong3i commented@bjaspan: i agree that it should be a bit difficult to commit this patch together with drupal-6.x-dev after code freeze, even though it is mainly a API enhancement. BTW, as oracle driver is already closed to drupal-6.x-dev, i will suggest to review the chance of accepting this enhancement. on the other hand, unless developer choose to use the new LOBs handling, none of their code need to be review.
the API of db_update_clob/blob() is still a prototype, we still can change it :)
@hass: hopelessly, oracle don't have such new data type, please refer to http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_eleme... for more information :)
Comment #19
Dries CreditAttribution: Dries commentedI've rolled back the old patch so we can work more on the new patch.
Comment #20
hass CreditAttribution: hass commentedAside if we get this solved for Oracle, we will not have any troubles with MsSQL 2000, too :-).
Comment #21
hswong3i CreditAttribution: hswong3i commentedpatch for latest CVS, update db_update_clob/blob(), update some document.
@bjaspan: now db_update_clob/blob() will accept real table name directly.
@hass: yes, as i know that is the problem for MSSQL 2000, if oracle driver solve it, MSSQL (and some other else) will also benefit from it :)
Comment #22
hswong3i CreditAttribution: hswong3i commentedpatch according latest CVS, update handling for block.module and contact.module, fix minor bug for comment.module.
completely tested modules: cache, aggregator, block, comment, contact and node (all patched modules)
Comment #23
hswong3i CreditAttribution: hswong3i commentedminor update according to v0.7, update code which more make senses with argument input.
Comment #24
hswong3i CreditAttribution: hswong3i commentedupdated patch for latest CVS.
Comment #25
hswong3i CreditAttribution: hswong3i commentedversion 1.1 update according to latest CVS, with following enhancements:
since this patch is required by latest oracle driver implementation (http://drupal.org/node/39260), please kindly review the chance of accepting this patch after drupal-6.x-dev code freeze, many thanks :)
Comment #26
Dries CreditAttribution: Dries commentedI had a look at the patch and here are some comments:
1. In general, I think it makes sense to differentiate between 'binary data' and 'character data'.
2.
I don't understand that explanation. Please consider rewording it for clarity.
3. I don't understand why we need db_update_clob() and db_update_blob() functions? If there is no critical need for these functions, I'd prefer to remove them, and to let people write regular db_query("UPDATE ...") statements. If these functions are critical (rather than just handy), document in the code why they are critical.
4.
Why is that extra check required?
That's all for now. Looks promising but needs some more work around the edges. Let's give this patch some prime time so we might be able to include it in D6.
Comment #27
hswong3i CreditAttribution: hswong3i commentedmost of the change are function hook for oracle driver, or other driver backend that require special handling to LOBs data.
2 & 3 & 4. for the case of oracle, we can't simply insert a "NULL" value for INSERT as placeholder, but EMPTY_BLOB() or EMPTY_CLOB(). this is because we can't do any update to a LOBs field with NULL value, but EMPTY_BLOB()/EMPTY_CLOB().
on the other hand, we also don't need to do any input value escape to input LOBs value as it will be handled by LOBs->save().
for example, in oracle driver:
therefore, take block.module as example, the insert query will become:
the update query will become:
after we fetch the pointer for "body", we can then run LOBs->save() and update its value:
for more detail reason of this API changes, please compare the differences between mysql/pgsql handling with that of oracle. i feel so sorry to insert some dummy handling to mysql/pgsql version, but this give a great chance for other DB driver implementation :)
Comment #28
hass CreditAttribution: hass commentedDon't forget to tell why we need db_update_clob() and db_update_blob() functions. There are differences in the update command for different DB's. In MsSQL there is a special UPDATETEXT, so we are better to know what we are updating. Additional you sometimes need to use pointers syntax for updating TEXT. Maybe other DBs have different update functions for updated text and binary data. So it's better to split, isn't it?
Comment #29
hswong3i CreditAttribution: hswong3i commented@hass: great! you give us the answer that we are seeding for: different DB have different LOBs update method, and so we better split it out, and handle it with independent function API. seems db_update_*() can really help in MsSQL too :)
Comment #30
hswong3i CreditAttribution: hswong3i commentedpatch according to latest CVS, update some document.
@Dries: i detail the document according to your questions:
2. since oracle will save pointer of LObs into DB table, and further more fetch value by calling LOB->save/load(), NULL (none of pointer) != EMPTY_CLOB() (pointer with empty value, able to use LOB->save/load()). the document update as:
3. similar reason as 2.: since oracle save a point for LOBs value into table, update of its value require additional handling, rather than a simple UPDATE query, an isolated APIs for LOBs value update are requred.
e.g. case of oracle driver: first of all, we need to fetch out LOB's pointer, by SELECT ... FOR UPDATE (current oracle implementation); secondly, start the transaction by oci_execute($last_result, OCI_DEFAULT); try to truncate old value, and then save its new value; rollback whenever face error, or commit transaction if success. it is very complicated, and should be handle individually :)
document of db_update_clob(), for mysql/mysqli, update as:
4. the different will be shown by comparing MySQL handling with Oracle's. for MySQL, CLOB is a dummy mapping to longtext, so handle as:
BTW for oracle, no escape is required for input value (will handle by LOB->save()), but require EMPTY_CLOB() as placeholder during INSERT/UPDATE query (as stated in 2.), code will become:
Comment #31
hswong3i CreditAttribution: hswong3i commentedComment #32
Dries CreditAttribution: Dries commentedI'll try to investigate this issue some more as soon time permits. In the mean time, it would be helpful if someone could help you massage the documentation in proper English. It's a little rough around the edges, but nothing we shouldn't be able to help you with. Thanks for the hard work, I'll get to it soon..
Comment #33
hswong3i CreditAttribution: hswong3i commented@Dries: document updated. detail of changes: http://rafb.net/p/WGZMcE39.html
Comment #34
hswong3i CreditAttribution: hswong3i commentedaccording to my previous benchmarking result, there is no performance different when comparing LOBs patched with CVS HEAD (at least in case of MySQL). impact for API update should be the only concern of this patch.
on the other hand, with LOBs patch helps, oracle driver (and some other databases with similar physical limitation in VARCHAR storage sizing) will able to implement in a much flexible style, e.g. we are now able to store node body with more than 4000 characters :)
i have no further more suggestion or idea about this patch. LOBs patch is the major dependence of oracle driver for drupal-6.x-dev, based on functionality concern. hope someone may help about its coding or documents improvement :)
Comment #35
hswong3i CreditAttribution: hswong3i commentedpatch based on latest CVS HEAD, update comment.module handling.
Comment #36
hswong3i CreditAttribution: hswong3i commentedwhen studying DB2, IBM's DB2 + Drupal document, and reference to ADOdb for existing implement, i found the following message about BLOB handling from ADOdb:
i think this prove the necessary of LOBs patch: different database comes with different LOBs handling, sometime we need to insert special placeholder during INSERT/UPDATE, and further more update its value by special API :)
Comment #37
hswong3i CreditAttribution: hswong3i commentedand here is the case of MSSQL, within ADOdb implementation:
Comment #38
hswong3i CreditAttribution: hswong3i commentedminor update: patch for latest CVS HEAD, a bit document update, and update node_save() handling based on latest changes :)
P.S. i found this is very helpful when implementing both oracle and DB2 driver. as ADOdb and PDO both come with similar handling, and this patch will change programming logic only if developer choose to use CLOB for (almost) unlimited storage size, i guess it should ready to commit with no question. hope someone may help about its review :)
P.P.S. world is always not such perfect: oracle comes with VARCHAR2 (max. 4000 characters) and CLOB (max. 128TB), where DB comes with VARCHAR (max. 32672 characters) and BLOB/CLOB (max. 2GB). it is not easy to implement MySQL's text:big => longtext mapping for similar functionality: able to use for compare (VARCHAR2/VARCHAR), and also comes with unlimited size (CLOB). we should split size concern (clob) from functionality concern (text), and preform isolated handling :)
Comment #39
hass CreditAttribution: hass commented2GB or more shouldn't be an issue... i think - no - site will have inline images or website pages of such a size. And people saving images in blobs should be beaten... :-)
Will this patch land for D6? I don't like to spend time on a "database.mssql.inc" if this patch never goes in...
Comment #40
hswong3i CreditAttribution: hswong3i commented@hass: i point out DB2's CLOB size just because i hope to compare it with MySQL. in case of DB2, we need to choose VARCHAR for functionality, or CLOB for size. neither one can clone MySQL's text:big => longtext handling, we need some trade off :(
P.S. MSSQL is also within my schedule, after Oracle and DB2. this is because we have ZendCore for both Oracle and DB2, so i don't need to spend too much time for system setup :)
Comment #41
hswong3i CreditAttribution: hswong3i commentedpatch update based on latest CVS HEAD, with minor code change in node_save(). completely tested with MySQL.
Comment #42
hswong3i CreditAttribution: hswong3i commentedpatch for latest CVS
Comment #43
hswong3i CreditAttribution: hswong3i commentedafter an indeed studying about ADOdb and PDO database abstract layer, i found that a special LOBs value handling API is almost a must, based on cross database compatibility concern. here is a simple definition of LOBs value, quote from PDO (http://hk.php.net/pdo):
in case of ADOdb, LOBs value will handle by: 1. normal INSERT query, with LOBs value placeholder (NULL for most database, EMPTY_BLOB/EMPTY_CLOB for Oracle); 2. manually update LOBs value by calling
UpdateBlob
. here is a simple code snippet for CLOB handling (reference to http://phplens.com/lens/adodb/docs-oracle.htm):in case of PDO, LOBs value will handle by: 1. normal INSERT query, with placeholder as "?"; 2. prepare and bind each value with placeholder, by calling
bindParam()
(NOTE: you will need to define data_type as PARAM_LOB explicitly for LOBs value binding); 3. execute transaction. here is a simple code snippet for inserting an image into a database by using PDO (reference to http://hk2.php.net/pdo):so what's the case of this patch? case is just similar as both ADOdb and PDO: 1. normal INSERT/UPDATE query, with NULL as LOBs value placeholder (it is because db_encode_blob() and db_encode_clob() will translate this NULL as suitable under layer placeholder); 2. manually update LOBs value by calling
db_update_blob()
ordb_update_clob()
. here is a simple code snippet for block content insertion:my research result is very simple: a special API is the best, common, and strict forward solution for LOBs value handling. even we may working with PDO in D7 or later (with array-style variable binding), it is also a must, for a hook which telling under layer what type of data we are now inserting, or else we can NEVER handle LOBs value insertion correctly.
moreover, that is no point to revamp our database API, even we are now changing our backend as PDO: our database API is powerful and suitable for most cases. PDO should only act as another under layer database driver supporting (maybe call as database.pdo.inc, or some other else). PDO SHOULDN'T be a replacement of existing implementation.
anyway, if we are just focusing on how PDO can benefit on performance when using with MySQL/PgSQL (which own more than 98% of our clients), but simply forget its powerful cross database ability, this will seems a bit silly... as we take a lot of effort on schema API, database abstract layer, etc, based on cross database concern, this idea is just a step backward ;p
Comment #44
Crell CreditAttribution: Crell commentedSince I suspect the latest comment is aimed at me based on a conversation in IRC earlier today, I'll jump in here. :-)
First off, no, the PHP 4/MySQL 3-era mysql driver is not at all robust or secure. That we layer a regex on top of it that is as slow as the much more robust and complex code in PDO is ample evidence of that. As long as our lowest common denominator is the mysql extension, we can't do anything interesting in the database system. mysqli is also not part of the standard install of PHP, and many shared hosts don't offer it. (mysql isn't either as of PHP 5.0, but most hosts install it for legacy support.) We can't rely on mysqli for MySQL support, which is what the vast majority of Drupal sites run on. That leaves PDO, which is reasonably cross-database (at the API level at least), reasonably fast (although not as fast as native APIs, it's still not a dog because it's all in C), and, most importantly, we can rely on it being present in any PHP 5 install as it's been part of the default configuration since PHP 5.1. That alone makes it a worthwhile move for D7, even if it weren't for easier Oracle/DB2/MS SQL support.
For data-modification queries, actually, in the D7 API as I'm currently building it you should never, ever use db_query() for an insert/update/delete statement. You should instead do
Aside from being easier to work with and more robust, it means you're passing not a serialized string to the database system but a data structure. Within the database driver, if necessary, you can do additional escaping, LOB handling, etc. without worrying about regexes and without putting extra work on the module developer.
Which brings up the next point. Right now, virtually all Drupal sites run MySQL or Postgres, with MySQL far and away the leader. Most Drupal developers know from MySQL primarily, I wager. The concept of LOBs and BLOBs is foreign to them, and many will likely see the necessity of it as a bug in those databases. (There are times when I do, too.) We should not build an API around a database that on a good day may be 2-3% of all Drupal installs for the foreseeable future. Most developers won't even have any way to test it to see if they're doing it "right". (How many people here have a copy of Oracle just lying around? :-) )
That sort of logic belongs inside the Oracle/DB2 driver, not exposed to every module developer to be able to forget or get wrong. If we can make that driver's job easier without making the API substantially more complex, we should do so. But not if it means adding a lot of needless, untestable, easy to break work for the vast majority of Drupal developers who are MySQL-based.
Comment #45
hass CreditAttribution: hass commentedMost databases are less then 2 or 4 GB... there are free versions of Oracle and MSSQL. Aside if i have the chance to get it running on our MSSQL2K5 active/active cluster i would move there... why should i spend my time on administering two different DBs if i can do it with one. Today we have simply no choice - eat or die with MySQL or PgSQL.
Comment #46
hswong3i CreditAttribution: hswong3i commented@Crell: i would like to have more details about how can your abstract layer handle LOBs value correctly? i have no question about inserting data with your array-style input, but where are you obtaining the information of these data type? it will be a bit not acceptable if: "as most data type will bind automatically and correctly by using PDO (unless LOBs, which need for explicitly defined), so we will simply forget LOBs handling in D7"...
@hass: good point. e.g. my client require for Oracle, due to political concern. therefore drupal will NEVER become their choice, as we are not supporting Oracle natively... (even though, we are suitable based on functional concern...)
Comment #47
hswong3i CreditAttribution: hswong3i commented@Crell: moreover, i am not opposing PDO supporting in D7/8/..., but try to figure out those common problems that we both need to due with. as we are both working for database abstract layer, we should facing with similar problem, no matter we are working with ADOdb or PDO ;)
e.g. db_num_rows() is a common problem among different database. this is not a problem of database abstract layer implementation, but dependent on database natively: some database won't pass the complete result set to client side after SELECT query; on the other hand, pass each row from result set when calling for fetch action (Oracle/DB2). since PDO working as a cross database abstract layer (as what i am now doing, but user level), which can't solve this problem natively, they try to REMOVE this support.
this is just similar as case of LOBs handling: since special API is a must, so we ADD this support. both ADOdb and PDO (moreover, this patch, too) propose for similar API, in order to due with similar difficulty.
LOBs handling is a common problem that we both need to solve; on the other hand, the successful of this patch should also equal to the successful of PDO, in coming future ;)
Comment #48
hswong3i CreditAttribution: hswong3i commentedAfter an indeed studying about differences between ADOdb, PDO and Drupal 6.x database abstract layer, it show that we are still missing a complete and standardized LOBs handling API, based on cross database compatibility concern (besides MySQL/PgSQL).
Drupal 6.x existing database abstract layer provide db_encode_blob() and db_encode_blob() for BLOB handling only, as PgSQL require special API for BLOB encode/decode (In case of MySQL, it is a dummy abstraction, as MySQL seems BLOB data as like as normal string). BTW, this is not the complete picture of LOBs handling and its difficulties: besides BLOB, it is also a critical concern about limitation on CHAR/VARCHAR/CLOB (Character Large OBject) among most database; moreover, method of LOBs INSERT/UPDATE is also database specific.
LOBs patch for Drupal 6.x (http://drupal.org/node/147947) will able to fill in this missing section, extent and complete our LOBs API functionality, with handy procedures. It should be the most common and strict forward solution when dueling with LOBs related difficulties. On the other hand, it shouldn't be a confusion for normal MySQL users, as it is very closed to the case of existing db_encode_blob() and db_decode_blob() implementation.
We shouldn't miss the chances, but our chances are waiting for this patch :)
(To view the thread, please navigate to http://edin.no-ip.com/html/?q=node/302)
Comment #49
hswong3i CreditAttribution: hswong3i commentedas i am now indeed studying about PDO + MySQL/Oracle/DB2, i find that this patch would also be a very important functional hook for our next generation database abstract layer development, based on cross database concern (even we may move to PDO, as this is a database-level but not database-abstraction-level problem). on the other hand, this patch is stable enough since July 24, 2007, and it functionality is already fully proved by the successful of Oracle/DB2 driver development. it should be RTBC.
Comment #50
hswong3i CreditAttribution: hswong3i commentedpatch via latest CVS HEAD
Comment #51
hswong3i CreditAttribution: hswong3i commentedAs it is now one day before D6 beta, it seems that not easy to commit this patch on time, together with Oracle and DM2 supporting. Even though it is a bit pity (as all are ready for production), I am now merging this patch into another technical preview of D7 database abstract layer, too (http://drupal.org/node/172541). I will keep on promoting this LOBs handling in our next generation of database abstraction layer.
BTW, if you are looking for unofficial Oracle (http://drupal.org/node/39260) and DB2 (http://drupal.org/node/165788) supporting within D6, please refer to their own issue. I will merge this patch (maybe some other additional patches) with those issues. On the other hand, I will provide patches based on latest CVS HEAD of D6, within D6 life cycle. So you will able to have those unofficial database supporting with a single patch apply :)
Comment #52
hass CreditAttribution: hass commentedThat's unbelievable bad information... i hoped to see this committed :-(.
Comment #53
hswong3i CreditAttribution: hswong3i commented@hass: yes, how pity it is... so let's postpone it into D7?
i will provide unofficial patch for both oracle and db2 within D6 life cycle, so people who are interested in this topic can get such supporting ON TIME, but not sit and waiting for D7 comes. i guess most of the projects needed for such unofficial patch are enterprise-related project, such little tailor-made handling may be acceptable for their professional and skillful developer :)
P.S. the main idea is: i will also benefit on this combined unofficial patch handling. i am now scheduled to move my personal Drupal site into Oracle with D6, and so i will able to keep on develop and test it daily. a single patch can save a lot of our time, during update ;p
Comment #54
hass CreditAttribution: hass commentedI'm not going to use unofficial patches. I have too many dependencies with modules and so on and i don't like to patch them all myself if they are buggy. This makes me going the mysql way another year or more, what suxxx much.
Comment #55
hswong3i CreditAttribution: hswong3i commented@hass: let's take it easy :)
i will not force anyone to use this unofficial patch: as it is unofficial, no one can generate about it correctness, and it is always not targeting for our end users. e.g. you will never use Debian's unstable release for production purpose, but only stable, isn't it?
BTW, we will always need time to prove its correctness. we need helpers (or better says as HACKERS ;p) to test the implementation, and provide feedback, before it can finally goes into core. as proved by the fail of this patch within D6, it will for sure to be another pity issue, if we start the development TOGETHER with the open of D7: it will be too late once again :(
a simple and handy unofficial patch may speed up the development, on the other hand give a big hand to those require for such features ON TIME. so please just let it be :)
Comment #56
hswong3i CreditAttribution: hswong3i commentedComment #57
chx CreditAttribution: chx commentedAs Siren is not a project on Drupal.org but an unofficial fork, I am removing it from issue titles to avoid confusion.
Comment #58
hswong3i CreditAttribution: hswong3i commentedEven though DB2 don't explicitly indicate the limitation of VARCHAR as like as that for Oracle (Oracle will only allow for VARCHAR2 with max 4000 characters; above this limitation, please consider to use CLOB with TB-scale storage), it is still logically limited as 4000 characters.
A minor footnote from IBM DB2:
This should be a good example why CLOB is something necessary for most database implementation. We are not always as lucky as case of MySQL and PostgreSQL, which we are able to work with TEXT for (nearly) unlimited size. On the other hand, we have CLOB, which usually required for special handling, e.g. database dependent INSERT/UPDATE/DELETE abstraction, decode CLOB as like as BLOB when fetching data, etc.
Since we are now having drupal_write_record() for high level programming abstraction, and (may) have INSERT/UPDATE/DELETE API for driver level abstraction, CLOB should be something much simpler than case of before. Moreover, this logic is now proved as function with Siren: Siren is able to work with Oracle CLOB, on the other hand don't introduce complicated abstraction to MySQL and PostgreSQL.
Comment #59
hswong3i CreditAttribution: hswong3i commentedAfter an indeed study of pdo_oci, CLOB is not the optimal solution for this issue (pdo_oci currently not support CLOB). On the other hand, I would like to summarize the reason of this issues once again:
Moreover, using blob (but not clob) as text:big replace also introduce some other benefits:
P.S.
'default' => '',
is important for Oracle and DB2 implementation: they will need to catch it and replace asEMPTY_BLOB()
- the default empty blob placeholder. This is completely different from a simple NULL handling. This is also what http://drupal.org/node/195169 mentioned for.Since Drupal 7.x is now open for public development, this patch de-fork from latest Siren research progress, and via latest Drupal 7.x CVS HEAD. Fully tested with MySQL and PostgreSQL.
Comment #60
hswong3i CreditAttribution: hswong3i commentedAppend corresponding drupal_write_record() handling.
Comment #61
catchNot critical, is this still valid now #225450: Database Layer: The Next Generation is in?
Comment #62
catchComment #63
hswong3i CreditAttribution: hswong3i commented@catch: Thanks for your consideration. This issue won't able to be fix with our new D7 DB abstraction layer, as it is the physical limitation of DB itself.
In case of Oracle, only maximum 4000 characters is allowed for VARCHAR2, which come with GROUP BY and other SQL functionality. CLOB have unlimited size but can't function as like as MySQL's TEXT type; on the other hand, its support stage is not such stable as BLOB. Therefore replace some text:big field with BLOB can simply solve the problem. Case is also similar in DB2 and MSSQL.
E.g. {batch} will store a serialized command into DB, which always result as a single text content with more than 4000 characters, and so will buggy in case of Oracle. Node body and block body come with similar situation, too.
I am going to summarize my research progress into individual small patches for CVS HEAD. Hope to have your kindly support :-)
Comment #64
Damien Tournoud CreditAttribution: Damien Tournoud commented@hswong3i: Please limit the scope of that issue to changing all TEXT:BIG that are used as BLOB (ie we only store chunk of data in them, we don't filter or order by them). Please also provide an appropriate upgrade path (ie. _update functions).
That mean your patch in #60 could get committed if you remove db_decode_blob() and drupal_write_record changes, and add upgrade paths.
Comment #65
Crell CreditAttribution: Crell commentedThat sounds like a limitation of Oracle if it has no equivalent to TEXT fields.
With Schema API, though, the exact underlying field type needn't be identical on each database. So we can use TEXT on MySQL and CLOB/BLOB on Oracle, as long as the driver itself handles marshalling data back and forth. That was a deliberate and key design of the new database API. So unless I'm misunderstanding you, we need only use CLOB/BLOB in the mappings for an Oracle driver; we don't need to change everyone else to use BLOB.
That said, don't we need non-opaque data for some of our larger text fields? If we do any sort of processing on them (LIKE for instance) then a BLOB field will break that. It's something to look into.
Comment #66
hswong3i CreditAttribution: hswong3i commented@Crell: It is truth that we can map different data type with database specific container, dependent on different requirement; but some fundamental functionalities are required for different mapping, too. Therefore replace ALL text:big filed as BLOB in core, or only map text:big as BLOB in case of Oracle within schema, will not help this case: we need to change the mapping case by case within different field definition, but not in schema absolutely.
E.g. We can map text:big (or every text type) as VARCHAR2(4000) in Oracle/DB2/MSSQL, so we can provide GROUP BY or LIKE functionality for all DB backend. Therefore we can utilize text:big for all description field, e.g. {boxex}.info and {term_data}.description.
On the other hand, we need to map some storage only fields as BLOB, e.g. {boxes}.body and {node_revisions}.body, so we can provide enough storage space without SQL functionality. This can also prevent crazily GROUP BY or LIKE SQL against hung storage which result as performance degrade.
My conclusion is: if we really need storage space rather than SQL functionality, remap field as BLOB; else just left it as text:big and let the 4000 characters limitation belongs to Oracle/DB2/MSSQL only. This should be the most fair solution with maximum cross database compatibility.
Comment #67
Crell CreditAttribution: Crell commentedI'm still not sure I follow you. It sounds like you're saying we need a "text big, as parsable text" and a "text as opaque value" as separate field types that on MySQL/PostgreSQL/SQLite would both map to TEXT while on Oracle they'd map to varchar and BLOB, respectively? (Why BLOB and not CLOB I still don't understand, but then I don't understand anything about Oracle's design.)
Comment #68
hswong3i CreditAttribution: hswong3i commentedThis oracle schema implementation is my latest research progress before our next generation of database abstraction layer commit, which function with both installation and most core operation:
CLOB is only available in Oracle but not DB2/MSSQL, PHP + PDO + Oracle CLOB come with a lot of bugs , and CLOB type is much different from MySQL/PostgreSQL TEXT type (P.S. DB2 also come with similar limitation):
So my suggestion is: as Drupal text type always assume with ORDER BY/GROUP BY/SELECT ... DISTINCT/SELECT... UNIQUE functionality, just let Oracle/DB2/MSSQL map it with VARCHAR(4000) with size limitation; on the other hand, replace some necessary fields as BLOB, trade its (not required) SQL functionality with nearly unlimited storage size.
Comment #69
hass CreditAttribution: hass commentedAs I know - MsSQL 2005/2008 can use "varchar(max)" that works like MySQL "LONGTEXT"
Comment #70
hswong3i CreditAttribution: hswong3i commented@hass: But the main point is legacy PHP MSSQL driver (FreeTDS?) can't handle this correctly (only support up to VARCHAR 255; CLOB INSERT/UPDATE seems buggy). I still not yet have time to test with Microsoft's new PHP driver... Hopefully it should really work fine...
So at this moment, I will still suggest mapping text as VARCHAR(4000) in case of MSSQL for better compatibility :)
UPDATE: Sorry for wrong information: FreeTDS would be crazy with CLOB INSERT/UPDATE and only handle the first 255 characters; both VARCHAR and BLOB are safe. So mapping Drupal text type as VARCHAR(4000) should also be safe.
Comment #71
hass CreditAttribution: hass commentedSounds like the driver is MsSQL 2000 based... maybe only not yet updated to the latest features... :-( as PDO MsSQL is experimental I expect not everything is already implemented :-). I wouldn't waste my time with SQL2K any more.
Comment #72
hswong3i CreditAttribution: hswong3i commentedA split version for feedback is available: Remap field as BLOB: {aggregator_item}.description.
Comment #73
Crell CreditAttribution: Crell commentedOK, I still don't see why we need to change anything on MySQL. Map text:big differently on Oracle and be done with it. No muss, no fuss. The only question is whether or not there are any text:big fields currently that we do need to parse in the database as text, not as an opaque blob. Before we make any schema changes, can you check through core and see if there are any? That will affect what if anything we need to do.
Comment #74
hswong3i CreditAttribution: hswong3i commented@Crell: Yes, I think we don't need any code base change for MySQL/PostgreSQL schema/abstraction implementation, but may need some footnote for general developers about how the behavior of text:big and blob should be, e.g. footnote for text:
In case of blob:
Adding above information into DB driver schema definition should give a great hand to general developers, e.g. within MySQL/PostgreSQL's getFieldTypeMap(). Any idea?
Comment #75
hass CreditAttribution: hass commentedI don't like your "recommendation" very much. It is a mayor improvement in MsSQL 2005+ to have varchar(max). I'm really happy that we have this now as it adds compatibility to MySQL and PostgreSQL. Oracle and DB2 don't have it... well they need to fix their software :-))). In MsSQL you cannot compare TEXT fields with
... WHERE column1 = 'Your very long text that is longer than 8000 chars...'
. Additional MS have dropped TEXT field support in MsSQL2008. I heard "rumours" from my colleagues that we must change all our TEXT fields before upgrading to 2008 (unverified myself).Comment #76
hswong3i CreditAttribution: hswong3i commented@hass: I don't like this "recommendation" too, but it do reflect to the real case: Oracle only support maximum 4000, DB2 only function as expected if smaller than 4000, where MSSQL not support larger than 8000 friendly; for a GCD, document Drupal's text support as maximum 4000 should be a reasonable choice (P.S. Sorry that this issue is not target for MSSQL only, but try to figure out a most fair handling among different database engine, together with maximum functionality).
Patch via CVS HEAD, documentation plus some code sync when compare MySQL and PostgreSQL implementation. May someone give a hand for my poor English?
Comment #77
hswong3i CreditAttribution: hswong3i commentedMay someone give a hand for these split patches review?
Remap field as BLOB: {aggregator_item}.description
Remap field as BLOB: {boxes}.body
Remap field as BLOB: {comments}.comment
Remap field as BLOB: {node_revisions}.body and {node_revisions}.teaser
Remap field as BLOB: {watchdog}.variables
Comment #78
hswong3i CreditAttribution: hswong3i commentedPatch update with better documentation.
Comment #79
hswong3i CreditAttribution: hswong3i commentedUpdated tasklist, with successful simpletest in both MySQL and PostgreSQL:
A bit buggy with PostgreSQL based on http://drupal.org/node/316095:
Comment #80
hass CreditAttribution: hass commentedShouldn't TEXT not become a CLOB!? BLOB is binary large object... CLOB is character large object
Comment #81
hswong3i CreditAttribution: hswong3i commented@hass: I have comment for the reason in http://drupal.org/node/147947#comment-720555 :-)
Long story short: BLOB is stable, large in storage size, with less limitation among different database engines, when compare with that of CLOB.
Comment #82
Anonymous (not verified) CreditAttribution: Anonymous commentedThe last submitted patch failed testing.
Comment #83
hswong3i CreditAttribution: hswong3i commentedPatch reroll via CVS HEAD for documentation only.
We don't need to change MySQL implementation, this patch just give developers more idea about limitation between different database engine, and so choose suitable data type based on cross database concern.
Comment #84
hswong3i CreditAttribution: hswong3i commentedBetter title.
Comment #86
lilou CreditAttribution: lilou commentedSee: #335122: Test clean HEAD after every commit and http://pastebin.ca/1258476
Comment #87
hswong3i CreditAttribution: hswong3i commentedAccording to IRC review with chx, this issue is now integrate with following issues:
The following issues are still keep open because of other dependence:
Comment #89
hswong3i CreditAttribution: hswong3i commentedAll BLOB related patches. Reroll via CVS HEAD.
Comment #91
Crell CreditAttribution: Crell commentedI am not convinced about moving so many fields to BLOB, but since so much has changed in core since the last post I don't think much of this is even relevant anymore. It's also not changing in Drupal 7.
Comment #92
jhedstromProbably too late for 8.0.x now too.
Comment #94
mgiffordUnassigning stale issue. Hopefully someone else will pursue this.
Comment #107
daffie CreditAttribution: daffie commentedComment #108
daffie CreditAttribution: daffie commented