CCK Type is used in a view filter in the facebook_status table, but database row is of type text. This is generally a bad idea. Please change your view(s) to not filter on text or set the max length value in the CCK field to less then 255. facebook_status, facebook_status_recent

Comments

icecreamyou’s picture

Project: Facebook-style Statuses (Microblog) » DB Tuner
Version: 6.x-2.x-dev » 6.x-1.x-dev
Component: Code - Functionality » Code

FBSS has absolutely nothing to do with CCK. DB Tuner is detecting something wrong.

The filter on the text columns in the facebook_status and facebook_status_recent views is {facebook_status}.status <> '' which is fast.

mikeytown2’s picture

Category: support » bug

Could that view be configurable to filter the status text to something other then ''?

Anyway there is a bug in db tuner as I assume most things used in views are CCK fields. Also need a way to disable the warning if the field is not a CCK field. But I would like to warn the admin about a view doing a text search as this is usually killer to your SQL server.

First thing to do is fix the wording so it doesn't say CCK if it's not a CCK table. Second thing is to find the longest value in the text field and see if it would fit in the table as an index. Example code for finding the biggest variable:

SELECT *, Length(value) AS length 
FROM variable
WHERE Length(value) = (SELECT Max(Length(value)) FROM variable)

OR

SELECT LENGTH( value ) , COUNT( * )
FROM variable
GROUP BY LENGTH( value )
ORDER BY LENGTH( value ) DESC

OR ... (need to profile this on a big database (million + rows) and make sure it won't kill it)
If it would fit then recommend looking into adding an index. Do not allow this to be done from the GUI as this alters the DB without the module that created that table knowing about the new limit on that field.

icecreamyou’s picture

Could that view be configurable to filter the status text to something other then ''?

Every view is configurable... the particular filter that is used in this case is exactly the correct one for the intended use case though. In English, the filter represents the argument "don't show blank status updates."

mikeytown2’s picture

gapple’s picture

Title: I got this warning from dbtuner module: CCK Type is used in a view filter in the facebook_status table, but database row is of t » Optimize usage of TEXT columns
StatusFileSize
new30.11 KB
new39.8 KB

My comment from #977672: Convert TEXT columns to VARCHAR if possible:

Since MySQL doesn't support TEXT/BLOB columns for in-memory temporary tables, it could have a potential gain to convert some TEXT fields to VARCHAR. Additionally, MySQL 5.0.3+ also allows varchar columns up to 65,535 characters, depending on character set and subject to the total row size limit, which removes the major disadvantage of VARCHAR fields being limited to 255 characters.

I think a good recommendation is to place maximum size limits on CCK fields if possible, ideally less than 255 characters so that it will utilize a VARCHAR column by default. If the limit is greater it may be possible to modify the column type, subject to the row size limitations.

Some core tables also use TEXT columns for values that may reasonably fit within a large VARCHAR field (e.g. vocabulary.description, term_data.description), and conversion may allow the database to make better use of memory-based temporary tables.

I created a separate issue since this one seemed to be focused on Views queries and column indexes, whereas I propose analyzing all TEXT columns for potential conversion, but they can be reasonably addressed together.

----
I've attached a couple of resource graphs showing temporary table usage on my server. One of the Drupal sites on the machine (the largest and busiest of 4) was modified to use VARCHAR columns where possible. The change in disk tables is most noticeable in the yearly graph, but apparent in the monthly graph as well. Previously it was consistently 30 - 35% of temp tables created on disk, with the current value approximately 8%.

mikeytown2’s picture

MySQL 5.0.3 and up does support varchar lengths up to 65,535 bytes. Do a quick db_version check and go from there.

mikeytown2’s picture

happy with

SELECT *, LENGTH( value ) , COUNT( * )
FROM variable
GROUP BY LENGTH( value )
ORDER BY LENGTH( value ) DESC

Or this

SELECT *, Max(Length(value )) FROM variable

did 2 Million small text rows in 30 seconds (term_data - description); tried to do 500K rows of 8GB of text (node_revisions - body), gave up on that. So queries like this are dependent on the size of the table. I think I will limit this to 96MB (configurable); any table above this size will not be scanned for the largest row.
Table: term_data Column: description
MyISAM - TEXT: 24 Seconds
MyISAM - VARCHAR: 3 Seconds
InnoDB - VARCHAR: 15 Seconds
InnoDB - VARCHAR & index: 5 Seconds
InnoDB - TEXT: 31 Seconds
This is my home computer (xampp on xp) so InnoDB is not tuned at all. Switching the database structures around took a very long time to do (hours per operation).

This is getting complicated so it sounds like I need to create a new menu entry for operations of this nature. Would be good because I can then issue more warnings about this eating your homework.

gapple’s picture

I finally wrote a blog post about my experience converting TEXT columns: http://goldapplesoftware.ca/blog/2011-05/reducing-drupals-disk-temporary...

One of the things I found was that it is possibly to find all TEXT columns in the database using the following query:

SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_SET_NAME 
FROM `information_schema`.`COLUMNS` 
WHERE 
  TABLE_SCHEMA = 'DatabaseName'
  AND
  DATA_TYPE in ('text', 'smalltext', 'mediumtext', 'longtext')
ORDER BY TABLE_NAME

The easiest way to check the size of entries in a text in the column is:

SELECT CHAR_LENGTH(ColumnName) as cl, COUNT(*)
FROM {TableName} 
GROUP BY cl DESC

CHAR_LENGTH() should be used instead of LENGTH(), since UTF-8 will require CHAR_LENGTH() * 2 bytes be available for storage.
LENGTH() will return the number of bytes used for storage, and for UTF-8 will be anywhere between CHAR_LENGTH() and CHAR_LENGTH() * 2, since we don't know how many characters will use one or two bytes.

gapple’s picture

I think the easiest recommendation to start with is to to set any CCK text fields to a maximum size <= 255 if possible.
Then, if any CCK fields have a a maximum size between 256 and approximately 30 000, we can suggest changing the database storage type to VARCHAR. (UTF-8 requires that two bytes of storage be available for every character. With the maximum row size being 64KiB, 64 * 1024 / 2 - [storage size of other columns] characters can be stored in a single column). A warning should be generated if the VARCHAR column's size is smaller than the field's maximum length, as it could result in data loss.

If the CCK field is set to store multiple values or is connected to multiple content types it will be stored in a separate table and row size limits should be less of an issue. If the text field is single-valued and connected to only one content type though, other fields will definitely need to be taken into consideration.

----

Recommending converting other columns will require an analysis of the current data stored in the field, and if a suitable benefit could be realized. While more work, I think maintaining a list of columns suitable for optimization should be created rather than just doing a check on the maximum length of all columns and doing a generic recommendation.

The size of the data column for filefield/imagefield and the attributes column for link is dependent on the field's configuration. The deployment I checked doesn't seem to really use these fields, so their content is very short. I would like to see some more information from other people before attempting a recommendation here.

gapple’s picture

Status: Active » Needs review
StatusFileSize
new2.03 KB
new7.92 KB

Here's a couple patches for some conversion recommendations (the second patch builds off the first)

The first patch checks all CCK Text fields for the potential to be converted to VARCHAR, and makes recommendations based on the fields maximum length setting and the largest value currently stored in the field.
If the column is already converted to a VARCHAR it will perform some sanity checks to make sure that data will not be cut off when inserted, or that the column is not needlessly larger than the field's maximum length.

The second patch makes recommendations on converting the term_data.description column if it doesn't contain too large a value.
If the column is already a VARCHAR, it will check that the largest stored value is not close to the column's size, since the taxonomy module will not limit the length a user enters.

hefox’s picture

Status: Needs review » Needs work
+++ dbtuner.text.inc
@@ -0,0 +1,213 @@
+  if ($output) {
+    return $output;
+  }
+  else {
+    return 'No Recommendations.';
+  }

return $output ? $output : t('No Recommendations').
(at least put t around no recs).

+++ dbtuner.text.inc
@@ -0,0 +1,213 @@
+    $version = version_compare(db_version(), '5.0.3', '>=');

Should this be a more generalized function? e.g. db_tuner_min_dbversion('5.0'3)?

+++ dbtuner.text.inc
@@ -0,0 +1,213 @@
+    $max_lengths[$table][$column] = db_result(db_query('SELECT MAX(CHAR_LENGTH(%s)) FROM {%s}', array($column, $table)));

Not sure if $table or column really needs to be treated like considering they're not user import (right?).

+++ dbtuner.text.inc
@@ -35,6 +37,41 @@
+      $output .= t("This column may be converted to VARCHAR manually to store values up to " . $col_max_size . " characters.");

Improper t, can't be parsed like that.

+++ dbtuner.text.inc
@@ -0,0 +1,213 @@
+  if ($output) {
+    return $output;
+  }
+  else {
+    return 'No Recommendations.';
+  }

return $output ? $output : t('No Recommendations').
(at least put t around no recs).

+++ dbtuner.text.inc
@@ -0,0 +1,213 @@
+    $version = version_compare(db_version(), '5.0.3', '>=');

Should this be a more generalized function? e.g. db_tuner_min_dbversion('5.0'3)?

+++ dbtuner.text.inc
@@ -0,0 +1,213 @@
+    $max_lengths[$table][$column] = db_result(db_query('SELECT MAX(CHAR_LENGTH(%s)) FROM {%s}', array($column, $table)));

Not sure if $table or column really needs to be treated like considering they're not user import (right?).

Haven't tested patch yet. Might as well combine the two since one relies on other to apply properly.