dba_run_script() truncates lines to 8192 bytes

dww - March 31, 2007 - 22:21
Project:Database Administration
Version:4.7.x-1.1
Component:Code
Category:bug report
Priority:critical
Assigned:Unassigned
Status:active
Description

dba_run_script() is doing the following:

$line = fgets($fp, 8192);

this breaks everything when you try to use the "Run script" functionality to restore a backup for a table with really big data. for example, on a tiny test site, work on the 5.x port, if you backup all tables, it fails to restore on {cache_menu}, which is way longer than 8192 characters. :(

this bug basically renders the "Run script" functionality useless, and since that's the only way to restore from backup directly using this module, i'm marking this critical.

#1

Jeremy - April 3, 2007 - 21:01

According to the PHP manual:

The length parameter became optional in PHP 4.2.0. Until PHP 4.3.0, omitting it would assume 1024 as the line length. If the majority of the lines in the file are all larger than 8KB, it is more resource efficient for your script to specify the maximum line length.

So dropping the 8192 would actually make things worse for pre-4.3.0 versions of PHP. Can fgets support larger than 8192 bytes? Can we just increase the length value, or do we need to add looping logic to handle lines longer than 8192 bytes?

#2

sun - April 3, 2007 - 22:13
Title:dba_run_script() needlessly truncates lines to 8192 bytes -- breaks DB restores» dba_run_script() truncates lines to 8192 bytes

I can see only two options here:

  1. Increase the length argument; someone seems to have it successfully increased to 16777216 (16 MB chunks).
  2. Use fread() or file_get_contents() to read the whole file, handle each line separately with a preg_match_all('/.+\n/', $contents) or similar afterwards.

I'd vote for 2., because that variant eliminates any possible line length problems.

#3

dww - April 3, 2007 - 23:28

yeah, i'm more concerned about correctness than speed... no one said the "run script" interface to DBA should be relied on for anything truly performance sensitive. ;)

however, reading the *entire* script into RAM via file_get_contents() could just create other problems: namely, running out of RAM in php, since the script could be huge (think restoring an entire site from a DB dump).

so, perhaps the right solution is to make the code still use the fgets(), but make this code smarter:

          if ($line && strncmp($line, '--', 2) && strncmp($line, '#', 1)) {
            $query .= $line;
            if (strpos($line, ';')) {
              if (db_query($query, FALSE)) {
                if ($edit['verbose']) {
                  drupal_set_message(check_plain($query));
                }
                $count++;
              }
              else {
                drupal_set_message(t('Query failed: %query', array('%query' => theme('placeholder', $query))), 'error');
              }
              $query = NULL;
            }

the problem here is the if (strpos($line, ';')) { -- that's just wrong. ;) if you serialize an array and insert that into the DB, there's going to be ';' all over the place in your data. similarly, node content containing ';' will trigger that, too. :( what we need is to make sure the ';' comes at the *end* of our line for it to count. even then, if you were really unlucky and an escaped ; happened to be byte 8192 in the file, we'd still get it wrong. i'm not entirely positive what to do about that particular edge case. :( seems like we'd have to make sure we were *not* on byte 8192 when we're looking at the ending ';', and if we are, we still read another "line" (upto 8192 bytes) and append that.

perhaps there's a more elegant solution that a) doesn't involve running out of RAM and b) ensures we get an entire SQL statement but no more. i'm just not familiar enough with all the various php string handling functions to have a good proposal off the top of my head.

 
 

Drupal is a registered trademark of Dries Buytaert.