Off-by-one error in drupal_substr() results in "undefined index" error in includes/unicode.inc line 516

pillarsdotnet - October 9, 2008 - 23:38
Project:Drupal
Version:7.x-dev
Component:base system
Category:bug report
Priority:normal
Assigned:Unassigned
Status:duplicate
Description

The attached patch fixes an off-by-one error in the drupal_substr() function of the includes/unicode.inc file.

The original code looks like this:

<?php
    $strlen
= strlen($text);
// ... skipping ...
     
$bytes = $istart; $chars = 0;
      while (
$bytes < $strlen && $chars < $length) {
       
$bytes++;
       
$c = ord($text[$bytes]);
        if (
$c < 0x80 || $c >= 0xC0) {
         
$chars++;
        }
      }
?>

Notice that on the last iteration of the loop, it will try to extract $test[$bytes] where the recently-incremented $bytes value is equal to the length of the $text string. This results in an "undefined index" error, as the element indexes range from [0 .. strlen()-1], not from [1 .. strlen()] as the code would imply.

My corrected version follows:

<?php
    $strlen
= strlen($text);
// ... skipping ...
     
$bytes = $istart+1; $chars = 0;
      while (
$bytes < $strlen && $chars < $length) {
       
$c = ord($text[$bytes]);
       
$bytes++;
        if (
$c < 0x80 || $c >= 0xC0) {
         
$chars++;
        }
      }
?>

Trivial two-line patch attached, created via "cvs diff" from a current checkout of the DRUPAL-6 branch.

AttachmentSize
includes-unicode.inc-off-by-one-fix.diff823 bytes

#1

Dave Reid - October 9, 2008 - 23:41

Can you provide a case that I can use to test this error?

#2

Damien Tournoud - October 9, 2008 - 23:45

Can you provide a case that I can use to test this error?

Better yet, we should have an unit test for this function :)

Related issue: #276453: Tests needed: unicode.inc

#3

pillarsdotnet - October 10, 2008 - 05:26

Dunno about testcases, but on the two Drupal-6 installations I have, it comes up all the time. My experience is perhaps atypical because I use a custom-built php binary that doesn't have the mbstring extension installed.

Along those lines, I'd like to ask: is the extension really worth it, performance-wise? I suppose that having binary code would be quicker than looping character-by-character in PHP, but my installation is memory-starved, and the extension pulls in so many dependency libraries that it triples the size of the resultant (statically-compiled) PHP binary.

#4

Damien Tournoud - December 30, 2008 - 02:51
Version:6.x-dev» 7.x-dev
Status:needs review» needs work

The patch is wrong... the real range is [0 .. strlen($text) - 1], no need to start at 1.

#5

pillarsdotnet - December 30, 2008 - 03:43

@Damien "The patch is wrong ..."

You are correct, and after a long battle trying to make the code work as advertised without generating warnings, I gave up and recompiled PHP with mbstring support.

#6

Damien Tournoud - December 30, 2008 - 14:46
Status:needs work» duplicate

This is now superseded by #352359: Unit test unicode.inc.

 
 

Drupal is a registered trademark of Dries Buytaert.