Off-by-one error in drupal_substr() results in "undefined index" error in includes/unicode.inc line 516
pillarsdotnet - October 9, 2008 - 23:38
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | duplicate |
Jump to:
Description
The attached patch fixes an off-by-one error in the drupal_substr() function of the includes/unicode.inc file.
The original code looks like this:
<?php
$strlen = strlen($text);
// ... skipping ...
$bytes = $istart; $chars = 0;
while ($bytes < $strlen && $chars < $length) {
$bytes++;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
?>Notice that on the last iteration of the loop, it will try to extract $test[$bytes] where the recently-incremented $bytes value is equal to the length of the $text string. This results in an "undefined index" error, as the element indexes range from [0 .. strlen()-1], not from [1 .. strlen()] as the code would imply.
My corrected version follows:
<?php
$strlen = strlen($text);
// ... skipping ...
$bytes = $istart+1; $chars = 0;
while ($bytes < $strlen && $chars < $length) {
$c = ord($text[$bytes]);
$bytes++;
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
?>Trivial two-line patch attached, created via "cvs diff" from a current checkout of the DRUPAL-6 branch.
| Attachment | Size |
|---|---|
| includes-unicode.inc-off-by-one-fix.diff | 823 bytes |

#1
Can you provide a case that I can use to test this error?
#2
Better yet, we should have an unit test for this function :)
Related issue: #276453: Tests needed: unicode.inc
#3
Dunno about testcases, but on the two Drupal-6 installations I have, it comes up all the time. My experience is perhaps atypical because I use a custom-built php binary that doesn't have the mbstring extension installed.
Along those lines, I'd like to ask: is the extension really worth it, performance-wise? I suppose that having binary code would be quicker than looping character-by-character in PHP, but my installation is memory-starved, and the extension pulls in so many dependency libraries that it triples the size of the resultant (statically-compiled) PHP binary.
#4
The patch is wrong... the real range is [0 .. strlen($text) - 1], no need to start at 1.
#5
@Damien "The patch is wrong ..."
You are correct, and after a long battle trying to make the code work as advertised without generating warnings, I gave up and recompiled PHP with mbstring support.
#6
This is now superseded by #352359: Unit test unicode.inc.