Here's another code cleanup for parse_size() as found in common.inc.

It adds support for both kilo(1000) and kibibyte(1024) and sizes up to the infamous yobi/yottabyte.

Since there is still no common.test available* to diff to, I'll post the test for it here:

class CommonParseSizeTestCase extends DrupalWebTestCase {

  /**
   * Implementation of getInfo().
   */
  function getInfo() {
    return array(
      'name' => t('Parse size test'),
      'description' => t('Parse a predefined byte formatted string and compare the integer output with the expected value.'),
      'group' => t('System')
    );
  }

  /**
   * Implementation of setUp().
   */
  function setUp() {
    $this->test_cases = array(
      '1 byte' => 1, // byte
      '1 KB'   => 1000, // kilobyte
      '1 MB'   => 1000000, // megabyte
      '1 GB'   => 1000000000, // gigabyte
      '1 TB'   => 1000000000000, // terabyte
      '1 PB'   => 1000000000000000, // petabyte
      '1 EB'   => 1000000000000000000, // exabyte
      '1 ZB'   => 1000000000000000000000, // zettabyte
      '1 YB'   => 1000000000000000000000000, // yottabyte
      '1 KiB'  => 1024, // kibibyte
      '1 MiB'  => 1024 * 1024, // mebibyte
      '1 GiB'  => 1024 * 1024 * 1024, // gibibyte
      '1 TiB'  => 1024 * 1024 * 1024 * 1024, // tebibyte
      '1 PiB'  => 1024 * 1024 * 1024 * 1024 * 1024, // pebibyte
      '1 EiB'  => 1024 * 1024 * 1024 * 1024 * 1024 * 1024, // exbibyte
      '1 ZiB'  => 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024, // zebibyte
      '1 YiB'  => 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024, // yobibyte
      '2'      => 2, // bytes
      '23kIB'  => 23 * 1024, // kibibyte
      '17  kilobyte'   => 17 * 1000, // kilobyte
      '1900  kibibyte' => 1900 * 1024, // kibibyte
      '12.2 gigabyte'  => 12 * 1000 * 1000 * 1000, // kilobyte
      ' 7 GbYtes'      => 7 * 1000 * 1000 * 1000, // kilobyte
    );
    parent::setUp();
  }

  /**
   * testCommonFormatSize
   */
  function testCommonParseSize() {
    foreach ($this->test_cases as $formatted_size => $bytes) {
      $this->assertTrue(
        ($result = parse_size($formatted_size)) == $bytes,
        $formatted_size . ' == ' . $result . ' bytes (' . $bytes . ') %s'
      );
    }
  }
}

* http://drupal.org/node/265563, http://drupal.org/node/151902 and http://drupal.org/node/265548

Comments

boombatower’s picture

Category: feature » bug
StatusFileSize
new1.62 KB

I came to the same realization when writing #295697: Increase PHP Memory Limit for Simpletest.

The following does not work as it would seem it should:

echo format_size(parse_size('32MB'));

doesn't output 32MB even though that would be expected.

I'm not sure we want to change it to support KiB as that is much less common then KB for 1024 bytes.

Even google uses 1 KB = 1024 bytes.

I wrote the following test before finding this issue.

I think that format_size() should be modified to work with 1024 base as parse_code() does and the test should modified to take that into account. (The array of values changed)

mrharolda’s picture

Even google uses 1 KB = 1024 bytes.
True, but the first hit in Google will tell you otherwise.

There has to got to be a breakpoint in the KB vs KiB mistake somewhere and I'd say Drupal 7 is a nice candidate...

I'd really like to look into the echo format_size(parse_size('32MB')); issue, but my D7 checkout seems to be a bit borked at the moment...

Fatal error: Class 'MergeQuery' not found in /var/www/d7/includes/database/database.inc on line 434

chx’s picture

i do not think it's drupal role to decide in the 1000 vs 1024 debate... please use 1024 bytes 1kbyte.

mrharolda’s picture

@chx: Drupal shouldn't decide it, but IMHO should follow the standards and those clearly say that 1 kilobyte == 1000 bytes... (Dries already agreed on this)

Please have a look at This comment (and the rest of the issue)...

boombatower’s picture

If we go with #4, not my preference: then we need to change parse_size() to work that way.

As of now parse_size uses 1024 and format_size uses 1000. That isn't workable, rather it should be considered a bug.

So which ever solution we choose, would like some more views (possible Dries to confirm, w/e), then we can implement the change.

boombatower’s picture

The issue here is not which is "politically correct" or "scientifically correct", at least in my humble opinion, but the average computer user knows that there files come in MB and GB. They have no idea what MiB vs MB is. It would seem that MB is much more useful. Thus 1024.

Same thing with:
* operating systems showing file sizes.
* food calorie vs actual (scientific) calorie. They tried to correct with capitalization, but it can be assumed whenever talking about a calorie outside of science you are talking about a food calorie.

I think the best solution would be the one that is more useful for "web users" in general. As google did.

beeradb’s picture

I've got to agree with Boombatower on this one, being technically correct means very little if most people don't understand what you're talking about. I'd say go with 1024, even if it's not 'correct' it's a lot more standard.

boombatower’s picture

So can we agree to change this to 1024? If so I'll make a patch, unless someone else does.

mrharolda’s picture

I've got to agree with Boombatower on this one, being technically correct means very little if most people don't understand what you're talking about. I'd say go with 1024, even if it's not 'correct' it's a lot more standard.

Users will never see wich standard Drupal is using (1000 or 1024), since all they'll see is '10 MB' and not the amount of bytes needed for that.

My vote is still for the 'correct' way of calculating 'byte variants' using 1000 bytes as one kilobyte (KB), but I guess I'm quite lonely in this quest... ;)

beeradb’s picture

MadHarold,

I see the point in wanting to be technically correct, but issues such as #295697: Increase PHP Memory Limit for Simpletest show how this is confusing to users. Currently that test shows that users need 33.55 megs of ram to install simpletest. In reality, setting your memory_limit to 32M is enough. This is confusing to users. I would also argue that we should go with what PHP uses, rather than try to fight our platform.

boombatower’s picture

/me thinks beeradb makes a strong case.

boombatower’s picture

Above examples was previously update memory requirement for clarity.

catch’s picture

I agree entirely with boombatower and beeradb on this - we should follow standard practice here and use 1024.

mrharolda’s picture

How about defining a define to define the size of a kilo? ;)

If Drupal decides to change the size in the future, only that define has to be changed...

I (now) agree that using 1024 bytes as a kilobyte (KB) is the way to go, but this might change in the future...

boombatower’s picture

As long as parse and format work the same that would be a major step forward. So a constant with 1024 as current value would be fine.

boombatower’s picture

Title: Replacement code for parse_size() with support for IEC and SI prefixes (+tests) » parse_size() and format_size do not use same Kilo standard

Clarify the current issue.

boombatower’s picture

Title: parse_size() and format_size do not use same Kilo standard » parse_size() and format_size() do not use same kilo standard
dries’s picture

On hindsight, I agree with boombatower and beeradb.

mrharolda’s picture

StatusFileSize
new7.56 KB

I finally had some time to rewrite both format_size() and parse_size() and created a single patch out of that.

The patch consists of:
* Updates and additions to the tests (added cross testing)
* A new constant in bootstrap.inc: define('DRUPAL_KILOBYTE', 1024);
* Updated format_size() and parse_size() to use DRUPAL_KILOBYTE
* parse_size() now is able to parse float sizes (1.23 MB) into bytes

Please review the patch and comment on these changes!

-H-

dries’s picture

Status: Needs review » Needs work

I think this is an acceptable approach.

1. I would extend the code comment (+ * The number of bytes in a kilobyte.) a bit. It is probably worth mentioning the 'competing standard' in this comment so people have a bit more context as to why they might want to change this value.

2. The regex in parse_size() could use some additional documentation as well.

3. + $size = $size / DRUPAL_KILOBYTE; // convert bytes to kilobytes (1000 bytes) -- the code comment is outdated here.

We're close now. :-)

boombatower’s picture

hmm...

echo parse_size(format_size(2000));
echo format_size(parse_size('212KB'));

Results in:

1997
212 KB
mrharolda’s picture

@boombatower:

The 2000 -> 1997 is due to the round(2) in format_size. 2000 / 1024 should be 1.953125 KB instead of 1.95 KB.

format_size(2000) = 1.95 KB
1.95 * DRUPAL_KILOBYTE = 1996.8 Bytes = 1997 Bytes

However:
1.953125 * DRUPAL_KILOBYTE = 2000 Bytes

I've changed the round() in parse_size() into ceil(), since all byte fragments will require a full byte to store it in (which will be present in the next patch).

I'll have a look at the code and comments later today and re-roll a patch for it...

alexanderpas’s picture

a big -1 to any use that designates kilobyte as 1024.
a big +1 to any use that designates kilobyte as 1000, and kibibyte as 1024.

PHP handling kilo as 1024 (if it is), is a PHP bug, and should not be fixed by drupal.

the SI prefixes should only be used in the decimal sense: kilobyte and megabyte denote one thousand bytes and one million bytes respectively, while kibibyte and mebibyte denote 1024 bytes and 1,048,576 bytes respectively. This recommendation has been adopted by SI, IEEE, CIPM, NIST, ISO/IEC and some other leading national and international standards, which now state that the prefixes k, M and G should always refer to powers of ten, even in the context of information technology. (reference: ISO/IEC IEC 80000-13:2008 )

reduced timeline:
1998:
IEC introduces unambigous prefixes for binary multiples (KiB, MiB, GiB etc.), reserving kB, MB, GB and so on for their decimal sense.

2005:
IEC prefixes are adopted by the IEEE after a two-year trial period.

2008:
NIST guidelines require use of IEC prefixes KiB, MiB ... (and not kB, MB) for binary byte multiples

p29, “The names and symbols for the prefixes corresponding to 2 10 , 2 20 , 2 30 , 2 40 , 2 50 , and 2 60 are, respectively: kibi, Ki; mebi, Mi; gibi, Gi; tebi, Ti; pebi, Pi; and exbi, Ei. Thus, for example, one kibibyte would be written: 1 KiB = 2 10 B = 1024 B, where B denotes a byte. Although these prefixes are not part of the SI, they should be used in the field of information technology to avoid the incorrect usage of the SI prefixes.”

also remember this:

decimal value binary value difference
10001 = 103 10241 = 210 2.4%
10002 = 106 10242 = 220 4.9%
10003 = 109 10243 = 230 7.4%
10004 = 1012 10244 = 240 10.0%
10005 = 1015 10245 = 250 12.6%
10006 = 1018 10246 = 260 15.3%
10007 = 1021 10247 = 270 18.1%
10008 = 1024 10248 = 280 20.9%

drupal should follow standards closely, and change when standards are updated, as the drop is always moving, AND SHOUD NOT LAG BEHIND!

also, this has a usability impact, since using the same wording with two different meanings is JUST PLAIN WRONG, and should end RIGHT NOW, Regular users don't know that the units have dual meanings, and we shouldn't continue confusing them in this way.

also "man 7 units" on your linux-box ;)

mrharolda’s picture

@alexanderpas

You could not be more right on this subject, but since PHP itself uses 1024 as a kilo, we're bound to that 'bogus standard'.

As soon as PHP uses the correct standard, Drupal only has to change DRUPAL_KILOBYTE into 1000 et voila!

If you're keen on following the correct standards, you can always change the value into 1000 (=1 kilo) yourself.

alexanderpas’s picture

http://bugs.php.net/bug.php?id=46534

please vote and comment!

boombatower’s picture

@alexanderpas: have you read the above comments? The fact that this is not the standard was discussed, but was decided against...please read above. I don't think this has to do PHP....

If you dislike this I believe all you have to do is change the constant on sites you build.

boombatower’s picture

Status: Needs work » Needs review
StatusFileSize
new7.79 KB

Addressed #20.

dries’s picture

Status: Needs review » Fixed

Committed to CVS HEAD. Thanks.

mrharolda’s picture

@boombatower (and Dries, of course!)

Thank you for addressing the remaining issues for this patch! (and Dries for reviewing and committing)

-H-

alexanderpas’s picture

if PHP fixes it's behaviour, we need to follow suit.

dave reid’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.