Problem/Motivation

(From #1086098: D6 JSON is total garbage - Create RFC 4627 compliant HTML safe JSON by @Heine)

Background:

We are dealing with a number of circumstances:

  • At least two different parsers may process the JSON:
    • HTML parser
    • JSON parser
  • Clients we have no control over need to interpret the JSON: jquery versions, iphone apps, .NET classes, world+dog

This means we need to produce RFC4627 compliant JSON that, at the same time, will not have certain 'special characters' such as ', ", <, > and & be interpreted by an HTML parser.

As RFC4627-2.5 clearly states that "Any character may be escaped", we can avoid special treatment of characters ', ", <, > and & by an HTML parser through simple substitution with a Unicode escape sequence (\uXXXX).

A number of characters MUST be escaped for the JSON parser. These are:

  • "
  • \
  • U+0000 - U+001F

For a number of characters (eg "), it is possible to choose the escape form and either precede them with a backslash (JSON Escape Sequence, eg \"), or use the \uXXXX (Unicode escape sequence, eg \u0027) form. HOWEVER, because we also need to deal with the HTML parser, _it_ may interpret the quote before the the JSON parser even has the chance to run. Because of this, it is advisable to use the Unicode escape sequence (\uXXXX) here as well.

Proposed resolution

In PHP 5.3, compliant JSON that properly encodes HTML special characters can be generated with:

json_encode($data, JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP);

However, in PHP 5.2, json_encode() does not support these options. So in 5.3 we use the provided PHP function, but in 5.2 we create our own helper to return properly encoded JSON according to the spec.

Remaining tasks

User interface changes

None.

API changes

None.

Original report by @Steve McKenzie

I discovered this when trying to debug when an objective-c parser was not parsing my data.

return '"'. str_replace(array("\r", "\n", "<", ">", "&"),
array('\r', '\n', '\x3c', '\x3e', '\x26'),
addslashes($var)) .'"';

in drupal_to_js is the issue.

im not someone that knows much about encoding but after showing this to people in freenode #iphonedev, they blamed that line.

so i switched and used json_encode and what do you know, it works fine now.

im too tired to give more details right now. this seems like enough anyways.

on another note, its probably about time drupal changes that name to drupal_to_json OR just drop the damn function entirely in favor of requiring json_encode / json_decode in php 5.2.x but obviously we're not discussing that here.

CommentFileSizeAuthor
#199 json-encode-d7-479368-199.patch7.61 KBxjm
#197 json-encode-d8-479368-197.patch3.8 KBxjm
#193 json-encode-d8-479368-193.patch4.55 KBxjm
#190 json-encode-479368-190-d7.patch7.26 KBxjm
#190 interdiff-184-190.txt4.04 KBxjm
#189 json-encode-d8-479368-189.patch3.74 KBxjm
#189 interdiff-183-189.txt1.46 KBxjm
#184 json-encode-d7-479368-184.patch6.91 KBxjm
#183 json-encode-d8-479368-183.patch3.38 KBxjm
#181 json-encode-d8-479368-181.patch3.16 KBxjm
#170 json-encode-d8-479368-170.patch2.27 KBxjm
#167 json-encode-d8-479368-167.patch2.99 KBxjm
#157 drupal-json-encode-479368-157.patch7.73 KBxjm
#155 drupal-json-encode-479368-155-d7.patch7.65 KBxjm
#152 drupal_json_encode.patch1.39 KBkeichee
#148 drupal_json_encode.patch1.41 KBkeichee
#146 drupal_json_encode.patch1.41 KBkeichee
#143 drupal_json_encode.patch595 byteskeichee
#141 drupal_json_encode.patch590 byteskeichee
#127 drupal_to_js_proper_json-D6.patch2.6 KBHeine
#119 do479368-drupal_json_encode.patch7.64 KBHeine
#103 do479368-drupal_json_encode.patch4.89 KBHeine
#101 do479368-drupal_json_encode.patch4.22 KBHeine
#91 do479468-drupal_json_encode-D7.patch3.57 KBHeine
#89 do479468-drupal_to_js-expanded-comments-nostatic.patch2.51 KBHeine
#88 do479468-drupal_to_js-expanded-comments.patch2.68 KBHeine
#71 479368_drupal_to_js_all_escaped.patch1.44 KBmcarbone
#69 479368_drupal_to_js_all_escaped.patch1.42 KBmcarbone
#64 479368_drupal_to_js_all_escaped.patch1.4 KBmcarbone
#61 479368_drupal_to_js_all_escaped.patch1.63 KBmcarbone
#60 479368_drupal_to_js_all_escaped.patch1.63 KBmcarbone
#58 479368_drupal_to_js_all_escaped.patch1.63 KBburningdog
#56 479368_drupal_to_js.patch680 bytesburningdog
#54 479368_drupal_to_js_all_escaped.patch1.75 KBburningdog
#51 479368_drupal_to_js_all_escaped.patch1.58 KBburningdog
#48 479368_drupal_to_js_all_escaped.patch1.31 KBburningdog
#42 479368_drupal_to_js_all_escaped.patch1.31 KBburningdog
#35 479368_drupal_to_js_all_escaped.patch1.38 KBburningdog
#34 479368_drupal_to_js_all_escaped.patch1.33 KBburningdog
#33 479368_drupal_to_js_all_escaped.patch1.35 KBburningdog
#29 479368_drupal_to_js_all_escaped.patch765 bytesburningdog
#28 479368_drupal_to_js_all_escaped.patch765 bytesburningdog
#26 479368_drupal_to_js.patch663 bytesburningdog
#20 479368_drupal_to_js.patch597 bytesburningdog
#16 479368_1.patch2.56 KBasimmonds
#12 479368.patch696 bytesasimmonds
#8 do-479368-u-escape-sequences_6.patch790 bytesHeine
#6 do-479368-u-escape-sequences.patch688 bytesHeine
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

os x character palette:

Character: <
unicode: 003C
UTF8: 3C

so why the \x and lower case? i dont know much about encoding like i said.

Heine’s picture

We need to escape <>& to prevent HTML parsing issues.

Relevant RFC: http://www.ietf.org/rfc/rfc4627.txt?number=4627

json should use \uXXXX, not \xYY to escape characters.

Damien Tournoud’s picture

An simple note, to respect the procedure: this doesn't apply to D7, which uses json_encode() already. 6.x is hence the correct version.

Heine’s picture

@DamZ

function drupal_to_js($var) {
  // json_encode() does not escape <, > and &, so we do it with str_replace()
  return str_replace(array("<", ">", "&"), array('\x3c', '\x3e', '\x26'), json_encode($var));
}

Still uses \x instead of \u

Damien Tournoud’s picture

Version: 6.x-dev » 7.x-dev

Indeed.

Heine’s picture

Status: Active » Needs review
FileSize
688 bytes

I'm having a hard time finding an authorative text on JS string encoding. Is it possible to mix \uXXX and UTF-8 in a string literal?

Heine’s picture

The \uXXX is of course transferred in UTF-8 (or whatever encoding used) and the parser deals with the rest.

Heine’s picture

Here's a patch against D6 for the OP to test.

Status: Needs review » Needs work

The last submitted patch failed testing.

Damien Tournoud’s picture

Issue tags: +Needs tests

It's time for some level tests about that.

EnekoAlonso-1’s picture

This is what I had to change for my iPhone app to load the JSON properly:
common.inc, line 2046 (Drupal 6):

return '"'. str_replace(array("\r", "\n", "<", ">", "&", "\'"),
    // array('\r', '\n', '\x3c', '\x3e', '\x26'), JSON Unicode fix http://drupal.org/node/479368
    array('\r', '\n', '\u003c', '\u003e', '\u0026', '\''),
    addslashes($var)) .'"';

*Note I also added a new replacement for single quotes \'

http://dev.enekoalonso.com

asimmonds’s picture

Status: Needs work » Needs review
FileSize
696 bytes

Looks like this is needed for #653580: Upgrade to jQuery 1.4

Rerolling

mfer’s picture

Priority: Normal » Critical

Marking this as critical since it causing issues with jQuery 1.4 testing. We have a short time table to get jQuery 1.4 in before the first alpha. The plan is to release jQuery 1.4 the day before the D7 alpha so we are doing testing early in preparation.

Damien Tournoud’s picture

Status: Needs review » Reviewed & tested by the community

This makes total sense.

webchick’s picture

Status: Reviewed & tested by the community » Needs work

Does make sense. But can we get some tests for this?

asimmonds’s picture

Status: Needs work » Needs review
FileSize
2.56 KB

json_decode() appears to unescape the HTML-safe characters itself (on PHP 5.2.12), so I've removed the older unescaping from drupal_json_decode(). This probably needs testing on other PHP versions.

The only test I could think of to add to the suite was to test for the escaped characters in the JSON encoded string. All the other unit tests for the JSON functions appear to pick up everything else.

Any more suggestions for the tests?

Damien Tournoud’s picture

Status: Needs review » Reviewed & tested by the community

Agreed this is the only additional test required.

webchick’s picture

Version: 7.x-dev » 6.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Great. Thanks, folks! :D

Committed to HEAD.

Moving down to 6.x.

asimmonds’s picture

Issue tags: -Needs tests

Removing needs tests tag

burningdog’s picture

FileSize
597 bytes

Rolled a patch for D6 based on #11. It adds some characters to correctly escape in JSON in drupal_to_js(). I'd found out that JSON Server was returning un-parseable JSON because it was escaping apostrophe's, which is incorrect.

There's a handy JSON tester online to check correct formatting, which might be helpful in this thread: http://www.jsonlint.com/

Heine’s picture

Status: Patch (to be ported) » Needs work
       return '"'. str_replace(array("\r", "\n", "<", ">", "&"),
-                              array('\r', '\n', '\x3c', '\x3e', '\x26'),
+                              array('\r', '\n', '\x3c', '\x3e', '\x26', '\u003c', '\u003e', '\u0026', '\''),
                               addslashes($var)) .'"';

This stil escapes < etc with \x3c etc. What is \' supposed to replace?

burningdog’s picture

Status: Needs work » Needs review

I'm assuming we're sticking with drupal_to_js() in D6, and not simply changing it to:

function drupal_to_js($var) {
  // json_encode() does not escape <, > and &, so we do it with str_replace().
  return str_replace(array('<', '>', '&'), array('\u003c', '\u003e', '\u0026'), json_encode($var));
}

??

burningdog’s picture

@Heine: \' is supposed the replace the backslash and leave the apostrophe in place.

burningdog’s picture

Status: Needs review » Needs work
Heine’s picture

We cannot switch to json_encode as D6 should support pre 5.2 PHP versions.

Your single quote in the replace array has no corresponding entry in the search array of str_replace.

burningdog’s picture

FileSize
663 bytes

Thanks Heine. Re-rolled.

Heine’s picture

Excellent!

I think we should drop the use of addslashes, and use addcslashes (or expand the str_replace) with the defined escape list of RFC 4627 :

  %x22 /          ; "    quotation mark  U+0022
  %x5C /          ; \    reverse solidus U+005C
  %x2F /          ; /    solidus         U+002F
  %x62 /          ; b    backspace       U+0008
  %x66 /          ; f    form feed       U+000C
  %x6E /          ; n    line feed       U+000A
  %x72 /          ; r    carriage return U+000D
  %x74 /          ; t    tab             U+0009
burningdog’s picture

I vote for str_replace() as it's pretty easy to see what's going with it. Patch attached.

burningdog’s picture

Wait, this one is slightly better.

burningdog’s picture

Status: Needs work » Needs review
c960657’s picture

Status: Needs review » Needs work
+      return '"'. str_replace(array('<', '>', '&', "\'", '"', '\\', '/', '\b', '\f', '\n', '\r', '\t'),
+                              array('\u003C', '\u003E', '\u0026', "'", '\u0022', '\u005C', '\u002F', '\u0008', '\u000C', '\n000A', '\u000D', '\u0009'),
                               addslashes($var)) .'"';

You need to write \b, \f, etc. in double quotes – otherwise you are searching for the literal string (e.g. \t) instead of the character it represents (a tab character).

When called with the string \, the output is "\u005C\u005C". It should be "\u005C" or simply "\\".

When called with \0 (i.e. a string containing one character with the ASCII value 0), the output is "\u005C0". It should be simply \0 enclosed in quotes (not the literal string \0, but the character with the ASCII value 0), or escaped as "\u0000".

I think addcslashes() is better suited for this task.

Heine’s picture

\x00 is not valid in JSON; The quotation mark, reverse solidus, and the control characters (U+0000 through U+001F: \x00 through \x1f) must be escaped.

As \uXXXX escaping is always allowed, I'd vote to ditch add(c)slashes and use only one escape method.

burningdog’s picture

Status: Needs work » Needs review
FileSize
1.35 KB

Ok, I've removed addslashes and now handle everything with str_replace. The function now handles all characters from \x00 to \x1f.

I've also written a way to test that it's all working, by testing the output of drupal_to_js() against json_encode() and printing an error if the outputs don't match. Like so:

<?php

$var = array();
//previous characters which were replaced in drupal_to_js() but don't need to be any more
$var[] = "<";
$var[] = ">";
$var[] = "&";

//characters which are usually handled by addslashes() http://php.net/manual/en/function.addslashes.php
$var[] = "'";
$var[] = '"';
$var[] = '\\';

//other special characters
$var[] = "\'";
$var[] = "\\";
$var[] = "/";
$var[] = "\b";
$var[] = "\f";
$var[] = "\n";
$var[] = "\r";
$var[] = "\t";

//unicode characters
$var[] = "\x00";
$var[] = "\x01";
$var[] = "\x02";
$var[] = "\x03";
$var[] = "\x04";
$var[] = "\x05";
$var[] = "\x06";
$var[] = "\x07";
$var[] = "\x08";
$var[] = "\x0a";
$var[] = "\x0b";
$var[] = "\x0c";
$var[] = "\x0d";
$var[] = "\x0e";
$var[] = "\x0f";
$var[] = "\x10";
$var[] = "\x11";
$var[] = "\x12";
$var[] = "\x13";
$var[] = "\x14";
$var[] = "\x15";
$var[] = "\x16";
$var[] = "\x17";
$var[] = "\x18";
$var[] = "\x1a";
$var[] = "\x1b";
$var[] = "\x1c";
$var[] = "\x1d";
$var[] = "\x1e";
$var[] = "\x1f";

//everything
$var[] = "<>&'\" \'\\ / \b \f \n \r \t \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x1a \x1b \x1c \x1d \x1e \x1f";

//test that drupal_to_js() matches the output of json_encode(). If it doesn't, output an error.
foreach ($var as $x) {
  if (drupal_to_js($x) != json_encode($x)) {
    print "<b>Error!</b> $x<br>";
    print 'drupal_to_js: '. drupal_to_js($x) .'<br>';
    print 'json_encode: '. json_encode($x) .'<p>';
  }
}

?>

Running these all through my version of drupal_to_js() doesn't generate any errors. Patch attached.

burningdog’s picture

Damn - patched against my already patched function. Here's the correct patch.

burningdog’s picture

From #2, need to escape <>& to prevent HTML parsing issues.

This now gives output from the testing function at #33 like so:

Error! <
drupal_to_js: "\u003c"
json_encode: "<"

Error! >
drupal_to_js: "\u003e"
json_encode: ">"

Error! &
drupal_to_js: "\u0026"
json_encode: "&"

Error! <>&'" \'\ / \b  �                           
drupal_to_js: "\u003c\u003e\u0026'\" \\'\\ \/ \\b \f \n \r \t \u0000 \u0001 \u0002 \u0003 \u0004 \u0005 \u0006 \u0007 \b \n \u000b \f \r \u000e \u000f \u0010 \u0011 \u0012 \u0013 \u0014 \u0015 \u0016 \u0017 \u0018 \u001a \u001b \u001c \u001d \u001e \u001f"
json_encode: "<>&'\" \\'\\ \/ \\b \f \n \r \t \u0000 \u0001 \u0002 \u0003 \u0004 \u0005 \u0006 \u0007 \b \n \u000b \f \r \u000e \u000f \u0010 \u0011 \u0012 \u0013 \u0014 \u0015 \u0016 \u0017 \u0018 \u001a \u001b \u001c \u001d \u001e \u001f"

...which is to be expected since we're escaping < > &.

bluetegu’s picture

drupal_to_js failed on my site due to line Separator (U+2028). Is this related? Should we remove similar UTF-8 separators as well?

burningdog’s picture

What do you mean, "failed"? As in the function failed with an error, or that drupal_to_js produced incorrect JSON? I added $var[] = "\u2028"; to my testing function from #33 and drupal_to_js (the patched version) produces the same output as json_encode.

burningdog’s picture

I've modified my drupal_to_js testing code from #33 to use the same escaping that D7 is using with json_encode. It generates no errors now, including the \u2028 unicode character.

<?php

$var = array("<", ">", "&", "'", '"', '\\', "\'", "\\", "/", "\b", "\f", "\n", "\r", "\t", "\x00", "\x01", "\x02", "\x03", "\x04", "\x05", "\x06", "\x07", "\x08", "\x0a", "\x0b", "\x0c", "\x0d", "\x0e", "\x0f", "\x10", "\x11", "\x12", "\x13", "\x14", "\x15", "\x16", "\x17", "\x18", "\x1a", "\x1b", "\x1c", "\x1d", "\x1e", "\x1f", "\u2028", "
", "<>&'\" \'\\ / \b \f \n \r \t \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x1a \x1b \x1c \x1d \x1e \x1f");

//test that drupal_to_js() matches the output of json_encode(). If it doesn't, output an error.
foreach ($var as $x) {
  $drupal = drupal_to_js($x);
  $json = str_replace(array('<', '>', '&'), array('\u003c', '\u003e', '\u0026'), json_encode($x));
  if ($drupal != $json) {
    print "<b>Error!</b> $x<br>";
    print 'drupal_to_js: '. $drupal .'<br>';
    print 'json_encode: '. $json .'<p>';
  }
}
?>
Heine’s picture

\b - \t are already included in the \x00 to \x1f control characters.

There's no need to replace "\'", as we just want to turn the single quote into the proper \u escape sequence, not the reverse solidus followed by a quote.

bluetegu’s picture

Roger #37 Here is what I mean by fail: I use a JQuery ajax call and return data via drupal_to_js. The data is an array that includes several HTML regions (right-sidebar and content). Everything worked fine, until part of the data (a teaser of a post) included the UTF-8 line separator. It caused the ajax call to fail with parsererror error.

bluetegu’s picture

In my case the following one line addition to drupal_to_js works:

$var = preg_replace( '/\p{Z}/u', ' ', $var );  // Remove separator characters
      
return '"'. str_replace(array("\r", "\n", "<", ">", "&"),
                              array('\r', '\n', '\x3c', '\x3e', '\x26'),
                              addslashes($var)) .'"';

Taken from http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_...

burningdog’s picture

Thanks for the feedback, Heine - the single quote is now replaced with \u0027 and I've taken out replacements for \f - \t. Patch attached.

bluetegu, you're saying that in your case drupal_to_js returns a \u2028 character which ajax then couldn't parse? I think that problem lies in the ajax, rather than in drupal_to_js. See the JSON rfc at http://www.ietf.org/rfc/rfc4627.txt - particularly the bit which says, "Insignificant whitespace is allowed before or after any of the six structural characters" and then it does on to list allowed whitespace characters, including Line feed/New line/Carriage return.

Try the attached patch and see if this solves your problem?

bluetegu’s picture

Roger, the patch added multiple unreadable characters to the HTML rendering through ajax I'm doing. So something is wrong here. I guess it replaced some white spaces I had to UTF-8. It didn't help with the problem I had either - but this should be expected as the patch doesn't remove the U2028 characters. I'm too tired to think straight so I'll look again on this tomorrow.

burningdog’s picture

Try running it through this function instead and see if it helps (this is how D7 does it):

<?php
function drupal_json_encode($var) {
  // json_encode() does not escape <, > and &, so we do it with str_replace().
  return str_replace(array('<', '>', '&'), array('\u003c', '\u003e', '\u0026'), json_encode($var));
}
?>
bluetegu’s picture

Hi Roger. #44 works for me. I'll use this instead of my current patch #41. Many thanks.

burningdog’s picture

That's great :) Does running it through the patched drupal_to_js (from #42) work too?

bluetegu’s picture

No, unfortunately it doesn't. See #43.
Looking again at the patch at #42 I think I spotted a typo: You replace "\x0a" with '\u000e' instead of '\u000a'. Once I fixed it, I no longer have the behavior I describe at #43.

burningdog’s picture

Ah, thanks - I've changed that character. Try attached patch again, please.

c960657’s picture

Status: Needs review » Needs work

You missed \x09 and \x19.

In addition to the characters encoded by json_encode(), the function now also encodes < > & " / ', but the Doxygen comment only mentions the three former characters. I don't think we should encode more characters in D6 than in drupal_json_encode() D7. If the three latter characters are considered “unsafe”, they should be encoded in D7 as well.

The last replacement pair is "\'" => 'u0027'. Shouldn't this be "'" => '\u0027', i.e. skip the unnecessary backslash in the first array, and add a missing backslash in the second?

bluetegu’s picture

Roger, it works ok for me, unless I have the separator u2028 (and possibly similar separators). For this to work, you can either use my #41 or the drupal_json_encode code per #44.

burningdog’s picture

Status: Needs work » Needs review
FileSize
1.58 KB

"You missed \x09 and \x19."
Actually, they don't exist - they use octal notation (I think?) which means that after 8 the next 'digit' is "a".

"I don't think we should encode more characters in D6 than in drupal_json_encode() D7."
I'm encoding them here at the suggestion of Heine in #27 and #32, who knows far more about this stuff than I do.

"The last replacement pair is "\'" => 'u0027'. Shouldn't this be "'" => '\u0027', i.e. skip the unnecessary backslash in the first array, and add a missing backslash in the second?"
It's weird, I know, but if the replacement is '\u0027' then running "\'" through the script returns "\\u0027". Keeping the backslash out of the replacement returns \u0027

I've added in a comment to Doxygen to detail the other replaced characters, and I've put the "\'" into a more logical place in the array.

burningdog’s picture

re #50: when I run the following, I get \\u2028 as an output:
print json_encode("\u2028");

I get the same output when I do:
print drupal_to_js("\u2028");

Isn't this expected?

Heine’s picture

\x[0-9A-Fa-f]{1,2} are hexadecimal :)

I don't really care about what the PHP json_encode is doing, but we need to follow the RFC, and then escape & <> to have HTML compatibility.

burningdog’s picture

Ah yes, hexadecimal. My brain wasn't supplying me with that word - too tired.

I've re-read the RFC to see if I've missed anything. Page 4 is the relevant page for our purposes; especially the line which tells us which characters don't need to be escaped:

    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

This means we need to escape \x00 - \x1F, \x22 (a quotation mark "), \x5C (reverse solidus), \x2F (solidus), \x62 (backspace), \x66 (form feed), \x6E (line feed), and \x72 (carriage return). I'd missed some of those - I've added them in now (and added a line in the function comments to mention these). Hopefully I've got everything?

@bluetegu, this might catch your error?

burningdog’s picture

And...just when I thought the function was performing properly, I find that putting in this:

print drupal_to_js("The quick brown fox jumped over the lazy dog");

gives this:

"The quick \u0008\u000Dow\u000A \u000Cox jumped ove\u000D \u0009he lazy dog"

It's late again and I'm too tired to figure out why this is messed up - help?

burningdog’s picture

FileSize
680 bytes

From #54, I'd thought the following characters needed to be escaped: backspace, form feed, line feed and carriage return. That's correct - they do - and as Heine pointed out in #39, all of these are already handled in \x00 - \x1f so we can just take them out here (and, I'd done the character representation incorrectly - a backspace is \b or \x5C\x62 - I'd represented it as just \x62 - which is why "The quick brown fox jumped over the lazy dog" got garbled).

So, the new lines should look like this:

<?php
      return '"'. str_replace(array("\\", "<", ">", "&", '"', '/', "\'", "\x00", "\x01", "\x02", "\x03", "\x04", "\x05", "\x06", "\x07", "\x08", "\x0A", "\x0B", "\x0C", "\x0D", "\x0E", "\x0F", "\x10", "\x11", "\x12", "\x13", "\x14", "\x15", "\x16", "\x17", "\x18", "\x1A", "\x1B", "\x1C", "\x1D", "\x1E", "\x1F"),
                              array('\\\\', '\u003c', '\u003e', '\u0026', '\u0022', '\u002F', 'u0027', '\u0000', '\u0001', '\u0002', '\u0003', '\u0004', '\u0005', '\u0006', '\u0007', '\u0008', '\u000A', '\u000B', '\u000C', '\u000D', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013', '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u001A', '\u001B', '\u001C', '\u001D', '\u001E', '\u001F'),
                              $var) .'"';
?>
Heine’s picture

"\'" should be "'"

The replacement '\\\\' will work, but '\u005C' would be more consistent.

You forgot \x09, \x10 and \x19 and the corresponding replacements.

burningdog’s picture

I've added in \x09 and \x19 (\x10 was already there ;)

Have changed the replacement '\\\\' to '\u005C' for consistency.

As for "\'"...that's what led me to this thread in the first place - I had a node title which wasn't having correct JSON return by drupal_to_json: "This isn't a node title" was returning "This isn\'t a node title" because of the addslashes function. Since we've taken out addslashes() there's no need to replace "\'" with anything.

So ' stays ' and \' becomes \u005C' - which is what we want.

<?php
return '"'. str_replace(array("\\", "<", ">", "&", '"', '/', "\x00", "\x01", "\x02", "\x03", "\x04", "\x05", "\x06", "\x07", "\x08", "\x09", "\x0A", "\x0B", "\x0C", "\x0D", "\x0E", "\x0F", "\x10", "\x11", "\x12", "\x13", "\x14", "\x15", "\x16", "\x17", "\x18", "\x19", "\x1A", "\x1B", "\x1C", "\x1D", "\x1E", "\x1F"),
            array('\u005C', '\u003c', '\u003e', '\u0026', '\u0022', '\u002F', '\u0000', '\u0001', '\u0002', '\u0003', '\u0004', '\u0005', '\u0006', '\u0007', '\u0008', '\u0009', '\u000A', '\u000B', '\u000C', '\u000D', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013', '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C', '\u001D', '\u001E', '\u001F'),
            $var) .'"';
?>

Yes?

burningdog’s picture

The test status for the last patch is "postponed" - how can I get the testing server to test it?

mcarbone’s picture

We definitely need to escape:

Next Line (\u0085)
Line Separator (\u2028)
Paragraph Separator (\u2029)

Note that this wasn't caught by your tests above because you can't test for these by adding "\u2028" -- you have to use "\xE2\x80\xA8".

I found this issue because I copied and pasted a Paragraph Separator into a CCK label unknowingly, which broke AJAX in views (see #682778: Error: http://mysite.org/admin/build/views/ajax/add-item/comm_sites/default/field.).

I've rerolled the patch with these additions.

mcarbone’s picture

Re-submitted without the extra line I added by mistake.

sun’s picture

Quite ugly + unmaintainable. Can we use strtr() with array notion instead of str_replace() ?

mcarbone’s picture

Assigned: Unassigned » mcarbone
Status: Needs review » Needs work
mcarbone’s picture

Status: Needs work » Needs review
FileSize
1.4 KB

Here's a hopefully cleaner version.

dooug’s picture

Used patch from #64 on Drupal 6.17, seems to have cleared up the issues I was having (I believe with ampersand character in the content being converted to json.)

matkeane’s picture

Status: Needs review » Reviewed & tested by the community

Hi,

The patch from #64 also works for me on a server running D6.17 and PHP 5.1.6. Marking this as RTBC as we now have two reports of the patch working.

The bug is a bit of a show-stopper for anybody not able to roll their own function using the PHP json_encode function in 5.2, so it would be great to see this patch committed soon (unless anybody spots a problem with it, of course) - running sites on patched versions of core makes me nervous!

sun’s picture

Status: Reviewed & tested by the community » Needs review

Needs RTBC by Heine or c960657.

c960657’s picture

Status: Needs review » Needs work

We definitely need to escape:

Next Line (\u0085)

Why? AFAICT this character is not mentioned in ECMA-262, contrary to the two other that you mention.

Isn't html_entity_decode('&#'. $i .';', ENT_NOQUOTES, 'UTF-8') equivalent to just chr($i) when $i < 127 ?

+ $replace_pairs

+      // Remaining unicode replacements.
+      for ($i = 0; $i < 32; $i++) {

How about saving storing $replace_pairs in a static variable to prevent this for loop from running in each call?

mcarbone’s picture

Status: Needs work » Needs review
FileSize
1.42 KB

Weird -- it's been six weeks since I wrote the patch, but I seem to remember trying chr(). Well, in any case, it's there now and it seems to be working fine.

Removed the Next Line character and added static variable usage.

c960657’s picture

Status: Needs review » Needs work
+      static $replace_pairs = array();
+
+      if (empty($replace_pairs)) {

The default is never used. The convention is to not specify a default and then check using if (!isset($foo)) {.

Single quote is not escaped.

+          "\\" => '\u005c',
+          "<" => '\u003c',
+          ">" => '\u003e',
+          "&" => '\u0026',
+          '"' => '\u0022',
+          '/' => '\u002f',
+          "\xe2\x80\xa8" => '\u2028', // Line Separator
+          "\xe2\x80\xa9" => '\u2029', // Paragraph Separator

I suggest you put all except the last two keys (and the missing key for the single quote) in single quotes.

BTW I compared the performance of str_replace() and strtr(), and in this case strtr() seems to be much faster (i.e. it should not be changed to str_replace() in #561422: Replace strtr() with str_replace() for db prefixing):

mcarbone’s picture

Status: Needs work » Needs review
FileSize
1.44 KB

"The default is never used."

Are you sure about that? I was following the example of drupal_get_schema in the same file (http://api.drupal.org/api/function/drupal_get_schema/6). But a default is not needed here, so I made the change anyway.

Single quote added.

realityloop’s picture

Status: Needs review » Reviewed & tested by the community

This works as I would expect.

Gábor Hojtsy’s picture

Looking for more testing feedback!

realityloop’s picture

Gábor, are you looking for more feedback from me.. or feedback from more testers?

Gábor Hojtsy’s picture

More testing from others, since it looks like a pretty substantial change.

webchick’s picture

Additionally, when providing testing results, something more substantial than "This works as I would expect." is generally helpful for building confidence in core committers regarding the level of testing. What, specifically, were you seeing your site do before that it's not doing now? How did you explicitly go out of your way to break it that failed? What sort of kooky/unusual set-up did you try the patch on where it did not break? etc.

realityloop’s picture

Sorry webchick, here is an expanded report.

I applied the patch then installed jQuery Update with http://drupal.org/node/775924#comment-3274706 but made it always use jQuery 1.4.2

Tested draggable sort on manage fields section of edit content types, tested views and panels edit screens by adding sorting and removing fields etc.

I also tested views and panels with Uniform enabled http://drupal.org/project/uniform

I also did the above tests with vanilla D6 jQuery version and eveything worked as I would expect

matkeane’s picture

On a server running D6.19 and PHP 5.1.6. (so without the PHP json_encode function in PHP 5.2) supplying a JSON feed to an iphone app, the results go something like this:

Before: Plain vanilla D6.19
Example JSON output:
{ "umm": { "item": { "id": "10092", "language": "fr", "url": "http://example.com/fr/node/10092", "title": "\x3cp\x3eUn miraculé de Lourdes\x3c/p\x3e\n", etc
Result: iphone app developer sends emails along the lines of 'what's all this garbage in the feed, I can't work with this!' and storms off in a huff!

After: Patched D6.19 with the patch in #71
Example JSON output:
{ "umm": { "item": { "id": "10092", "language": "fr", "url": "http:\u002f\u002fexample.com\u002ffr\u002fnode\u002f10092", "title": "\u003cp\u003eUn miraculé de Lourdes\u003c\u002fp\u003e\u000a", etc
Result: Less disgruntled iphone app developer!

As well as encoding tags in \uXXX format as opposed to the problematic \x format, the slashes in the url item are encoded, which didn't happen before applying the patch (but that's just an observation, not a problem, as far as I can see).

klonos’s picture

subscribing...

srobert72’s picture

subscribing

Jackinloadup’s picture

subscribe

rlnorthcutt’s picture

Looking to test this and want to be clear:

1) Apply patch at #71
2) Replace stock jquery with 1.4 (or use jquery update module with 1.4)
3) Try to break
4) record results

That about right?

smasty’s picture

#6: do-479368-u-escape-sequences.patch queued for re-testing.

Déja’s picture

I have installed the #71 patch and updated to jQuery 1.4.2 using the patched linked to at http://drupal.org/node/775924#comment-3274706. After several days of testing and use, I have yet to encounter any issues caused by this patch.

I have also written of this at http://echodittolabs.org/blog/2010/08/drupal-6x-jquery-142-new-possibili... and look forward to its inclusion in the core.

realityloop’s picture

@rlnorthcutt

Thats a good start

safetypin’s picture

Patch in #71 corrected the incompatibility for me in 6.19. Is this ready to be included in the next release, or can I do anything to help make it ready?

markus_petrux’s picture

subscribing

Heine’s picture

Priority: Critical » Major
Status: Reviewed & tested by the community » Needs review
FileSize
2.68 KB

We are dealing with a number of circumstances:

  • At least two different parsers may process the JSON (drupal_add_js setting):
    • HTML parser
    • JSON parser
  • Clients we have no control over need to interpret the JSON: jquery versions, iphone apps, .NET classes, world+dog

This means we need to produce RFC4627 compliant JSON that, at the same time, will not have certain 'special characters' such as ', ", <, > and & be interpreted by an HTML parser.

As RFC4627-2.5 clearly states that "Any character may be escaped", we can avoid special treatment of characters ', ", <, > and & by an HTML parser through simple substitution with a Unicode escape sequence (\uXXXX).

A number of characters MUST be escaped for the JSON parser. These are:

  • "
  • \
  • U+0000 - U+001F

For a number of characters (eg "), it is possible to choose the escape form and either precede them with a backslash (JSON Escape Sequence, eg \"), or use the \uXXXX (Unicode escape sequence, eg \u0027) form. HOWEVER, because we also need to deal with the HTML parser, _it_ may interpret the quote before the the JSON parser even has the chance to run. Because of this, it is advisable to use the Unicode escape sequence (\uXXXX) here as well.

Right now, drupal_to_js is not RFC4627 compliant. This causes problems with a number of non-core JSON consumers.

Violations:

  • Control characters such as U+001F are not escaped
  • It uses the undefined escape sequence \xXX, which only works on certain JSON implementations (likely, only JS eval)
  • It improperly uses the non-existing JSON escape sequence \0 for the 0 byte instead of the Unicode escape sequence \u0000

Additional problem:

  • It risks interpretation of certain characters as special by the HTML parser due to its use of addslashes (\")

Now, enough about the problems of the current implementation, on to the patch.

It correctly escapes U+0000 - U+001F (Ascii 0 - 31) and the potential special HTML characters.

AFAIK we do not have to escape /, U+2028 and U+2029 according to the specs, but naive JSON parsers (eval) may have trouble with it.

As there's no harm in escaping these characters, I've kept them in the revised patch. All this reroll did was to expand the U+0000 - U+001f range and add comments.

Heine’s picture

Now without the static.

Damien Tournoud’s picture

Version: 6.x-dev » 7.x-dev
Status: Needs review » Patch (to be ported)

Back to D7 for \, ", ' and / that we don't escape there.

Heine’s picture

Status: Patch (to be ported) » Needs review
FileSize
3.57 KB
Heine’s picture

json_encode uses JSON escape sequences (\") instead of our preferred Unicode escape sequences (\uXXXX). As json_encode in PHP 5.2.x doesn't have support for the $options array, we need to ditch the function and implement json_encoding ourselves again.

Damien Tournoud’s picture

Hm, any reason not to replace '"' and '\'' manually as we were previously doing?

Heine’s picture

Because we do not want to escape all quotes.

"key" : "value"

Quotes in key and value need to be escaped, but we have to leave the significant quots wrapping the key and value alone.

We can't use check_plain either, because the result of drupal_json_encode has to be consumed by pure JSON parsers.

Damien Tournoud’s picture

*sigh*

Ok, so we can probably have two versions:

  • On PHP >= 5.3.0, use json_encode() with JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP, postprocessing '\\\\' into '\u005c' and '\\/' into '\u002f'
  • Otherwise use our implementation

Are the postprocessing of \ and / really necessary, or can we get away without it?

Damien Tournoud’s picture

If the postprocessing of \ and / is really necessary, we should open a feature request against PHP to add support for that in later versions.

gausarts’s picture

Subscribing. Thanks

mcarbone’s picture

Assigned: mcarbone » Unassigned
realityloop’s picture

This seems to have stalled.. What still needs to be done?

safetypin’s picture

Well, can patch #71 be applied to drupal 6 core? I've had to patch core with this, and I'd love to not have to repatch at the next update, even though this doesn't affect user-facing interfaces in my case and wouldn't take my site down.

It sounds like there's a pretty serious interpreter issue that needs to be addressed, but I don't understand the impact of the various JS interpreters. It also seems like the conversation has moved from v6 to v7. Should this thread be split off so the issue can be addressed separately for both versions?

Heine’s picture

I think we can get away with not escaping \ and / as they are not special in HTML. (/ at least not when not in in the </ or /> combo, and < and > are escaped).

(patch contains a BOM, so it's also a test for the test infrastructure)

Status: Needs review » Needs work

The last submitted patch, do479368-drupal_json_encode.patch, failed testing.

Heine’s picture

Status: Needs work » Needs review
FileSize
4.89 KB

Lovely, json_encode options produce \uXXXX vs. \uxxxx. Adjusting both test and drupal_encode_json_helper to test and produce uppercase hexadecimals.

klonos’s picture

That's for d7. Right?

gausarts’s picture

Tested #71 with jQuery 1.4.2.

Now I got:

jQuery.extend(Drupal.settings, { "basePath": "\u002f",

instead of:

jQuery.extend(Drupal.settings, { "basePath": "/",

Subscribing. Thanks

Heine’s picture

The latest D6 patch for review is on #89.
The latest D7 patch for review is on #103.

klonos’s picture

Thanx Heine for letting us know.

btw, patches in #88 and #89 got accidentally named 479468 instead of 479368

realityloop’s picture

Status: Needs review » Needs work

#89 works with d6's default jquery library, but draggable handles don't appear when managing fields of content types using jQuery 1.4.2.

gausarts’s picture

Just for reference, there is a working patch for tabledrag.js at http://drupal.org/node/893538

Heine’s picture

Status: Needs work » Needs review

jQuery updates are NOT SUPPORTED. Random breakage due to swapping out core components is in fact to be expected.

Please focus on the JSON produced.

Heine’s picture

Status: Needs review » Needs work

On instigation by chx; needs a test for drupal_json_helper as well or we lose coverage on PHP 5.3 testservers.

manu manu’s picture

Status: Needs work » Needs review
Heine’s picture

Status: Needs review » Needs work

Please don't change issues from needs work to needs review - it will be missed in sprints this way.

eatsleepdev’s picture

Status: Needs work » Needs review

nm

Heine’s picture

Status: Needs review » Needs work
chaiwei’s picture

Status: Needs work » Needs review
Heine’s picture

Status: Needs review » Needs work
walidvb’s picture

subscribe.

Heine’s picture

Status: Needs work » Needs review
FileSize
7.64 KB

Patch adds the additional test for the helper to keep coverage on PHP 5.3+

See the following comments for background.

#479368-88: D7: Create RFC compliant HTML safe JSON
#479368-92: D7: Create RFC compliant HTML safe JSON

Any followup not furthering this issue will be unpublished.

trungonly’s picture

Heine’s picture

Title: drupal_to_js provides bad unicode conversions » drupal_json_encode produces invalid JSON
Heine’s picture

Title: drupal_json_encode produces invalid JSON » Create RFC compliant HTML safe JSON

Better title.

This issue is most urgent for D6 and interaction with external JSON consumers:

drupal_json_encode produces JSON that risks being interpreted by HTML parsers (\" escape sequences).
drupal_to_js (D6) produces total garbage for RFC compliant parsers (iPhone, random APIs, later jQuery versions).

Stalski’s picture

Title: Create RFC compliant HTML safe JSON » drupal_to_js provides bad unicode conversions
Status: Needs review » Reviewed & tested by the community

Works and looks great.

Heine’s picture

Title: drupal_to_js provides bad unicode conversions » Create RFC compliant HTML safe JSON

Crosspost

Heine’s picture

Status: Reviewed & tested by the community » Needs review

Needs another review IMO.

Heine’s picture

Drupal 6 backport with a testbot safe name.

mdupont’s picture

+1 for the D6 patch, much needed as commonly used features are exposed to bugs that are hard to track (such as Ajax-enabled Views). I've seen U+2028 characters in text fields if copy-paste is used with some editors.

brad.bulger’s picture

that's just the problem i was trying to solve, a U+2028 character breaking Views Ajax filtering in a D6 site. i applied the last patch from #127 and it just shifted the issue - instead of getting a failure on the Ajax call, there's a syntax error later on when it tries to use the response result. i had to change these lines:

        // While these are allowed unescaped according to ECMA-262, section
        // 15.12.2, they cause problems in some JSON parsers.
        "\xe2\x80\xa8" => '\u2028', // U+2028, Line Separator.
        "\xe2\x80\xa9" => '\u2029', // U+2029, Paragraph Separator.

to set those to blanks instead of the escaped values.

i think the gist of this may be outside the scope of this issue, but i can try to make a simpler test case if anyone would want to see it.

klonos’s picture

...well if this translates to potential issues in the future, I'd be one of them people that'd care to see a test case (and a solution eventually).

TommyChris’s picture

Heine’s picture

I've created #1086098: D6 JSON is total garbage - Create RFC 4627 compliant HTML safe JSON as the issue is most problematic in D6 (per #122). Maybe it can get some traction there.

Heine’s picture

brad, do you know the exact location of the syntax error? This might be a problem that should be solved by not offering e2 80 a8 and friend in _this_ response in the first place.

brad.bulger’s picture

it's the eval() at line 155 of views/js/ajax_view.js if that's of use.

Heine’s picture

Yes, then the problem IMO is in the entity calling drupal_json. callback should be a valid JS callback.

vzlahagen’s picture

andypost’s picture

Do the d.o infrastructure have PHP 5.3 slaves? Anyway this good to go

pillarsdotnet’s picture

#119: do479368-drupal_json_encode.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, do479368-drupal_json_encode.patch, failed testing.

Tor Arne Thune’s picture

Version: 7.x-dev » 8.x-dev
Issue tags: +Needs backport to D7
keichee’s picture

Assigned: Unassigned » keichee
Status: Needs work » Needs review
FileSize
590 bytes

my patch :)

Status: Needs review » Needs work

The last submitted patch, drupal_json_encode_0[1].patch, failed testing.

keichee’s picture

FileSize
595 bytes

okay, as per requested by the testing module, patch will just escape <, > and & only.

keichee’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, drupal_json_encode.patch, failed testing.

keichee’s picture

Status: Needs work » Needs review
FileSize
1.41 KB

added <, >, ', " and & to simpletest test module to comply to drupal_json_encode escape function

please test attached.

Status: Needs review » Needs work

The last submitted patch, drupal_json_encode.patch, failed testing.

keichee’s picture

FileSize
1.41 KB

fine, remove that backslash for double quote

keichee’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
Issue tags: -Needs backport to D7

The last submitted patch, drupal_json_encode.patch, failed testing.

keichee’s picture

Status: Needs work » Needs review
Issue tags: +Needs backport to D7

#146: drupal_json_encode.patch queued for re-testing.

keichee’s picture

Status: Needs review » Needs work
FileSize
1.39 KB

hmm tweaked simple test a little bit. hopefully to see green

keichee’s picture

Status: Needs work » Needs review
catch’s picture

So #152 looks great but that requires 5.3 - however D8 now has a PHP 5.3 requirement so it should be fine for that.

#119 needs a re-roll for D7, no longer applies. Also looks like it needs the test changes from #152 added to it (adding '\').

xjm’s picture

Here's that reroll of #119 for D7. (Edit: missed the changes from #152.)

xjm’s picture

Ah, the character that needs to be added is a single quote, not a slash.

xjm’s picture

Setting to 7.x temporarily to test the D7 patch with the single quote added.

xjm’s picture

Version: 8.x-dev » 7.x-dev
xjm’s picture

Version: 7.x-dev » 8.x-dev

#157 passes, so moving back to 8.x.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Oh I see it's an escaped single quote, that'll teach me to review patches from my phone.

A double quoted single quote by itself is no more readable, at least I spotted something was missing I guess...

#157 I haven't properly reviewed but it looks like Heine and Damien have. One thing we should do is at least sync the json_encode arguments with D8 so they're more easily scannable:

+    return json_encode($var, JSON_HEX_QUOT | JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS);

vs.

+  return json_encode($var, JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP);

So #152 looks RTBC for D8 to me now that we required 5.3 there, so let's get that in then it looks like at most minor changes to the D7 patch to get that ready.

xjm’s picture

In #152, I don't understand why the double quote is excluded from the patch. I mean, I see that earlier tests broke with it, but is there a logical reason it's excluded?

Looking over it again, I think #157 needs the single quote added to the helper test's $html_unsafe in addition to the change catch suggests. Also, the same question about the double quote applies there. The 5.3 call to json_encode() includes the double quote as an argument, but the 5.2 switch statement does not include it in the same section as < > ' &.

xjm’s picture

Status: Reviewed & tested by the community » Needs review

Tentatively back to NR until I can wrangle an answer to #161 out of someone.

keichee’s picture

i just copied the code from php.net documentation

i dont know why it fails when i include it in simpletests, thats why i removed it.. i guess its a simpletest bug or something?

xjm’s picture

Status: Needs review » Needs work

Looking at the test results for the last one that failed, the test that failed was:

A JSON encoded string does not contain ". Other common.test 2343 DrupalJSONTest->testJSON()

This means that the double quote was not properly escaped. Removing it from the test fixes that test error, but there is an underlying problem escaping it. So, I think this is NW, then.

xjm’s picture

Assigned: keichee » xjm
xjm’s picture

I see now why it is failing. The JSON-encoded string is delimited with double quotes, e.g.:

print json_encode('This is my string with " & \' < > characters', JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP);

prints:
"This is my string with \u0022 \u0026 \u0027 \u003C \u003E characters"

with the double quotes at the beginning and end.

So rather than asserting that the string does not contain a double quote, it should assert that the double quote only appears at the beginning and end of the string?

xjm’s picture

Status: Needs work » Needs review
FileSize
2.99 KB

Let's try this.

Status: Needs review » Needs work

The last submitted patch, json-encode-d8-479368-167.patch, failed testing.

xjm’s picture

Oh, $json is getting re-defined as an array and not just the string. We need to test the string delimiters only before this. Reusing the same var seems not-best-practice, but I guess that'd be a separate issue.

xjm’s picture

xjm’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
Issue tags: -Needs backport to D7

The last submitted patch, json-encode-d8-479368-170.patch, failed testing.

xjm’s picture

Status: Needs work » Needs review
Issue tags: +Needs backport to D7

#152: drupal_json_encode.patch queued for re-testing.

xjm’s picture

So, the solution in #152 is now causing taxonomy autocomplete fields to fail, when it didn't as of July 3.

I'll see if #157 has the same problem now.

xjm’s picture

Version: 8.x-dev » 7.x-dev
xjm’s picture

catch’s picture

A taxonomy patch just went in, switching to drupal_implode_tags().

xjm’s picture

xjm’s picture

Status: Needs review » Needs work

So I guess it is maybe failing because the str_replace() in drupal_implode_tags() matches double quotes and not the encoded variant? Do we need to patch that function as well? I don't think not encoding double quotes should be an option.

xjm’s picture

When I actually test this in a browser, the patch does not seem to alter the behavior. E.g., if I have a term foo, bar, it is returned as an autocomplete suggestion when I type foo regardless of whether or not the patch is applied. So I think the conflict is with the new test for #1000736: Term reference autocomplete widget having problems handling terms with commas:

    $this->assertRaw('{"\"' . $term2->name . '\"":"' . $term2->name . '"}', t('Autocomplete returns term %term_name after typing the first 3 letters.', array('%term_name' => $term2->name)));

Without the patch, this is the text when I run the test:
{"\"xc0wcH6G, dfzKJQXl\"":"xc0wcH6G, dfzKJQXl"}

With the patch applied, this is the result:
{"\u0022vuzR52oT, jcy3G4Sp\u0022":"vuzR52oT, jcy3G4Sp"}

So we just need to change the assertion to match \u0022 instead of \".

xjm’s picture

Attached is the same as #170, but adjusts the taxonomy test to match encoded rather than escaped double quotes wrapping the term.

xjm’s picture

Version: 7.x-dev » 8.x-dev
Status: Needs work » Needs review
xjm’s picture

Just a comment fix to match #119.

xjm’s picture

Version: 8.x-dev » 7.x-dev
FileSize
6.91 KB

And, the D7 version with support for 5.2, based on Heine's patch in #119.

I ran JSON and taxonomy form tests for this locally on both 5.2 and 5.3; the tests passed. I also tested the functionality of autocomplete fields and they autocompleted properly with and without commas in the term names.

However, assuming this passes testbot, it would be good to have some people testing it in a situation where compliant JSON is required, e.g. iPhone. A review from Heine would also be good.

For the record, the D6 version of this is in #1086098: D6 JSON is total garbage - Create RFC 4627 compliant HTML safe JSON.

klonos’s picture

Thanx Jess for your hard work on this ;)

xjm’s picture

Tagging issues not yet using summary template.

xjm’s picture

Summary added.

sun’s picture

Status: Needs review » Needs work
Issue tags: -Needs backport to D7
+++ b/includes/common.inc
@@ -4797,14 +4797,120 @@ function drupal_clear_js_cache() {
+  // We do not want to use drupal_static() since the PHP version will never
+  // change during a request.

Shorten to: "PHP version cannot change within a request."

+++ b/includes/common.inc
@@ -4797,14 +4797,120 @@ function drupal_clear_js_cache() {
+  // json_encode on PHP prior to PHP 5.3.0 doesn't support options.
...
+    // json_encode() escapes <, >, ', &, and " using its options parameter.
...
+  // In PHP versions less than 5.3.0, use our helper.
+  return drupal_json_encode_helper($var);

Let's replace the last comment with the first, and also move the second to the last, which clarifies what is actually different below 5.3, and why on earth we need that helper.

+++ b/includes/common.inc
@@ -4797,14 +4797,120 @@ function drupal_clear_js_cache() {
+      return $var ? 'true' : 'false'; // Lowercase necessary!
+    case 'integer':

(and elsewhere) There should be a blank line between switch case statements that do not fall-through to the next. See http://drupal.org/coding-standards

+++ b/includes/common.inc
@@ -4797,14 +4797,120 @@ function drupal_clear_js_cache() {
+        // 15.12.2, they cause problems in some JSON parser.

parser*s*

+++ b/includes/common.inc
@@ -4797,14 +4797,120 @@ function drupal_clear_js_cache() {
+      return '"'. strtr($var, $replace_pairs) .'"';
...
+        return '[ '. implode(', ', $output) .' ]';
...
+      return '{'. implode(', ', $output) .'}';

Bad string concatenation style.

+++ b/includes/common.inc
@@ -4797,14 +4797,120 @@ function drupal_clear_js_cache() {
+      if (empty ($var) || array_keys($var) === range(0, sizeof($var) - 1)) {

No space after "empty".

-3 days to next Drupal core point release.

xjm’s picture

Version: 7.x-dev » 8.x-dev
Status: Needs work » Needs review
FileSize
1.46 KB
3.74 KB

Comment fix to D8 to bring it in line with @sun's comment change recommendation, plus an additional comment clarifying the use of \u0022 in the taxonomy autocomplete test.

xjm’s picture

And, finally, the d7 reroll, incorporating both the comment changes from #189 and sun's comment and code style fixes from #188. Phew!

xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Issue summary: View changes

Linked latest patches in issue summary.

catch’s picture

Issue tags: -Needs backport to D7

#189: json-encode-d8-479368-189.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +Needs backport to D7

The last submitted patch, json-encode-d8-479368-189.patch, failed testing.

xjm’s picture

Status: Needs work » Needs review
FileSize
4.55 KB

Rerolled for core/.

xjm’s picture

Assigned: xjm » Unassigned
Tor Arne Thune’s picture

+++ b/sites/default/default.settings.php
@@ -316,7 +316,7 @@ ini_set('session.cookie_lifetime', 2000000);
- * theme. It is located inside 'core/modules/system/maintenance-page.tpl.php'.
+ * theme. It is located inside 'modules/system/maintenance-page.tpl.php'.

Is this correct for D8?

xjm’s picture

Status: Needs review » Needs work

Nope. And... that's weird, and has nothing to do with the patch.

Edit: I bet default.settings.php was write-protected and so didn't get changed properly when I rebased. Oops! Thanks for the catch.

xjm’s picture

Status: Needs work » Needs review
FileSize
3.8 KB

Fixed.

chx’s picture

Status: Needs review » Needs work

In #94 Heine explains why we dont want a dumb string replace in any version so the D7 version is good to go functionality but I would love to banish that abomination into a helper include.

The D8 version is OK because json_encode does the right thing for the Ux0000-Ux0019 characters and the few other special chars JSON-wise and with the new options it even becomes HTML safe.

xjm’s picture

Version: 8.x-dev » 7.x-dev
Status: Needs work » Needs review
FileSize
7.61 KB

Attached creates a new file, json-encode.inc. This is for D7 only.

The JSON and taxonomy autocomplete tests pass locally with PHP 5.2. Setting to D7 temporarily to make sure this patch passes the full test suite for D7.

The D8 patch is still in #197.

xjm’s picture

Version: 7.x-dev » 8.x-dev
chx’s picture

This is good to go.

chx’s picture

Status: Needs review » Reviewed & tested by the community

I mean...

Dries’s picture

Status: Reviewed & tested by the community » Needs work

I reviewed this patch and it looks good. However, it needs a quick re-roll for the 'core'-directory changes.

xjm’s picture

Status: Needs work » Reviewed & tested by the community

See #197. The D8 patch is available there.

xjm’s picture

Issue summary: View changes

Remove nonsensical "only."

Dries’s picture

Version: 8.x-dev » 7.x-dev

Oops, you're right xjm. Committed to 8.x and moving to 7.x for @webchick.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Hm. Seems a bit ick to make a whole separate include file for just this one function. But I guess if it was good enough for Dries, it's good enough for me. ;) Thanks for the expanded test coverage.

Committed and pushed to 7.x.

webchick’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Patch (to be ported)
Issue tags: -Needs backport to D7 +Needs backport to D6

Oopsie. Marking down to D6.

catch’s picture

Version: 6.x-dev » 7.x-dev
Status: Patch (to be ported) » Fixed
klonos’s picture

Title: Create RFC compliant HTML safe JSON » D7: Create RFC compliant HTML safe JSON
Issue tags: -Needs backport to D6

...minor title edit to make this clear(er) + removing the backport tag.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

alphawebgroup’s picture

But after upgrading to D7.12, I have some problems with my ajax interface, which after ajax-reload returns html with russian textes. I use ajax_command_html for ajax output with php 5.2.17.
I've tried to use old version of the drupal_json_encode() function and it looks good...
and drupal_json_encode_helper() returns abracadabra instead of the normal readable text.
I think the problem is in drupal_json_encode_helper() function, lines 23-77...
But I'm not sure...

alphawebgroup’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

@alweb: I'd suggest opening a new issue, with detailed steps to reproduce the problem you're having. Thanks!