Problem/Motivation
(From #1086098: D6 JSON is total garbage - Create RFC 4627 compliant HTML safe JSON by @Heine)
Background:
We are dealing with a number of circumstances:
- At least two different parsers may process the JSON:
- HTML parser
- JSON parser
- Clients we have no control over need to interpret the JSON: jquery versions, iphone apps, .NET classes, world+dog
This means we need to produce RFC4627 compliant JSON that, at the same time, will not have certain 'special characters' such as ', ", <, > and & be interpreted by an HTML parser.
As RFC4627-2.5 clearly states that "Any character may be escaped", we can avoid special treatment of characters ', ", <, > and & by an HTML parser through simple substitution with a Unicode escape sequence (\uXXXX).
A number of characters MUST be escaped for the JSON parser. These are:
- "
- \
- U+0000 - U+001F
For a number of characters (eg "), it is possible to choose the escape form and either precede them with a backslash (JSON Escape Sequence, eg \"), or use the \uXXXX (Unicode escape sequence, eg \u0027) form. HOWEVER, because we also need to deal with the HTML parser, _it_ may interpret the quote before the the JSON parser even has the chance to run. Because of this, it is advisable to use the Unicode escape sequence (\uXXXX) here as well.
Proposed resolution
In PHP 5.3, compliant JSON that properly encodes HTML special characters can be generated with:
json_encode($data, JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP);
However, in PHP 5.2, json_encode()
does not support these options. So in 5.3 we use the provided PHP function, but in 5.2 we create our own helper to return properly encoded JSON according to the spec.
Remaining tasks
- Test functionality of latest patches on devices which require compliant JSON.
- Drupal 8 (PHP 5.3 only): #479368-197: D7: Create RFC compliant HTML safe JSON
- Drupal 7 (PHP 5.2 and 5.3): #479368-199: D7: Create RFC compliant HTML safe JSON
- Patch for D6 backport is in #1086098: D6 JSON is total garbage - Create RFC 4627 compliant HTML safe JSON
- Review by Heine if possible.
User interface changes
None.
API changes
None.
Original report by @Steve McKenzie
I discovered this when trying to debug when an objective-c parser was not parsing my data.
return '"'. str_replace(array("\r", "\n", "<", ">", "&"),
array('\r', '\n', '\x3c', '\x3e', '\x26'),
addslashes($var)) .'"';
in drupal_to_js is the issue.
im not someone that knows much about encoding but after showing this to people in freenode #iphonedev, they blamed that line.
so i switched and used json_encode and what do you know, it works fine now.
im too tired to give more details right now. this seems like enough anyways.
on another note, its probably about time drupal changes that name to drupal_to_json OR just drop the damn function entirely in favor of requiring json_encode / json_decode in php 5.2.x but obviously we're not discussing that here.
Comment | File | Size | Author |
---|---|---|---|
#199 | json-encode-d7-479368-199.patch | 7.61 KB | xjm |
#197 | json-encode-d8-479368-197.patch | 3.8 KB | xjm |
#193 | json-encode-d8-479368-193.patch | 4.55 KB | xjm |
#190 | json-encode-479368-190-d7.patch | 7.26 KB | xjm |
#190 | interdiff-184-190.txt | 4.04 KB | xjm |
Comments
Comment #1
Anonymous (not verified) CreditAttribution: Anonymous commentedos x character palette:
Character: <
unicode: 003C
UTF8: 3C
so why the \x and lower case? i dont know much about encoding like i said.
Comment #2
Heine CreditAttribution: Heine commentedWe need to escape <>& to prevent HTML parsing issues.
Relevant RFC: http://www.ietf.org/rfc/rfc4627.txt?number=4627
json should use \uXXXX, not \xYY to escape characters.
Comment #3
Damien Tournoud CreditAttribution: Damien Tournoud commentedAn simple note, to respect the procedure: this doesn't apply to D7, which uses json_encode() already. 6.x is hence the correct version.
Comment #4
Heine CreditAttribution: Heine commented@DamZ
Still uses \x instead of \u
Comment #5
Damien Tournoud CreditAttribution: Damien Tournoud commentedIndeed.
Comment #6
Heine CreditAttribution: Heine commentedI'm having a hard time finding an authorative text on JS string encoding. Is it possible to mix \uXXX and UTF-8 in a string literal?
Comment #7
Heine CreditAttribution: Heine commentedThe \uXXX is of course transferred in UTF-8 (or whatever encoding used) and the parser deals with the rest.
Comment #8
Heine CreditAttribution: Heine commentedHere's a patch against D6 for the OP to test.
Comment #10
Damien Tournoud CreditAttribution: Damien Tournoud commentedIt's time for some level tests about that.
Comment #11
EnekoAlonso-1 CreditAttribution: EnekoAlonso-1 commentedThis is what I had to change for my iPhone app to load the JSON properly:
common.inc, line 2046 (Drupal 6):
*Note I also added a new replacement for single quotes \'
http://dev.enekoalonso.com
Comment #12
asimmonds CreditAttribution: asimmonds commentedLooks like this is needed for #653580: Upgrade to jQuery 1.4
Rerolling
Comment #13
mfer CreditAttribution: mfer commentedMarking this as critical since it causing issues with jQuery 1.4 testing. We have a short time table to get jQuery 1.4 in before the first alpha. The plan is to release jQuery 1.4 the day before the D7 alpha so we are doing testing early in preparation.
Comment #14
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis makes total sense.
Comment #15
webchickDoes make sense. But can we get some tests for this?
Comment #16
asimmonds CreditAttribution: asimmonds commentedjson_decode() appears to unescape the HTML-safe characters itself (on PHP 5.2.12), so I've removed the older unescaping from drupal_json_decode(). This probably needs testing on other PHP versions.
The only test I could think of to add to the suite was to test for the escaped characters in the JSON encoded string. All the other unit tests for the JSON functions appear to pick up everything else.
Any more suggestions for the tests?
Comment #17
Damien Tournoud CreditAttribution: Damien Tournoud commentedAgreed this is the only additional test required.
Comment #18
webchickGreat. Thanks, folks! :D
Committed to HEAD.
Moving down to 6.x.
Comment #19
asimmonds CreditAttribution: asimmonds commentedRemoving needs tests tag
Comment #20
burningdog CreditAttribution: burningdog commentedRolled a patch for D6 based on #11. It adds some characters to correctly escape in JSON in drupal_to_js(). I'd found out that JSON Server was returning un-parseable JSON because it was escaping apostrophe's, which is incorrect.
There's a handy JSON tester online to check correct formatting, which might be helpful in this thread: http://www.jsonlint.com/
Comment #21
Heine CreditAttribution: Heine commentedThis stil escapes < etc with \x3c etc. What is \' supposed to replace?
Comment #22
burningdog CreditAttribution: burningdog commentedI'm assuming we're sticking with drupal_to_js() in D6, and not simply changing it to:
??
Comment #23
burningdog CreditAttribution: burningdog commented@Heine: \' is supposed the replace the backslash and leave the apostrophe in place.
Comment #24
burningdog CreditAttribution: burningdog commentedComment #25
Heine CreditAttribution: Heine commentedWe cannot switch to json_encode as D6 should support pre 5.2 PHP versions.
Your single quote in the replace array has no corresponding entry in the search array of str_replace.
Comment #26
burningdog CreditAttribution: burningdog commentedThanks Heine. Re-rolled.
Comment #27
Heine CreditAttribution: Heine commentedExcellent!
I think we should drop the use of addslashes, and use addcslashes (or expand the str_replace) with the defined escape list of RFC 4627 :
Comment #28
burningdog CreditAttribution: burningdog commentedI vote for str_replace() as it's pretty easy to see what's going with it. Patch attached.
Comment #29
burningdog CreditAttribution: burningdog commentedWait, this one is slightly better.
Comment #30
burningdog CreditAttribution: burningdog commentedComment #31
c960657 CreditAttribution: c960657 commentedYou need to write \b, \f, etc. in double quotes – otherwise you are searching for the literal string (e.g.
\t
) instead of the character it represents (a tab character).When called with the string
\
, the output is"\u005C\u005C"
. It should be"\u005C"
or simply"\\"
.When called with \0 (i.e. a string containing one character with the ASCII value 0), the output is
"\u005C0"
. It should be simply \0 enclosed in quotes (not the literal string\0
, but the character with the ASCII value 0), or escaped as"\u0000"
.I think addcslashes() is better suited for this task.
Comment #32
Heine CreditAttribution: Heine commented\x00 is not valid in JSON; The quotation mark, reverse solidus, and the control characters (U+0000 through U+001F: \x00 through \x1f) must be escaped.
As \uXXXX escaping is always allowed, I'd vote to ditch add(c)slashes and use only one escape method.
Comment #33
burningdog CreditAttribution: burningdog commentedOk, I've removed addslashes and now handle everything with str_replace. The function now handles all characters from \x00 to \x1f.
I've also written a way to test that it's all working, by testing the output of drupal_to_js() against json_encode() and printing an error if the outputs don't match. Like so:
Running these all through my version of drupal_to_js() doesn't generate any errors. Patch attached.
Comment #34
burningdog CreditAttribution: burningdog commentedDamn - patched against my already patched function. Here's the correct patch.
Comment #35
burningdog CreditAttribution: burningdog commentedFrom #2, need to escape <>& to prevent HTML parsing issues.
This now gives output from the testing function at #33 like so:
...which is to be expected since we're escaping < > &.
Comment #36
bluetegu CreditAttribution: bluetegu commenteddrupal_to_js failed on my site due to line Separator (U+2028). Is this related? Should we remove similar UTF-8 separators as well?
Comment #37
burningdog CreditAttribution: burningdog commentedWhat do you mean, "failed"? As in the function failed with an error, or that drupal_to_js produced incorrect JSON? I added
$var[] = "\u2028";
to my testing function from #33 and drupal_to_js (the patched version) produces the same output as json_encode.Comment #38
burningdog CreditAttribution: burningdog commentedI've modified my drupal_to_js testing code from #33 to use the same escaping that D7 is using with json_encode. It generates no errors now, including the \u2028 unicode character.
Comment #39
Heine CreditAttribution: Heine commented\b - \t are already included in the \x00 to \x1f control characters.
There's no need to replace "\'", as we just want to turn the single quote into the proper \u escape sequence, not the reverse solidus followed by a quote.
Comment #40
bluetegu CreditAttribution: bluetegu commentedRoger #37 Here is what I mean by fail: I use a JQuery ajax call and return data via drupal_to_js. The data is an array that includes several HTML regions (right-sidebar and content). Everything worked fine, until part of the data (a teaser of a post) included the UTF-8 line separator. It caused the ajax call to fail with parsererror error.
Comment #41
bluetegu CreditAttribution: bluetegu commentedIn my case the following one line addition to drupal_to_js works:
Taken from http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_...
Comment #42
burningdog CreditAttribution: burningdog commentedThanks for the feedback, Heine - the single quote is now replaced with \u0027 and I've taken out replacements for \f - \t. Patch attached.
bluetegu, you're saying that in your case drupal_to_js returns a \u2028 character which ajax then couldn't parse? I think that problem lies in the ajax, rather than in drupal_to_js. See the JSON rfc at http://www.ietf.org/rfc/rfc4627.txt - particularly the bit which says, "Insignificant whitespace is allowed before or after any of the six structural characters" and then it does on to list allowed whitespace characters, including Line feed/New line/Carriage return.
Try the attached patch and see if this solves your problem?
Comment #43
bluetegu CreditAttribution: bluetegu commentedRoger, the patch added multiple unreadable characters to the HTML rendering through ajax I'm doing. So something is wrong here. I guess it replaced some white spaces I had to UTF-8. It didn't help with the problem I had either - but this should be expected as the patch doesn't remove the U2028 characters. I'm too tired to think straight so I'll look again on this tomorrow.
Comment #44
burningdog CreditAttribution: burningdog commentedTry running it through this function instead and see if it helps (this is how D7 does it):
Comment #45
bluetegu CreditAttribution: bluetegu commentedHi Roger. #44 works for me. I'll use this instead of my current patch #41. Many thanks.
Comment #46
burningdog CreditAttribution: burningdog commentedThat's great :) Does running it through the patched drupal_to_js (from #42) work too?
Comment #47
bluetegu CreditAttribution: bluetegu commentedNo, unfortunately it doesn't. See #43.
Looking again at the patch at #42 I think I spotted a typo: You replace "\x0a" with '\u000e' instead of '\u000a'. Once I fixed it, I no longer have the behavior I describe at #43.
Comment #48
burningdog CreditAttribution: burningdog commentedAh, thanks - I've changed that character. Try attached patch again, please.
Comment #49
c960657 CreditAttribution: c960657 commentedYou missed \x09 and \x19.
In addition to the characters encoded by json_encode(), the function now also encodes
< > & " / '
, but the Doxygen comment only mentions the three former characters. I don't think we should encode more characters in D6 than in drupal_json_encode() D7. If the three latter characters are considered “unsafe”, they should be encoded in D7 as well.The last replacement pair is "\'" => 'u0027'. Shouldn't this be "'" => '\u0027', i.e. skip the unnecessary backslash in the first array, and add a missing backslash in the second?
Comment #50
bluetegu CreditAttribution: bluetegu commentedRoger, it works ok for me, unless I have the separator u2028 (and possibly similar separators). For this to work, you can either use my #41 or the drupal_json_encode code per #44.
Comment #51
burningdog CreditAttribution: burningdog commented"You missed \x09 and \x19."
Actually, they don't exist - they use octal notation (I think?) which means that after 8 the next 'digit' is "a".
"I don't think we should encode more characters in D6 than in drupal_json_encode() D7."
I'm encoding them here at the suggestion of Heine in #27 and #32, who knows far more about this stuff than I do.
"The last replacement pair is "\'" => 'u0027'. Shouldn't this be "'" => '\u0027', i.e. skip the unnecessary backslash in the first array, and add a missing backslash in the second?"
It's weird, I know, but if the replacement is '\u0027' then running "\'" through the script returns "\\u0027". Keeping the backslash out of the replacement returns \u0027
I've added in a comment to Doxygen to detail the other replaced characters, and I've put the "\'" into a more logical place in the array.
Comment #52
burningdog CreditAttribution: burningdog commentedre #50: when I run the following, I get \\u2028 as an output:
print json_encode("\u2028");
I get the same output when I do:
print drupal_to_js("\u2028");
Isn't this expected?
Comment #53
Heine CreditAttribution: Heine commented\x[0-9A-Fa-f]{1,2} are hexadecimal :)
I don't really care about what the PHP json_encode is doing, but we need to follow the RFC, and then escape & <> to have HTML compatibility.
Comment #54
burningdog CreditAttribution: burningdog commentedAh yes, hexadecimal. My brain wasn't supplying me with that word - too tired.
I've re-read the RFC to see if I've missed anything. Page 4 is the relevant page for our purposes; especially the line which tells us which characters don't need to be escaped:
This means we need to escape \x00 - \x1F, \x22 (a quotation mark "), \x5C (reverse solidus), \x2F (solidus), \x62 (backspace), \x66 (form feed), \x6E (line feed), and \x72 (carriage return). I'd missed some of those - I've added them in now (and added a line in the function comments to mention these). Hopefully I've got everything?
@bluetegu, this might catch your error?
Comment #55
burningdog CreditAttribution: burningdog commentedAnd...just when I thought the function was performing properly, I find that putting in this:
gives this:
It's late again and I'm too tired to figure out why this is messed up - help?
Comment #56
burningdog CreditAttribution: burningdog commentedFrom #54, I'd thought the following characters needed to be escaped: backspace, form feed, line feed and carriage return. That's correct - they do - and as Heine pointed out in #39, all of these are already handled in \x00 - \x1f so we can just take them out here (and, I'd done the character representation incorrectly - a backspace is \b or \x5C\x62 - I'd represented it as just \x62 - which is why "The quick brown fox jumped over the lazy dog" got garbled).
So, the new lines should look like this:
Comment #57
Heine CreditAttribution: Heine commented"\'" should be "'"
The replacement '\\\\' will work, but '\u005C' would be more consistent.
You forgot \x09, \x10 and \x19 and the corresponding replacements.
Comment #58
burningdog CreditAttribution: burningdog commentedI've added in \x09 and \x19 (\x10 was already there ;)
Have changed the replacement '\\\\' to '\u005C' for consistency.
As for "\'"...that's what led me to this thread in the first place - I had a node title which wasn't having correct JSON return by drupal_to_json: "This isn't a node title" was returning "This isn\'t a node title" because of the addslashes function. Since we've taken out addslashes() there's no need to replace "\'" with anything.
So
'
stays'
and\'
becomes\u005C'
- which is what we want.Yes?
Comment #59
burningdog CreditAttribution: burningdog commentedThe test status for the last patch is "postponed" - how can I get the testing server to test it?
Comment #60
mcarbone CreditAttribution: mcarbone commentedWe definitely need to escape:
Next Line (\u0085)
Line Separator (\u2028)
Paragraph Separator (\u2029)
Note that this wasn't caught by your tests above because you can't test for these by adding "\u2028" -- you have to use "\xE2\x80\xA8".
I found this issue because I copied and pasted a Paragraph Separator into a CCK label unknowingly, which broke AJAX in views (see #682778: Error: http://mysite.org/admin/build/views/ajax/add-item/comm_sites/default/field.).
I've rerolled the patch with these additions.
Comment #61
mcarbone CreditAttribution: mcarbone commentedRe-submitted without the extra line I added by mistake.
Comment #62
sunQuite ugly + unmaintainable. Can we use strtr() with array notion instead of str_replace() ?
Comment #63
mcarbone CreditAttribution: mcarbone commentedComment #64
mcarbone CreditAttribution: mcarbone commentedHere's a hopefully cleaner version.
Comment #65
dooug CreditAttribution: dooug commentedUsed patch from #64 on Drupal 6.17, seems to have cleared up the issues I was having (I believe with ampersand character in the content being converted to json.)
Comment #66
matkeane CreditAttribution: matkeane commentedHi,
The patch from #64 also works for me on a server running D6.17 and PHP 5.1.6. Marking this as RTBC as we now have two reports of the patch working.
The bug is a bit of a show-stopper for anybody not able to roll their own function using the PHP json_encode function in 5.2, so it would be great to see this patch committed soon (unless anybody spots a problem with it, of course) - running sites on patched versions of core makes me nervous!
Comment #67
sunNeeds RTBC by Heine or c960657.
Comment #68
c960657 CreditAttribution: c960657 commentedWhy? AFAICT this character is not mentioned in ECMA-262, contrary to the two other that you mention.
Isn't
html_entity_decode('&#'. $i .';', ENT_NOQUOTES, 'UTF-8')
equivalent to justchr($i)
when $i < 127 ?+ $replace_pairs
How about saving storing $replace_pairs in a static variable to prevent this for loop from running in each call?
Comment #69
mcarbone CreditAttribution: mcarbone commentedWeird -- it's been six weeks since I wrote the patch, but I seem to remember trying chr(). Well, in any case, it's there now and it seems to be working fine.
Removed the Next Line character and added static variable usage.
Comment #70
c960657 CreditAttribution: c960657 commentedThe default is never used. The convention is to not specify a default and then check using
if (!isset($foo)) {
.Single quote is not escaped.
I suggest you put all except the last two keys (and the missing key for the single quote) in single quotes.
BTW I compared the performance of str_replace() and strtr(), and in this case strtr() seems to be much faster (i.e. it should not be changed to str_replace() in #561422: Replace strtr() with str_replace() for db prefixing):
Comment #71
mcarbone CreditAttribution: mcarbone commented"The default is never used."
Are you sure about that? I was following the example of drupal_get_schema in the same file (http://api.drupal.org/api/function/drupal_get_schema/6). But a default is not needed here, so I made the change anyway.
Single quote added.
Comment #72
realityloopThis works as I would expect.
Comment #73
Gábor HojtsyLooking for more testing feedback!
Comment #74
realityloopGábor, are you looking for more feedback from me.. or feedback from more testers?
Comment #75
Gábor HojtsyMore testing from others, since it looks like a pretty substantial change.
Comment #76
webchickAdditionally, when providing testing results, something more substantial than "This works as I would expect." is generally helpful for building confidence in core committers regarding the level of testing. What, specifically, were you seeing your site do before that it's not doing now? How did you explicitly go out of your way to break it that failed? What sort of kooky/unusual set-up did you try the patch on where it did not break? etc.
Comment #77
realityloopSorry webchick, here is an expanded report.
I applied the patch then installed jQuery Update with http://drupal.org/node/775924#comment-3274706 but made it always use jQuery 1.4.2
Tested draggable sort on manage fields section of edit content types, tested views and panels edit screens by adding sorting and removing fields etc.
I also tested views and panels with Uniform enabled http://drupal.org/project/uniform
I also did the above tests with vanilla D6 jQuery version and eveything worked as I would expect
Comment #78
matkeane CreditAttribution: matkeane commentedOn a server running D6.19 and PHP 5.1.6. (so without the PHP json_encode function in PHP 5.2) supplying a JSON feed to an iphone app, the results go something like this:
Before: Plain vanilla D6.19
Example JSON output:
{ "umm": { "item": { "id": "10092", "language": "fr", "url": "http://example.com/fr/node/10092", "title": "\x3cp\x3eUn miraculé de Lourdes\x3c/p\x3e\n", etc
Result: iphone app developer sends emails along the lines of 'what's all this garbage in the feed, I can't work with this!' and storms off in a huff!
After: Patched D6.19 with the patch in #71
Example JSON output:
{ "umm": { "item": { "id": "10092", "language": "fr", "url": "http:\u002f\u002fexample.com\u002ffr\u002fnode\u002f10092", "title": "\u003cp\u003eUn miraculé de Lourdes\u003c\u002fp\u003e\u000a", etc
Result: Less disgruntled iphone app developer!
As well as encoding tags in \uXXX format as opposed to the problematic \x format, the slashes in the url item are encoded, which didn't happen before applying the patch (but that's just an observation, not a problem, as far as I can see).
Comment #79
klonossubscribing...
Comment #80
srobert72 CreditAttribution: srobert72 commentedsubscribing
Comment #81
Jackinloadup CreditAttribution: Jackinloadup commentedsubscribe
Comment #82
rlnorthcuttLooking to test this and want to be clear:
1) Apply patch at #71
2) Replace stock jquery with 1.4 (or use jquery update module with 1.4)
3) Try to break
4) record results
That about right?
Comment #83
smasty CreditAttribution: smasty commented#6: do-479368-u-escape-sequences.patch queued for re-testing.
Comment #84
Déja CreditAttribution: Déja commentedI have installed the #71 patch and updated to jQuery 1.4.2 using the patched linked to at http://drupal.org/node/775924#comment-3274706. After several days of testing and use, I have yet to encounter any issues caused by this patch.
I have also written of this at http://echodittolabs.org/blog/2010/08/drupal-6x-jquery-142-new-possibili... and look forward to its inclusion in the core.
Comment #85
realityloop@rlnorthcutt
Thats a good start
Comment #86
safetypinPatch in #71 corrected the incompatibility for me in 6.19. Is this ready to be included in the next release, or can I do anything to help make it ready?
Comment #87
markus_petrux CreditAttribution: markus_petrux commentedsubscribing
Comment #88
Heine CreditAttribution: Heine commentedWe are dealing with a number of circumstances:
This means we need to produce RFC4627 compliant JSON that, at the same time, will not have certain 'special characters' such as ', ", <, > and & be interpreted by an HTML parser.
As RFC4627-2.5 clearly states that "Any character may be escaped", we can avoid special treatment of characters ', ", <, > and & by an HTML parser through simple substitution with a Unicode escape sequence (\uXXXX).
A number of characters MUST be escaped for the JSON parser. These are:
For a number of characters (eg "), it is possible to choose the escape form and either precede them with a backslash (JSON Escape Sequence, eg \"), or use the \uXXXX (Unicode escape sequence, eg \u0027) form. HOWEVER, because we also need to deal with the HTML parser, _it_ may interpret the quote before the the JSON parser even has the chance to run. Because of this, it is advisable to use the Unicode escape sequence (\uXXXX) here as well.
Right now, drupal_to_js is not RFC4627 compliant. This causes problems with a number of non-core JSON consumers.
Violations:
Additional problem:
Now, enough about the problems of the current implementation, on to the patch.
It correctly escapes U+0000 - U+001F (Ascii 0 - 31) and the potential special HTML characters.
AFAIK we do not have to escape /, U+2028 and U+2029 according to the specs, but naive JSON parsers (eval) may have trouble with it.
As there's no harm in escaping these characters, I've kept them in the revised patch. All this reroll did was to expand the U+0000 - U+001f range and add comments.
Comment #89
Heine CreditAttribution: Heine commentedNow without the static.
Comment #90
Damien Tournoud CreditAttribution: Damien Tournoud commentedBack to D7 for
\
,"
,'
and/
that we don't escape there.Comment #91
Heine CreditAttribution: Heine commentedComment #92
Heine CreditAttribution: Heine commentedjson_encode uses JSON escape sequences (\") instead of our preferred Unicode escape sequences (\uXXXX). As json_encode in PHP 5.2.x doesn't have support for the $options array, we need to ditch the function and implement json_encoding ourselves again.
Comment #93
Damien Tournoud CreditAttribution: Damien Tournoud commentedHm, any reason not to replace '"' and '\'' manually as we were previously doing?
Comment #94
Heine CreditAttribution: Heine commentedBecause we do not want to escape all quotes.
"key" : "value"
Quotes in key and value need to be escaped, but we have to leave the significant quots wrapping the key and value alone.
We can't use check_plain either, because the result of drupal_json_encode has to be consumed by pure JSON parsers.
Comment #95
Damien Tournoud CreditAttribution: Damien Tournoud commented*sigh*
Ok, so we can probably have two versions:
JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP
, postprocessing '\\\\' into '\u005c' and '\\/' into '\u002f'Are the postprocessing of \ and / really necessary, or can we get away without it?
Comment #96
Damien Tournoud CreditAttribution: Damien Tournoud commentedIf the postprocessing of \ and / is really necessary, we should open a feature request against PHP to add support for that in later versions.
Comment #97
gausarts CreditAttribution: gausarts commentedSubscribing. Thanks
Comment #98
mcarbone CreditAttribution: mcarbone commentedComment #99
realityloopThis seems to have stalled.. What still needs to be done?
Comment #100
safetypinWell, can patch #71 be applied to drupal 6 core? I've had to patch core with this, and I'd love to not have to repatch at the next update, even though this doesn't affect user-facing interfaces in my case and wouldn't take my site down.
It sounds like there's a pretty serious interpreter issue that needs to be addressed, but I don't understand the impact of the various JS interpreters. It also seems like the conversation has moved from v6 to v7. Should this thread be split off so the issue can be addressed separately for both versions?
Comment #101
Heine CreditAttribution: Heine commentedI think we can get away with not escaping \ and / as they are not special in HTML. (/ at least not when not in in the </ or /> combo, and < and > are escaped).
(patch contains a BOM, so it's also a test for the test infrastructure)
Comment #103
Heine CreditAttribution: Heine commentedLovely, json_encode options produce \uXXXX vs. \uxxxx. Adjusting both test and drupal_encode_json_helper to test and produce uppercase hexadecimals.
Comment #104
klonosThat's for d7. Right?
Comment #105
gausarts CreditAttribution: gausarts commentedTested #71 with jQuery 1.4.2.
Now I got:
instead of:
Subscribing. Thanks
Comment #106
Heine CreditAttribution: Heine commentedThe latest D6 patch for review is on #89.
The latest D7 patch for review is on #103.
Comment #107
klonosThanx Heine for letting us know.
btw, patches in #88 and #89 got accidentally named 479468 instead of 479368
Comment #108
realityloop#89 works with d6's default jquery library, but draggable handles don't appear when managing fields of content types using jQuery 1.4.2.
Comment #109
gausarts CreditAttribution: gausarts commentedJust for reference, there is a working patch for tabledrag.js at http://drupal.org/node/893538
Comment #110
Heine CreditAttribution: Heine commentedjQuery updates are NOT SUPPORTED. Random breakage due to swapping out core components is in fact to be expected.
Please focus on the JSON produced.
Comment #111
Heine CreditAttribution: Heine commentedOn instigation by chx; needs a test for drupal_json_helper as well or we lose coverage on PHP 5.3 testservers.
Comment #112
manu manu#71: 479368_drupal_to_js_all_escaped.patch queued for re-testing.
Comment #113
Heine CreditAttribution: Heine commentedPlease don't change issues from needs work to needs review - it will be missed in sprints this way.
Comment #114
eatsleepdev CreditAttribution: eatsleepdev commentednm
Comment #115
Heine CreditAttribution: Heine commentedComment #116
chaiwei CreditAttribution: chaiwei commented#71: 479368_drupal_to_js_all_escaped.patch queued for re-testing.
Comment #117
Heine CreditAttribution: Heine commentedComment #118
walidvb CreditAttribution: walidvb commentedsubscribe.
Comment #119
Heine CreditAttribution: Heine commentedPatch adds the additional test for the helper to keep coverage on PHP 5.3+
See the following comments for background.
#479368-88: D7: Create RFC compliant HTML safe JSON
#479368-92: D7: Create RFC compliant HTML safe JSON
Any followup not furthering this issue will be unpublished.
Comment #120
trungonly CreditAttribution: trungonly commented#71: 479368_drupal_to_js_all_escaped.patch queued for re-testing.
Comment #121
Heine CreditAttribution: Heine commentedComment #122
Heine CreditAttribution: Heine commentedBetter title.
This issue is most urgent for D6 and interaction with external JSON consumers:
drupal_json_encode produces JSON that risks being interpreted by HTML parsers (\" escape sequences).
drupal_to_js (D6) produces total garbage for RFC compliant parsers (iPhone, random APIs, later jQuery versions).
Comment #123
Stalski CreditAttribution: Stalski commentedWorks and looks great.
Comment #124
Heine CreditAttribution: Heine commentedCrosspost
Comment #125
Heine CreditAttribution: Heine commentedNeeds another review IMO.
Comment #127
Heine CreditAttribution: Heine commentedDrupal 6 backport with a testbot safe name.
Comment #128
mdupont+1 for the D6 patch, much needed as commonly used features are exposed to bugs that are hard to track (such as Ajax-enabled Views). I've seen U+2028 characters in text fields if copy-paste is used with some editors.
Comment #129
brad.bulger CreditAttribution: brad.bulger commentedthat's just the problem i was trying to solve, a U+2028 character breaking Views Ajax filtering in a D6 site. i applied the last patch from #127 and it just shifted the issue - instead of getting a failure on the Ajax call, there's a syntax error later on when it tries to use the response result. i had to change these lines:
to set those to blanks instead of the escaped values.
i think the gist of this may be outside the scope of this issue, but i can try to make a simpler test case if anyone would want to see it.
Comment #130
klonos...well if this translates to potential issues in the future, I'd be one of them people that'd care to see a test case (and a solution eventually).
Comment #131
TommyChris#71: 479368_drupal_to_js_all_escaped.patch queued for re-testing.
Comment #132
Heine CreditAttribution: Heine commentedI've created #1086098: D6 JSON is total garbage - Create RFC 4627 compliant HTML safe JSON as the issue is most problematic in D6 (per #122). Maybe it can get some traction there.
Comment #133
Heine CreditAttribution: Heine commentedbrad, do you know the exact location of the syntax error? This might be a problem that should be solved by not offering e2 80 a8 and friend in _this_ response in the first place.
Comment #134
brad.bulger CreditAttribution: brad.bulger commentedit's the eval() at line 155 of views/js/ajax_view.js if that's of use.
Comment #135
Heine CreditAttribution: Heine commentedYes, then the problem IMO is in the entity calling drupal_json. callback should be a valid JS callback.
Comment #136
vzlahagen CreditAttribution: vzlahagen commented#71: 479368_drupal_to_js_all_escaped.patch queued for re-testing.
Comment #137
andypostDo the d.o infrastructure have PHP 5.3 slaves? Anyway this good to go
Comment #138
pillarsdotnet CreditAttribution: pillarsdotnet commented#119: do479368-drupal_json_encode.patch queued for re-testing.
Comment #140
Tor Arne Thune CreditAttribution: Tor Arne Thune commentedComment #141
keichee CreditAttribution: keichee commentedmy patch :)
Comment #143
keichee CreditAttribution: keichee commentedokay, as per requested by the testing module, patch will just escape <, > and & only.
Comment #144
keichee CreditAttribution: keichee commentedComment #146
keichee CreditAttribution: keichee commentedadded <, >, ', " and & to simpletest test module to comply to drupal_json_encode escape function
please test attached.
Comment #148
keichee CreditAttribution: keichee commentedfine, remove that backslash for double quote
Comment #149
keichee CreditAttribution: keichee commentedComment #151
keichee CreditAttribution: keichee commented#146: drupal_json_encode.patch queued for re-testing.
Comment #152
keichee CreditAttribution: keichee commentedhmm tweaked simple test a little bit. hopefully to see green
Comment #153
keichee CreditAttribution: keichee commentedComment #154
catchSo #152 looks great but that requires 5.3 - however D8 now has a PHP 5.3 requirement so it should be fine for that.
#119 needs a re-roll for D7, no longer applies. Also looks like it needs the test changes from #152 added to it (adding '\').
Comment #155
xjmHere's that reroll of #119 for D7. (Edit: missed the changes from #152.)
Comment #156
xjmAh, the character that needs to be added is a single quote, not a slash.
Comment #157
xjmSetting to 7.x temporarily to test the D7 patch with the single quote added.
Comment #158
xjmComment #159
xjm#157 passes, so moving back to 8.x.
Comment #160
catchOh I see it's an escaped single quote, that'll teach me to review patches from my phone.
A double quoted single quote by itself is no more readable, at least I spotted something was missing I guess...
#157 I haven't properly reviewed but it looks like Heine and Damien have. One thing we should do is at least sync the json_encode arguments with D8 so they're more easily scannable:
vs.
So #152 looks RTBC for D8 to me now that we required 5.3 there, so let's get that in then it looks like at most minor changes to the D7 patch to get that ready.
Comment #161
xjmIn #152, I don't understand why the double quote is excluded from the patch. I mean, I see that earlier tests broke with it, but is there a logical reason it's excluded?
Looking over it again, I think #157 needs the single quote added to the helper test's
$html_unsafe
in addition to the change catch suggests. Also, the same question about the double quote applies there. The 5.3 call tojson_encode()
includes the double quote as an argument, but the 5.2 switch statement does not include it in the same section as< > ' &
.Comment #162
xjmTentatively back to NR until I can wrangle an answer to #161 out of someone.
Comment #163
keichee CreditAttribution: keichee commentedi just copied the code from php.net documentation
i dont know why it fails when i include it in simpletests, thats why i removed it.. i guess its a simpletest bug or something?
Comment #164
xjmLooking at the test results for the last one that failed, the test that failed was:
This means that the double quote was not properly escaped. Removing it from the test fixes that test error, but there is an underlying problem escaping it. So, I think this is NW, then.
Comment #165
xjmComment #166
xjmI see now why it is failing. The JSON-encoded string is delimited with double quotes, e.g.:
prints:
"This is my string with \u0022 \u0026 \u0027 \u003C \u003E characters"
with the double quotes at the beginning and end.
So rather than asserting that the string does not contain a double quote, it should assert that the double quote only appears at the beginning and end of the string?
Comment #167
xjmLet's try this.
Comment #169
xjmOh,
$json
is getting re-defined as an array and not just the string. We need to test the string delimiters only before this. Reusing the same var seems not-best-practice, but I guess that'd be a separate issue.Comment #170
xjmComment #171
xjmComment #173
xjm#152: drupal_json_encode.patch queued for re-testing.
Comment #174
xjmSo, the solution in #152 is now causing taxonomy autocomplete fields to fail, when it didn't as of July 3.
I'll see if #157 has the same problem now.
Comment #175
xjmComment #176
xjm#157: drupal-json-encode-479368-157.patch queued for re-testing.
Comment #177
catchA taxonomy patch just went in, switching to drupal_implode_tags().
Comment #178
xjmLooks like catch is referring to #1000736: Term reference autocomplete widget having problems handling terms with commas.
Comment #179
xjmSo I guess it is maybe failing because the str_replace() in drupal_implode_tags() matches double quotes and not the encoded variant? Do we need to patch that function as well? I don't think not encoding double quotes should be an option.
Comment #180
xjmWhen I actually test this in a browser, the patch does not seem to alter the behavior. E.g., if I have a term
foo, bar
, it is returned as an autocomplete suggestion when I typefoo
regardless of whether or not the patch is applied. So I think the conflict is with the new test for #1000736: Term reference autocomplete widget having problems handling terms with commas:Without the patch, this is the text when I run the test:
{"\"xc0wcH6G, dfzKJQXl\"":"xc0wcH6G, dfzKJQXl"}
With the patch applied, this is the result:
{"\u0022vuzR52oT, jcy3G4Sp\u0022":"vuzR52oT, jcy3G4Sp"}
So we just need to change the assertion to match
\u0022
instead of\"
.Comment #181
xjmAttached is the same as #170, but adjusts the taxonomy test to match encoded rather than escaped double quotes wrapping the term.
Comment #182
xjmComment #183
xjmJust a comment fix to match #119.
Comment #184
xjmAnd, the D7 version with support for 5.2, based on Heine's patch in #119.
I ran JSON and taxonomy form tests for this locally on both 5.2 and 5.3; the tests passed. I also tested the functionality of autocomplete fields and they autocompleted properly with and without commas in the term names.
However, assuming this passes testbot, it would be good to have some people testing it in a situation where compliant JSON is required, e.g. iPhone. A review from Heine would also be good.
For the record, the D6 version of this is in #1086098: D6 JSON is total garbage - Create RFC 4627 compliant HTML safe JSON.
Comment #185
klonosThanx Jess for your hard work on this ;)
Comment #186
xjmTagging issues not yet using summary template.
Comment #187
xjmSummary added.
Comment #188
sunShorten to: "PHP version cannot change within a request."
Let's replace the last comment with the first, and also move the second to the last, which clarifies what is actually different below 5.3, and why on earth we need that helper.
(and elsewhere) There should be a blank line between switch case statements that do not fall-through to the next. See http://drupal.org/coding-standards
parser*s*
Bad string concatenation style.
No space after "empty".
-3 days to next Drupal core point release.
Comment #189
xjmComment fix to D8 to bring it in line with @sun's comment change recommendation, plus an additional comment clarifying the use of
\u0022
in the taxonomy autocomplete test.Comment #190
xjmAnd, finally, the d7 reroll, incorporating both the comment changes from #189 and sun's comment and code style fixes from #188. Phew!
Comment #190.0
xjmUpdated issue summary.
Comment #190.1
xjmLinked latest patches in issue summary.
Comment #191
catch#189: json-encode-d8-479368-189.patch queued for re-testing.
Comment #193
xjmRerolled for core/.
Comment #194
xjmComment #195
Tor Arne Thune CreditAttribution: Tor Arne Thune commentedIs this correct for D8?
Comment #196
xjmNope. And... that's weird, and has nothing to do with the patch.
Edit: I bet
default.settings.php
was write-protected and so didn't get changed properly when I rebased. Oops! Thanks for the catch.Comment #197
xjmFixed.
Comment #198
chx CreditAttribution: chx commentedIn #94 Heine explains why we dont want a dumb string replace in any version so the D7 version is good to go functionality but I would love to banish that abomination into a helper include.
The D8 version is OK because json_encode does the right thing for the Ux0000-Ux0019 characters and the few other special chars JSON-wise and with the new options it even becomes HTML safe.
Comment #199
xjmAttached creates a new file,
json-encode.inc
. This is for D7 only.The JSON and taxonomy autocomplete tests pass locally with PHP 5.2. Setting to D7 temporarily to make sure this patch passes the full test suite for D7.
The D8 patch is still in #197.
Comment #200
xjmComment #201
chx CreditAttribution: chx commentedThis is good to go.
Comment #202
chx CreditAttribution: chx commentedI mean...
Comment #203
Dries CreditAttribution: Dries commentedI reviewed this patch and it looks good. However, it needs a quick re-roll for the 'core'-directory changes.
Comment #204
xjmSee #197. The D8 patch is available there.
Comment #204.0
xjmRemove nonsensical "only."
Comment #205
Dries CreditAttribution: Dries commentedOops, you're right xjm. Committed to 8.x and moving to 7.x for @webchick.
Comment #206
webchickHm. Seems a bit ick to make a whole separate include file for just this one function. But I guess if it was good enough for Dries, it's good enough for me. ;) Thanks for the expanded test coverage.
Committed and pushed to 7.x.
Comment #207
webchickOopsie. Marking down to D6.
Comment #208
catchD6 is in #1086098: D6 JSON is total garbage - Create RFC 4627 compliant HTML safe JSON due to having to support PHP 4.
Comment #209
klonos...minor title edit to make this clear(er) + removing the backport tag.
Comment #211
alphawebgroupBut after upgrading to D7.12, I have some problems with my ajax interface, which after ajax-reload returns html with russian textes. I use ajax_command_html for ajax output with php 5.2.17.
I've tried to use old version of the drupal_json_encode() function and it looks good...
and drupal_json_encode_helper() returns abracadabra instead of the normal readable text.
I think the problem is in drupal_json_encode_helper() function, lines 23-77...
But I'm not sure...
Comment #211.0
alphawebgroupUpdated issue summary.
Comment #212
xjm@alweb: I'd suggest opening a new issue, with detailed steps to reproduce the problem you're having. Thanks!