Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
The documentation for check_plain() says that it returns,
An HTML safe version of $text, or an empty string if $text is not valid UTF-8.
However, if $text is not valid UTF-8, PHP < 5.4 raises a warning message (and returns empty string). The warning should not be raised.
Steps to reproduce: Call check_plain() with an invalid UTF-8 string.
Comment | File | Size | Author |
---|---|---|---|
#100 | core-check_plain-393538-100.patch | 853 bytes | David_Rothstein |
#84 | core-check_plain-393538-84.patch | 750 bytes | Liam Morland |
#68 | core-check_plain-393538-68.patch | 1.92 KB | Liam Morland |
#65 | core-check_plain-393538-65.patch | 978 bytes | Liam Morland |
#62 | core-check_plain-393538-62.patch | 982 bytes | Liam Morland |
Comments
Comment #2
swentel CreditAttribution: swentel commenteddescription' +> t(''),
should be
description' => t(''),
Comment #3
dmitrig01 CreditAttribution: dmitrig01 commentedCan you tell i didn't try it?
Comment #5
valthebaldBumping version, rerolling for 8.x, marking for backporting
Comment #6
Heine CreditAttribution: Heine commented' Should be escaped as
'
Comment #7
valthebaldCorrected and added another check - for non-HTML quotes
Comment #8
valthebald0x20000 converted to string before passing to check_plain, thanks, PIFR.
Also, taken invalid unicode character according to Wikipedia
Comment #10
valthebaldInteresting.
Should check_plain escape non-HTML quotation marks? Currently it does not.
Comment #12
valthebaldThere is a catch here.
As a part of the test, we pass incorrect unicode sequence to check_plain(), and expect check_plain() to return empty string.
It does, and assertion works fine.
But the test fails due to PHP warning produced by htmlspecialchars() function.
Any idea how to overcome this?
I thought of 2 solutions:
1) Change check_plain() implementation to call @htmlspecialchars() - seems bad, or
2) place set_error_handler() inside test
Comment #13
xjmOf note: https://bugs.php.net/bug.php?id=47494
Comment #14
xjmNot sure the best way to proceed here. As several folks in IRC said,
check_plain()
should really just clean input silently. So do we stick a@
in front, or what? Is it actually cleaning the input, or failing to entirely?Meanwhile, code style nitpicks:
This class should have a docblock. See http://drupal.org/node/1354#classes and http://drupal.org/node/1354#functions.
Should be "Tests check_plain()."
The last argument (message assertion texts for the test report) should not be translated. (Yes, many you will see in the codebase are currently wrapped in t(), but there is an issue to remove them.)
Comment #15
valthebaldAfter some short discussion in IRC,
I have suppressed htmlspecialchars() output in check_plain().
In addition to making tests happy, it gives bonus of anti-flooding defense (massive input containing incorrect unicode sequences)
There is related PHP bug here - https://bugs.php.net/bug.php?id=47494
Comment #16
Niklas Fiekas CreditAttribution: Niklas Fiekas commentedOn invalid UTF8? It should drop the whole string and return an empty one, because there are security problems with trying to actually only correct a few bytes. See ENT_IGNORE flag on htmlspecialchars().
Comment #17
valthebaldComment #18
valthebaldFixed #14 stuff: added docblock, spell fix, and removed t()
Comment #19
xjmI'm not sure the
@
is a good solution, because that would/could hide legitimate feedback on malformed data, no? As Crell put it in https://bugs.php.net/bug.php?id=47494:Wouldn't we then just be expanding the scope of the PHP bug?
Comment #20
Crell CreditAttribution: Crell commentedSuppressing errors is the wrong answer in all use cases except one: ldap_open() triggers a warning on invalid login rather than returning false, so you must use it there. I cannot accurately describe my opinion of that "feature" of PHP in a family-friendly forum.
So no, @ is not the answer here. Rather, we should document the underlying PHP critical bug (I don't care what the PHP devs say, it's a critical bug that they refuse to fix. See previous comment about family-friendly forums.) To keep tests working, probably swap out the error handler right around that line combined with extensive documentation about why PHP is broken.
Incidentally, this is why trigger_error() is never the right answer.
Comment #21
xjmWell, I'm also not sure that adding a special workaround just for the test is a good idea. It seems to me that the test exposes something that could trigger a slough of unwanted notices in the wild, right? So would we want to make a change to
check_plain()
itself?Comment #22
valthebald1. Simpletest explicitly assumes, that code under test must not throw messages. Otherwise, tests fail.
2. That means, that current implementation of check_plain() contains bug.
I hope we all agree on 1 and 2?
Main question that remains, what is desired behavior in case of detected invalid input.
Yes, it is possible to intercept error condition in check_plain().
For example, it is possible to
or
instead of just
My answer to remaining question: valid behavior of mysterios_error_handler() is silent return, so let's not cheat ourselves and use @ suppression.
Any other mysterios_error_handler behavior, including attempt to log error (i.e. with watchdog()) opens vulnerability to flooding DoS attack by sending massive invalid requests. I've got explicit instruction not to log invalid user input during every security code review that I had, be it PHP or not, from different reviewers.
Note 1: error suppressing does not mean that caller of check_plain() cannot detect invalid input. In case of error check_plain() returns empty string.
Note 2: Don't get me wrong - in general, I am a strong opponent of error suppression, but not in case where error suppression is a goal.
Comment #23
valthebaldSomewhat related issue - #652394: Aggressive watchdog message assertion
Comment #24
sunThis is tested already...?
http://api.drupal.org/api/drupal/core--modules--simpletest--tests--commo...
Comment #25
valthebaldThe following piece of code hides the problem instead of solving it (common.test, lines 372-376):
check_plain() messages are suppressed in test, but not in real usage.
So yes, there's no need in a new test, yet the problem with incorrect check_plain() behavior remains
Comment #26
sunThe issued warnings are expected behavior, documented on php.net, nothing wrong with it.
Which incorrect behavior? This issue is categorized as a bug, but I'm missing a concrete bug report in here?
We can certainly extend the tests to verify more expected behavior of htmlspecialchars() to ensure it treats the most important characters as intended. But I fail to see where incorrect behavior is involved?
Lastly, note that code dealing with arbitrary user input normally invokes drupal_validate_utf8() on it, before attempting to process or output it (thus, before check_plain() and related functions are getting involved). That said, the primary source for arbitrary user input does not perform this validation yet; see #943722: Form API accepts values containing invalid UTF-8.
Comment #27
valthebaldValid code does not produce PHP messages.
check_plain() produces PHP messages.
Therefore - check_plain() contains invalid code. I call that incorrect behavior :)
htmlspecialchars() policy on messages is controversial, there is open bug on it (https://bugs.php.net/bug.php?id=47494), and finally, it's behavior is addressed with the new helpful parameters in PHP 5.4. Documentation of the problem does not solve the problem.
Comment #28
droplet CreditAttribution: droplet commenteda soultion from #1090290: admin/people/permissions/roles: htmlspecialchars(): Invalid multibyte sequence in argument in check_plain()
Comment #29
smartango CreditAttribution: smartango commentedhttp://drupal.org/node/652394#comment-6164352 was more related to this
Comment #30
Damien Tournoud CreditAttribution: Damien Tournoud commentedSuppressing errors hides bugs, here or elsewhere. Your logging policy in production is your problem. It is completely independent of this: it happens down the chain, when the error has been caught and sent to the watchdog implementation.
Comment #31
cweagansFixing tags per http://drupal.org/node/1517250
Comment #32
abedzoubi25 CreditAttribution: abedzoubi25 commented#18: check-plain-test-393538-18.patch queued for re-testing.
Comment #34
swentel CreditAttribution: swentel commentedComment #35
Niklas Fiekas CreditAttribution: Niklas Fiekas commentedI have to say I agree with #393538-27: Document that check_plain() can issue PHP messages on invalid UTF-8 input. Also note that #1090290: admin/people/permissions/roles: htmlspecialchars(): Invalid multibyte sequence in argument in check_plain() has been closed as a duplicate of this one. What was the argument for "won't fix"?
Comment #36
mattjbondi CreditAttribution: mattjbondi commentedI use rawurlencode() and decode for my input and works...
Comment #37
asPagurus CreditAttribution: asPagurus commentedI works with Drupal 6.28 and have same problem.
And I Have problem with sequrity. One of bots that walks through my site, has error in its engine (imho) and does not recognize construction as
<a href="http://address" class="something">
. And it trying to get page with address:http://address"%20class="something
And I have many errors in my log about incorrect byte sequences...
But my site is ok :)
And what have I to do?
IMHO it is strong sequrity issue... May be not today but tomorrow....
Sorry for my english :)
Comment #38
Crell CreditAttribution: Crell commentedPlease don't reopen irrelevant issue threads. That should be filed as a new support request.
Comment #39
int_ua CreditAttribution: int_ua commentedHey, are you aware that it's kind of critical? The whole site dies after just trying to add an empty node with UTF-8 title. Is there a different issue for D7 that I can't find?
Comment #40
Liam MorlandThe documentation for check_plain() says that it returns,
But that is not true. If $text is not valid UTF-8, it raises a warning message (and returns empty string). At the very least, the documentation has to be changed to reflect what the function actually does.
But I think the documentation describes what people want it to do. Here are two patches. The first makes check_plain() do exactly what the documentation says. The second has it return FALSE if the UTF-8 is invalid and updates the documentation to match.
Comment #41
valthebaldI like approach from #40. We have an indication of incorrect UTF-8 (format_string() returns FALSE), but no PHP-level warning. Nice!
Comment #42
sunIf I understand https://bugs.php.net/bug.php?id=47494 and http://php.net/manual/en/function.htmlspecialchars.php correctly, then all you have to do is to update to PHP 5.4.
There are new flags like ENT_SUBSTITUTE, but for now, we still want the empty string behavior.
In any case, the warnings seem to be gone with PHP >=5.4.
Off-topic here: I think we actually want to add a PHP version condition that triggers two different code paths there:
But that's for another issue.
Comment #43
Liam MorlandUntil PHP 5.4 is the minimum required, one of the patches in #40 should go in. Perhaps the version test that you suggest could bypass the call to drupal_validate_utf8() for those using 5.4 or above.
Comment #44
Liam MorlandThis is no longer an issue for D8. Reroll for D7.
Comment #45
mgifford44: drupal_393538_check_plain.patch queued for re-testing.
Comment #46
latin_miracle CreditAttribution: latin_miracle commented#44 worked like a charm on 7.26.
Thanks Liam Morland!
Comment #47
valthebaldWorked for me too
Comment #48
BBCSame here. #44 seems to have eliminated the error for me on 7.27.
Thanks!
Comment #50
David_Rothstein CreditAttribution: David_Rothstein commented44: drupal_393538_check_plain_return_false.patch queued for re-testing.
Comment #51
David_Rothstein CreditAttribution: David_Rothstein commented44: drupal_393538_check_plain.patch queued for re-testing.
Comment #52
valthebaldStill looks good
Comment #54
David_Rothstein CreditAttribution: David_Rothstein commented44: drupal_393538_check_plain.patch queued for re-testing.
Comment #55
David_Rothstein CreditAttribution: David_Rothstein commentedLooks like a testbot fluke... but which patch from #44 is RTBC anyway? (On first glance returning an empty string makes more sense to me.)
Also wondering about the performance implications of this. Didn't we deliberately choose a minimum PHP version for Drupal 7 so that (unlike Drupal 6) check_plain() could just be a simple wrapper around htmlspecialchars() with no extra regular expressions or anything like that? See this code from system_requirements():
Comment #57
Liam MorlandIt could be either patch. Since "(string) FALSE" is empty string, it is very unlikely to make a difference. However, the one called drupal_393538_check_plain.patch just fixes the bug instead of making the very minor change to the API documentation.
Comment #58
Liam MorlandHaving said that, I do like sun's suggestion in #42. That would also be an API change, since it removes invalid characters instead of returning empty string.
I think we should commit drupal_393538_check_plain.patch since that is the minimal change to fix the bug, making behavior match documentation. A follow-up issue could look at API improvements.
Comment #59
Liam MorlandThe attached patch is based on sun's idea from #42 combined with my patch from #44. I understand that the warning is only generated on PHP < 5.4, so the check is only done on the older version. That should address the performance concern.
I like sun's other ideas. I think they should go in a separate issue.
Comment #60
Liam MorlandSorry, use this one.
Comment #62
Liam MorlandReroll.
Comment #65
Liam MorlandFix silly typo.
Comment #66
mgiffordLooks like a pretty simple patch. Not sure how it would cause a security problem.
The issue summary could use some work.
It should have unit tests...
I'd also like to have a simple description of how to replicate the problem so we can verify it's been fixed.
Comment #67
Liam MorlandComment #68
Liam MorlandPatch with tests. The only difference in the test is that the error suppression is no longer needed.
Comment #70
Liam MorlandI don't think the failure is related to the patch.
Comment #72
mgiffordPatch applies nicely on SimplyTest.me....
Drupal 7: PHP 5.2.5 or higher (5.3 recommended). - https://www.drupal.org/requirements
So don't we still need the check? It was there in #59.
Comment #73
Liam Morland#68 still does the PHP version check. The only difference with #65 is that it removes the unneeded error suppression from the tests.
#59 had the bug that it would never run version_compare() because $php_lt_54 was always set.
Comment #74
mgiffordSorry, I must have been looking at an earlier version.. It's definitely there in #68.
Comment #77
Liam MorlandRestoring previous status.
Comment #80
mgiffordbad bot...
Comment #82
David_Rothstein CreditAttribution: David_Rothstein commentedThis is still a performance hit on the majority of Drupal sites (which aren't running PHP 5.4 yet)... how much of one I'm not sure, but a new function call which itself calls preg_match() on every single piece of text that passes through check_plain() is not obviously negligible impact.
As others have mentioned above, this warning is documented PHP behavior and we should just be able to add it to the check_plain() documentation also... right?
Comment #83
Liam MorlandTime is healing this wound as PHP 5.3 is no longer supported. This patch just updates the documentation.
Comment #84
Liam MorlandSorry, use this one.
Comment #86
Crell CreditAttribution: Crell commentedI think this is reasonable. Drupal 7 still supports PHP 5.2+ (as in, runs on) but that doesn't mean we need to bend over backward for people running completely unsupported software.
Comment #90
Liam MorlandComment #94
Liam MorlandI love it when a comment-only patch fails testing. ;-)
Comment #95
mgiffordindeed.. wonder how many times we'll have to retest this...
Comment #98
Liam MorlandComment #99
David_Rothstein CreditAttribution: David_Rothstein commentedI tested this on PHP 5.3 and it did not behave as described in the code comment - I did not get a PHP warning. Played around with the PHP configuration a bit and https://bugs.php.net/bug.php?id=47494#1306953388 is an accurate description of how it works (backwards as that is...)
We don't have to go into that much detail in the comment, but we shouldn't imply that there will always be a warning either.
Regarding PHP 5.3, I do think it's important we still pay attention to that since it's still by far the most used version of PHP. PHP 5.4 finally just caught up to PHP 5.2's usage (sadly) and is nowhere near PHP 5.3 yet (24.5% vs. 47.4%, as of today - http://w3techs.com/technologies/details/pl-php/5/all). Also, various Linux distros, etc., will still be providing security support for PHP 5.3 for years to come.
Comment #100
David_Rothstein CreditAttribution: David_Rothstein commentedMaybe just something like this...?
Comment #101
Liam MorlandMake sense to me.
Comment #102
David_Rothstein CreditAttribution: David_Rothstein commentedCommitted to 7.x - thanks!
Comment #104
David_Rothstein CreditAttribution: David_Rothstein commentedActually, not sure how I missed this but that definitely should be the other way around ("string" before the variable).
Fixing now via a quick followup commit.
Comment #107
bmateus CreditAttribution: bmateus commentedThis error just appeared on my D7.38 installation, with PHP 5.3.
Had to solve it by patching includes/bootstrap.inc on line 1566.
Thought the patch was already applyed to the new versions of core. Isn't it?
Comment #108
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@bmateus, the documentation patch is the one that was committed to core - we didn't commit anything that overrides PHP 5.3's behavior.
Comment #109
cilefen CreditAttribution: cilefen commented#2644106: htmlspecialchars(): Invalid multibyte sequence in argument in check_plain