check_plain() spends 6% of page execution time in self, and 6% in drupal_validate_utf8(), nearly all of this time is just the expense of calling functions thousands of times, not what they're actually doing. Also drupal_validate_utf8() only exists because IE6 is stupid.

By moving the same preg_match() into check_plain(), we save about 50% of the time spent in profiling, this should be backportable.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Title: IE6 and one function call makes every Drupal site slow » IE6 and one function call makes every Drupal site slower

whoops.

sun’s picture

Hm. drupal_validate_utf8() additionally checked for empty()... why not here?

Also, even after some discussion in IRC, I'm still not sure whether we can't entirely drop that IE6 support function. See also #308865: Drop IE6 support in Drupal core.

sun’s picture

Additional insight for why this function exists: https://security.drupal.org/node/116

I'd propose to re-think whether we need this entire check at all. (See also the issue I linked to in my previous follow-up)

catch’s picture

FileSize
1.3 KB

Added the empty(). We should consider dropping at least some aspects of IE6 support in HEAD I think, but we can also backport this to 5/6.

Still twice as fast, this morning check_plain() is taking 6% / 3% instead of 12% / 6% but still good results.

catch’s picture

Title: IE6 and one function call makes every Drupal site slower » Optimize check_plain()

More sober title.

pwolanin’s picture

Title: Optimize check_plain() » IE6 and one function call makes every Drupal site slower

empty() will return TURE for the string '0' as well as various non-strings. Are you sure you want to use that vs. the strlen==0 check?

You could also cast to a string and compare to the empty string, and return the cast variable.

catch’s picture

FileSize
893 bytes

Just checked HEAD and we get between 1-3 empty strings per page, so how about just removing that altogether?

pwolanin’s picture

@catch- makes sense to me - if there are that few we are not gaining anything since it costs lots of extra function calls.

sun’s picture

Status: Needs review » Needs work

We should append a @see drupal_validate_utf8() to the PHPDoc though, and perhaps additionally explain why we are duplicating code.

Speaking of duplication, shouldn't we apply the same to filter_xss() (the second instance) ? So we'd keep drupal_validate_utf8() for (rather special) contrib modules that need it only?

catch’s picture

Status: Needs work » Needs review
FileSize
2.46 KB

Added the phpdoc, did similar in filter_xss() and removed the strlen from drupal_validate_utf8().

sun’s picture

Yay! Thanks for updating!

+ *   valid UTF-8. @see drupal_validate_utf8().

(multiple occurrences) @see should always be on a separate line.

 function check_plain($text) {
-  return drupal_validate_utf8($text) ? htmlspecialchars($text, ENT_QUOTES) : '';
+  // Validate strings as UTF-8 to prevent cross site scripting attacks on
+  // Internet Explorer 6. We duplicate the same preg_match() here to avoid
+  // additional function calls, since check_plain() is often called hundreds of
+  // times per request.
+  return (preg_match('/^./us', $text) == 1) ? htmlspecialchars($text, ENT_QUOTES) : '';

This lengthy comment should rather go into the PHPDoc.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
2.69 KB

Fixed the notice and did some more comment tidying.

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me - nice to have to validated performance improvements.

jrchamp’s picture

The only problem with removing the strlen from drupal_validate_utf8 is that it changes the result for empty string. An empty string is a valid utf8 string, whereas the regex requires at least one character.

catch’s picture

Status: Reviewed & tested by the community » Needs review

Hmm, I'm wondering if we could switch the order - so if the preg_match() returns TRUE we're done, if it doesn't, then we do the strlen check and return true if it's an empty string? It's not a problem in check_plain() because we return an empty string anyway if the preg_match() fails. Or we could just delete that hunk from the patch since drupal_validate_utf8() isn't in the critical path anyway.

Damien Tournoud’s picture

strlen() should be lightening fast. What about benchmarking with and without that check?

catch’s picture

FileSize
1.31 KB

We only need to duplicate the preg_match() in check_plain() since that's in the critical path, so removed other changes.

catch’s picture

Title: IE6 and one function call makes every Drupal site slower » Optimize check_plain()
jrchamp’s picture

Status: Needs review » Reviewed & tested by the community

This was already RTBC. It duplicates a small amount of code (referencing the original) in a critical path which reduces page load by a noticeable amount. Sounds good to me.

catch’s picture

So Moshe pointed out that since xdebug has to measure function calls, it's not the best way to measure performance improvements gained by removing function calls, which was a good point. Beyond this, I also found via micro-benchmarking that having xdebug extension enabled at all (i.e. xdebug on, profiling off) adds overhead to function calls.

So here's micro-benchmarks for before/after check_plain() to verify the speed-up in check_plain() itself. I've attached the benchmarking script to this issue so you can run it locally and/or see the logic:

(three runs, 2,000,000 iterations of each loop).

catch@catch-laptop:~/www/7$ php check_plain.php
nothing: 0.467885017395 seconds
function no_op(): 1.22173714638 seconds
check plain old: 8.99900794029 seconds
check plain new: 5.9074318409 seconds
catch@catch-laptop:~/www/7$ php check_plain.php
nothing: 0.499942064285 seconds
function no_op(): 1.27891206741 seconds
check plain old: 8.88757514954 seconds
check plain new: 5.72876691818 seconds
catch@catch-laptop:~/www/7$ php check_plain.php
nothing: 0.451987028122 seconds
function no_op(): 1.2409799099 seconds
check plain old: 8.81144690514 seconds
check plain new: 5.80533409119 seconds

With xdebug enabled:

nothing: 0.719228029251 seconds
function no_op(): 6.70149278641 seconds
check plain old: 38.8574278355 seconds
check plain new: 24.8503789902 seconds 

Actually in this case, the relative speed up in check_plain() is the same, it's just the absolute times which are different. Either way, it validates this patch, but I'm hoping to figure out how best to verify these changes on some of the other issues.

catch’s picture

FileSize
1.42 KB

Forgot to attach the test script.

catch’s picture

And just for completeness, here's numbers for 10,000 iterations - which it's possible to get in HEAD on some pages(or at least the webgrind screenshots had 8,900).

nothing: 0.0027801990509 seconds
function no_op(): 0.0087149143219 seconds
check plain old: 0.0544929504395 seconds
check plain new: 0.037360906601 seconds
catch@catch-laptop:~/www/7$ php check_plain.php
nothing: 0.00294899940491 seconds
function no_op(): 0.00863099098206 seconds
check plain old: 0.0572259426117 seconds
check plain new: 0.0334417819977 seconds

catch@catch-laptop:~/www/7$ php check_plain.php
nothing: 0.00277781486511 seconds
function no_op(): 0.00840902328491 seconds
check plain old: 0.0663850307465 seconds
check plain new: 0.0283868312836 seconds

That's a saving of around 25-30ms.

sun’s picture

+++ includes/bootstrap.inc	18 Jul 2009 19:54:35 -0000
@@ -1024,11 +1024,21 @@ function drupal_unpack($obj, $field = 'd
+ * check_plain() also validates strings as UTF-8 to prevent cross site scripting

...slightly exceeds 80 chars.

Beer-o-mania starts in 22 days! Don't drink and patch.

catch’s picture

It only does that if you include the + at the beginning of the line, otherwise it's exactly 80 characters ;)

c960657’s picture

In PHP 5.2.5 and newer, htmlspecialchars() automatically handles partial multibyte sequences (according to the release notes).

The following approach reduces the running time of check_plain() even further (approx. 11% AFAICT) when using a PHP 5.2.5 or later. With older versions, it is about as fast as than the current implementation.

function check_plain($text) {
  static $new_php;
  if (!isset($new_php)) {
    $new_php = version_compare(PHP_VERSION, '5.2.5', '>=');
  }
  return ($new_php || preg_match('/^./us', $text) == 1) ? htmlspecialchars($text, ENT_QUOTES, 'UTF-8') : '';
}
sun’s picture

Let's prove this.

NOTE: As of now, this is NOT the patch that is RTBC.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

sun’s picture

Status: Needs work » Reviewed & tested by the community

Complete removal of check_plain(): (sorry, my machine is slow)

HEAD without patch:

# ab -c 2 -n 100 http://drupal.test/

Document Path:          /
Document Length:        10865 bytes

Concurrency Level:      2
Time taken for tests:   38.092 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      1128800 bytes
HTML transferred:       1086500 bytes
Requests per second:    2.63 [#/sec] (mean)
Time per request:       761.836 [ms] (mean)
Time per request:       380.918 [ms] (mean, across all concurrent requests)
Transfer rate:          28.94 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   13  11.7     16      62
Processing:   547  744 198.3    625    1641
Waiting:      531  728 197.8    609    1625
Total:        562  757 201.3    656    1656

Percentage of the requests served within a certain time (ms)
  50%    656
  66%    828
  75%    844
  80%    844
  90%    937
  95%   1281
  98%   1422
  99%   1656
 100%   1656 (longest request)

Patch applied:

# ab -c 2 -n 100 http://drupal.test/

Document Path:          /
Document Length:        10865 bytes

Concurrency Level:      2
Time taken for tests:   35.529 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      1128800 bytes
HTML transferred:       1086500 bytes
Requests per second:    2.81 [#/sec] (mean)
Time per request:       710.589 [ms] (mean)
Time per request:       355.294 [ms] (mean, across all concurrent requests)
Transfer rate:          31.03 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   13   9.3     16      31
Processing:   547  694 146.5    687    1484
Waiting:      531  678 146.5    672    1469
Total:        562  707 145.7    687    1500

Percentage of the requests served within a certain time (ms)
  50%    687
  66%    750
  75%    766
  80%    766
  90%    859
  95%    937
  98%   1344
  99%   1500
 100%   1500 (longest request)

EDIT: I mistakenly copied the benchmarks in the wrong order :-/

sun’s picture

And of course, here is the updated patch.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

sun’s picture

Status: Needs work » Reviewed & tested by the community

Using a concurrency of 1:

HEAD:

# ab -c1 -n 100 http://drupal.test/

Document Path:          /
Document Length:        10865 bytes

Concurrency Level:      1
Time taken for tests:   49.904 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      1128800 bytes
HTML transferred:       1086500 bytes
Requests per second:    2.00 [#/sec] (mean)
Time per request:       499.037 [ms] (mean)
Time per request:       499.037 [ms] (mean, across all concurrent requests)
Transfer rate:          22.09 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   10   7.7     16      16
Processing:   453  487  48.1    469     734
Waiting:      437  471  48.4    453     719
Total:        469  497  48.5    484     734

Percentage of the requests served within a certain time (ms)
  50%    484
  66%    484
  75%    500
  80%    500
  90%    562
  95%    609
  98%    687
  99%    734
 100%    734 (longest request)

Patch:

Document Path:          /
Document Length:        10865 bytes

Concurrency Level:      1
Time taken for tests:   48.169 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      1128800 bytes
HTML transferred:       1086500 bytes
Requests per second:    2.08 [#/sec] (mean)
Time per request:       481.694 [ms] (mean)
Time per request:       481.694 [ms] (mean, across all concurrent requests)
Transfer rate:          22.88 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   10   7.7     16      31
Processing:   453  470  41.3    469     766
Waiting:      437  454  41.4    437     750
Total:        453  480  42.3    469     781

Percentage of the requests served within a certain time (ms)
  50%    469
  66%    469
  75%    484
  80%    484
  90%    484
  95%    531
  98%    719
  99%    781
 100%    781 (longest request)
 
webchick’s picture

Status: Reviewed & tested by the community » Needs review

Please don't mark patches RTBC until they're reviewed and tested.

Status: Needs review » Needs work

The last submitted patch failed testing.

Damien Tournoud’s picture

Status: Needs work » Needs review

#26 does make a lot of sense to me, but I would like to see that tested. I would not trust PHP to watch my food ;)

jrchamp’s picture

@Damz: Here is the output from my system testing the code from #26 in the script provided on #22.

nothing: 0.342183113098 seconds
function no_op(): 2.46541881561 seconds
check plain old: 14.3335449696 seconds
check plain new: 9.67884707451 seconds
check plain new 5.2.5: 6.89955997467 seconds

Damien Tournoud’s picture

I meant "test that htmlentities() does the correct thing regarding broken UTF-8 sequences".

Heine’s picture

htmlspecialchars assumes a default charset of ISO-8859-1 where no multibyte codes exist. In ISO-8859-1 mode, htmlspecialchars does not and cannot care about "invalid" multi-byte sequences.

Drupal strings are UTF-8; You must pass 'UTF-8' as the charset argument to get proper behaviour.

htmlspecialchars("Foo\xC0barbaz", ENT_QUOTES, 'UTF-8'); should return an empty string.

pwolanin’s picture

which patch is under consideration? #18? Maybe we should just get that in and have any further discussion to post-freeze.

The code in #26 (no patch) seems to me to be as far as we would want to go in terms of taking advantage of PHP 5.2.5+. We should not require it.

catch’s picture

htmlspecialchars("Foo\xC0barbaz", ENT_QUOTES, 'UTF-8');
That looks plenty of reason to keep a separate check_plain() wrapper. Since this is all internals, I don't see any reason why it couldn't go in after freeze as a performance improvement (and also be backported to D6 once we settle on one). One idea I had is maybe settings a constant for PHP version then checking that rather than the static - not sure how that compares in terms of speed.

sun’s picture

Yes, that was also my gut feeling when I replaced all instances of check_plain() with htmlspecialchars() directly. That won't work out for the "average developer and themer"... Additionally, it seems that the performance gain of that approach is minimal. So let's get back to the BC approach in #26.

Benchmarks, anyone?

smk-ka’s picture

FileSize
1.48 KB

What about dynamically defining check_plain(), depending on the result of the PHP version check? This results in the fastest possible implementation for suitable PHP versions, and a slower one for the rest.

jrchamp’s picture

I'm a big fan of #42. Defining the function so that we don't have to check the static on each call should save us even more time overall.

pwolanin’s picture

Status: Needs review » Needs work

#42 omits the basic optimization of duplicating the preg_match

smk-ka’s picture

Status: Needs work » Needs review
FileSize
1.48 KB

Inlined drupal_validate_utf8() as per #44.

pwolanin’s picture

Status: Needs review » Needs work

In the doxygen, "Uses drupal_validate_utf8()" is not really true. Maybe "uses the same method..."?

Also, the conditional define is abnormal for Drupal core, so it might be worth explaining in the comments why is doesn't look like http://api.drupal.org/api/function/drupal_clone/6 In oter words, this function is called so often for a non-cached page that any sub-calls add significant overhead and we thus put the preg code there directly adn conditionally define the function to avoid doing version_compare each time it's called.

jrchamp’s picture

From #18
+ * check_plain() also validates strings as UTF-8 to prevent cross site scripting
+ * attacks on Internet Explorer 6. We duplicate the preg_match() from
+ * drupal_validate_utf8() here rather than calling the function to avoid
+ * the overhead of an additional function call, since check_plain() may be
+ * called hundreds of times during a request.

Damien Tournoud’s picture

This needs to be benchmarked. As far as I know, conditionally defined functions cannot be cached by most (if not all) op-code caches. I personally would recommend the static variable approach.

pwolanin’s picture

hmm, well we certainly want the function to be cacheable in the opcode.

If we use a static, it should be a normal static (not use drupal_static()) since it doesn't make sense to me that the PHP version would ever change while the code is executing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
1.59 KB

ok, more like #18

pwolanin’s picture

Heine points out that I need === to distinguish FALSE from NULL

jrchamp’s picture

#51 appears to be about 6% faster than #26, and 33% faster than #18

nothing: 0.330275058746 seconds
function no_op(): 2.31866407394 seconds
check plain old: 14.1262290478 seconds
check plain new: 9.65815281868 seconds
check plain new 5.2.5: 6.80544900894 seconds
check plain new 5.2.5 pwolanin: 6.36292600632 seconds

It would be worthwhile to add #38 as a SimpleTest to ensure that the intended functionality remains intact.

sun’s picture

+++ includes/bootstrap.inc
@@ -1120,11 +1120,35 @@ function drupal_unpack($obj, $field = 'data') {
+  static $php525 = NULL;
+  
+  if ($php525 === NULL) {

Just use !isset() like everywhere else in core (and remove the = NULL from the static).

(Also note trailing white-space here)

+++ includes/bootstrap.inc
@@ -1120,11 +1120,35 @@ function drupal_unpack($obj, $field = 'data') {
+    $php525 = version_compare(PHP_VERSION, '5.2.5', '>=');
...
+  if ($php525) {

An additional cast to Boolean may speed up the further tests of the variable.

Otherwise, looks good.

This review is powered by Dreditor.

pwolanin’s picture

@sun - I assumed === might be faster than isset(), but I can do it that way.

The return value of version_compare() will be a boolean.

pwolanin’s picture

FileSize
3.23 KB

minor fixes per sun, plus a simple unit test case.

jrchamp’s picture

nothing: 0.338335990906 seconds
function no_op(): 2.31378507614 seconds
check plain old: 14.4185819626 seconds
check plain new: 9.58456897736 seconds
check plain new 5.2.5: 6.64569997787 seconds
check plain new 5.2.5 pwolanin: 6.31802201271 seconds
check plain new 5.2.5 pwolanin 2nd edition: 6.39244294167 seconds

About 1% slower than #51. Thankfully, isset isn't a function. :)

Thanks pwolanin! Let's get this in soon.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Marking #55 RTBC, wonderful job everyone.

hass’s picture

+

catch’s picture

FileSize
3.64 KB

Added an explicit @todo to drop the ugly when we drop support for either IE6 or PHP5.2.5

Dries’s picture

Status: Reviewed & tested by the community » Needs work

I support this approach, but most of the new PHPdoc at the top of the function needs to be moved (back) into code. The reader of the API documentation doesn't (and shouldn't care) about the fact that we inlined some code. Let's do a quick re-roll.

catch’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
0 bytes

Moved the docs inline. It's a pretty short function, so I just left it as a single block of documentation, no easy way to split it up.

Heine’s picture

Can we have more tests? With overlong three byte encoding for example?

pwolanin’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
3.28 KB

moved the bulk of the comments to the function body.

pwolanin’s picture

some possible test cases can be derived from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

jrchamp’s picture

@Heine, @pwolanin: Is this what you were thinking?

     $text = check_plain("\xE0\x80\xAF");
     $this->assertEqual($text, '', 'check_plain() rejects invalid sequence "\xE0\x80\xAF"');

Note:
- \x2F succeeds as expected.
- \xC0\x2F fails as expected.
- \xC0\xAF and \xE0\x80\xAF fail in the preg_match as expected, but unexpectedly does not fail in the htmlspecialchars.

The behavior for these 4 test cases is identical in 5.1.6 and 5.2.10, except that in 5.1.6 a PHP Warning is generated for \xC0\x2F from htmlspecialchars.

Perhaps I am doing something wrong; I would appreciate it if someone who knows more about UTF-8 than I do could let me know if this is correct.

pwolanin’s picture

We shoudl build a little bigger test set - but basically, if the PHP internals don't handle this as robustly as the preg_match, we should just go back to the simpler earlier patch that doesn't try to rely on PHP version.

catch’s picture

The last patch here doing that was #18 - http://drupal.org/node/523058#comment-1842404

But that probably needs the comments in-lining per Dries review (since the comments are much the same).

I think we should probably go ahead and commit that, and backport to Drupal 6 too, since there's nothing breakable there. Then that still gives us a window to further optimize if we find away around the PHP wonkiness.

c960657’s picture

I am no expert on this, but here is my understand of this:

AFAICT htmlspecialchars() only rejects partial multibyte sequences followed by a byte < 0x80 (or the end of the string). I think this is fine - the UTF-8 exploit is only relevant if one of the few characters that would otherwise be escaped by htmlspecialchars() (i.e. " < > & ') is "hidden" by a partial multibyte sequence. These few characters all have character codes < 0x80. In other words, htmlspecialchars() does not block all invalid UTF-8 strings, but it does block the dangerous ones.

Here is the change that was made to htmlspecialchars() in PHP 5.2.5:
http://svn.php.net/viewvc/php/php-src/branches/PHP_5_2/ext/standard/html...

pwolanin’s picture

Thanks for the link to the diff. So, it seems like we can proceed as long as we are selective about picking test cases? Or would we rather pay the performance penalty to have more consistent drupal behavior across PHP version? I could imagine see this leading to some hard-to-debug problems.

catch’s picture

I'd be tempted to say commit the slightly slower version - at leastso we can do a straight backport to Drupal 6 for consistency there. Then consider changing this in Drupal 8 (even if we don't require 5.2.6 for D8 it seems reasonable that most servers will actually be running it by then).

For me that means re-rolling #18 with the documentation moved inline + a @todo for the PHP 5.2.6 version + leaving this open once we've done the backport so we can discuss more for Drupal 7 or push it to Drupal 8.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Given the grave performance implications here and the need to backport ASAP I would strongly recommend committing the latest patch and building the test case later. This is an exception to a rule I know but it has good reason.

jrchamp’s picture

@c960657: Thank you for the explanation.

+1 to #63 because it doesn't alter the intent of the preg_match, and has significant performance benefits.

scroogie’s picture

so?

jrchamp’s picture

@scroogie: I'm pretty sure this is the point where more people review and approve the code or it gets committed. It takes a lot of patience sometimes. :)

sun’s picture

Status: Reviewed & tested by the community » Needs review

Given the last comments, it's entirely not clear to me which patch should be committed to D7. Please re-attach.

catch’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
3.28 KB

This one is the one which was marked RTBC most recently.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

sun’s picture

Tests failed.

catch’s picture

Status: Needs work » Needs review
catch’s picture

Status: Needs review » Reviewed & tested by the community
catch’s picture

Would really be nice to see this committed, since the slowness of check_plain() also adds overhead to all the functions which call it, which makes it harder to track down as-yet-unfound bottlenecks when profiling.

Re-posting jrchamp's microbenchmarks, which are the most recent:

nothing: 0.338335990906 seconds
function no_op(): 2.31378507614 seconds
check plain old: 14.4185819626 seconds
check plain new: 9.58456897736 seconds
check plain new 5.2.5: 6.64569997787 seconds
check plain new 5.2.5 pwolanin: 6.31802201271 seconds
check plain new 5.2.5 pwolanin 2nd edition: 6.39244294167 seconds

The last result is for the current patch.

As you can see, this is more than twice as fast as the implementation in HEAD, and the RTBC patch is itself a third faster than the original patch on this issue, making it worth the static for PHP version until we can increase requirements.

I inherited a horrible, horrible site freelancing one time, where the theme was riddled with XSS holes - after explaining to the client that this would need to be cleaned up, they went back to the company which originally build the theme, which came back and explained they never use check_plain() because it 'slows things down'. I'm sure that's not an isolated case of such stupidity.

The only slight issue here is how much we trust the htmlspecialchars() fix in PHP 5.2.5 - c960657 has shown that it's secure, but not as restrictive as our original preg_match(). If we're uneasy about relying on that though, then we have only very slightly slower alternatives, also ready to go, earlier in the issue.

webchick’s picture

Version: 7.x-dev » 6.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Huh. Somehow I had missed Dries's #60 where he points out that he supports this approach apart from some commenting issues. It looks like those comments have been addressed, along with nicely expanded test coverage. Additionally, it looks like this speeds things up quite a bit, according to the microbenchmarks.

Therefore, committed to HEAD. Seems like it might be worth back-porting this to D6?

andypost’s picture

Status: Patch (to be ported) » Needs review
FileSize
1.71 KB

And here's a backport for d6

boombatower’s picture

Didn't read through whole issue, but does this fix or is aware of:

htmlspecialchars(): Invalid multibyte sequence in argument	Warning	bootstrap.inc	1204	check_plain()

when run on PHP 5.3.

andypost’s picture

@boombatower not this patch not about this warning, suppose test should be changed a bit

Also why this improvement still not in?

pwolanin’s picture

I pointed it out to Gabor today - we just need final confirmation that it's working cleanly on D6.

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community

Patch applies cleanly and matches what's already in D7.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

For maintenance reasons, I'd add a strong, very visible reference to check_plain() in drupal_validate_utf8(), so if we need modification there, we don't forget to roll it to here as well. When we have multiple copies of code around, we should make sure to modify them in concert.

andypost’s picture

@Gábor Hojtsy purpose of drupal_validate_utf8() is different - just to check a string. I see nothing to modify in drupal_validate_utf8()

Gábor Hojtsy’s picture

@andypost: imagine drupal_validate_utf8() will have a change in a month or a year or two. Will people making that change remember that the same code appears in check_plain() and needs modification? I'm saying we should add a clear comment to drupal_validate_utf8() to notify people in the future that the same code is copied in check_plain().

c960657’s picture

Status: Needs work » Needs review
FileSize
2.58 KB

Added an inline comment in validate_utf8() as requested in #88.

andypost’s picture

Status: Needs review » Reviewed & tested by the community

Suppose this comment is enough

andypost’s picture

There's a follow up for D7 #755758: check_plain throwing warnings

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Committed, thanks.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

greywolfsspirit’s picture

apostrophe is being encoded to html code when used with Advanced Forum module's navigation. If a topic has an apostrophe, say for example, CiCi's Pizza, it's displaying the link as: CiCi's Pizza

Does anyone have any suggestions as to a fix for this?

Oleksa-1’s picture

After this patch #91 was commited to D6.17 a lot such kind of questions "warning: htmlspecialchars() [function.htmlspecialchars]: Invalid multibyte sequence" you can find on forum.

I had to replace in my latest D6.19 file bootstrap.inc with same file from D6.16 to get rid of this warning error.