Comment links are generated via t() - which by definition outputs HTML (even if there's no HTML tags in it by default), however comment.module doesn't set html to TRUE for any of it's 'reply', 'edit', 'delete' links etc. If you had a custom english translation this could result in strong or em tags getting escaped.
With 300 comments per page, this results in 900 unnecesssary calls to check_plain(). The c.5,000 total calls to check_plain() are responsible for 8% of the page load.
ab on the same page with 'administer comments' permission confirms it's about 2%
ab -c1 -n500
HEAD:
0.74 [#/sec] (mean)
Patch:
0.75 [#/sec]
| Comment | File | Size | Author |
|---|---|---|---|
| #1 | ab.txt | 2.91 KB | catch |
| html.patch | 9.12 KB | catch |
Comments
Comment #1
catchrequests per second aren't ideal at this level. Here's time per request from the same runs:
HEAD:
Time per request: 1357.287
Patch:
Time per request: 1328.057
(97.8% time per request with the patch applied - so confirms the 2% suggested by cachegrind.).
Comment #2
agentrickardIt took a few reads to follow, but catch is right. This should be ready to go.
Comment #3
stella commentedYup, it's a simple patch, passes the testing bot and passes coder, so I feel happy marking RTBC.
Comment #4
dries commentedGood catch, catch. I estimate there are many more such calls that could be optimized ... In reviewing the patch, I found
html => TRUEto be slightly counter-intuitive. I thinkescape => FALSEwould be more intuitive, and therefore, encourage more people to use this property for performance related tweaks. We should probably create a DX issue for this, instead of trying to tag it onto this one, but I figured I'd share my experience to get the conversation going.Comment #5
catchI noticed that most of the remaining 4,000 calls are also on comment links (i.e. doing check_plain() on paths like comment/reply/1 for example).
I agree that escape => TRUE might be more useful (or re-using ALREADY_SANITIZED from drupal_set_title()) - then we should consider reworking l() and url() to use this for paths and attributes as well as titles - at which point it becomes more like 6-8% improvement on pages like this instead of 2%. As long as it's opt-in, and we account for checking path aliases etc, it looks worth doing.
Comment #6
dries commentedI agree that this is worth doing. Do you want me to commit this as an interim solution, or do you want to take a crack at the bigger patch so we can benchmark it as a whole?
Comment #7
heine commentedThe html option (former parameter) of l() tells something about the string you pass. Namely that it is HTML, not plaintext, the other string type l() accepts. In that light, the name 'html' makes sense, at least more so than ALREADY_SANITZED or escape.
Please note that url() does not return HTML. These URLs need to be escaped by check_plain / check_url before using it as attribute values to make sure pages are valid (no &, but
&).Comment #8
stella commentedComment #9
webchickThis sounds like this is still 'needs review' atm.
Comment #10
stella commentedoops added tag to wrong issue
Comment #11
dries commentedcatch, I'd like you to come up with a recommendation on how to proceed with this. Can I delegate this to you?
Comment #12
catchHeine's #7 suggests we can't do anything tricky in url() to skip check_plain() for trusted URLs because check_plain() isn't only protecting against XSS, it's also ensuring valid HTML. The arguments for keeping html => TRUE make sense to me too.
So I don't see us saving the other check_plain() calls without a lot of refactoring of l() just to get around this issue, since there's not an obvious path to doing that right now, I think we might as well commit this as is, then take a look at wider changes in another issue (if they're not already won't fix).
Comment #13
catchTurns out there's an existing RTBC bug as a result of these not being set 'html' => TRUE with a test at #439148: login/register link generated when a user isn't allowed to comment, is transformed to html entities. Leaving both patches RTBC - if this goes in first we can add the tests from that one, if that one goes in first this is a tiny re-roll.
Comment #14
dries commentedCommitted! Thanks.
Comment #16
R.Muilwijk commentedClosed duplicate -> http://drupal.org/node/352245#comment-3115944