Comment links are generated via t() - which by definition outputs HTML (even if there's no HTML tags in it by default), however comment.module doesn't set html to TRUE for any of it's 'reply', 'edit', 'delete' links etc. If you had a custom english translation this could result in strong or em tags getting escaped.

With 300 comments per page, this results in 900 unnecesssary calls to check_plain(). The c.5,000 total calls to check_plain() are responsible for 8% of the page load.

ab on the same page with 'administer comments' permission confirms it's about 2%

ab -c1 -n500

HEAD:
0.74 [#/sec] (mean)

Patch:
0.75 [#/sec]

CommentFileSizeAuthor
#1 ab.txt2.91 KBcatch
html.patch9.12 KBcatch

Comments

catch’s picture

StatusFileSize
new2.91 KB

requests per second aren't ideal at this level. Here's time per request from the same runs:

HEAD:
Time per request: 1357.287

Patch:
Time per request: 1328.057

(97.8% time per request with the patch applied - so confirms the 2% suggested by cachegrind.).

agentrickard’s picture

It took a few reads to follow, but catch is right. This should be ready to go.

stella’s picture

Status: Needs review » Reviewed & tested by the community

Yup, it's a simple patch, passes the testing bot and passes coder, so I feel happy marking RTBC.

dries’s picture

Good catch, catch. I estimate there are many more such calls that could be optimized ... In reviewing the patch, I found html => TRUE to be slightly counter-intuitive. I think escape => FALSE would be more intuitive, and therefore, encourage more people to use this property for performance related tweaks. We should probably create a DX issue for this, instead of trying to tag it onto this one, but I figured I'd share my experience to get the conversation going.

catch’s picture

I noticed that most of the remaining 4,000 calls are also on comment links (i.e. doing check_plain() on paths like comment/reply/1 for example).

I agree that escape => TRUE might be more useful (or re-using ALREADY_SANITIZED from drupal_set_title()) - then we should consider reworking l() and url() to use this for paths and attributes as well as titles - at which point it becomes more like 6-8% improvement on pages like this instead of 2%. As long as it's opt-in, and we account for checking path aliases etc, it looks worth doing.

dries’s picture

I agree that this is worth doing. Do you want me to commit this as an interim solution, or do you want to take a crack at the bigger patch so we can benchmark it as a whole?

heine’s picture

The html option (former parameter) of l() tells something about the string you pass. Namely that it is HTML, not plaintext, the other string type l() accepts. In that light, the name 'html' makes sense, at least more so than ALREADY_SANITZED or escape.

Please note that url() does not return HTML. These URLs need to be escaped by check_plain / check_url before using it as attribute values to make sure pages are valid (no &, but &).

stella’s picture

Issue tags: +Coding standards
webchick’s picture

Status: Reviewed & tested by the community » Needs review

This sounds like this is still 'needs review' atm.

stella’s picture

Issue tags: -Coding standards

oops added tag to wrong issue

dries’s picture

catch, I'd like you to come up with a recommendation on how to proceed with this. Can I delegate this to you?

catch’s picture

Status: Needs review » Reviewed & tested by the community

Heine's #7 suggests we can't do anything tricky in url() to skip check_plain() for trusted URLs because check_plain() isn't only protecting against XSS, it's also ensuring valid HTML. The arguments for keeping html => TRUE make sense to me too.

So I don't see us saving the other check_plain() calls without a lot of refactoring of l() just to get around this issue, since there's not an obvious path to doing that right now, I think we might as well commit this as is, then take a look at wider changes in another issue (if they're not already won't fix).

catch’s picture

Turns out there's an existing RTBC bug as a result of these not being set 'html' => TRUE with a test at #439148: login/register link generated when a user isn't allowed to comment, is transformed to html entities. Leaving both patches RTBC - if this goes in first we can add the tests from that one, if that one goes in first this is a tiny re-roll.

dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed! Thanks.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

R.Muilwijk’s picture