#266488 pointed out that check_plain() is not reentrant: you can't call it several times on the same text or any &xxx; entity will get reencoded.

The contract of check_plain() is:

Encode special characters in a plain-text string for display as HTML.

It does not say anything about what to do with entities.

I suggest changing the contract so that pre-existing entities are not escaped, thus making check_plain() reentrant.

The discussion here is on:

  • Is this ok security-wise?
  • Do some user input need special treatment? (what to do when a user types & in a title box: display & or display &?)
CommentFileSizeAuthor
check_plain-reentrant.patch549 bytesdamien tournoud

Comments

heine’s picture

Status: Needs review » Closed (won't fix)

It does not say anything about what to do with entities.

It does say that (emphasis added):

Encode special characters in a plain-text string for display as HTML.

check_plain operates on plaintext strings and ensures they can be displayed in an HTML context.

So, when the following plaintext string is given:

Foo & Bar & Baz

check_plain should output:

Foo & Bar & Baz

When this string is included in HTML it will be shown as:

Foo & Bar & Baz
damien tournoud’s picture

Status: Closed (won't fix) » Needs review

Hum. I guess the dilemma is either:

  • we fix check_plain() to allow HTML entities, or
  • we fix *every* incorrect call to the function (ie. all where the input is *not* plain text, which is the case of the output of check_plain()...).

Which of these two is possible? Which is the easiest?

damien tournoud’s picture

For reference, here is the list of all calls to check_plain() in Drupal 7 (a whole bunch of them, most of them can be suspected of being reentrant): http://api.drupal.org/api/function/check_plain/7/references

Moreover, t() calls check_plain() unconditionally when dealing with '@' and '%' modifiers, which increase the probability of having double-escaped strings.

heine’s picture

As check_plain operates explicitly on plaintext, your first proposition makes no sense. Double escaping is a sign of sloppyness. Fortunately there aren't that many double-escaping issues. The function in core most in need of fixing is theme_links and its immediate callers.

heine’s picture

t() should be though of as returning HTML

chx’s picture

Status: Needs review » Closed (won't fix)

check_plain merely runs an encoding which makes sure that a standards compliant HTML renderer software shows the exact same string you have passed in. There is nothing to fix here. If you pass in things that look like escaped HTML text to your eyes then your HTML renderer will show just that. If you think check_plain is some sort of strong AI which can figure out what to do then you are mistaken. The proposal is horribly wrong on every possible count and will introduce very strange errors.