There are a few issues about special characters getting double encoded and it boils down to the check_plain allowing it by default in the htmlspecialchars. Setting the fourth parameter (double_encode) to FALSE will fix these issues. From http://us2.php.net/htmlspecialchars

When double_encode is turned off PHP will not encode existing html entities, the default is to convert everything.

.

Is there a reason the fourth parameter isn't set to FALSE?

Thanks

Comments

Damien Tournoud’s picture

Status: Active » Closed (works as designed)

It's better to fix the cause then to add unpredictable workarounds. This double_encode is a hack we should keep as away from as possible.

kamkejj’s picture

Status: Closed (works as designed) » Active

Setting htmlspecialchars 4th parameter in check_plain() isn't a hack. See http://us2.php.net/htmlspecialchars.

Run this:

<?php
$title = "A & W";
$title = check_plain($title);
echo $title; // result A &amp; W
$title = check_plain($title);
echo $title; // result A &amp;amp; W
$title = check_plain($title);
echo $title; // result A &amp;amp;amp; W

Why is this the intended way to work.

Damien Tournoud’s picture

Status: Active » Closed (works as designed)

Controlling the type of text you manipulate is critical, if you fail to properly know everywhere if a piece of text is HTML or plaintext, you expose yourself to all kind of security and non-security issues. The double_encode parameter is a hack that muddies the difference between HTML and plaintext. It is very much not welcome.

ice5nake’s picture

@Damien Tournoud, While you may be correct I think I have to disagree with your assessment. Can you explain or reference why having the double encode flag set to false would be a security problem?

I'd argue that once text has html entities in it it is no longer plaintext.

Why would you want to double encode something? The PHP docs do not mention a security implication with this flag as they do with some other flags.

ice5nake’s picture

I'd also argue that check_plain should be idempotent and with double encode set to false it would be.

Damien Tournoud’s picture

@ice5nake: if your plain-text input is &amp; you *want* your HTML output to be &amp;amp;. For example, let's create a node called:

How to properly use the &amp; entity in HTML

The $double_encode parameters of htmlspecialchars() is not about idempotence, it's about black-magic. While I agree that an encoding scheme that is idempotent would be better, this is no solution.