htmLawed allows invalid Unicode characters through, such as U+FFFE, regardless of whether using hex or dec representation. If the site serves the page up as XHTML, the whole page will break.

Comments

alpha2zee’s picture

Assigned: Unassigned » alpha2zee
Status: Active » Fixed

This is fixed in the just-released version 2.1 of the module. The module now uses htmLawed 1.0.9 that has the entity-pointing-to-invalid-character issue fixed.

htmLawed neutralizes all entities referring to characters that are invalid (hexdec code-points other than 9, a, d, 20 to d7ff, and e000 to 10ffff except fffe and ffff), and the entities referring to characters that are discouraged (hexdec code-points 7f to 84, 86 to 9f, and fdd0 to fddf). Entities referring to the remaining discouraged [note that they are not invalid] characters are let through.

alpha2zee’s picture

Title: htmLawed allows illegal unicode characters » htmLawed allowing illegal unicode character entity
alpha2zee’s picture

Status: Fixed » Closed (fixed)