Node's teaser generation fails for this kind of contents:
<P style="bla" class="MS-thingy"><SPAN style="hop" class="MS-blabla">Contents</SPAN></P>
(one of my users manages to input this with a combination of (server: Drupal + wysiwyg + TinyMCE) + (client: IE6 + Word))
This gives this teaser:
<><>Contents<><>
This is due to the regex used to filter body when reducing it to a teaser, in modules/node/node.module:
preg_replace('#<([/a-z]*)[^>]*>#', '<\1>', $body);
where [/a-z]* will match nothing because of uppercase HTML tags. Thus the SPAN matches [^>]*, and \1 is the empty string; thus the resulting <>.
First thought about this would be to add uppercase tags to the regex.
However, I prefer forcing everything to lowercase at the start, so that we have a somewhat unified input to rely on (damn IE6, we want you to send us lowercase tags, like everybody else!).
I attach a patch with this solution.
| Comment | File | Size | Author |
|---|---|---|---|
| node.teaser-tolerance-for-uppercase-html-tags.patch | 515 bytes | guillaume.outters |
Comments
Comment #1
AlexisWilke commentedThe teaser code was really bad and I have a patch here:
#221257: text_summary() should output valid HTML and Unicode text
Your patch only takes care of upper/lower case, mine fixes all the problems (that I know of)!
Thank you.
Alexis Wilke