Node's teaser generation fails for this kind of contents:

<P style="bla" class="MS-thingy"><SPAN style="hop" class="MS-blabla">Contents</SPAN></P>

(one of my users manages to input this with a combination of (server: Drupal + wysiwyg + TinyMCE) + (client: IE6 + Word))

This gives this teaser:

<><>Contents<><>

This is due to the regex used to filter body when reducing it to a teaser, in modules/node/node.module:

preg_replace('#<([/a-z]*)[^>]*>#', '<\1>', $body);

where [/a-z]* will match nothing because of uppercase HTML tags. Thus the SPAN matches [^>]*, and \1 is the empty string; thus the resulting <>.

First thought about this would be to add uppercase tags to the regex.

However, I prefer forcing everything to lowercase at the start, so that we have a somewhat unified input to rely on (damn IE6, we want you to send us lowercase tags, like everybody else!).

I attach a patch with this solution.

Comments

AlexisWilke’s picture

Issue tags: +teaser

The teaser code was really bad and I have a patch here:

#221257: text_summary() should output valid HTML and Unicode text

Your patch only takes care of upper/lower case, mine fixes all the problems (that I know of)!

Thank you.
Alexis Wilke

Status: Needs review » Needs work
Issue tags: -teaser

The last submitted patch, node.teaser-tolerance-for-uppercase-html-tags.patch, failed testing.

Status: Needs work » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.