HTML corrector encodes (entifies) <!--break--> tag

Todd Nienkerk - December 5, 2008 - 15:05
Project:Drupal
Version:6.6
Component:filter.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:duplicate
Description

The HTML corrector input filter mistakenly encodes (entifies) the less-than symbol in the "<!--break-->" tag. This results in the string &lt;!--break--> being added to node output in some cases -- for example, in RSS feeds generated by taxonomy.module -- which displays the literal string "<!--break-->" at the summary split.

The patch adds a single line to _filter_htmlcorrector() that removes the string "<!--break-->" before entifying the less-than character.

+  // Remove the teaser separator before entifying angle brackets.
+  $text = str_replace('', '', $text);
+
   // Properly entify angles.
   $text = preg_replace('!<([^a-zA-Z/])!', '&lt;\1', $text);

This is a conservative patch intended to fix only the appearance of the "<!--break-->" tag. I propose expanding this chance to remove all HTML comments prior to coding the less-than character. To remove all HTML comments, the line $text = str_replace('', '', $text); above should be changed to:

$text = preg_replace('/<!--(.|\s)*?-->/', '', $text);

Reproducing the bug

  1. Create a series of nodes using the "Full HTML" input filter.
  2. Give these nodes unique summaries (teasers). That is, instead of "splitting" the node, give these nodes unique teasers that are not intended to be joined with the rest of the body. (This is achieved by unchecking "Show summary in full view" above the node's summary/body textareas.)
  3. Tag each of these nodes with the same term.
  4. View the RSS feed for that term (at example.com/taxonomy/term/TID/0/feed). You should see the literal text "<!--break-->" at the top of each item.

The "<!--break-->" tag is also visible on each node's Dev Load tab.

AttachmentSizeStatusTest resultOperations
html_corrector_break_tag.patch523 bytesIgnoredNoneNone

#1

Todd Nienkerk - December 5, 2008 - 15:09

Please note that, in the example above, the text "<!--break-->" has been stripped from inside the <code></code> tags. This results in my patch looking like it does nothing. Please view the patch file itself to see the un-stripped code.

(Perhaps the code filter should allow HTML comments?)

#2

Todd Nienkerk - December 11, 2008 - 00:03
Title:HTML corrector endoces (entifies) <!--break--> tag» HTML corrector encodes (entifies) <!--break--> tag

Fixing typo in the issue title.

#3

Damien Tournoud - December 11, 2008 - 00:33

Related, but strangely the opposite: #222926: htmlcorrector filter escapes HTML comments

#4

Todd Nienkerk - December 12, 2008 - 16:43

@Damien Tournoud: Actually, I think that issue is exactly the same. (And, unfortunately, it's showing up in 6.x and 7.x.)

#5

jcnventura - December 14, 2008 - 00:02

Indeed the problem is the same, and had already been identified two years ago (#97182: <!--break --> is transformed into html code with lt and gt), which makes me wonder how the HTML corrector module was merged into core with such a major problem.

João

#6

Todd Nienkerk - December 15, 2008 - 17:24

@jcnventura: I left a message in the issue you posted directing people to the patch posted here. How can we get this into the next core release?

#7

gpk - December 15, 2008 - 17:32
Status:needs review» duplicate

@6: See the roadmap at http://drupal.org/node/222926#comment-1086392.

I'm marking this as duplicate of #222926: htmlcorrector filter escapes HTML comments because the underlying problem is incorrect handling of HTML comments by filter.module, and I don't believe a partial fix like this would ever be committed. Your patch may remain a useful fix though for some people until such time as the other issue is fixed.

Often patches don't get in because people don't test them so that might be somewhere you can help.

#8

g10tto - January 21, 2009 - 19:47

How does one apply this patch? What file and where inside does one place the code?

#9

Todd Nienkerk - January 21, 2009 - 23:55

g10tto:

First, read this: Applying patches. The patch is applied to filter.module, which is the "component" of this issue (listed above). It's found in the core modules directory: /modules/filter/filter.module.

#10

g10tto - May 30, 2009 - 23:03

This was patched in the latest version of v6.12, however I still get issues from time to time (like now) on new nodes, even if I replace the filter.module file with one that I know works on another Drupal site.

#11

drupert55 - July 10, 2009 - 07:29

I see this in 6.13.

 
 

Drupal is a registered trademark of Dries Buytaert.