HTML corrector encodes (entifies) <!--break--> tag
| Project: | Drupal |
| Version: | 6.6 |
| Component: | filter.module |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | duplicate |
Jump to:
The HTML corrector input filter mistakenly encodes (entifies) the less-than symbol in the "<!--break-->" tag. This results in the string <!--break--> being added to node output in some cases -- for example, in RSS feeds generated by taxonomy.module -- which displays the literal string "<!--break-->" at the summary split.
The patch adds a single line to _filter_htmlcorrector() that removes the string "<!--break-->" before entifying the less-than character.
+ // Remove the teaser separator before entifying angle brackets.
+ $text = str_replace('', '', $text);
+
// Properly entify angles.
$text = preg_replace('!<([^a-zA-Z/])!', '<\1', $text);This is a conservative patch intended to fix only the appearance of the "<!--break-->" tag. I propose expanding this chance to remove all HTML comments prior to coding the less-than character. To remove all HTML comments, the line $text = str_replace('', '', $text); above should be changed to:
$text = preg_replace('/<!--(.|\s)*?-->/', '', $text);Reproducing the bug
- Create a series of nodes using the "Full HTML" input filter.
- Give these nodes unique summaries (teasers). That is, instead of "splitting" the node, give these nodes unique teasers that are not intended to be joined with the rest of the body. (This is achieved by unchecking "Show summary in full view" above the node's summary/body textareas.)
- Tag each of these nodes with the same term.
- View the RSS feed for that term (at
example.com/taxonomy/term/TID/0/feed). You should see the literal text "<!--break-->" at the top of each item.
The "<!--break-->" tag is also visible on each node's Dev Load tab.
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| html_corrector_break_tag.patch | 523 bytes | Ignored | None | None |

#1
Please note that, in the example above, the text "<!--break-->" has been stripped from inside the <code></code> tags. This results in my patch looking like it does nothing. Please view the patch file itself to see the un-stripped code.
(Perhaps the code filter should allow HTML comments?)
#2
Fixing typo in the issue title.
#3
Related, but strangely the opposite: #222926: htmlcorrector filter escapes HTML comments
#4
@Damien Tournoud: Actually, I think that issue is exactly the same. (And, unfortunately, it's showing up in 6.x and 7.x.)
#5
Indeed the problem is the same, and had already been identified two years ago (#97182: <!--break --> is transformed into html code with lt and gt), which makes me wonder how the HTML corrector module was merged into core with such a major problem.
João
#6
@jcnventura: I left a message in the issue you posted directing people to the patch posted here. How can we get this into the next core release?
#7
@6: See the roadmap at http://drupal.org/node/222926#comment-1086392.
I'm marking this as duplicate of #222926: htmlcorrector filter escapes HTML comments because the underlying problem is incorrect handling of HTML comments by filter.module, and I don't believe a partial fix like this would ever be committed. Your patch may remain a useful fix though for some people until such time as the other issue is fixed.
Often patches don't get in because people don't test them so that might be somewhere you can help.
#8
How does one apply this patch? What file and where inside does one place the code?
#9
g10tto:
First, read this: Applying patches. The patch is applied to filter.module, which is the "component" of this issue (listed above). It's found in the core modules directory:
/modules/filter/filter.module.#10
This was patched in the latest version of v6.12, however I still get issues from time to time (like now) on new nodes, even if I replace the filter.module file with one that I know works on another Drupal site.
#11
I see this in 6.13.