Problem/Motivation

Since moving the Body field to the Field system in Drupal 7, the <!--break--> method of teaser splitting no longer works with Filtered HTML or Plain Text text formats. #18 offers a good analysis of the problem, which is essentially that in Drupal 6, teaser break processing jumped the queue and happened before the filter system. (TODO: Is that true? Would be good to investigate exactly how D6 and D7's code paths have diverged here.) In Drupal 7, the filtering happens later and filter_xss() is stripping out the comment.

Proposed resolution

One suggestion in #37 was to allow HTML comments to pass through filter_xss(). However, it was pointed out in #38 that we've had SAs related to HTML comments in the past. (TODO: Is that actually true? Citation needed. I searched security.drupal.org for "html comment" and came up empty handed.)

The solution that seems to have the most buy-in at the moment is temporarily switching the <!--break--> string to another string in order to escape filter_xss() unscathed. This would have the benefit of also working with other filters which might strip comments, such as http://drupal.org/project/iframe_filter.

Remaining tasks

Review and test the latest patch. However, validation of the approach is necessary, which means some legwork on:

- Investigate and document exactly why D6 deals with this situation fine and D7 doesn't.
- Investigate and document why #559584: filter_xss() and Line break filter break HTML comments didn't solve this problem.
- Get security team input on the options we have for fixing filter_xss() to deal with the teaser splitter (and other HTML comments) instead of being forced to hack around it.

Original report by thamas

The 'break' tag does not work with Filtered HTML, this may create problems at D6 to D7 upgrade.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

thamas’s picture

Title: The 'break' tag does not work with Filterd HTML » The 'break' tag does not work with Filtered HTML
Damien Tournoud’s picture

Status: Active » Postponed (maintainer needs more info)

The break tag is typically not used anymore in the new "Text and summary" field, and we have an upgrade path. Do you have specific concerns about the migration path?

thamas’s picture

Status: Postponed (maintainer needs more info) » Active
guusbosman’s picture

I ran into this when testing D7 beta 2.

http://drupal.org/node/546608 says that the

tag is still used in D7. Also, the text in a node preview still mentions the break tag ("You can insert the delimiter "

" (without the quotes) to fine-tune where your post gets split.").

If we're still supporting the break tag, I think that it's confusing if the Filtered HTML filter doesn't work with it. In D6 the break tag works okay with Filtered HTML.

I often use the break tag at the end of a Body when I don't want the trimmed version to be shorter than the full version.

tobiasb’s picture

Status: Active » Needs review
FileSize
1.24 KB
garpy’s picture

Version: 7.x-dev » 7.0-beta3

I tested this without the patch and was able to recreate the bug. I applied the patch and it fixed the bug. Working on local environment in MAMP. I used Drupal 7 beta3 release.

garpy’s picture

Status: Needs review » Reviewed & tested by the community
tobiasb’s picture

Version: 7.0-beta3 » 7.x-dev
sun.core’s picture

Status: Reviewed & tested by the community » Needs work
+++ modules/filter/filter.module	14 Nov 2010 16:50:04 -0000
@@ -1217,6 +1217,11 @@ function _filter_html_settings($form, &$
+  // filter_xss() removes html comments, so we need a workaround for the teaserbreaker

1) "html" should be all upper-case.

2) Seems to exceed 80 chars.

3) Missing trailing period.

+++ modules/filter/filter.module	14 Nov 2010 16:50:04 -0000
@@ -1217,6 +1217,11 @@ function _filter_html_settings($form, &$
+    $text = str_replace('<!--break-->', '[teaserbreaker]', $text);

[something] is a commonly used macro tag syntax used by contributed modules. This temporary replacement keyword should be more unique.

Powered by Dreditor.

sun.core’s picture

Component: node system » filter.module
sun.core’s picture

Issue tags: +Needs tests

Sorry for spamming, but this also needs tests.

calebgilbert’s picture

I get the same issue with the <!--break--> tag not working as it did in previous versions of Drupal, even with the Full HTML filter.

vito_swat’s picture

Priority: Normal » Major

I think this is major as this cause regression.
I checked this and my results:
1. In full HTML <!--break--> works.
2. In filtered HTML don't work.
3. After adding <!--break--> to allowed tags it also don't work
4. In Plain text this also don't work and <!--break--> is visible to the end user.

Looks like something is really broken here.

federico’s picture

It doesn't work for me in full HTML, if works only if I enable "Limit allowed HTML tags" option

But also, Convert line breaks into HTML (i.e.<br> and <p>) stops working after the

tag, causing the loss of format in nodes with the break tag.

To reproduce:
1- Install drupal 7 dev, do not touch filter module.
1- Create a new node using full html format
2- Paste this

1- This is the first paragraph and the summary. It contains a break comment at the end.<!--break-->

2- This is the second paragraph. 

3- This is the third paragraph. It should be an opening p tag at the beginning and a closing p tag at the end, in the generated source code.

I couldn't apply the patch

sun’s picture

This is indeed a major problem. I'm entirely not sure why the break tag is stripped in D7, but it must have been caused by one of these issues.

scito’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
FileSize
13.32 KB
11.54 KB

In fact there are two issues:

1) Issue #16 results from #559584: filter_xss() and Line break filter break HTML comments and is a bug of the line/paragraph filter.
$open = ($chunk[1] != '/' || $chunk[1] != '!'); is always true in the function _filter_autop().
Better is $open = ($chunk[1] != '/' && $chunk[1] != '!');

Changes in the patch: I've modified the filter.module and improved the tests. Note: I commented out (marked with TODO) the iframe test since it seems not useful for the line/paragraph filter. Should we treat the <iframe> tag as the other tags such as <pre> in the line break filter? Maybe this was the intention.
$chunks = preg_split('@(<!--.*?-->|</?(?:pre|script|style|object|!--)[^>]*>)@i', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

Note: I've replaced in the tests the "found" message text with "expected" as this seems clearer.

2) The issue described in #1 to #15 seems to be a consequence of #372743: Body and teaser as fields.
In Drupal 6 the teaser break tag seems to be treated early in the processing, i.e. before the filters. (Note: Teasers are stored in DB field node_revisions.teaser.) In Drupal 7 the body is an ordinary field and the break tag for the teaser is treated later in the processing, i.e. after the filters. Filters can do what ever they want with the content, e.g. filtering out of comments such as the break tag. One approach is to "protect" the break tag through a temporary replacement as this was proposed in #7. But this is more a workaround than a "real solution". There is no guarantee that this replacement always works. Maybe someone has a better idea.

I've implemented this workaround as a first step in _text_sanitize() of the text.module and added some tests: a combination of display formats (summary_or_trimmed and trimmed) and core filters (Full HTML, Filter HTML, Plain text).

We should probably split these two issues for the further resolution.

karschsp’s picture

#18: breaktag_881006_18.diff queued for re-testing.

scito’s picture

Component: filter.module » field system

I created for the problem described in #16 the new issue #1029108: Comments confuse line/paragraph filter and changed the component of this issue.

scito’s picture

I'm working on another approach where the body is split by the break tag before sanitizing. Teaser and body are filtered/processed separately for 'text_trimmed' or 'text_summary_or_trimmed'.

scito’s picture

This new patch saves the <!--break--> for trimmed displays. A pseudo column 'value_for_trimming' is used by _text_sanitize() in these trimming cases.

This approach seems more proper and feasible.

It's not possible to process teaser and body separately for all 'normal' values (i.e. no trimming cases) since some filters might "correct" missing end tags for the teaser. This is unwanted in a full node view situation. Thus, the new pseudo column 'value_for_trimming' was introduced.

Currently, the 'plain_text' filter encodes and displays <!--break-->. Is this the wished behavior? Or should the break tag be filtered out?

bfroehle’s picture

+++ modules/filter/filter.module
@@ -1603,7 +1603,7 @@ function _filter_autop($text) {
-      $open = ($chunk[1] != '/' || $chunk[1] != '!');
+      $open = ($chunk[1] != '/' && $chunk[1] != '!');

This from the earlier patch should either be included or split off into a different issue.

+++ modules/field/modules/text/text.moduleundefined
@@ -299,6 +304,10 @@ function text_field_formatter_view($entity_type, $entity, $field, $instance, $la
+ * &lt;--break--&gt> tag of the column 'value' which might be lost through

Do you mean &gt; ? Or just <!--break-->?

scito’s picture

FileSize
12.77 KB

This from the earlier patch should either be included or split off into a different issue.

I created a separate issue: #1029108: Comments confuse line/paragraph filter. I think it's easier attacking them independently.

I fixed the comment problem.

yched’s picture

subscribe for now

jenlampton’s picture

Component: field system » filter.module

Think this got moved to the wrong component ;)

yched’s picture

Actually, no. The summary generation does belong to text field code now.

I most probably won't have time to look into this before drupalcon.

yched’s picture

Component: filter.module » field system
Frits1980’s picture

subscribing

vito_swat’s picture

Currently, the 'plain_text' filter encodes and displays <!--break-->. Is this the wished behavior? Or should the break tag be filtered out?

For my common sense break tag definitely should not be visible no matter which built in or exotic filter you use! This is special, drupal only tag to act as teaser splitter. Nothing more.

daftu’s picture

subscribe

Agileware’s picture

In #2 Damien said:

The break tag is typically not used anymore in the new "Text and summary" field, and we have an upgrade path. Do you have specific concerns about the migration path?

Do you mean there is a migration path from <!--break--> to using the summary part of the "Text and summary" field?
Because I just upgraded a site and still have 12000 nodes that have the <!--break--> comment in the body and 110 of those 12000 don't have any text in the body_summary.

scito’s picture

Version: 7.x-dev » 8.x-dev
Issue tags: +Needs backport to D7

Bumping and tagging as per standard policy.

scito’s picture

#1087440: HTML Purifier break Drupal 7.x <!--break--> handling is an issue of the HTML Purifier module concerning this problem.

webernet’s picture

Title: The 'break' tag does not work with Filtered HTML » REgression: 'break' tag doesn't work with Filtered HTML

I've been testing D6 to D7 upgrades and this is definitely a major issue as I don't want to convert all nodes to full HTML.

Damien Tournoud’s picture

Ok, it seems that you need to add !== to the list of allowed tags for the comments to be kept. Not sure if we wouldn't rather want to allow comments to pass-thru filter_xss() by default.

sun’s picture

Title: REgression: 'break' tag doesn't work with Filtered HTML » Regression: 'break' tag doesn't work with Filtered HTML
Issue tags: +Regression

I'm pretty sure that we had several SAs pertaining to HTML comments in the past, so I don't think that making filter_xss() ignore comments is an option.

I think we need to go back to one of the earlier patches and go with the approach of replacing <!--break--> with a different delimiter.

webernet’s picture

FileSize
1.12 KB

Revised version of #7.

Uses time() to make a (hopefully) unique substitute.

Needs tests.

aspilicious’s picture

Status: Needs review » Needs work
+++ b/modules/filter/filter.moduleundefined
@@ -1220,6 +1220,13 @@ function _filter_html_settings($form, &$form_state, $filter, $format, $defaults)
+  // Substitute for the <!--break--> tag as filter_xss() removes HTML comments. ¶
+  $teaser_breaker = strpos($text, '<!--break-->');

trailing whitespace

Powered by Dreditor.

webernet’s picture

Status: Needs work » Needs review
FileSize
1.8 KB

Fixed the whitespace and added a basic test.

Status: Needs review » Needs work

The last submitted patch, break_tag-881006-2.patch, failed testing.

webernet’s picture

Status: Needs work » Needs review
FileSize
1.78 KB

Without the commented out sections.

Status: Needs review » Needs work
Issue tags: -Regression, -Needs backport to D7

The last submitted patch, break_tag-881006-3.patch, failed testing.

webernet’s picture

Status: Needs work » Needs review
Issue tags: +Regression, +Needs backport to D7

#43: break_tag-881006-3.patch queued for re-testing.

scito’s picture

Using a replacement text for breaker tag is not a reliable method. You do not have control over what filters do with your replacement text. The patch of #43 uses the time. Imagine, there might be a (dump) filter which replaces all numbers with images or 'number words'.

Patch #24 avoids this. The breaker tag is not passed to filters. But it's a more complex solution.

I propose to include the tests of #24 in other patches. They are independent of the actual patch.

webernet’s picture

The substitute text is only used within the one filter in order to allow the break tag to bypass filter_xss().

The temporary replacement text isn't seen by any other filters as the original <--break--> text is restored within the same filter, before the next filter even sees it.

Clean’s picture

I've been banging my head and wondering about this for days, I thought I messed up something. At the moment, seem like WYSIWG don't work with summary field also. I have to copy the source code from body field to the summary to actually get things right.

This is ugly and unacceptable, but at least it work on front end. Not sure about impact on SEO anyway.
This is a rally big issue because almost every content website need it ;(

John

xjm’s picture

Tagging issues not yet using summary template.

ibex’s picture

We can fix the built-in core filters (proposition #47) or build an architecture that fixes/avoids the problem for every filter including contributed filters (proposition #24).

Maybe we could split this issue for fixing the core filters in 7.x in a simple way and improve the architecture in 8.x.

webernet’s picture

I hate to bump this, but this is a major regression and it's still at needs review two months and three releases since the latest patch was posted... the issue itself is over a year old.

varkenshand’s picture

Component: filter.module » field system

I agree this is serious and strange behaviour.
In my case I have noticed (D7.7 and D7.8) that the break tag works as intended on a clean install and in a new node.
In my test site however, the tag is ignored when there is an image present at the top (generated with imagecache, that is). Once the image is there, all breaks are ignored. This makes the taxonomy term view unusable.

leoklein’s picture

subscribing

nico.knaepen’s picture

subscribing

Anonymous’s picture

Status: Needs review » Reviewed & tested by the community

this looks good to go to me, it's a very simple patch, with a test.

chx’s picture

Status: Reviewed & tested by the community » Needs review
chx’s picture

Component: field system » filter.module
drmonkeyninja’s picture

Including < !-- > (ignoring the spaces between the brackets) in your allowed HTML tags for the input format will ensure that the break is observed.

catch’s picture

Status: Needs review » Needs work
Issue tags: +Needs issue summary update, +Regression, +Needs backport to D7

The last submitted patch, break_tag-881006-3.patch, failed testing.

smitty’s picture

subscribe

jenlampton’s picture

@smitty - no need to 'subscribe' anymore. Try the "follow" link at top right :)

smitty’s picture

@jenlampton - I tried it. But every time I came back to this issue there was "Follow" again instead of "Following" (in the meantime it seems to work). But anyway: even "Following" seems to do nothing. I couldn't find any place where this notification shows up and I didn't get a message about your new posting.

Is there any documentation about this "Follow" button?

jenlampton’s picture

Status: Needs work » Needs review
FileSize
1.78 KB
1.82 KB

It's surely documented somewhere, I just don't know where. For me "follow" works exactly the same way as adding a comment. it adds issues to the list on my dashboard (which is where I check on them).

And since we're off topic and I'm feeling bad about that, patch re-rolled since core directory move. looks like the old patch still applies cleanly to D7.

varkenshand’s picture

Still persistent ignoring of the break tag, in version 7.10. Both filtered and full html.

jenlampton’s picture

@varkenshand even after applying the patch?

varkenshand’s picture

I never applied a patch in my life, to be honest. In general I use the latest dev versions of modules and the most recent stable Drupal.

Update: I tried several programs to apply patches but frankly they are the worst software I have seen in ages (Windows 7 environment). What I would like to do is
- Download the module
- Apply patch to file
- Upload the module again to test.
This does not seem to be possible at present.

jenlampton’s picture

@varkenshand We've provided a patch that is meant to fix the problem, but before the fix will be included in the next version of Drupal, we will need some help testing it.

You can get some information on applying patches for mac or for windows if you'd like to help us out. :) It's actually really straightforward if you are on Linux, or if you are already using git.

rangermeier’s picture

the patch from #64 works for me (on drupal 7.10). thanks!

steinmb’s picture

Confirm that #64 apply cleanly against D7.10, and even more important, it works :)

ralphb’s picture

Component: field system » filter.module

I can confirm that #64 works against 7.12.

I also agree with #18, though, that the patch is merely a kludge that may break any time. Adding time() is cute but pointless, you could use any fixed random string for that purpose.

oriol_e9g’s picture

I think that we need better tests, not only test _filter_html() function. We also need test a real node creation with the break tag.

Only tests patch should fail, complete patch should pass.

Status: Needs review » Needs work

The last submitted patch, allow-break-tag-onlytests-881006-72.patch, failed testing.

oriol_e9g’s picture

Status: Needs work » Needs review

The tests fail as expected, the full patch pass.

jenlampton’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: +needs backport to 6.x

Works for me :)
I believe this is still an issue in D6 as well, or is that being tracked in this other issue?
#222926: HTML Corrector filter escapes HTML comments

catch’s picture

Status: Reviewed & tested by the community » Needs work

Normally we'd use REQUEST_TIME rather than time().

Since we don't need the time at all, can we do something like hash the break tag, or use an actual random string or number. Either way it needs a comment as to why that's needed rather than a static string.

j0rd’s picture

I'm running into this problem with my trimmed_plaintext module. I need this issue resolved before I can support breaks.

I can't test the latest patch as it was made for D8 and not D7.

With that all said, as a module developer who's having to deal with this, I have a problem with the way Drupal is handling this. Not sure if the patches resolve my particular qualm, but never the less.

In hook_field_formatter_view($entity_type, $entity, $field, $instance, $langcode, $items, $display) I would much prefer that $item['summary'] always be populated with the appropriate summary, be this caused by a BREAK tag or by the end user splitting the field into two parts. It doesn't really make sense, when a BREAK is used, not to have this field populated. I assume this is done, to create simplicity and not having to synchronize a full body with BREAK and the summary it would create. It would on the other hand stop bugs from happening in field_formatter modules.

oriol_e9g’s picture

Status: Needs work » Needs review
FileSize
3.63 KB

Trying again, I have used md5 instead of sha1 or similar because the first is faster.

Only failing tests in #72

oriol_e9g’s picture

@j0rd This is a Drupal 7 version for testing purpose.

oriol_e9g’s picture

xjm’s picture

Assigned: Unassigned » xjm
Status: Needs review » Needs work
+++ b/core/modules/filter/filter.moduleundefined
@@ -1231,6 +1231,14 @@ function _filter_html_settings($form, &$form_state, $filter, $format, $defaults)
+  // Substitute for the <!--break--> tag as filter_xss() removes HTML comments.
+  $teaser_breaker = strpos($text, '<!--break-->');
+  if ($teaser_breaker !== FALSE) {
+    // Md5 hash to make an unique substitute.
+    $break_tag_replacement = '[[[break-' . md5($text) . ']]]';
+    $text = str_replace('<!--break-->', $break_tag_replacement, $text);
+  }

The comments here are a little confusing. The name $teaser_breaker is also a little odd.

+++ b/core/modules/node/node.testundefined
@@ -624,6 +624,42 @@ class SummaryLengthTestCase extends DrupalWebTestCase {
+class SummaryBreak extends DrupalWebTestCase {

This class needs a docblock.

+++ b/core/modules/node/node.testundefined
@@ -624,6 +624,42 @@ class SummaryLengthTestCase extends DrupalWebTestCase {
+   * Creates a node with break delimitator to test the teaser break functionality.

"Delimitator" is not a word. It should be "delimiter." (This change should also get the summary under 80 chars.)

I'll reroll to clean these things up.

xjm’s picture

+++ b/core/modules/node/node.testundefined
@@ -624,6 +624,42 @@ class SummaryLengthTestCase extends DrupalWebTestCase {
+    $this->assertRaw('Lorem ipsum dolor sit amet', 'The node sumamry is found.', 'Node');
...
+    $this->assertRaw('Lorem ipsum dolor sit amet', 'The node sumamry is found.', 'Node');

Oh, also, "summary" is misspelled here.

xjm’s picture

Status: Needs work » Needs review
FileSize
4.31 KB
3.6 KB
2.44 KB

Attached:

  • Simplifies the code a little and uses an md5 of <!--break--> rather than the whole text. (We just need a placeholder that is not found in the text; no need to have the overhead of md5()ing the whole $text.)
  • Clarifies several inline comments.
  • Corrects typos.
  • Removes unnecessary t() from assertion message text. (Reference: #500866: [META] remove t() from assert message).
  • Adds a docblock to the test class.
  • Renames the test class to something a bit more standard.

Status: Needs review » Needs work

The last submitted patch, break-separator-881006-83.patch, failed testing.

xjm’s picture

Whoops. Deleted one line too many. Let's try that again...

xjm’s picture

Status: Needs work » Needs review
FileSize
3.64 KB
2.44 KB

*grumble*

artfulrobot’s picture

I've used this patch except that I've swapped the unnecessary md5 calculation for a literal string.

Status: Needs review » Needs work

The last submitted patch, break-separator-881006-88-mine.patch, failed testing.

xjm’s picture

That's a good suggestion. However, the patch needs to be rolled against 8.x. Also, the comment should be on the preceding line so that it doesn't go over 80 characters.

artfulrobot’s picture

Perhaps I should have not submitted a patch and just described it. So here goes:

The md5("<!--break-->") is unnecessary because the md5 of a constant is always going to be constant.

Drupal is heavy enough without adding further inefficiencies IMHO.

Perhaps xjm could incorporate into patch @ #88 ?

xjm’s picture

Status: Needs work » Needs review
FileSize
621 bytes
3.72 KB

:P

webchick’s picture

Issue summary: View changes

Attempting to summarize the issue to date. This probably still needs work.

webchick’s picture

The approach in this patch makes my spidey sense tingle, so I read through the issue looking for answers to some questions. I didn't find them, but while I was at it I updated the issue summary. ;)

The following is still unclear to me:

- Why exactly is D6 fine with this situation, and D7 is so not? #18 comes closest to answering this question, but it's still a little vague on specifics. I'd love to know what's the actual flow/missing function/whatever in D6 vs. D7 that causes break tags to get screwed up?
- chx's question in #56 seems to still be unanswered. Why didn't #559584: filter_xss() and Line break filter break HTML comments fix this indeed? Related to the above. Possibly answered by the above.
- Damien's proposal in #37 is how I'd expect to fix this. Namely, if our problem is with filter_xss(), let's fix it there, and just allow comments. This was discounted in #38 saying that "I'm pretty sure that we had several SAs pertaining to HTML comments in the past, so I don't think that making filter_xss() ignore comments is an option." However, I search security.drupal.org for "html comment" and found no such issues (doesn't mean they don't exist, just means I'd love to have this option 'officially' ruled out before we do this weird "temporary string switching" behaviour).

If anyone could lend some clarity to these points, I'd feel a lot more comfortable committing the approach in #93.

borazslo’s picture

#88 with #93 Rocks!! Yeah! And thx!

xjm’s picture

Assigned: xjm » Unassigned
litmusdesigns’s picture

#58 is correct.
I can confirm that adding <!--> to allowed formats fixes the issue with DrupalBreak and Filtered HTML without applying a patch

catch’s picture

Status: Needs review » Needs work

That seems like a better fix to me than the md5() trickery. Let's add that as a default.

xjm’s picture

Issue tags: +Needs security review

@catch -- I think HTML comments were rejected as a solution in #38 (see also webchick's review in #94). So maybe we need a security review on whether adding them is safe?

I'm also curious as to how this regression came into being. I spent probably half an hour looking into it a couple weeks ago and could not find a reason for it when comparing the D6 and D7 code for teaser generation.

smitty’s picture

#58 / #97 works fine: After adding <!--> to allowed formats the break in the teaser is correct.

BUT: At the same time you get a line-break in the full text at the position where the <!-break-> is entered.

herzbube’s picture

Adding <!--break--> to the Filtered HTML format (as suggested by #58) does not work for me. I am on D7.14, having upgraded from D6 about a week ago.

I have tried all sorts of things but couldn't get this to work:

  • Disabled all filters except for "Limit allowed HTML tags" to make sure that it's really this filter that causes the problem (it is)
  • Added <!--break--> at the front / end of the list of allowed tags
  • Add a space in front and after the work "break"
  • Cleared the cache
  • Doublechecked in the database that <!--break--> appears verbatim both in the format definition and the node body
smitty’s picture

@herzbube: Just leave out the Word "break". Only enter <!-->.

jenlampton’s picture

Status: Needs work » Needs review
FileSize
782 bytes

The attached patch allows the HTML comment tag for the Filtered HTML text format, as suggested in #98

This would solve all problems for sites upgrading from D6 to D7 (to D8?), and for sites that have not changed the allowed tags for the filtered HTML text format. I think it's not as ideal as #93 for a few reasons. For starters, I'm not sure this will work for sites that have changed the allowed tags on the Filtered HTML text format, or as in my case, created another text format with the HTML filter and a different set of tags than the default.

Additionally, I think we may need to document what <!--> means somewhere, since that was not needed in previous versions of Drupal.

xjm’s picture

Issue tags: +Needs tests

To consider the approach in #104, we need to make very sure any tags within the <!--> still get stripped. Let's try rolling #104 together with the earlier tests form #93, and maybe also explicitly add a test that <!-- <script></script>--> and variants gets stripped properly?

jenlampton’s picture

Status: Needs review » Needs work
FileSize
1.89 KB

I don't know how to incorporate the tests from #93 into this new style Symfony testy thingies, but I did some testing of my own, that failed :/

I created a node with the content:

this is a test <!-- with a <script>bad news</script> comment --> in the test.

And the resulting HTML on the page was:

<p>this is a test </p>
<!-- with a <script>bad news</script> comment --><p> in the test.</p>

We need to make sure the script tags get stripped somehow if we go with this approach, and it would also be nice if we didn't get extra unwanted P tags in the original content.

Oh, but I did I also rework the settings form to include a checkbox for allowing HTML comments (checked by default) instead of having to allow a funny-looking tag to the list of allowed tags, I like this UI better.

I also like this general approach better (allowing all safe HTML comments instead of only the break tag) because it also solves the problem for contrib modules like paging and ad, which are currently broken since they can't allow their stuff to get injected into node bodies (as HTML comments) until this gets fixed. Allowing only the break tag via md5 hash won't help them at all.

Who has strong regex fu? :)

drumm’s picture

We ran into this on Drupal.org, #1830022: Getting involved page content. Looks like we can work around it in D7 by appending ' <!-->' to filtered HTML's allowed tags in a custom update hook.

ergophobe’s picture

I wouldn't say I have strong regex fu and I'm not totally sure I understand what you're trying to achieve. But assuming it's just the use case above, I can make a regex for that.

Perhaps I've oversimplified, but I'm just stripping out attempts to inject script tags in and stripping them out without prejudice.

<script[^>]*>(?>([\n\r]|.)*?</script[^>]*>)

Essentially this is just a "lazy dot" match (.*?) that also catches newline (\n) and return (\r) in case it's multiline input and by taking them in any order it's agnostic with respect to Mac, Win, Lin.

I've also created an atomic group (the ?> operator) that essentially says to match the whole thing as a group. I can't think of a practical case where this will change the match, but what it does is offer some efficiency if the lazy dot fails to match (i.e. there is no closing script tag). Essentially, what would happen otherwise is, failing to match, the regex has to backtrack, but with the atomic group it says "if you don't find this full expression including the script tag at the end, then it's malformed and just keep going."

Limitations:

  1. Nesting. It doesn't expect nesting. So in the case of <script>blah blah<script>bad stuff</script> halb halb</script> it's going to output halb halb </script>. Not too tragic, it just leads to messy output, but that's an edge case.
  2. Unclosed script tag. This assumes the script tag actually closes. Does anyone know what browser behavior is when a script tag isn't closed? I assume it fails entirely, but I haven't tested that. If it doesn't just fail outright, then some additional sanitizing needs to happen here.

It does expect some other possible trickery in event someone is trying to fake out the regex. I don't know just how garbled code can get and a browser will attempt to parse it, but I added the [^>]* matches in the script tag, even the end tag, so that the following are at least made safe, even though the output will not always be well-formed:

  • <script attr="value">bad stuff</script garbageattribute="garbage"> - removed entirely
  • <script title="tricky ></script> stuff">bad stuff</script> - this will end up with malformed HTML, but it should be safe. It will match on the first </script> which is really part of the value of the title attribute and it will end up outputting stuff">bad stuff</script>. Not great, but I think that should be safe.
  • Multiline obfuscations
    <script fakeattr="this is
    my feeble attempt to trick you">bad stuff </script>
  • There might be other possibilities, but I think that should get rid of the case you posted originally and many, many more, but should be pretty safe in terms of leaving allowable code alone.

    Wthin the limits of formatting allowed on d.o, here's the gibberish input I tested against

    asdf asdfe asdf eoeasfakjsn as dfnasl oijf sdknf lsdkfj <script>bad stuff simplest case</script> asdfkj aslkdjfhf

    <script>bad stuff spread over more
    than one line bad stuff</script>

    asdf asdf asdfaesdf asdf asdf asdfaesdf asdf asdf asdfaesdf asdf asdfhlaksjhf <script src="source" otherstuff="otherstuff">bad stuff with an opening tag with attributes bad stuff</script> asdf asdf asdfaesdf asdf asdf asdfaesdf asdf asdf asdfaesdf asdf asdf asdfaesdf asdf asdf asdfaesdf asdf asdf asdfaesdf asdf asdf sdfhlaksjhf <script>bad stuff with stupid closing tag with attribute to trip up simple match bad stuff</script src="asdf"> asdf asdf asdfaesdf asasdfhlaksjhf

    <script
    src="garbage">bad stuff on multiple lines with malformed closing tag</scipt>

    asdf asdf asdfaes<script>not bad stuff, but we still need a closing tag for previous test</script> asdf asdf this is a test <!-- with a <script>bad news - Jen's original test case</script> comment --> in the test. asdfaesdf asdf asdfhlaksjhf <script title="tricky ></script> stuff">bad stuff bad stuff bad stuff</script> asdf asdf asdfaesdf asdf as

jenlampton’s picture

Status: Needs work » Needs review

For the record, we're still telling people to use this broken feature on the node preview page. the text says :

The trimmed version of your post shows what your post looks like when promoted to the main page or when exported for syndication. You can insert the delimiter "<!--break-->" (without the quotes) to fine-tune where your post gets split.

If we can't actually get a fix in for this in D7, can we at least remove the text that tells all our content creators to try this broken feature? :)

I'm changing status back to 'needs review' because I'd like people to look at the patch in 106 again. I'm not sure this has any actual security implications, since whatever gets inserted inside a HTML comment will be - get ready for this - commented out.

koppie’s picture

I'm re-uploading @jenlampton's patch from 109 since the Drupal bot never tested it for some reason. Green stripe, baby! Come on! :-D

I'm removing "needs backport to D6" because (a) I believe it works in D6, and (b) Jen agrees that it doesn't need to be backported, and she's the one who added the tag in the first place (13 months ago).

Status: Needs review » Needs work

The last submitted patch, core-break_tag_filtered_html-881006-110.patch, failed testing.

Atratus’s picture

For the sake of anyone needing to deal with this in a way that doesn't mess with the core, here is how I have dealt with this in a custom module. The following code will add two filters for the text formats. Use the "mangle" filter as the first filter in the text format and the "unmangle" as the last filter. It also deals with extraneous p tags wrapping the break inserted by the wysiwyg editor that I am using.

function mymodule_filter_info() {
  $filters = array();
  $filters['mangle-break'] = array(
    'title' => t('Prevent &lt;!--break--> from being stripped by mangling '),
    'process callback' => '_mymodule_hide_break',
  );
  $filters['unmangle-break'] = array(
    'title' => t('Restore mangled &lt;!--break--> tags '),
    'process callback' => '_mymodule_fix_break',
  );
  return $filters;
}

function _mymodule_hide_break($text, $filter, $format, $langcode, $cache, $cache_id) {
  $mangled = "[[".md5("<!--break-->")."]]";
  $text = str_ireplace('<p><!--break--></p>', '<!--break-->', $text);
  return str_replace('<!--break-->', $mangled, $text);
}

function _mymodule_fix_break($text, $filter, $format, $langcode, $cache, $cache_id) {
  $mangled = "[[".md5("<!--break-->")."]]";
  return str_replace($mangled,'<!--break-->', $text);
}
Atratus’s picture

Issue summary: View changes

Formatting issues.

calebgilbert’s picture

After playing with things some more I wonder if this whole issue isn't actually more simple than it seems...

...the break tag seems to work normally for me after simply going to a content type's "manage display" settings and changing the "body" field format from "Default" to "Summary or trimmed". This begs the question what "Default" even means if not "Summary or trimmed" (see screenshot), but none-the-less the break tag works as expected for me if I change this setting.

UPDATE:
I was on the right track, but changing the body field on the default display settings page is not quite right. Fwiw, and based on my tinkering I believe the following steps demonstrate how the core display settings could/should have shipped in order to have the break tag work properly out of the box:

1. On the default display settings leave the body field to "Default". We need to create a teaser display setting instead. To do this select the "Teaser" option underneath the "Use custom display settings for the following view modes" collapsible fieldset and click save. (screenshot)

2. Choose the teaser display settings you just created and then change the body field on that page to "Summary or trimmed". The break tag behavior should now work for you as expected. (screenshot)

TWD’s picture

subscribing.

May try #112 by Atratus and report.

TWD’s picture

The custom module in #112 Atratus works like a charm!

But it would help other noobs like me I think to explain explicitly the thought process behind it and the necessary configuration setting.

The problem is that core strips out the break tag from filtered HTML.
So Atratus "tricks" the filter by disguising the tag using the .md5 PHP hash function. That's what the "hide_break" callback function is all about.
Then once it has passed the filters, he undisguises the tag using the "fix_break" callback function so that it gets rendered into HTML.

The important point to note is that this custom module adds two "filters" into the Configuration/Content Authoring/Text Formats screen.

1. You need to enable both:
"Prevent <!--break--> from being stripped by mangling "
and
"Restore mangled <!--break--> tags "

2. In the Filter Processing Order on the same page you need to drag the "Prevent Stripping" process so that it is just above the Limit Allowed HTML tags and drag "Restore Mangled" so it is just below the Limit Allowed HTML process.

jhedstrom’s picture

Version: 8.0.x-dev » 7.x-dev

Any mention of the <!--break--> tag was removed in #1510544: Allow to preview content in an actual live environment. Moving this back to 7.x.

w00’s picture

I was bitten by this bug today.

I'd noticed it working occasionally, but mostly not, in previous Drupal encounters.

So I set up a new site yesterday and was pleasantly surprised that my Summary was broken as expected.

Today I added a page or tweaked a page, and saw it *not* working.

Spent far too much time diagnosing the issue. Once done, I used the new-found knowledge to find this page. And I see it's been an issue for... 5 years?!?

As far as I can tell, either I need to:

  1. Apply one of the patches here (after every update I imagine);
  2. I have to grant Full HTML privileges to all users;
  3. Or I have to tell users to copy / paste the summary and to do it after every change to the body.

Really, none of these options are palatable, not even reasonable.

I'll admit I haven't read all 116 comments that precede mine, but as of Drupal 7.34 (applying 7.35 this evening or tomorrow), this issue persists and there doesn't seem to be any good reason.

After the SQL-cleansing bug a while back, now this? I'm thinking something's really wrong here. I understand that bugs can happen (i.e. SQL or this one), but that it hasn't been addressed in the main branch is nothing short of ridiculous, frankly.

herzbube’s picture

@w00 The workaround that I have been living with for 3 years now (and which has been mentioned in the older comments a couple of times) is to insert <!--> into the list of allowed HTML tags. It's not too bad, which is why I suspect that the issue hasn't been given more attention. Just make sure that you add <!--> and not <!--break-->.

w00’s picture

herzbube, Thank you for your reply.

That does, indeed, work.

Not pretty, but acceptable.

Thanks again.

Oh, and for anyone that finds this thread and isn't sure how to accomplish this:

This can be accomplished by going to /admin/config/content/formats/1 (Configuration > Content Authoring > Text Formats > Filtered HTML configure) and selecting Limit Allowed HTML Tags.

Add " <!-->" to the end of the list of tags under the label Allowed HTML tags.

Michael_Lessard_micles.biz’s picture

The quickfix in #118 above works in Drupal 7.38
as long as your Content Types have the proper Manage Display settings, namely Teasers (custom) set to Summary or Trimmed.

I am migrating from D6 to D7 at the moment.

Nb: I fully support the indignation that this is not fixed in core by now. Hope it is fixed in D8, as D6 content is planned to be importable by D8. This bug causes a regression/migration hurdle (as if it wasn't a painful process already), where all your nodes might appear fully where teasers are supposed to be shown.

cilefen’s picture

Status: Needs work » Needs review
FileSize
1.82 KB

According to #116, all mention of this inline tag was removed in Drupal 8.

This is @jenlampton's patch, backported to Drupal 7 and with a hook_update_N() to enable the setting.

Status: Needs review » Needs work

The last submitted patch, 121: regression_break_tag-881006-121.patch, failed testing.

guusbosman’s picture

Haha, it took me 5 years from trying an early beta (comment #6) to finally making the switch to D7. Good too see that there's a work-around (#118 worked fine for me).

massiws’s picture

#118 worked fine for me too.