Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
The HTML Corrector module replaces the < bracket with a < when it is the start of an HTML comment. There are two cases where this is a problem:
- Users have fill HTML access and you want to be able to catch missing tags. Users need to be able to copy in content from an older site which contains HTML comments, or the users want to be able to put in comments for their own use.
- At least one module I maintain (Table of Contents) and possibly others use HTML comments as markup so if the module is disabled nothing is rendered to the user. This worked fine in D5, but if the HTML corrector module is enabled then the markup is rendered.
If this is considered to a valid issue, then let me know and I'll create a patch to fix it. Otherwise, please suggest ways around this!
Thanks,
--Andrew
Comment | File | Size | Author |
---|---|---|---|
#102 | 222926-filter.patch | 6.38 KB | smk-ka |
#99 | filter.patch | 2.37 KB | jenlampton |
#91 | 222926-90-html-filter.patch | 3.19 KB | tic2000 |
#69 | htmlcorrector_comments3.patch | 3.64 KB | dixon_ |
#66 | htmlcorrector_comments2.patch | 3.19 KB | jcnventura |
Comments
Comment #1
henrrrik CreditAttribution: henrrrik commentedThis affects my XStandard module too.
This tweak to the regular expression in the
_filter_htmlcorrector
function seems to do the trick:Comment #2
gpk CreditAttribution: gpk commented(Duplicated at http://drupal.org/node/221252.)
The regex might be marginally simpler as follows:
Yes, this needs a patch.
Once the regex is modified to catch HTML comments as well as other tags, how does http://api.drupal.org/api/function/_filter_htmlcorrector/6 cope with the fact that the comment is never closed in the way that most tags should be? i.e. if you have <!-- comment --> does try to close it by inserting </--> or similar later on?
Comment #3
henrrrik CreditAttribution: henrrrik commentedYes, it closes with a
</!-->
tag.This adjustment to the code that closes the tag fixes that:
Comment #4
gpk CreditAttribution: gpk commentedSee also http://drupal.org/node/97182 (same issue but in the context of htmlcorrector as a contrib module for Drupal 5.x). I'd mark one or other duplicate except that these are now separate projects since htmlcorrector has been taken into core in 6.x.
Comment #5
Spaceman-Spiff CreditAttribution: Spaceman-Spiff commentedAny idea if this will be fixed in a future release or such fix/module/patch has been released?
Will the htmlcorrector.module.comment_fix.patch posted by jcfiala work on 6.x?
Comment #6
jvong CreditAttribution: jvong commentedHello,
I have posted http://drupal.org/node/240312 how do I resolve this issue. Can someone please help?
Thanks,
John
Comment #7
walker_643 CreditAttribution: walker_643 commentedPatching modules/filter/filter.module fixed this (very annoying) issue for me. I hereby vote for such a patch in 6.3/7.x
Comment #8
sunRe-rolled my patch of #97182: <!--break --> is transformed into html code with lt and gt against Drupal 6.x. Should fix this issue without much changes.
Comment #9
sunRe-rolled patch patch against HEAD. Dunno whether this issue should be moved to 7.x as well.
Comment #10
gpk CreditAttribution: gpk commentedMaybe the patch at http://drupal.org/node/269095 is better?
Comment #11
sunRe-rolled patch from #269095: [Filter] FULL HTML filter with HTML corrector enabled break comments <!-- ..... --> for D7 and marked that issue as duplicate.
I did not test the patch yet, but it looks indeed simpler.
Comment #12
jcnventura CreditAttribution: jcnventura commentedThese patches only deal with:
1. Preventing the replacement of '<' by '<'
2. Preventing the insertion of a dummy closing tag for the comments
However, the contents of the comment are still processed by the HTML corrector module.. The attached patch extends #11 to prevent the processing of the contents of the comment. To test this, create a multi-line comment.
Drupal 6.3 currently produces :
With the patch the output is:
The attached patch is against Drupal 6.3, and please, please commit it before D6.4. I have tested this patch and it works nicely for me.
João
PS: Updated on 2008-07-22 because the original patch terminated the comment with any '>' and not only with the first '-->'
Comment #13
Matt BThis works for me - thanks!
Comment #14
jcnventura CreditAttribution: jcnventura commentedI shouldn't be doing this, but since this bug is interfering with so many modules (including the AdSense module), I am raising this to critical hoping that it will be tested/reviewed and merged before Drupal 6.4 is released.
João
Comment #15
Matt Bdidn't make it to Drupal 6.4, I had to repatch when I upgraded (patch still works great). This needs to be fixed, Drupal 6 is not usable for me without it!
Comment #16
stefan_seefeld CreditAttribution: stefan_seefeld commentedI'm reading this discussion as I'm needing some custom markup on user-generated pages, too. I wonder whether it would be useful to allow (or even pre-define) processing instructions (i.e. <?...?>), not only generic xml comment elements, to be preserved such that specific modules can handle them.
Thanks,
Stefan
Comment #17
Matt BStefan - would it not be better for the specific module to implement a filter to do this?
Comment #18
stefan_seefeld CreditAttribution: stefan_seefeld commentedCertainly, the module in question needs to recognize the instruction.
All I'm asking here is whether it is possible to allow for processing instructions to be preserved, i.e. not being filtered out or escaped. If that's already the case, just ignore my request. :-)
Thanks,
Stefan
PS: as a use-case, consider the 'print' module with its ability to generate pdf. I want to inject page-break markers into my nodes, specifically targetted at the pdf generator, but I have no idea whether drupal would allow processing instructions, i.e. whether those can be passed through.
Comment #19
jcnventura CreditAttribution: jcnventura commented@stefan_seefeld: please don't cloud the issue here.. The problem here has nothing to do with inserting input filters like you're suggesting. That's to be left to the modules themselves.
The problem here is that HTML comments are escaped by the HTML corrector, rendering them visible on the normal content..
Joao
PS: As to what we had discussed for the print module, this needs to be fixed because PDF specific-instructions must be placed in the content in a way that is not visible normally and that activates only during PDF generation.. I would prefer to use HTML comments for that as it be less overhead than using an input filter.
Comment #20
stefan_seefeld CreditAttribution: stefan_seefeld commented@jcnventura: sorry, I misunderstood what the issue was. I assumed that the escaping of comments (and possibly processing-instructions) would not only display them when not wanted, but also make it impossible for other modules to pick them up.
I'll follow-up on the rest on the print-specific issue tracker.
Thanks,
Stefan
Comment #21
tcblack CreditAttribution: tcblack commentedThe latest patch appears to work for me.
Comment #22
jcnventura CreditAttribution: jcnventura commentedWell, it seems to work for at least three persons: me, tcblack and Matt B.
Marking it RTBC, in the hopes that it will be committed soon.
João Ventura
Comment #23
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis is a patch to a rather complicated piece of code. It will require a full-featured unit test case before going in.
Comment #24
jcnventura CreditAttribution: jcnventura commented@Damien Tournoud : I agree with what you're saying on the part that this is a patch to a rather complicated piece of code. However, I disagree with the need for a fully-featured test case. Why?
1. Because of this bug in the HTML corrector, that feature is almost unusable since it is breaking a lot of contrib modules.
2. The patch is only 3 lines! It's quite easy to see that the only changes are related to processing of HTML comments.
3. The patch has been tested to work on Drupal 6.4 by at least 3 persons.
Since the tests for the HTML corrector filter haven't yet been done (just looked in D7 CVS), this isn't a simple case of waiting for someone to extend those tests to cover this case. You're proposing that this (simple) fix be put on hold until a very complex set of simpletests are written.
From this thread I can tell you that this is breaking the following modules:
- Table of Contents
- XStandard
- AdSense
João
Comment #25
Damien Tournoud CreditAttribution: Damien Tournoud commented@jcnventura, sorry, but the Html Corrector itself is a rather complicated piece of code. I would like to see what's happen in corner cases like
<!--
inside tags and so on. At this time, the whole filter is untested so we really don't know.I'm also really unsure this could break the modules you listed. For Table of Contents, for example, you just have to be sure to order the filters the right way (the HTML corrector has to be on top). So I would say this is more an annoyance than a breakage.
Comment #26
jcnventura CreditAttribution: jcnventura commentedI was going to put it RTBC again, but I give up. You're right that all those modules aren't broken if the HTML corrector filter is the first one to execute..
However, a lot of people monetize their sites using AdSense (and most of them don't use the AdSense module as it's not clear if using it is not a violation of Google's TOS). So they are ordered by Google to type in the following in their pages:
What do you think happens with that comment in the script? It gets turned into something that is definitively against Google's TOS. In some archaic browsers it will even display the code (the reason why everyone comments the inside of JavaScripts).
The only way to avoid this is to turn off the HTML corrector, which most people don't even know it's there.
However, the patch doesn't need work. I am setting it to needs review. What needs work is the filter module simpletests.
João
Comment #27
Damien Tournoud CreditAttribution: Damien Tournoud commentedOne last thing: if I understand all this right (we don't have a test case, so I can't be 100% sure), the htmlcorrector with the patch in #12 will probably do some silly things like transforming:
<!-- this is a test <i>test</i>
to<!--this is a test <i>test</i></!-->
,while it will not even try to close
<!-- this is a new test
because there are no matching">"
.This definitely needs *first* a full test case, and *only then* a patch.
Comment #28
deviantintegral CreditAttribution: deviantintegral commented@Damien Tournoud: Actually, that's not quite true. In the best case scenario, I'd like to tell users to place the HTML corrector before the tableofcontents filter. Then, there should be less things to break when users forget to do things like close heading tags. Most users setting up filters would generally do HTML corrector followed by HTML filter, followed by additional filters, but that's not possible until this patch lands.
Comment #29
Damien Tournoud CreditAttribution: Damien Tournoud commentedPutting an adsense code in a node content makes no sense at all. The least bad option is to use a block for this and place it in the footer.
Comment #30
sunPlease stop discussing about tests. This has to be fixed in 7.x first, to be back-ported to 6.x. And we have a clear rule for patches against 7.x - everything needs tests, even if there are none yet.
Comment #31
dman CreditAttribution: dman commentedPlacing adsense code in node content is certainly against best practices. But not unreasonable.
.. but ALLOWING SCRIPT TAGS in HTML-filtered input? And then finding that the comments inside the scripts were a problem?!
HTML filters are just protection against potentially malicious code. If you trust yourself enough to break your own system to this extent - don't use HTML filter! FFS.
.dan.
Comment #32
jcnventura CreditAttribution: jcnventura commentedJust a comment on #27: there is no correct way for this. A comment can contain HTML inside (and even other comments), so actually the silly thing that it may do is not so silly (I haven't tested that situation). Where's the boundary if you forget to close the comment?
My personal opinion is that it should be when it reaches the
</body>
tag. That way, when the rest of the content disappears, the user knows that something is amiss.I am creating a link to this issue in the issue for creating the tests for the filter module (#276597: TestingParty08: filter.module).
João
Comment #33
peterx CreditAttribution: peterx commentedPatch from #12 works in my 6.4 site with wide range of comments including comments around javascript. Some of the comments and javascript arrive in AHAH from other Web sites. Do I use this patch or turn off HTML correction? The incoming HTML needs filtering because the HTML was created by Orks for Internet Explorer 0.5 alpha and that rules out switching off HTML filtering and correction. Thank you for the patch.
Comment #34
Matt Bagain, this hasn't made it to Drupal 6.5, so I had to repatch. Without doing this, Drupal 6 is not usable (for me)! Works fine in drupal 6.5
Comment #35
alexanderpas CreditAttribution: alexanderpas commentedroadmap:
- Create Tests (that'll fail) for 7.x (#276597: TestingParty08: filter.module)
- fix 7.x code
- No tests fail anymore
- backport to 6.x
- backport to 5.x html corrector module.
Comment #36
Matt Bpatch works fine on 6.6
Comment #37
veriKami CreditAttribution: veriKami commentedsubscribing :-)
Comment #38
grendzy CreditAttribution: grendzy commentedsubscribing
Comment #39
Todd Nienkerk CreditAttribution: Todd Nienkerk commentedI ran into the same problem in 6.x and created a patch for it: #343236: HTML corrector encodes (entifies) <!--break--> tag. Because the break tag is encoded, it shows up in RSS feeds.
Comment #40
palmstrom CreditAttribution: palmstrom commentedsubscribing
Comment #41
kmillecam CreditAttribution: kmillecam commentedsubscribing
Comment #42
alanBrookland CreditAttribution: alanBrookland commentedsubscribing
Comment #43
timatlee CreditAttribution: timatlee commentedsubscribing
Comment #44
escoles CreditAttribution: escoles commentedAppears to still be an issue for 6.9 -- I've just documented on two 6.9 installs where it will be a problem due to large amounts of legacy content that weren't previously affected by this issue. Hence, many contain HTML comments.
Is this issue re. 6.x or 7.x? "Version" is currently set to 7.x-dev, but most of the discussion is re. 6.x. Can someone help me understand why this issue would be deferred to 7.x? Does this mean that if we need a fix in 6.x, we're always going to have to patch-over the latest HTML Corrector release?
CORRECTION -- I now see the rationale in the "roadmap" here:
http://drupal.org/node/222926#comment-1086392
... so I see that plan seems to be to backport to 6.x after using 7.x efforts to unit test.
(Note that this is a particular problem for people who do a lot of cut & paste blogging. We need the corrector to close un-closed containers that we might miss by not scrutinizing the HTML of what we paste -- or to correct stuff we did in the past. But if we enable it, the comments embedded in the stuff we copy & paste will show.)
Comment #45
Dave Hirschman CreditAttribution: Dave Hirschman commentedThe HTML corrector filter will also substitute the html entity for '<' characters within javascript. For example,
if (x < y) { .... }
produces an error, whereas
if (y > x) {...}
does not.
For now, I guess I'll create a second input filter with HTML corrector turned off and select that for those few nodes where I want to use a bit of javascript or have some html comments.
Comment #46
seanrThis needs to be fixed in 6.x. What the hell is taking so long to get a clearly working patch committed? The problem here is that this filter is enabled by default in core and is breaking things. You cannot get any more critical than that. Please commit the fix!
Comment #47
seanrMarking as critical since it clearly is.
Comment #48
sunSee the required tasks in #222926-35: HTML Corrector filter escapes HTML comments
Comment #49
cameronp CreditAttribution: cameronp commentedsubscribing
Comment #50
jcfiala CreditAttribution: jcfiala commentedI tried to use the latest patch and it failed because of a drupal-6.4 or something in the patch file - I've adjusted the patch file so it runs w/ -p0
Comment #51
deviantintegral CreditAttribution: deviantintegral commentedFYI there is another related issue against D5 with the HTML filter #103563: HTML filter escaping html comments. If you're having problems with D6 even with the HTML corrector disabled, take a look at the patch over there.
Comment #52
elly CreditAttribution: elly commentedthanks for the d6 patch, it works like a charm!
Comment #53
jcnventura CreditAttribution: jcnventura commentedHello again,
Here's my latest attempt. I have to acknowledge that Damien Tournoud (among others) was completely right. After applying some tests*, I concluded that the patch was indeed flawed.
This applies to the current 7.x-dev, and includes a couple of tests that verify it. After applying this patch all the tests in #276597: TestingParty08: filter.module still pass.
One final note: as it is now, the patch only handles proper comments (i.e. it requires that the comment is properly closed). I didn't try to handle that case, as it is too complicated to decide where to close it. I think that the issue of correcting a comment that is not terminated should be handled elsewhere.
João
*developed by wrwrwr in #276597: TestingParty08: filter.module
Comment #54
jcnventura CreditAttribution: jcnventura commentedrevisiting #35:
1. Create Tests (that'll fail) for 7.x ------- done
2. fix 7.x code --------- done
3. No tests fail anymore ------- done
4. backport to 6.x ------ TBD
5. backport to 5.x html corrector module ------- TBD
Comment #55
monotaga CreditAttribution: monotaga commentedsubscribing
Comment #56
grendzy CreditAttribution: grendzy commented#54 looks good. The new test passes only after patching filter.module. I also tinkered with different strings in the test case, and couldn't find any issues. One thing I did notice is that self-closing tags inside of comments get corrected, like <br> gets converted to <br />, which seems harmless.
Also this patch passes coder's style checks.
Comment #57
sunThanks for working on this!
PHPDoc could use a proper description. Replace "2" with "Comments" in function name.
Comment #58
jcnventura CreditAttribution: jcnventura commented@sun: Changed the description and renamed the function as you asked
@grendzy: Thanks for bringing that to my attention. Since what you described went against the code of the changes I made, I went back and analyzed it better. I had forgotten to include the 's' modifer for the PCRE regular expression, so the HTML corrector was still active in multi-line comments. I have added a single-use tag to the (multi-line) html corrector test to verify that this is now working correctly.
I am setting this back to RTBC because the only change to the filter module was the 's' modifier. Feel free to set it back to 'needs review' if you believe that change to be important enough to downgrade it's status.
Comment #59
jcnventura CreditAttribution: jcnventura commentedComment #60
Damien Tournoud CreditAttribution: Damien Tournoud commentedWell, you added a single-line constraint on the "don't apply processing inside of those tags" regexp, please confirm in a test that it doesn't fall apart when the tags are on multiple lines. Example:
or:
Comment #61
jcnventura CreditAttribution: jcnventura commentedI didn't mention it, but the tests in #276597: TestingParty08: filter.module still pass, and one of them already handles the first case (which doesn't get affected at all by the s modifier since the tag is still in the same line).
However, your second case raises a valid point. Previously the tag matching stopped at the end of the line, and now it stops at the closing delimiter (>). Since the code is SUPPOSED to stop at the delimiter, I am pretty sure that simple change actually improved the code.
I will add a new test case to confirm that this doesn't break anything when opening or closing tags are spread over multiple lines.
João
Comment #62
jcnventura CreditAttribution: jcnventura commentedI couldn't come up with a new test, and because of that I remembered that the s modifier only applies to the dot character (http://es.php.net/manual/en/reference.pcre.pattern.modifiers.php).
So in simple terms, the s modifier added between #53 and #58 only modifies the handling of multi-line comments and that is already being tested in the test included in #58.
Based on that, as there's really no work needed, I am setting it to needs review. Note that grendzy had set it to RTBC in #56 (for 6 minutes)..
João
Comment #64
geerlingguy CreditAttribution: geerlingguy commentedSubscribe - this might also fix #348514: Node body does not handle <!--break--> properly.
Comment #65
JeremyFrench CreditAttribution: JeremyFrench commentedThe latest patch seems to change, filter.module and filter.test, should the test not be changed as part of #276597? If it should let me know and I will split the patch in two.
Comment #66
jcnventura CreditAttribution: jcnventura commentedApparently the testing system failed to install Drupal, and the only way to restart it is to re-upload the patch.
Comment #67
EvanDonovan CreditAttribution: EvanDonovan commentedThanks for your work on this! The patch in #66 resolved the issue for me on Drupal 6.11. I had to apply it manually though (one of the hunks failed because the offsets were too different from HEAD).
It also seems to fix #348514: Node body does not handle <!--break--> properly for me.
Comment #68
TBarregren CreditAttribution: TBarregren commentedJust for information: I have made a D6 module that can be used to overcome this problem. See http://drupal.org/project/htmlcomment.
Comment #69
dixon_Re rolled the patch from #66 to apply on a fresh D7 install.
Comment #71
stella CreditAttribution: stella commentedsetting back to needs review, to re-run tests
Comment #72
joshmillerbump
Comment #73
ChrisBryant CreditAttribution: ChrisBryant commentedI understand this also causes problems for the Paging module as seen here:
#400190: Teaser break tag <!--break--> exposed
Comment #75
deekayen CreditAttribution: deekayen commentedtrying out a new testing bot that apparently wasn't working
Comment #76
joshmillerThe new testing bot liked it. I applied the patch, the added test applied with some fuzz.
After reading through the issue que, this patch is ready to be committed.
Josh
Comment #77
tic2000 CreditAttribution: tic2000 commentedYou should keep an eye on #374441: Refactor Drupal HTML corrector (PHP5) too which solves a lot of the problems here. You can see in #45 there the problems that will still remain after this or that one get committed. My vote goes for #374441: Refactor Drupal HTML corrector (PHP5) which will trim the current patch here to 1 line.
Comment #78
jcnventura CreditAttribution: jcnventura commentedCan we then apply this to Drupal 6 to fix the damn bug?? And let the refactored Drupal HTML corrector for Drupal 7?
Comment #79
tic2000 CreditAttribution: tic2000 commented@jcnventura
Not until one of the patches gets committed.
Comment #80
tic2000 CreditAttribution: tic2000 commented#374441: Refactor Drupal HTML corrector (PHP5) got committed.
This patch no longer applies.
To resume what I said in the other issue.
Testing the patch there and the patch here I noticed that:
1. Using Filtered HTML input format comments are removed. I think it shouldn't do this.
2. If the comments have some html tags inside, the result is even worse.
<!-- comment <p>comment</p> -->
will result incomment -->
. If my previous statement is arguable, now for sure something is wrong. It should either remove the comment or (ideally IMO) let it untouched.3. Finally, using Full HTML will not strip the comment, but because of the line brake filter if you write
it will output in source view
.
Point 3 is corrected by the change the patch in this issue does in the _filter_autop() function.
Point 1 and 2 remain to be addressed.
Comment #81
Damien Tournoud CreditAttribution: Damien Tournoud commentedI'm pretty convinced that the only way to properly solve those issues is to move those filters out of crazy pattern matching and into proper manipulation of the DOM tree.
Comment #82
jcnventura CreditAttribution: jcnventura commentedIn reply to #80:
1. I never tried filtered HTML, but that would probably be a bug in that filter and not in the HTML corrector filter, so it should possibly go into another issue.
2. I will try to take a look at it and add a specific test to make sure of it.
3. Glad to know.
One final thing.. Now that this will NEVER make it to Drupal 7.. Can we move it back to Drupal 6 and commit this patch there? This is an itch that needs scratching for several of us third-party module maintainers. Of course the better choice would be to apply the refactored HTML corrector to D6, but I don't think that will happen.
João
Comment #83
tic2000 CreditAttribution: tic2000 commented@jcnventura
Yes, I think we can open separate issues for that and let this one for d6.
The re factored html corrector in D6 can be only done in contrib. Among others things that stops it to make it to core is that it requires PHP5.
@Damien Tournoud
I agree. Too much regexp voodoo now.
Comment #84
tic2000 CreditAttribution: tic2000 commentedComment #85
emdalton CreditAttribution: emdalton commentedI started to see this when I upgraded from D5 to D6. There were legacy comments on some nodes, and all of a sudden they were displaying when they hadn't before. Then I also started to see stray comment fragments in other places, e.g. when inserting views using filters. I've now turned off the HTMLcorrector filter, but this really needs to be addressed.
Comment #86
markus_petrux CreditAttribution: markus_petrux commentedSubscribing
Comment #87
westbywest CreditAttribution: westbywest commentedSubscribing
Comment #88
webchickFixing version. AFAIK none of the comments has said that this isn't a problem in D7, and bugs need to get fixed there first.
Comment #89
webchickComment #90
jcnventura CreditAttribution: jcnventura commentedwebchick, from #80 on, it seems this and the refactored HTML corrector are incompatible. I am assuming that including the test from this patch in the tests for the HTML corrector will make sure that this patch is not necessary in D7.
However, it is still highly annoying in D6, and this patch should be applied to that branch.
Comment #91
tic2000 CreditAttribution: tic2000 commented@webchick
html corrector has no problem with comments now. So either the issue is moved to D6 que or the title is changed to reflect the problems in #80 which comes from autop filter (auto line brake) and html filter (allowed tags).
Even if the title is changed a new issue has to be open for D6 because the solution used in D7 so comments are not escaped can't be used in D6.
Meanwhile a first try to solve the problems in #80.
Comment #92
tic2000 CreditAttribution: tic2000 commentedComment #94
tic2000 CreditAttribution: tic2000 commentedhmm. it fails on the spam deterent test. rel="nofollow" is not added in a link inside a comment. But before the patch it only passes because the comment tag is stripped out so the text inside is displayed.
Comment #95
tic2000 CreditAttribution: tic2000 commentedBack to D6. I'll open a new issue for the problems in #80 to be addressed properly in D7 (#559584: filter_xss() and Line break filter break HTML comments).
This one needs work because the patch in #69 needs a re-roll against D6 branch.
Comment #96
EvanDonovan CreditAttribution: EvanDonovan commentedThanks for championing this, tic2000. It sounds like you are correct - the PHP5-specific changes to the filter in 7.x mean that a different approach is needed here. I will be happy to review once a new patch is rolled.
Comment #97
hgmichna CreditAttribution: hgmichna commentedIsn't the following also related?
Drupal uses XHTML, which inherits from XML the standard way to differentiate between XML/HTML code and embedded non-XML/HTML code by way of the
syntax, as I certainly don't need to remind the web programmers among us.
For example, if you embed HTML or script code in a document and don't want it to be interpreted as HTML by the browser, you have to embed it in these entities, if it contains XML-breaking code. Example:
One thing that's urgently needed is that Drupal keeps away from
<![CDATA[ … ]]>
blocks, including this tag itself, and under no circumstances fiddles with them. If anybody wants to embed HTML, script, or other code and doesn't want it to be interpreted as XML or HTML, he wraps it in this tag, no browser will touch it, and neither should Drupal.By the way, as to #45 by Dave.Hirschman, embedding script in a HTML
<!-- … -->
comment tag makes no sense, as all browsers today understand the script tag. Programmers did that in the very early days of web programming, but no longer. However, we still need the<![CDATA[ … ]]>
construct today, because otherwise literally intended "<" characters could break XML syntax rules.I've tried to alert everybody to the problem in issue #556648: <![CDATA[ escaped, but am not sure whether it should be dealt with here, because the problem seems to be related.
Comment #98
grendzy CreditAttribution: grendzy commentedtagging
Comment #99
jenlamptonSo we're now on D 6.14 and still waiting for a patch to be re-rolled against D6? Let's try this one :-)
Comment #100
mcarbone CreditAttribution: mcarbone commentedPatch applied and seems to be working.
Comment #101
deviantintegral CreditAttribution: deviantintegral commentedThe patch in #99 needs to be taken from the drupal root, not the root of the filter module. As well, when testing a comment, I end up with something like:
<!--test comment-->text</!--test comment-->
Where the "closing" tag is at the end of the document. A closing comment doesn't need to be inserted since it's not an actual tag.
Comment #102
smk-ka CreditAttribution: smk-ka commentedRerolled patch: passing through comments unchanged in both htmlcorrector and autop functions. HTML filter treats comments correctly using code from #91.
Needless to say it would be nice to have this fixed before Nibiru.
Comment #103
mlncn CreditAttribution: mlncn commentedAnd we have a winner! This (#102) keeps HTML Corrector from incorrectly escaping comment tags and keeps the everything else happy too. Tested and also using on production.
benjamin, agaric
Comment #104
hgmichna CreditAttribution: hgmichna commentedKeeps everbody else happy too? Also #97?
Comment #105
smk-ka CreditAttribution: smk-ka commentedNo, I surely missed that comment.
Comment #106
Poieo CreditAttribution: Poieo commentedSubscribing...
Comment #107
neochief CreditAttribution: neochief commented+1
Comment #108
maestrojed CreditAttribution: maestrojed commentedPatch in #102 applied cleanly but did not fix the issue for me. (drupal 6.14
Subscribing...
Comment #109
smk-ka CreditAttribution: smk-ka commentedCan you give more details/an example of what didn't work?
Comment #110
jhedstromThis patch (#102) works for me.
Comment #111
EvanDonovan CreditAttribution: EvanDonovan commentedPatch #102 works for me on a 6.15 install. Can we get this patch applied to Drupal 6.x core first and then deal with #556648: <![CDATA[ escaped separately? That way, we don't have to keep patching this every time that Drupal core is upgraded.
Comment #112
neochief CreditAttribution: neochief commented+1 for #111
Comment #113
JGO CreditAttribution: JGO commentedindeed, please add it to core ! :)
----------------------------
JGO | http://www.e2s.be
----------------------------
Comment #114
alb CreditAttribution: alb commentedfor who use fckeditor with botton break and pagebreak;
1) if in a node there are one < !--break-- > and some < !--pagebreak-- > the node teaser is ok, but when the node is opened appear the string < !--break-- >
2)
but in a similar node, with only the break, is all ok when the node is opened.
Regards problem in the point 1), if in the Input format I remove the filter Html corrector,
the problems not appear more; so the problem is this filter;
if I not delete, but move this filter after Paging filter the problem in the point 1 there isn't, but others problem appears (appear strings of < !--pagebreak-- > and < !--page_filter-- >);
in the patch, is possible to insert also a solution for this problem?
Comment #115
Marc Bijl CreditAttribution: Marc Bijl commentedSubscribing...
In the meantime: any workarounds?
UPDATE
=====
Applied the patch (Drupal 6.15) and seems to work.
However, if the opening tag of a HTML comment is on its own line (the comment below it),
I need to type a space after the opening tag; otherwise it results in a paragraph like this
<p><!--</p>
Comment #116
geerlingguy CreditAttribution: geerlingguy commentedI third #111. I have a lot of legacy content, and many nodes have pasted-from-word comments, and cleaning them up manually will take a ton of time. This patch will let me forget about having to do some dirty regex work just to remove a bunch of comments...
Comment #117
RichieB CreditAttribution: RichieB commentedSubscribing. I'm in the same boat as #116. We have some content that is pasted from MS-Word that leaves visible comments in Drupal 6.15.
Comment #118
torgosPizzaPatch in #102 worked for me on Pressflow 6.15.
Comment #119
Gábor Hojtsy#559584: filter_xss() and Line break filter break HTML comments was marked duplicate of this issue without the D7 issues being fixed. That sounds like a recipe for regressions.
Comment #120
gpk CreditAttribution: gpk commentedI reopened #559584: filter_xss() and Line break filter break HTML comments.
@119: I infer you would be unwilling to commit #102, even though the problem is partially fixed in 7.x via #374441: Refactor Drupal HTML corrector (PHP5) ?
Comment #121
geerlingguy CreditAttribution: geerlingguy commentedJust FYI, for those who need a quick fix (especially if you get a lot of Word copy/pastes on your site), I stuck this into a custom module on my site to set up an input filter, which I enabled and set to run first on my input formats page:
Comment #122
hgmichna CreditAttribution: hgmichna commentedAre you sure the m should not be a g? m can only be wrong.
Comment #123
mcarbone CreditAttribution: mcarbone commentedSeems OK to me: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
Comment #124
hgmichna CreditAttribution: hgmichna commentedThe two most common elements in the universe are hydrogen and stupidity.
Harlan Jay Ellison (American author, born 1934-05-27)
Comment #125
robertjd CreditAttribution: robertjd commentedsubscribe
Comment #126
RichieB CreditAttribution: RichieB commentedI am running Drupal 6.15 with the patch from #102, and it works fine for normal node views, but HTML comments are still shown when I use this code:
I use this to build an overview page of all nodes posted of a certain content type. How do I get rid of the HTML comments when using node_view()?
Comment #127
rfaySubscribing
Comment #128
jenlampton@RichieB, why not use a view?
@smk-ka, there's another issue open for #97, can we commit this one and deal with that one over there? I think getting this fix in now would be better than waiting for that one to get fixed too.
What do others think?
Jen
Comment #129
EvanDonovan CreditAttribution: EvanDonovan commented@jenlampton: I agree. As I said at #111, I think we should commit this and deal with #97 separately. I don't like having to patch core every time I upgrade...
Comment #130
btully CreditAttribution: btully commentedPatch in #102 worked for me on Pressflow 6.15 as well. Great job and many thanks!
Comment #131
lizhenry CreditAttribution: lizhenry commentedI'm having the same issue as @RichieB. The pages I've got here are built with views, but they contain custom fields with php code that calls node_view, and the site has quite a lot of users who cut and paste from Word and I think Outlook . @jenlampton you suggest using views to fix the problem. I'm not sure how to use views to filter out the Microsoft style information that appears as an HTML comment.
Comment #132
geerlingguy CreditAttribution: geerlingguy commentedBump, please - Patch in #102 applied cleanly to three different Drupal 6.15 installs (haven't tested on 6.16 yet), and this is absolutely an essential bug to fix, for all the 100+ reasons stated above :-)
I can't get teaser breaks to work on my site without this patch...
Comment #133
rfayI have this deployed just fine on 6.16.
This is a key bug in Drupal, and I strongly agree that it should be committed.
Comment #134
pgacv2 CreditAttribution: pgacv2 commentedApplied patch in #102 to a 6.16 install and works fine, not escaping comments in a block.
Comment #135
kmonty+1
Comment #136
Kafu CreditAttribution: Kafu commentedSubscribing.
Comment #137
geerlingguy CreditAttribution: geerlingguy commentedJust tested in 6.16 on http://www.archstl.org/, and the patch applies cleanly (offsets are a little different, but both hunks applied). Hopefully we can push this in during the code sprint at DrupalCon SF! http://hojtsy.hu/blog/2010-apr-13/making-drupal-6-even-more-awesome-code...
Comment #138
quicksketchAnother vote in support of putting in #102. As has been noted several times already, Drupal 7 requires a different approach to fixing this issue, since the htmlcorrector has changed significantly and exhibits a different problem than Drupal 6 (see #559584: filter_xss() and Line break filter break HTML comments).
The CDATA issue raised in #97 has an issue over at #556648: <![CDATA[ escaped.
We're running patch #102 on (soon to be finally upgraded) lullabot.com. This patch is ready, any other related tasks have appropriate issues, let's get this fixed for Drupal 6.17.
Comment #139
q0rban CreditAttribution: q0rban commentedsubscribe
Comment #140
EvanDonovan CreditAttribution: EvanDonovan commentedThanks, quicksketch, for the vote of confidence! I'm glad to know that lullabot.com is also waiting on this core patch :)
Comment #141
john.kenney CreditAttribution: john.kenney commentedapplied #102 patch on 6.16 and it resolved nagging problem of
<!--paging_filter-->
printing in my views due to use of paging module.Comment #142
chantra CreditAttribution: chantra commented#102 seems to work fine against 6.16 for me too. Hoping to get this included in 6.x core at some stage.
Why is the test status postponed?
Comment #143
berenddeboer CreditAttribution: berenddeboer commentedApplied #102 and works fine. Please can we get this in core? It's just really annoying to see
in RSS feeds displaying full text.
Comment #144
Gábor HojtsyGreat, thanks for all the testing and support, committed.
Comment #145
Heine CreditAttribution: Heine commentedI nearly got a stroke when I saw #80, but I've since confirmed that HTML comments are removed by the HTML filter.
Comments are equivalent to script, object and embed tags in terms of danger:
Next, comment handling is still a bit odd;
results in
#222926-101: HTML Corrector filter escapes HTML comments also describes an issue, likely in combination with the html corrector, or the html corrector alone.
(edit; example)
Comment #147
Pomax CreditAttribution: Pomax commentedStill broken for the "<" symbol inside script blocks with full html selected:
Placing this on a page in Drupal 6.19 turns the comparator "<" and the bit shift operator "<<" into "<" and "<<" respectively. Oddly, the character ">" is fine.
It does, however, massively break pages with custom on-page javascript as is =(
Comment #148
hgmichna CreditAttribution: hgmichna commentedYou should try the (old-fashioned) HTML comment tags that are otherwise nowhere needed any more:
Same for <style …> tags.
By the way, it is normal and expected that the right angle bracket is accepted. This is, for example, part of the XML spec.
In theory, the <![CDATA[ … ]]> entities should also work, but I think they still don't. Without these we may be breaking XML and XHTML.
Comment #149
Pomax CreditAttribution: Pomax commentedAh! A good idea, but sadly that doesn't help prevent the problem. Drupal 6.19, at least, still turns the "<" and "<<" inside the commented block into < entities.
I changed the article I have on http://projects.nihongoresources.com/utilities/utf_to_hex_converter to use your suggestion, but as can be seen from the source code that page generates, Drupal still filters the text (when editing the page, these characters are shown as normal "<" and "<<").
Comment #150
hgmichna CreditAttribution: hgmichna commentedIsn't it the whole purpose of this thread to correct this defect? Perhaps the correction hasn't made it into the production versions of Drupal yet.
Comment #151
Pomax CreditAttribution: Pomax commentedchanging status to critical, because while left unfixed it gets in prevents any "full html" pages from executing on-page javascript with "<" evaluations, which are frequent. Hopefully someone higher up will quickly act on this bug.
Comment #152
berenddeboer CreditAttribution: berenddeboer commentedPomax, < is invalid inside xhtml, so you really need to write < or something must translate your < into <
Comment #153
hgmichna CreditAttribution: hgmichna commentedThat is not quite correct. < is permitted inside <![CDATA[ … ]]> entities. Hence the correct form for script and style elements in XHTML is, for example:
Here JavaScript can contain all these characters.
This is why Drupal must not encode any characters inside CDATA entities, but it erroneously does.
I cannot tell you how often I see software whose author has not understood the essence of transparency in protocols.
Comment #154
Pomax CreditAttribution: Pomax commentedhgmichan is correct, "<" is permitted in CDATA blocks (normally it's PCDATA, or Parsed Character data; CDATA tells the parser to turn off parsing, skip ahead to the end of the CDATA block, and then resume parsing). However, with explicit CDATA block indicators the "<" and "<<" inside the block are still erroneoously substituted with < and <<
Comment #155
peterconnolly CreditAttribution: peterconnolly commentedSubscribing.
This is a major issue preventing upgrade of client websites past Drupal 6.16
Comment #156
EvanDonovan CreditAttribution: EvanDonovan commentedWorkaround until this gets resolved: turn the htmlcorrector filter off for the the input formats in which you have JS that you need to execute.
If you need something to clean up HTML in those input formats, you could use the HTML Purifier contrib module instead.
Comment #157
Pomax CreditAttribution: Pomax commentedUnless you mean a different html corrector, setting input format set to "Full HTML" does not bypass the problem.
Comment #158
EvanDonovan CreditAttribution: EvanDonovan commented@Pomax: HTML Purifier is a module available for download from drupal.org. It is not the same thing as core's Full HTML format, which contains the HTML Corrector filter by default (although it can be disabled).
Comment #159
adamdicarlo CreditAttribution: adamdicarlo commentedUgh. I'm getting *page breaking* output like this in a teaser view (Drupal 6.19, Views 3.0-alpha3):
because the content starts with an HTML comment. I believe this bug is to blame. This really sucks. (This is in a view where the Node: Teaser field is set to abbreviate to x characters, and "Correct HTML" is turned on.)
Comment #160
adamdicarlo CreditAttribution: adamdicarlo commentedAnother detail: That's the output I get when logged in. When anonymous, the HTML comment tag gets escaped as <!--.
Comment #161
magpie5212 CreditAttribution: magpie5212 commentedI think this is the same bug which is duplicating code I am inserting into pages for IE. (The code is there to fix IE differences to all other browsers with object tags - I change them back to iframes. The following is just dummy text used for debug.)
I traced this to the _filter_autop
The result is:
Comment #162
magpie5212 CreditAttribution: magpie5212 commentedMy problem is cured by removing line 910 of filter.module: $output .= $chunk; Not sure about a test case to find out if this causes any other problems.
Comment #163
bradspry CreditAttribution: bradspry commentedsubscribing
Comment #165
adraskoy CreditAttribution: adraskoy commentedsubscribe
Comment #167
hansamurai CreditAttribution: hansamurai commentedSubscribing...
Edit: this was fixed for me by updating to Drupal 6.20, I was a bit behind.
Comment #168
Danny EnglanderSubscribing, having the same issue.
Comment #169
ndeschildre CreditAttribution: ndeschildre commentedI confirm magpie5212's bug on Drupal 6.20:
When using the "convert line breaks into HTML" filter, it duplicates HTML comments. And commenting the following line (filter.module):
it works.
Comment #170
dpearcefl CreditAttribution: dpearcefl commentedIs this issue fixed in the latest D6? is there any interest in pursuing this issue?
Comment #171
sun@dpearceMN: Please stop doing this.
Comment #172
dpearcefl CreditAttribution: dpearcefl commentedI respectfully asked the questions. Perhaps is some action was shown on this issue, I wouldn't have to ask.
Comment #173
rfay@dpearceMN, we appreciate you checking into these poor orphaned issues. Just don't change the status unnecessarily, because it makes them disappear off other people's radar. And please read the issue before asking whether something has been committed. Even though it's detective work, it can be done. I do wish it was easier.
And yes, this is a critical issue for D6 which I think should have gone in a long time ago. If you can spend some time with it, get a refreshed patch which deals with whatever reason it was set to "needs work" maybe it can get going.
Comment #175
EvanDonovan CreditAttribution: EvanDonovan commented@dpearceMN: So it looks from a read-through of the issue that it is fixed for regular comments, but not for angle brackets inside Javascript, or IE conditional comments. A patch has never been written for those things.
I would recommend that if you need HTML corrector to handle those correctly you use the HTML Purifier contributed module instead. The 6.x HTML corrector, being regex-based, will never be 100% accurate. An alternative would be for someone to take the 7.x HTML corrector, which uses the SimpleXML parser, and make it into a contrib module.
I would suspect that at this point this issue will never be fixed for 6.x, since it is already fully fixed for 7.x and thus there is no possibility of a patch to backport.
Comment #176
dpearcefl CreditAttribution: dpearcefl commentedThen "won't fix" it shall be.
Comment #177
sunThere are still 2-3 years in which someone might step up and fix this in a non-breaking/backwards-compatible way for D6.
Comment #178
EvanDonovan CreditAttribution: EvanDonovan commented@dpearceMN: I didn't mean to suggest the issue should be "won't fix". I was simply suggesting a few alternatives that you could use to resolve the problem since probably the people who would know how to fix it won't have the time.
Comment #180
jenlamptonThis other issue may be a duplicate
#881006: Regression: 'break' tag doesn't work with Filtered HTML
or maybe we can at least steal some of that solution for 6.x as well :)