htmlcorrector filter escapes HTML comments
deviantintegral - February 17, 2008 - 17:29
| Project: | Drupal |
| Version: | 6.x-dev |
| Component: | filter.module |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | reviewed & tested by the community |
| Issue tags: | FilterSystemRevamp |
Description
The HTML Corrector module replaces the < bracket with a < when it is the start of an HTML comment. There are two cases where this is a problem:
- Users have fill HTML access and you want to be able to catch missing tags. Users need to be able to copy in content from an older site which contains HTML comments, or the users want to be able to put in comments for their own use.
- At least one module I maintain (Table of Contents) and possibly others use HTML comments as markup so if the module is disabled nothing is rendered to the user. This worked fine in D5, but if the HTML corrector module is enabled then the markup is rendered.
If this is considered to a valid issue, then let me know and I'll create a patch to fix it. Otherwise, please suggest ways around this!
Thanks,
--Andrew

#1
This affects my XStandard module too.
This tweak to the regular expression in the
_filter_htmlcorrectorfunction seems to do the trick:// Properly entify angles.$text = preg_replace('!<([^a-zA-Z\!/])!', '<\1', $text);
#2
(Duplicated at http://drupal.org/node/221252.)
The regex might be marginally simpler as follows:
$text = preg_replace('|<([^a-zA-Z!/])|', '<\1', $text);Yes, this needs a patch.
Once the regex is modified to catch HTML comments as well as other tags, how does http://api.drupal.org/api/function/_filter_htmlcorrector/6 cope with the fact that the comment is never closed in the way that most tags should be? i.e. if you have <!-- comment --> does try to close it by inserting </--> or similar later on?
#3
Yes, it closes with a
</!-->tag.This adjustment to the code that closes the tag fixes that:
// Close remaining tags.foreach ($stack as $closing_tag) {
if ($closing_tag == '!--') {
$output .= '-->';
}
else {
$output .= '</'. $closing_tag .'>';
}
}
#4
See also http://drupal.org/node/97182 (same issue but in the context of htmlcorrector as a contrib module for Drupal 5.x). I'd mark one or other duplicate except that these are now separate projects since htmlcorrector has been taken into core in 6.x.
#5
Any idea if this will be fixed in a future release or such fix/module/patch has been released?
Will the htmlcorrector.module.comment_fix.patch posted by jcfiala work on 6.x?
#6
Hello,
I have posted http://drupal.org/node/240312 how do I resolve this issue. Can someone please help?
Thanks,
John
#7
Patching modules/filter/filter.module fixed this (very annoying) issue for me. I hereby vote for such a patch in 6.3/7.x
#8
Re-rolled my patch of #97182: <!--break --> is transformed into html code with lt and gt against Drupal 6.x. Should fix this issue without much changes.
#9
Re-rolled patch patch against HEAD. Dunno whether this issue should be moved to 7.x as well.
#10
Maybe the patch at http://drupal.org/node/269095 is better?
#11
Re-rolled patch from #269095: [Filter] FULL HTML filter with HTML corrector enabled break comments <!-- ..... --> for D7 and marked that issue as duplicate.
I did not test the patch yet, but it looks indeed simpler.
#12
These patches only deal with:
1. Preventing the replacement of '<' by '<'
2. Preventing the insertion of a dummy closing tag for the comments
However, the contents of the comment are still processed by the HTML corrector module.. The attached patch extends #11 to prevent the processing of the contents of the comment. To test this, create a multi-line comment.
Drupal 6.3 currently produces :
<!-- Testing a comment<br />--><br />
<!-- Testing a comment2<br />
--><br />
<!-- Testing another comment -->
With the patch the output is:
<!-- Testing a comment-->
<!-- Testing a comment2
-->
<!-- Testing another comment -->
The attached patch is against Drupal 6.3, and please, please commit it before D6.4. I have tested this patch and it works nicely for me.
João
PS: Updated on 2008-07-22 because the original patch terminated the comment with any '>' and not only with the first '-->'
#13
This works for me - thanks!
#14
I shouldn't be doing this, but since this bug is interfering with so many modules (including the AdSense module), I am raising this to critical hoping that it will be tested/reviewed and merged before Drupal 6.4 is released.
João
#15
didn't make it to Drupal 6.4, I had to repatch when I upgraded (patch still works great). This needs to be fixed, Drupal 6 is not usable for me without it!
#16
I'm reading this discussion as I'm needing some custom markup on user-generated pages, too. I wonder whether it would be useful to allow (or even pre-define) processing instructions (i.e. <?...?>), not only generic xml comment elements, to be preserved such that specific modules can handle them.
Thanks,
Stefan
#17
Stefan - would it not be better for the specific module to implement a filter to do this?
#18
Certainly, the module in question needs to recognize the instruction.
All I'm asking here is whether it is possible to allow for processing instructions to be preserved, i.e. not being filtered out or escaped. If that's already the case, just ignore my request. :-)
Thanks,
Stefan
PS: as a use-case, consider the 'print' module with its ability to generate pdf. I want to inject page-break markers into my nodes, specifically targetted at the pdf generator, but I have no idea whether drupal would allow processing instructions, i.e. whether those can be passed through.
#19
@stefan_seefeld: please don't cloud the issue here.. The problem here has nothing to do with inserting input filters like you're suggesting. That's to be left to the modules themselves.
The problem here is that HTML comments are escaped by the HTML corrector, rendering them visible on the normal content..
Joao
PS: As to what we had discussed for the print module, this needs to be fixed because PDF specific-instructions must be placed in the content in a way that is not visible normally and that activates only during PDF generation.. I would prefer to use HTML comments for that as it be less overhead than using an input filter.
#20
@jcnventura: sorry, I misunderstood what the issue was. I assumed that the escaping of comments (and possibly processing-instructions) would not only display them when not wanted, but also make it impossible for other modules to pick them up.
I'll follow-up on the rest on the print-specific issue tracker.
Thanks,
Stefan
#21
The latest patch appears to work for me.
#22
Well, it seems to work for at least three persons: me, tcblack and Matt B.
Marking it RTBC, in the hopes that it will be committed soon.
João Ventura
#23
This is a patch to a rather complicated piece of code. It will require a full-featured unit test case before going in.
#24
@Damien Tournoud : I agree with what you're saying on the part that this is a patch to a rather complicated piece of code. However, I disagree with the need for a fully-featured test case. Why?
1. Because of this bug in the HTML corrector, that feature is almost unusable since it is breaking a lot of contrib modules.
2. The patch is only 3 lines! It's quite easy to see that the only changes are related to processing of HTML comments.
3. The patch has been tested to work on Drupal 6.4 by at least 3 persons.
Since the tests for the HTML corrector filter haven't yet been done (just looked in D7 CVS), this isn't a simple case of waiting for someone to extend those tests to cover this case. You're proposing that this (simple) fix be put on hold until a very complex set of simpletests are written.
From this thread I can tell you that this is breaking the following modules:
- Table of Contents
- XStandard
- AdSense
João
#25
@jcnventura, sorry, but the Html Corrector itself is a rather complicated piece of code. I would like to see what's happen in corner cases like
<!--inside tags and so on. At this time, the whole filter is untested so we really don't know.I'm also really unsure this could break the modules you listed. For Table of Contents, for example, you just have to be sure to order the filters the right way (the HTML corrector has to be on top). So I would say this is more an annoyance than a breakage.
#26
I was going to put it RTBC again, but I give up. You're right that all those modules aren't broken if the HTML corrector filter is the first one to execute..
However, a lot of people monetize their sites using AdSense (and most of them don't use the AdSense module as it's not clear if using it is not a violation of Google's TOS). So they are ordered by Google to type in the following in their pages:
<script type="text/javascript"><!--google_ad_client = "pub-xxxxxxxxxx";
/* Drupal 728x90 */
google_ad_slot = "xxxxxxxxxx";
google_ad_width = 728;
google_ad_height = 90;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
What do you think happens with that comment in the script? It gets turned into something that is definitively against Google's TOS. In some archaic browsers it will even display the code (the reason why everyone comments the inside of JavaScripts).
The only way to avoid this is to turn off the HTML corrector, which most people don't even know it's there.
However, the patch doesn't need work. I am setting it to needs review. What needs work is the filter module simpletests.
João
#27
One last thing: if I understand all this right (we don't have a test case, so I can't be 100% sure), the htmlcorrector with the patch in #12 will probably do some silly things like transforming:
<!-- this is a test <i>test</i>to<!--this is a test <i>test</i></!-->,while it will not even try to close
<!-- this is a new testbecause there are no matching">".This definitely needs *first* a full test case, and *only then* a patch.
#28
@Damien Tournoud: Actually, that's not quite true. In the best case scenario, I'd like to tell users to place the HTML corrector before the tableofcontents filter. Then, there should be less things to break when users forget to do things like close heading tags. Most users setting up filters would generally do HTML corrector followed by HTML filter, followed by additional filters, but that's not possible until this patch lands.
#29
Putting an adsense code in a node content makes no sense at all. The least bad option is to use a block for this and place it in the footer.
#30
Please stop discussing about tests. This has to be fixed in 7.x first, to be back-ported to 6.x. And we have a clear rule for patches against 7.x - everything needs tests, even if there are none yet.
#31
Placing adsense code in node content is certainly against best practices. But not unreasonable.
.. but ALLOWING SCRIPT TAGS in HTML-filtered input? And then finding that the comments inside the scripts were a problem?!
HTML filters are just protection against potentially malicious code. If you trust yourself enough to break your own system to this extent - don't use HTML filter! FFS.
.dan.
#32
Just a comment on #27: there is no correct way for this. A comment can contain HTML inside (and even other comments), so actually the silly thing that it may do is not so silly (I haven't tested that situation). Where's the boundary if you forget to close the comment?
My personal opinion is that it should be when it reaches the
</body>tag. That way, when the rest of the content disappears, the user knows that something is amiss.I am creating a link to this issue in the issue for creating the tests for the filter module (#276597: TestingParty08: filter.module).
João
#33
Patch from #12 works in my 6.4 site with wide range of comments including comments around javascript. Some of the comments and javascript arrive in AHAH from other Web sites. Do I use this patch or turn off HTML correction? The incoming HTML needs filtering because the HTML was created by Orks for Internet Explorer 0.5 alpha and that rules out switching off HTML filtering and correction. Thank you for the patch.
#34
again, this hasn't made it to Drupal 6.5, so I had to repatch. Without doing this, Drupal 6 is not usable (for me)! Works fine in drupal 6.5
#35
roadmap:
- Create Tests (that'll fail) for 7.x (#276597: TestingParty08: filter.module)
- fix 7.x code
- No tests fail anymore
- backport to 6.x
- backport to 5.x html corrector module.
#36
patch works fine on 6.6
#37
subscribing :-)
#38
subscribing
#39
I ran into the same problem in 6.x and created a patch for it: #343236: HTML corrector encodes (entifies) <!--break--> tag. Because the break tag is encoded, it shows up in RSS feeds.
#40
subscribing
#41
subscribing
#42
subscribing
#43
subscribing
#44
Appears to still be an issue for 6.9 -- I've just documented on two 6.9 installs where it will be a problem due to large amounts of legacy content that weren't previously affected by this issue. Hence, many contain HTML comments.
Is this issue re. 6.x or 7.x? "Version" is currently set to 7.x-dev, but most of the discussion is re. 6.x. Can someone help me understand why this issue would be deferred to 7.x? Does this mean that if we need a fix in 6.x, we're always going to have to patch-over the latest HTML Corrector release?
CORRECTION -- I now see the rationale in the "roadmap" here:
http://drupal.org/node/222926#comment-1086392
... so I see that plan seems to be to backport to 6.x after using 7.x efforts to unit test.
(Note that this is a particular problem for people who do a lot of cut & paste blogging. We need the corrector to close un-closed containers that we might miss by not scrutinizing the HTML of what we paste -- or to correct stuff we did in the past. But if we enable it, the comments embedded in the stuff we copy & paste will show.)
#45
The HTML corrector filter will also substitute the html entity for '<' characters within javascript. For example,
if (x < y) { .... }produces an error, whereas
if (y > x) {...}does not.
For now, I guess I'll create a second input filter with HTML corrector turned off and select that for those few nodes where I want to use a bit of javascript or have some html comments.
#46
This needs to be fixed in 6.x. What the hell is taking so long to get a clearly working patch committed? The problem here is that this filter is enabled by default in core and is breaking things. You cannot get any more critical than that. Please commit the fix!
#47
Marking as critical since it clearly is.
#48
See the required tasks in #222926-35: htmlcorrector filter escapes HTML comments
#49
subscribing
#50
I tried to use the latest patch and it failed because of a drupal-6.4 or something in the patch file - I've adjusted the patch file so it runs w/ -p0
#51
FYI there is another related issue against D5 with the HTML filter #103563: HTML filter escaping html comments. If you're having problems with D6 even with the HTML corrector disabled, take a look at the patch over there.
#52
thanks for the d6 patch, it works like a charm!
#53
Hello again,
Here's my latest attempt. I have to acknowledge that Damien Tournoud (among others) was completely right. After applying some tests*, I concluded that the patch was indeed flawed.
This applies to the current 7.x-dev, and includes a couple of tests that verify it. After applying this patch all the tests in #276597: TestingParty08: filter.module still pass.
One final note: as it is now, the patch only handles proper comments (i.e. it requires that the comment is properly closed). I didn't try to handle that case, as it is too complicated to decide where to close it. I think that the issue of correcting a comment that is not terminated should be handled elsewhere.
João
*developed by wrwrwr in #276597: TestingParty08: filter.module
#54
revisiting #35:
1. Create Tests (that'll fail) for 7.x ------- done
2. fix 7.x code --------- done
3. No tests fail anymore ------- done
4. backport to 6.x ------ TBD
5. backport to 5.x html corrector module ------- TBD
#55
subscribing
#56
#54 looks good. The new test passes only after patching filter.module. I also tinkered with different strings in the test case, and couldn't find any issues. One thing I did notice is that self-closing tags inside of comments get corrected, like <br> gets converted to <br />, which seems harmless.
Also this patch passes coder's style checks.
#57
Thanks for working on this!
+ /**+ * Test the HTML corrector.
+ */
+ function testHtmlCorrector2() {
PHPDoc could use a proper description. Replace "2" with "Comments" in function name.
#58
@sun: Changed the description and renamed the function as you asked
@grendzy: Thanks for bringing that to my attention. Since what you described went against the code of the changes I made, I went back and analyzed it better. I had forgotten to include the 's' modifer for the PCRE regular expression, so the HTML corrector was still active in multi-line comments. I have added a single-use tag to the (multi-line) html corrector test to verify that this is now working correctly.
I am setting this back to RTBC because the only change to the filter module was the 's' modifier. Feel free to set it back to 'needs review' if you believe that change to be important enough to downgrade it's status.
#59
#60
Well, you added a single-line constraint on the "don't apply processing inside of those tags" regexp, please confirm in a test that it doesn't fall apart when the tags are on multiple lines. Example:
<script>my test script
</script>
or:
<pre>test line 1
test line 2
</pre
>
#61
I didn't mention it, but the tests in #276597: TestingParty08: filter.module still pass, and one of them already handles the first case (which doesn't get affected at all by the s modifier since the tag is still in the same line).
However, your second case raises a valid point. Previously the tag matching stopped at the end of the line, and now it stops at the closing delimiter (>). Since the code is SUPPOSED to stop at the delimiter, I am pretty sure that simple change actually improved the code.
I will add a new test case to confirm that this doesn't break anything when opening or closing tags are spread over multiple lines.
João
#62
I couldn't come up with a new test, and because of that I remembered that the s modifier only applies to the dot character (http://es.php.net/manual/en/reference.pcre.pattern.modifiers.php).
So in simple terms, the s modifier added between #53 and #58 only modifies the handling of multi-line comments and that is already being tested in the test included in #58.
Based on that, as there's really no work needed, I am setting it to needs review. Note that grendzy had set it to RTBC in #56 (for 6 minutes)..
João
#63
The last submitted patch failed testing.
#64
Subscribe - this might also fix #348514: Node body does not handle <!--break--> properly.
#65
The latest patch seems to change, filter.module and filter.test, should the test not be changed as part of #276597? If it should let me know and I will split the patch in two.
#66
Apparently the testing system failed to install Drupal, and the only way to restart it is to re-upload the patch.
#67
Thanks for your work on this! The patch in #66 resolved the issue for me on Drupal 6.11. I had to apply it manually though (one of the hunks failed because the offsets were too different from HEAD).
It also seems to fix #348514: Node body does not handle <!--break--> properly for me.
#68
Just for information: I have made a D6 module that can be used to overcome this problem. See http://drupal.org/project/htmlcomment.
#69
Re rolled the patch from #66 to apply on a fresh D7 install.
#70
The last submitted patch failed testing.
#71
setting back to needs review, to re-run tests
#72
bump
#73
I understand this also causes problems for the Paging module as seen here:
#400190: Teaser break tag <!--break--> exposed
#74
The last submitted patch failed testing.
#75
trying out a new testing bot that apparently wasn't working
#76
The new testing bot liked it. I applied the patch, the added test applied with some fuzz.
After reading through the issue que, this patch is ready to be committed.
Josh
#77
You should keep an eye on #374441: Refactor Drupal HTML corrector (PHP5) too which solves a lot of the problems here. You can see in #45 there the problems that will still remain after this or that one get committed. My vote goes for #374441: Refactor Drupal HTML corrector (PHP5) which will trim the current patch here to 1 line.
#78
Can we then apply this to Drupal 6 to fix the damn bug?? And let the refactored Drupal HTML corrector for Drupal 7?
#79
@jcnventura
Not until one of the patches gets committed.
#80
#374441: Refactor Drupal HTML corrector (PHP5) got committed.
This patch no longer applies.
To resume what I said in the other issue.
Testing the patch there and the patch here I noticed that:
1. Using Filtered HTML input format comments are removed. I think it shouldn't do this.
2. If the comments have some html tags inside, the result is even worse.
<!-- comment <p>comment</p> -->will result incomment -->. If my previous statement is arguable, now for sure something is wrong. It should either remove the comment or (ideally IMO) let it untouched.3. Finally, using Full HTML will not strip the comment, but because of the line brake filter if you write
<!-- comment --><!-- comment <p>comment</p> -->
<p><!-- comment --><br /><!-- comment<p>comment</p>
<p> --></p>
Point 3 is corrected by the change the patch in this issue does in the _filter_autop() function.
Point 1 and 2 remain to be addressed.
#81
I'm pretty convinced that the only way to properly solve those issues is to move those filters out of crazy pattern matching and into proper manipulation of the DOM tree.
#82
In reply to #80:
1. I never tried filtered HTML, but that would probably be a bug in that filter and not in the HTML corrector filter, so it should possibly go into another issue.
2. I will try to take a look at it and add a specific test to make sure of it.
3. Glad to know.
One final thing.. Now that this will NEVER make it to Drupal 7.. Can we move it back to Drupal 6 and commit this patch there? This is an itch that needs scratching for several of us third-party module maintainers. Of course the better choice would be to apply the refactored HTML corrector to D6, but I don't think that will happen.
João
#83
@jcnventura
Yes, I think we can open separate issues for that and let this one for d6.
The re factored html corrector in D6 can be only done in contrib. Among others things that stops it to make it to core is that it requires PHP5.
@Damien Tournoud
I agree. Too much regexp voodoo now.
#84
#85
I started to see this when I upgraded from D5 to D6. There were legacy comments on some nodes, and all of a sudden they were displaying when they hadn't before. Then I also started to see stray comment fragments in other places, e.g. when inserting views using filters. I've now turned off the HTMLcorrector filter, but this really needs to be addressed.
#86
Subscribing
#87
Subscribing
#88
Fixing version. AFAIK none of the comments has said that this isn't a problem in D7, and bugs need to get fixed there first.
#89
#90
webchick, from #80 on, it seems this and the refactored HTML corrector are incompatible. I am assuming that including the test from this patch in the tests for the HTML corrector will make sure that this patch is not necessary in D7.
However, it is still highly annoying in D6, and this patch should be applied to that branch.
#91
@webchick
html corrector has no problem with comments now. So either the issue is moved to D6 que or the title is changed to reflect the problems in #80 which comes from autop filter (auto line brake) and html filter (allowed tags).
Even if the title is changed a new issue has to be open for D6 because the solution used in D7 so comments are not escaped can't be used in D6.
Meanwhile a first try to solve the problems in #80.
#92
#93
The last submitted patch failed testing.
#94
hmm. it fails on the spam deterent test. rel="nofollow" is not added in a link inside a comment. But before the patch it only passes because the comment tag is stripped out so the text inside is displayed.
#95
Back to D6. I'll open a new issue for the problems in #80 to be addressed properly in D7 (#559584: Handling of html comments in filter module).
This one needs work because the patch in #69 needs a re-roll against D6 branch.
#96
Thanks for championing this, tic2000. It sounds like you are correct - the PHP5-specific changes to the filter in 7.x mean that a different approach is needed here. I will be happy to review once a new patch is rolled.
#97
Isn't the following also related?
Drupal uses XHTML, which inherits from XML the standard way to differentiate between XML/HTML code and embedded non-XML/HTML code by way of the
<![CDATA[ … ]]>syntax, as I certainly don't need to remind the web programmers among us.
For example, if you embed HTML or script code in a document and don't want it to be interpreted as HTML by the browser, you have to embed it in these entities, if it contains XML-breaking code. Example:
<script>//<![CDATA[if (1 < 2) alert("OK");
//]]></script>
One thing that's urgently needed is that Drupal keeps away from
<![CDATA[ … ]]>blocks, including this tag itself, and under no circumstances fiddles with them. If anybody wants to embed HTML, script, or other code and doesn't want it to be interpreted as XML or HTML, he wraps it in this tag, no browser will touch it, and neither should Drupal.By the way, as to #45 by Dave.Hirschman, embedding script in a HTML
<!-- … -->comment tag makes no sense, as all browsers today understand the script tag. Programmers did that in the very early days of web programming, but no longer. However, we still need the<![CDATA[ … ]]>construct today, because otherwise literally intended "<" characters could break XML syntax rules.I've tried to alert everybody to the problem in issue #556648: <![CDATA[ escaped, but am not sure whether it should be dealt with here, because the problem seems to be related.
#98
tagging
#99
So we're now on D 6.14 and still waiting for a patch to be re-rolled against D6? Let's try this one :-)
#100
Patch applied and seems to be working.
#101
The patch in #99 needs to be taken from the drupal root, not the root of the filter module. As well, when testing a comment, I end up with something like:
<!--test comment-->text</!--test comment-->Where the "closing" tag is at the end of the document. A closing comment doesn't need to be inserted since it's not an actual tag.
#102
Rerolled patch: passing through comments unchanged in both htmlcorrector and autop functions. HTML filter treats comments correctly using code from #91.
Needless to say it would be nice to have this fixed before Nibiru.
#103
And we have a winner! This (#102) keeps HTML Corrector from incorrectly escaping comment tags and keeps the everything else happy too. Tested and also using on production.
benjamin, agaric
#104
Keeps everbody else happy too? Also #97?
#105
No, I surely missed that comment.
#106
Subscribing...
#107
+1
#108
Patch in #102 applied cleanly but did not fix the issue for me. (drupal 6.14
Subscribing...
#109
Can you give more details/an example of what didn't work?
#110
This patch (#102) works for me.
#111
Patch #102 works for me on a 6.15 install. Can we get this patch applied to Drupal 6.x core first and then deal with #556648: <![CDATA[ escaped separately? That way, we don't have to keep patching this every time that Drupal core is upgraded.
#112
+1 for #111
#113
indeed, please add it to core ! :)
----------------------------
JGO | http://www.e2s.be
----------------------------
#114
for who use fckeditor with botton break and pagebreak;
1) if in a node there are one < !--break-- > and some < !--pagebreak-- > the node teaser is ok, but when the node is opened appear the string < !--break-- >
2)
but in a similar node, with only the break, is all ok when the node is opened.
Regards problem in the point 1), if in the Input format I remove the filter Html corrector,
the problems not appear more; so the problem is this filter;
if I not delete, but move this filter after Paging filter the problem in the point 1 there isn't, but others problem appears (appear strings of < !--pagebreak-- > and < !--page_filter-- >);
in the patch, is possible to insert also a solution for this problem?
#115
Subscribing...
In the meantime: any workarounds?
UPDATE
=====
Applied the patch (Drupal 6.15) and seems to work.
However, if the opening tag of a HTML comment is on its own line (the comment below it),
I need to type a space after the opening tag; otherwise it results in a paragraph like this
<p><!--</p>#116
I third #111. I have a lot of legacy content, and many nodes have pasted-from-word comments, and cleaning them up manually will take a ton of time. This patch will let me forget about having to do some dirty regex work just to remove a bunch of comments...
#117
Subscribing. I'm in the same boat as #116. We have some content that is pasted from MS-Word that leaves visible comments in Drupal 6.15.
#118
Patch in #102 worked for me on Pressflow 6.15.
#119
#559584: Handling of html comments in filter module was marked duplicate of this issue without the D7 issues being fixed. That sounds like a recipe for regressions.
#120
I reopened #559584: Handling of html comments in filter module.
@119: I infer you would be unwilling to commit #102, even though the problem is partially fixed in 7.x via #374441: Refactor Drupal HTML corrector (PHP5) ?
#121
Just FYI, for those who need a quick fix (especially if you get a lot of Word copy/pastes on your site), I stuck this into a custom module on my site to set up an input filter, which I enabled and set to run first on my input formats page:
<?php/**
* Implementation hook_filter().
*/
function custom_filter($op, $delta = 0, $format = -1, $text = '', $cache_id = 0) {
switch ($op) {
case 'list':
return array(0 => 'HTML Comment Removal Filter');
case 'description':
return t('Removes HTML comments, like those inserted by Microsoft Word or other nefarious applications.');
case 'process':
// Remove HTML Comments beginning with <!-- and ending with -->
$text = str_replace("\n", 'placeholder text string', $text);
$text = preg_replace('/<!--.*?-->/m', '', $text);
$text = str_replace('placeholder text string', "\n", $text);
return $text;
default:
return $text;
}
}
?>
#122
<?php…
$text = preg_replace('/<!--.*?-->/m', '', $text);
…
?>
Are you sure the m should not be a g? m can only be wrong.
#123
Seems OK to me: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
#124
The two most common elements in the universe are hydrogen and stupidity.
Harlan Jay Ellison (American author, born 1934-05-27)