Posted by dww on May 14, 2008 at 1:34am
| Project: | HTML2Text |
| Version: | master |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | dww |
| Status: | reviewed & tested by the community |
Issue Summary
<strong>, <em> etc can all have other attributes. The regexps in drupal_html_to_text() don't handle this. Attached patches for D6 and the D5 backport in contrib (http://drupal.org/project/html_to_text)
| Attachment | Size |
|---|---|
| drupal_html_to_text_strong_with_attrs.d6.patch | 855 bytes |
| drupal_html_to_text_strong_with_attrs.d5.patch | 862 bytes |
Comments
#1
Whoops, that was a little too greedy. We need to ensure the rest of the goo is something after at least 1 space, so that we don't accidentally match other tags that start with those names (e.g. confusing
<blockquote>with<b>). This is better.#2
In IRC, pointed out that the
' +'part I added in #1 to match 1 or more spaces can confuse regexp engines since a space is also matched by[^>]. It all works fine, but I guess it can potentially be slower this way. Atomic grouping to the rescue.Tested and it still works. I see absolutely no change in speed, but whatever, I'm not carefully benchmarking any of this. I suppose making the regexp a little more complicated and harder to read is no reason not to optimize something that's not performance intensive in the first place. *smirk*
#3
Yeah, it's fine.
#4
Committed to 6.x. I guess this needs to move to Drupal 5 and 7 as well. Moving to 5 first.
#5
Thanks! Actually, once it moves to 5.x, it needs to switch projects, so let's get this in 7.x core next. chx can always commit this to 5.x contrib now that he's seen it and seen it go into 6.x. ;)
#6
(Oh, and I should note that http://drupal.org/files/issues/258192_drupal_html_to_text_strong_with_at... still applies cleanly to HEAD without offsets).
#7
Committed to CVS HEAD.
#8
"once it moves to 5.x, it needs to switch projects"
#9
@drumm: Sadly, chx disabled the issue queue on the real html_to_text project: http://drupal.org/project/html_to_text. You just moved it here: http://drupal.org/project/html2txt -- which is some old abandoned thing. :(
@chx: this is exactly why it's a pain in the ass not to have the issue queue enabled for your module. Can you please reconsider your decision and re-enable it so there's a place for backport patches like this? If people submit to your queue and you need to bump something to the core queue, that's easy (I end up doing that for update_status vs. update when necessary). While you're thinking about this, can you also either commit the patch and/or give me CVS access so I can take care of obvious stuff like this? ;)
Thanks,
-Derek