Download & Extend

drupal_html_to_text() doesn't support <strong class="foo">

Project:HTML2Text
Version:master
Component:Code
Category:bug report
Priority:normal
Assigned:dww
Status:reviewed & tested by the community

Issue Summary

<strong>, <em> etc can all have other attributes. The regexps in drupal_html_to_text() don't handle this. Attached patches for D6 and the D5 backport in contrib (http://drupal.org/project/html_to_text)

AttachmentSize
drupal_html_to_text_strong_with_attrs.d6.patch855 bytes
drupal_html_to_text_strong_with_attrs.d5.patch862 bytes

Comments

#1

Whoops, that was a little too greedy. We need to ensure the rest of the goo is something after at least 1 space, so that we don't accidentally match other tags that start with those names (e.g. confusing <blockquote> with <b>). This is better.

AttachmentSize
258192_drupal_html_to_text_strong_with_attrs.d6.patch 865 bytes
258192_drupal_html_to_text_strong_with_attrs.d5.patch 852 bytes

#2

In IRC, pointed out that the ' +' part I added in #1 to match 1 or more spaces can confuse regexp engines since a space is also matched by [^>]. It all works fine, but I guess it can potentially be slower this way. Atomic grouping to the rescue.

Tested and it still works. I see absolutely no change in speed, but whatever, I'm not carefully benchmarking any of this. I suppose making the regexp a little more complicated and harder to read is no reason not to optimize something that's not performance intensive in the first place. *smirk*

AttachmentSize
258192_drupal_html_to_text_strong_with_attrs.2.d6.patch 873 bytes
258192_drupal_html_to_text_strong_with_attrs.2.d5.patch 860 bytes

#3

Status:needs review» reviewed & tested by the community

Yeah, it's fine.

#4

Version:6.x-dev» 5.x-dev

Committed to 6.x. I guess this needs to move to Drupal 5 and 7 as well. Moving to 5 first.

#5

Version:5.x-dev» 7.x-dev

Thanks! Actually, once it moves to 5.x, it needs to switch projects, so let's get this in 7.x core next. chx can always commit this to 5.x contrib now that he's seen it and seen it go into 6.x. ;)

#6

(Oh, and I should note that http://drupal.org/files/issues/258192_drupal_html_to_text_strong_with_at... still applies cleanly to HEAD without offsets).

#7

Version:7.x-dev» 5.x-dev

Committed to CVS HEAD.

#8

Project:Drupal core» HTML2Text
Version:5.x-dev» master
Component:base system» Code

"once it moves to 5.x, it needs to switch projects"

#9

@drumm: Sadly, chx disabled the issue queue on the real html_to_text project: http://drupal.org/project/html_to_text. You just moved it here: http://drupal.org/project/html2txt -- which is some old abandoned thing. :(

@chx: this is exactly why it's a pain in the ass not to have the issue queue enabled for your module. Can you please reconsider your decision and re-enable it so there's a place for backport patches like this? If people submit to your queue and you need to bump something to the core queue, that's easy (I end up doing that for update_status vs. update when necessary). While you're thinking about this, can you also either commit the patch and/or give me CVS access so I can take care of obvious stuff like this? ;)

Thanks,
-Derek