drupal_html_to_text() doesn't support <strong class="foo">

dww - May 14, 2008 - 01:34
Project:HTML2Text
Version:HEAD
Component:Code
Category:bug report
Priority:normal
Assigned:dww
Status:reviewed & tested by the community
Description

<strong>, <em> etc can all have other attributes. The regexps in drupal_html_to_text() don't handle this. Attached patches for D6 and the D5 backport in contrib (http://drupal.org/project/html_to_text)

AttachmentSize
drupal_html_to_text_strong_with_attrs.d6.patch855 bytes
drupal_html_to_text_strong_with_attrs.d5.patch862 bytes

#1

dww - May 14, 2008 - 02:07

Whoops, that was a little too greedy. We need to ensure the rest of the goo is something after at least 1 space, so that we don't accidentally match other tags that start with those names (e.g. confusing <blockquote> with <b>). This is better.

AttachmentSize
258192_drupal_html_to_text_strong_with_attrs.d6.patch 865 bytes
258192_drupal_html_to_text_strong_with_attrs.d5.patch 852 bytes

#2

dww - May 16, 2008 - 00:04

In IRC, pointed out that the ' +' part I added in #1 to match 1 or more spaces can confuse regexp engines since a space is also matched by [^>]. It all works fine, but I guess it can potentially be slower this way. Atomic grouping to the rescue.

Tested and it still works. I see absolutely no change in speed, but whatever, I'm not carefully benchmarking any of this. I suppose making the regexp a little more complicated and harder to read is no reason not to optimize something that's not performance intensive in the first place. *smirk*

AttachmentSize
258192_drupal_html_to_text_strong_with_attrs.2.d6.patch 873 bytes
258192_drupal_html_to_text_strong_with_attrs.2.d5.patch 860 bytes

#3

chx - May 16, 2008 - 00:14
Status:needs review» reviewed & tested by the community

Yeah, it's fine.

#4

Gábor Hojtsy - May 19, 2008 - 08:19
Version:6.x-dev» 5.x-dev

Committed to 6.x. I guess this needs to move to Drupal 5 and 7 as well. Moving to 5 first.

#5

dww - May 19, 2008 - 16:04
Version:5.x-dev» 7.x-dev

Thanks! Actually, once it moves to 5.x, it needs to switch projects, so let's get this in 7.x core next. chx can always commit this to 5.x contrib now that he's seen it and seen it go into 6.x. ;)

#6

dww - May 19, 2008 - 16:05

(Oh, and I should note that http://drupal.org/files/issues/258192_drupal_html_to_text_strong_with_at... still applies cleanly to HEAD without offsets).

#7

Dries - May 19, 2008 - 19:25
Version:7.x-dev» 5.x-dev

Committed to CVS HEAD.

#8

drumm - May 26, 2008 - 03:42
Project:Drupal» HTML2Text
Version:5.x-dev» HEAD
Component:base system» Code

"once it moves to 5.x, it needs to switch projects"

#9

dww - May 26, 2008 - 07:38

@drumm: Sadly, chx disabled the issue queue on the real html_to_text project: http://drupal.org/project/html_to_text. You just moved it here: http://drupal.org/project/html2txt -- which is some old abandoned thing. :(

@chx: this is exactly why it's a pain in the ass not to have the issue queue enabled for your module. Can you please reconsider your decision and re-enable it so there's a place for backport patches like this? If people submit to your queue and you need to bump something to the core queue, that's easy (I end up doing that for update_status vs. update when necessary). While you're thinking about this, can you also either commit the patch and/or give me CVS access so I can take care of obvious stuff like this? ;)

Thanks,
-Derek

 
 

Drupal is a registered trademark of Dries Buytaert.