Download & Extend

drupal_html_to_text handling of newlines in anchor links

Project:Drupal core
Version:8.x-dev
Component:system.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:active

Issue Summary

The function drupal_html_to_text does not handle the situation where there are newlines in the link text. For example, given the test sequence:

$html_text = "<a href=\"http://some.url\">Link \ntext</a>";
$plain_text = drupal_html_to_text($html_text);

We get:

Link text

when we should get:

Link text [1]

[1] http://some.url

The proposed simple fix is to add the 's' (PCRE_DOTALL) regexp modifier to the relevant pattern: a patch is attached.

For information, the reason I encountered this problem is I was using the 6.4 simplenews module which is effectively still in beta stage. I was providing it with a long string of HTML text (without any nl's) for emailing as html mail, and in this case also a plain text segment needs to be included in the resulting multipart email message (a routine called simpletext_html_to_text calls drupal_html_to_text). However various nl's get added and deleted all along the way, in a rather complex way I don't understand, including in the middle of link texts, and at a point when they are present this drupal function gets called with the resulting observed problem of missing footnotes.

AttachmentSizeStatusTest resultOperations
mail.inc_.fixnl_.patch606 bytesIgnored: Check issue status.NoneNone

Comments

#1

This problem is still around in Drupal 6.15.
I encountered it trying to send multipart/alternative emails using the mimemail module
which uses the function drupal_html_to_text from core internally.

I feel that making that change in core is to be preferred to patching mimemail
which would mean mostly duplicating the functionality of drupal_html_to_text.

#2

Version:6.4» 8.x-dev

Still in D8.