Improve \Drupal\Core\Utility\Mail::htmlToText() [#299138]

Comment	File	Size	Author
#164	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-164.patch	32.32 KB	pillarsdotnet

#247	drupal_html_to_text-improvements-299138-247.patch	55.87 KB	pillarsdotnet

#232	drupal_html_to_text-improvements-299138-232.patch	50.12 KB	pillarsdotnet

#246	drupal_html_to_text-improvements-299138-246.patch	56.36 KB	pillarsdotnet

#243	drupal_html_to_text-improvements-299138-243.patch	55.86 KB	pillarsdotnet

#239	drupal_html_to_text-improvements-299138-239.patch	50.36 KB	pillarsdotnet

#238	drupal_html_to_text-improvements-299138-238.patch	50.18 KB	pillarsdotnet

#237	drupal_html_to_text-improvements-299138-237.patch	50.12 KB	pillarsdotnet

#236	drupal_html_to_text-improvements-299138-236.patch	50.12 KB	pillarsdotnet

#234	drupal_html_to_text-improvements-299138-234.patch	50.1 KB	pillarsdotnet

#231	drupal_html_to_text-improvements-299138-231.patch	50.17 KB	pillarsdotnet

#230	drupal_html_to_text-improvements-299138-230.patch	49.74 KB	pillarsdotnet

#228	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-228.patch	49.18 KB	pillarsdotnet

#227	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-226.patch	46.94 KB	pillarsdotnet

#225	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-225.patch	43.67 KB	pillarsdotnet

#218	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-218.patch	42.53 KB	pillarsdotnet

#216	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-216.patch	42.55 KB	pillarsdotnet

#208	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-208.patch	40.64 KB	pillarsdotnet

#204	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-204.patch	40.75 KB	pillarsdotnet

#205	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-205.patch	40.64 KB	pillarsdotnet

#200	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-200.patch	39.51 KB	pillarsdotnet

#198	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-198.patch	39.33 KB	pillarsdotnet

#190	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-190.patch	37.23 KB	pillarsdotnet

#157	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-157.patch	32.61 KB	pillarsdotnet

#167	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-167.patch	32.78 KB	pillarsdotnet

#192	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-192.patch	38.19 KB	pillarsdotnet

#197	drupal_html_to_text-fix_broken_formatting_and_forget_about_the_tests-299138-197.patch	10.11 KB	pillarsdotnet

#196	drupal_html_to_text-fix_broken_formatting_and_forget_about_the_tests-299138-195.patch	39.64 KB	pillarsdotnet

#193	drupal_html_to_text-fix_broken_formatting_and_forget_about_the_tests-299138-193.patch	9.93 KB	pillarsdotnet

#184	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-184.patch	37.19 KB	pillarsdotnet

#185	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-185.patch	37.2 KB	pillarsdotnet

#182	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-182.patch	35.79 KB	pillarsdotnet

#177	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-177.patch	34.52 KB	pillarsdotnet

#174	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-174.patch	34.52 KB	pillarsdotnet

#171	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-171.patch	34.51 KB	pillarsdotnet

#165	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-164.patch	32.09 KB	stephandale

#166	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-166.patch	32.78 KB	pillarsdotnet

#155	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-155.patch	32.63 KB	pillarsdotnet

#154	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-154.patch	32.47 KB	pillarsdotnet

#153	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-153.patch	32.86 KB	pillarsdotnet

#152	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-152.patch	33.27 KB	pillarsdotnet

#148	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-148.patch	31.07 KB	pillarsdotnet

#147	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-147.patch	31.06 KB	pillarsdotnet

#145	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-145.patch	30.9 KB	pillarsdotnet

#143	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-143.patch	30.89 KB	pillarsdotnet

#141	drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-141.patch	29.18 KB	pillarsdotnet

#140	drupal_html_to_text-tests+fix-299138-136.patch	26.05 KB	stephandale

#135	drupal_html_to_text-test_only-299138-135.patch	8.35 KB	pillarsdotnet

#135	drupal_html_to_text-tests+fix-299138-135.patch	22.17 KB	pillarsdotnet

#133	drupal_html_to_text-test_only-299138-133.patch	8.32 KB	pillarsdotnet

#133	drupal_html_to_text-tests+fix-299138-133.patch	22.28 KB	pillarsdotnet

#134	drupal_html_to_text-test_only-299138-134.patch	8.32 KB	pillarsdotnet

#134	drupal_html_to_text-tests+fix-299138-134.patch	21.62 KB	pillarsdotnet

#130	drupal_html_to_text-test_only-299138-130.patch	7.9 KB	pillarsdotnet

#130	drupal_html_to_text-tests+fix-299138-130.patch	19.76 KB	pillarsdotnet

#121	drupal_html_to_text-tests+fix-299138-121.patch	19.4 KB	pillarsdotnet

#120	drupal_html_to_text-test_only-299138-120.patch	7.93 KB	pillarsdotnet

#120	drupal_html_to_text-tests+fix-299138-120.patch	19.4 KB	pillarsdotnet

#119	drupal_html_to_text-test_only-299138-119.patch	7.99 KB	pillarsdotnet

#119	drupal_html_to_text-tests+fix-299138-119.patch	19.41 KB	pillarsdotnet

#108	drupal_html_to_text-tests_only-299138-107.patch	7.53 KB	pillarsdotnet

#108	drupal_html_to_text-tests+fix-299138-107.patch	16.97 KB	pillarsdotnet

#106	drupal_html_to_text-tests_only-299138-106.patch	7.53 KB	pillarsdotnet

#106	drupal_html_to_text-tests+fix-299138-106.patch	16.98 KB	pillarsdotnet

#105	drupal_html_to_text-tests+fix-299138-105.patch	16.95 KB	pillarsdotnet

#101	drupal_html_to_text-tests_only-299138-101.patch	7.51 KB	pillarsdotnet

#101	drupal_html_to_text-tests+fix-299138-101.patch	16.93 KB	pillarsdotnet

#99	drupal_html_to_text-tests-299138-99.patch	6.99 KB	pillarsdotnet

#99	drupal_html_to_text-tests+fix-299138-99.patch	14.9 KB	pillarsdotnet

#75	drupal_html_to_text-299138-75.patch	1.06 KB	pillarsdotnet

#94	drupal_html_to_text-tests+fix-299138-94.patch	14.73 KB	pillarsdotnet

#92	drupal_html_to_text-tests+fix-299138-92.patch	14.73 KB	pillarsdotnet

#89	drupal_html_to_text-tests_only-299138-89.patch	5.8 KB	pillarsdotnet

#89	drupal_html_to_text-tests+fix-299138-89.patch	10.02 KB	pillarsdotnet

#84	drupal_html_to_text-299138-84.patch	9.54 KB	pillarsdotnet

#82	drupal_html_to_text-299138-82.patch	15.71 KB	pillarsdotnet

#81	drupal_html_to_text-299138-81.patch	14.55 KB	pillarsdotnet

#77	drupal_html_to_text-299138-77.patch	14.57 KB	pillarsdotnet

#78	drupal_html_to_text-299138-78.patch	14.62 KB	pillarsdotnet

#69	mail.inc_.patch	585 bytes	pillarsdotnet

#66	mail.inc_.patch	548 bytes	pillarsdotnet

#68	mail.inc_.patch	545 bytes	pillarsdotnet

#57	drupal.html2text.57.patch	12.23 KB	sun

#52	issue_299138_mail.test.patch	13.28 KB	Tiburón

#37	mail.test-299138-37.patch	13.52 KB	arjenk

#33	mail.test-299138-33.patch	14.14 KB	arjenk

#32	mail.test-299138-32.patch	13.41 KB	arjenk

#29	mail.test-299138-29.patch	13.42 KB	arjenk

#25	mail.test-299138-25.patch	13.21 KB	arjenk

#22	299138-drupal_html_to_text-test_1.patch	13.02 KB	jrglasgow

#21	299138-drupal_html_to_text-test.patch	12.58 KB	jrglasgow

#17	mail.test-299138-17.patch	11.98 KB	arjenk

#13	mail.test-299138-13.patch	11.22 KB	arjenk

#11	mail.test-299138-11.patch	2.89 KB	arjenk

#9	mail.test-299138-9.patch	3.8 KB	arjenk

#8	mail.inc-299138-8.patch	1.51 KB	Kevin Hankens

#5	mail.test	1.48 KB	Kevin Hankens

Comment #1

Kevin Hankens CreditAttribution: Kevin Hankens commented 4 September 2008 at 19:38

One quick question - sorry if this is in the wrong forum. If the $allowed_tags argument is empty, would you consider using the allowed tags list from the Filtered HTML input format settings? This would be more aligned with the bulk of the site content.

I use a lot of div tags in my content and wouldn't want to pass the entire list of allowed tags plus 'div' in an array each time I call drupal_html_to_text.

Log in or register to post comments

Comment #2

Kevin Hankens CreditAttribution: Kevin Hankens commented 4 September 2008 at 14:47

Oh, I forgot to add that for me a div tag would signify a separate section, so I also wanted to see a case for it in the switch stanza that worked like the p tag.

Thanks for letting me share :)

Log in or register to post comments

Comment #3

Kevin Hankens CreditAttribution: Kevin Hankens commented 5 September 2008 at 14:51

I haven't submitted any simpletests before, but I started working on one yesterday for this function. If you are interested, I would be happy to finish it up and send it this way for review. If so, here are a couple of quick questions.

Would the testing be as simple as: a) checking that there are no HTML tags; and b) testing a piece of HTML code against an expected result - or can you foresee something more complex?
There aren't any core tests under Mail, would this require a new Group, or grouped with another?

Thanks!

Kevin

Log in or register to post comments

Comment #4

catch

he/him

English

CreditAttribution: catch commented 6 September 2008 at 00:50

1. That'd probably be fine. Anything else can always be added to later, but the idea here is provide input, pass it through the function, verify expected output, not much more than that.
2. You'd need a new group and a new mail.test - you could copy modules/simpletest/tests/form.test (or some other .inc that already as tests) to get started quickly.

Also feel free to post patches that are still in progress if you think there's something worth looking at, even if it's not quite ready.

Log in or register to post comments

Comment #5

Kevin Hankens CreditAttribution: Kevin Hankens commented 9 September 2008 at 19:52

File	Size
mail.test	1.48 KB

Here's a proposed mail.test file. It's a little rough, but the tests worked, and I was able to make them fail by manipulating the output of drupal_html_to_text(). Here are some considerations:

The line breaks in the expected output are ugly, but I wasn't able to replicate it using new line codes. Perhaps we could just use a simpler piece of HTML for testing.
The regex expression for the HTML matching is something that I've had saved for quite some time, but I don't know the identity of the original author. If this is a problem, we'll need a Drupal coder to replace it with something of ours.
I tried using assertNoPattern() instead of using preg_match with a boolean because it was cleaner, but I couldn't get it to work without it throwing a warning. It appeared to work, but I got the yellow warning bar saying "delimiters cannot be alpha numeric or a back slash"

Otherwise, take a look and see if it's something you can use!

Log in or register to post comments

Comment #6

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 9 September 2008 at 20:03

Status:

Active

» Needs review

Marking needs review.

Log in or register to post comments

Comment #7

Kevin Hankens CreditAttribution: Kevin Hankens commented 9 September 2008 at 20:32

On a related note, here are two things that I noticed in mail.inc that might warrant revision - hope it's ok to compile both into one patch!

The first change would also allow single quotes in <a> tags to generate footnote URLs.
The second creates a case for <div> tags which act like a  tag, since divs are usually a separate piece of info

Thanks!

[edit]Just found an error - will post revision soon[/edit]

Log in or register to post comments

Comment #8

Kevin Hankens CreditAttribution: Kevin Hankens commented 9 September 2008 at 20:37

File	Size
mail.inc-299138-8.patch	1.51 KB

Apologies, I forgot to include div in the list of allowed tags.

One other note, would it make sense to include a blank line before paragraphs, divs, etc.?

Log in or register to post comments

Comment #9

arjenk CreditAttribution: arjenk commented 6 November 2008 at 21:34

File	Size
mail.test-299138-9.patch	3.8 KB

Well, i started reviewing the test patch, but ended writing it. Sorry about that. Some notes:

I solved the the not-so-nice-linebreaks with the use of a define LF
the tests are all defined in an array for readability;
Kevin, i'm not sure what to do with the div patch, since the issue is about the making of tests, i would say move it to another issue with the div patch alone?;
next step should probably be tests for the parameter $allowed_tags, or more for exceptional use, or combined use of tags, or can we accept the scope of the patch as is, considering the size of it?

Log in or register to post comments

Comment #10

drewish CreditAttribution: drewish commented 6 November 2008 at 22:53

Status:

Needs review

» Needs work

A few things:

Why are we using the setUp() function if there's only one test?
Comments need proper capitalization.
I don't think LF is an improvement, what's wrong with "\n"?
It's not clear to me what the second half of the test is doing... it seems like it's testing a separate aspect and would be better served as a separate test with a proper description.

Log in or register to post comments

Comment #11

arjenk CreditAttribution: arjenk commented 8 November 2008 at 06:48

File	Size
mail.test-299138-11.patch	2.89 KB

cleanup

Log in or register to post comments

Comment #12

drewish CreditAttribution: drewish commented 9 November 2008 at 02:34

Status:

Needs work

» Needs review

visually it looks a lot better. haven't tried running them.

Log in or register to post comments

Comment #13

arjenk CreditAttribution: arjenk commented 12 November 2008 at 10:14

File	Size
mail.test-299138-13.patch	11.22 KB

Thanks,

After searching Drupal for references to drupal_html_to_text i stumbled upon the initial patch for this function (#154218: Add HTML to text convertor for email (and wrap mails properly)) by Steven. Here he showed a sample conversion, which gave me the idea to add an more functional unit test to the patch. Pro: it tests the function in a more reallife (tm) use, Cons: i adds a lot of lines to the mail.test file. What do you think?

test results so far:

There is some strange behaviour with utf8 characters, and some trivial bugs in drupal_html_to():

A single <li>foo</li> gives a PHP:notice;
There is always an extra whiteline at the end;
There are trailing whitespaces in the output.

I will make a separate issue for these findings (if not too trivial), after this patch is committed ;-) ...

Please comment.

Log in or register to post comments

Comment #14

16 November 2008 at 22:00

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #15

lilou CreditAttribution: lilou commented 17 November 2008 at 14:14

Status:

Needs work

» Needs review

See: #335122: Test clean HEAD after every commit and http://pastebin.ca/1258476

Log in or register to post comments

Comment #16

catch

he/him

English

CreditAttribution: catch commented 19 November 2008 at 15:44

Category:	bug	» task
Status:	Needs review	» Needs work

There are redundant empty lines in the phpdoc. Also I think this could use some extra inline comments about what's being tested and how.

Log in or register to post comments

Comment #17

arjenk CreditAttribution: arjenk commented 22 November 2008 at 18:05

Status:

Needs work

» Needs review

File	Size
mail.test-299138-17.patch	11.98 KB

Tried to improve comments and generated output.

Log in or register to post comments

Comment #18

catch

he/him

English

CreditAttribution: catch commented 11 January 2009 at 18:38

Component:	tests	» base system
Issue tags:		+Needs tests

Log in or register to post comments

Comment #19

cwgordon7 CreditAttribution: cwgordon7 commented 22 January 2009 at 01:31

It probably doesn't matter, but since this is a test case and should try to be as near the actual case as possible, shouldn't valid HTML be used? For example, the line <ul><li>Drupal<li>Drupal</ul> should probably be <ul><li>Drupal</li><li>Drupal</li></ul>, unless there is a reason for the invalid htm — perhaps you were trying to test the function's handling of such a case?

Log in or register to post comments

Comment #20

22 January 2009 at 09:45

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #21

jrglasgow CreditAttribution: jrglasgow commented 7 March 2009 at 18:39

Status:

Needs work

» Needs review

File	Size
299138-drupal_html_to_text-test.patch	12.58 KB

I added the tests suggested by cwgordon7 in #19 in additional to the tests already there.

Log in or register to post comments

Comment #22

jrglasgow CreditAttribution: jrglasgow commented 7 March 2009 at 18:55

File	Size
299138-drupal_html_to_text-test_1.patch	13.02 KB

I fixed some alignment problem based on the recommendation of cwgordon7

Log in or register to post comments

Comment #23

cwgordon7 CreditAttribution: cwgordon7 commented 7 March 2009 at 21:55

Status:

Needs review

» Reviewed & tested by the community

Code style is good, and all tests passed, so rtbc. :)

Log in or register to post comments

Comment #24

sun

German

Karlsruhe

CreditAttribution: sun commented 24 March 2009 at 01:48

Status:

Reviewed & tested by the community

» Needs work

+      'description'  => t('Test mail.inc function drupal_html_to_text().'),

Spot the double spacing here.

+  function testDrupalHtmlToText() {
+
+    $html_to_text_cases = array(
...
+  function testDrupalHtmlToTextArgs() {
+

Please remove the blank line.

+      '<blockquote>Drupal</blockquote>'                 => '>Drupal' . "\n",

Shouldn't it actually be '> Drupal' ?

+      '<ol><li>Drupal<li>Drupal</ol>'                   => ' 1) Drupal' . "\n" . ' 2) Drupal' . "\n\n",
+      '<ol><li>Drupal</li><li>Drupal</li></ol>'         => ' 1) Drupal' . "\n" . ' 2) Drupal' . "\n\n",

Could we ammend the first line with a comment that states that we're explicitly testing malformed HTML?
Also, there is no test for an UL in an OL and vice-versa.

+      '<h2>Drupal</h2>'                                 => '-------- DRUPAL ' . str_repeat("-", 62). "\n\n",

Concatentation mistake here.

Log in or register to post comments

Comment #25

arjenk CreditAttribution: arjenk commented 25 May 2009 at 18:51

Status:

Needs work

» Needs review

File	Size
mail.test-299138-25.patch	13.21 KB

More code cleanup, coder gives no errors anymore. Also i changed the test to DrupalUnitTestCase for speedup of the test (see #464714: Speed up the tests).

'<blockquote>Drupal</blockquote>'                 => '>Drupal' . "\n",

True, '> Drupal' makes more sense, lets fix this after the unittest gets in. I also added a test with a nested <ul> and <ol>, and visa-versa.

Log in or register to post comments

Comment #26

stella CreditAttribution: stella commented 26 May 2009 at 18:11

Status:

Needs review

» Needs work

Minor fixes required:
There are a number of trailing spaces in the patch. Also should 'Drupal Drupal' not really be 'Drupal Drupal' inline with Drupal standards, or is that   on purpose as part of the test?

Also not sure about the assertion message text below, it could use some simplification:

$this->assertEqual($result, $expected_result, t('Test output of drupal_html_to_text(Long html mail with all sorts of html tags.)'));

Ditto with the assertion messages for the assertions in testDrupalHtmlToTextArgs(). I'd rather English sentences explaining the functionality that passed/failed, rather than a piece of code. And how does the 3rd assertion here differ in functionality from the 1st? They both appear to be testing that the array() overrides the default html tags allowed.

Other than those few things, patch is looking good, great work!

Log in or register to post comments

Comment #27

sun

German

Karlsruhe

CreditAttribution: sun commented 26 May 2009 at 18:21

Also should 'Drupal
Drupal' not really be 'Drupal
Drupal' inline with Drupal standards, or is that
on purpose as part of the test?

Good catch! The answer is: both. We need to test both, since the handling of   and   may break. Whereas the former should go into the "malformed" array section.

The other remarks from stella I agree with.

This is a wonderful & great patch! Let's get it done! :)

Log in or register to post comments

Comment #28

sun

German

Karlsruhe

CreditAttribution: sun commented 26 May 2009 at 18:26

Ah well. And that was a nice test for HTML Filter...

Log in or register to post comments

Comment #29

arjenk CreditAttribution: arjenk commented 27 May 2009 at 17:41

File	Size
mail.test-299138-29.patch	13.42 KB

Thanks for the reviews!

ah, those trailing spaces; believe it on not, they are on purpose. Those are generated by drupal_html_to_text(). Removing them would cause this test to fail. This is added as a comment in the
code.
Added separate tests for the  and  tags.
Tried to improve the test messages to more English sentences; t('Test of drupal_html_to_text() with a long html mail.'). Could use a suggestion here, if this is not what you were looking for.
The idea of the third test of testDrupalHtmlToTextArgs() was to test if passing a second parameter would not break the existing funcionality. I combined this now with passing two tags, so it makes somewhat more sense (...)

Log in or register to post comments

Comment #30

arjenk CreditAttribution: arjenk commented 27 May 2009 at 17:45

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #31

sun

German

Karlsruhe

CreditAttribution: sun commented 27 May 2009 at 18:07

Status:

Needs review

» Needs work

Now that I look at it again:

+class MailIncTestCase extends DrupalUnitTestCase {

I think we should call this DrupalHtmlToTextTestCase.

+      'description' => t('Test mail.inc function drupal_html_to_text().'),

Likewise, mail.inc function can be removed here.

+      $this->assertEqual($result,  $expected_text, t('drupal_html_to_text(@func_call ...)', array('@func_call' => drupal_substr($html_string, 0, 15))));

Duplicate space before $expected_text.

+   * Tests html_to_text() with a full mail example.

s/Tests/Test/ - PHPDoc summaries never use the third form.

+    // This is the html code to test.

HTML we want to write uppercase here.

+    /*
+     * The expected result after html_to_text($html).
+     * Do not remove the trailing whitespaces, as these are generated, and
+     * will otherwise fail this test.
+     */

That's a wrong notation for inline comments. We only use multiline comments for PHPDoc, just use // here.
Also, I believe the function name is drupal_html_to_text(), no?
Can we simplify that to Trailing white-space is required to match the output of drupal_html_to_text(). ?

+   * Test $allowed_tags argument of html_to_text().

One more instance of html_to_text() here.

Log in or register to post comments

Comment #32

arjenk CreditAttribution: arjenk commented 27 May 2009 at 18:34

Status:

Needs work

» Needs review

File	Size
mail.test-299138-32.patch	13.41 KB

re-roll with sun's comments.

Log in or register to post comments

Comment #33

arjenk CreditAttribution: arjenk commented 16 June 2009 at 20:53

File	Size
mail.test-299138-33.patch	14.14 KB

Some small updates according to the latest test guidelines [http://drupal.org/node/325974]:

The getInfo is now a public static, and the descriptions use a more active tense now. Also changed mail to e-mail, and spotted a missed lowercase html. Last: added an mail.test entry to simpletest.info, otherwise it wont
show up in the testlist.

Log in or register to post comments

Comment #34

kscheirer

English

Vallejo

CreditAttribution: kscheirer commented 14 July 2009 at 00:05

will review this when I get a chance

Log in or register to post comments

Comment #35

kscheirer

English

Vallejo

CreditAttribution: kscheirer commented 18 July 2009 at 03:29

Status:

Needs review

» Needs work

All the tests pass, this is a great addition!

just a few quibbles:

Need a newline at the end of modules/simpletest/tests/mail.test to follow coding standards
You have two tests that report with the message "Tests drupal_html_to_text() basic working of the parameter $allowed_tags." - perhaps the second one could explain how it differs from the first?
When reporting the function and parameters used, why truncate at 15 characters? Most of the time the test text will not be seen by anyone, but when one fails it will be useful to see the complete test performed.

Log in or register to post comments

Comment #36

lilou CreditAttribution: lilou commented 19 July 2009 at 12:47

Now, t() is removed in getinfo() : #500866: [META] remove t() from assert message

Log in or register to post comments

Comment #37

arjenk CreditAttribution: arjenk commented 3 September 2009 at 14:29

Status:

Needs work

» Needs review

File	Size
mail.test-299138-37.patch	13.52 KB

# Need a newline at the end of modules/simpletest/tests/mail.test to follow coding standards

changed.

# You have two tests that report with the message "Tests drupal_html_to_text() basic working of the parameter $allowed_tags." - perhaps the second one could explain how it differs from the first?

how about: "Tests drupal_html_to_text() with tags which are not given by the parameter $allowed_tags.". Better suggestions are welcome!

# When reporting the function and parameters used, why truncate at 15 characters? Most of the time the test text will not be seen by anyone, but when one fails it will be useful to see the complete test performed.

ok, looks better indeed. Also removed the t() there as t('drupal_html_to_text()') makes no sense.

#Now, t() is removed in getinfo()

done.

Log in or register to post comments

Comment #38

16 October 2009 at 12:20

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #39

sun.core CreditAttribution: sun.core commented 8 January 2010 at 02:50

Priority:

Critical

» Normal

Tests don't qualify as critical.

Log in or register to post comments

Comment #40

sreynen CreditAttribution: sreynen commented 18 April 2010 at 23:15

These tests seem to be failing due to a bug in the actual function. That is, I think the tests are working as intended, and catching actual errors. Specifically, line breaks aren't be added as expected in drupal_html_to_text(). It's not clear to me whether it's really necessary to have line breaks at the end of many of the examples, but there definitely needs to be a line break in place of an actual page break (e.g. "Drupal Drupal", and there isn't currently.

So I think the function needs to be updated to pass the tests, but I'm not very familiar with the function. Maybe someone more familiar with the function can confirm this?

Log in or register to post comments

Comment #41

gowriabhaya CreditAttribution: gowriabhaya commented 22 April 2010 at 08:28

Title:	TestingParty08: drupal_html_to_text()	» Tests for drupal_html_to_text()
Issue tags:		+TestingPartySF

Code sprint tag

Log in or register to post comments

Comment #42

MichaelCole CreditAttribution: MichaelCole commented 22 April 2010 at 21:09

Working this today. Will fix test and code, and resubmit patch.

Log in or register to post comments

Comment #43

MichaelCole CreditAttribution: MichaelCole commented 22 April 2010 at 22:33

Giving up. Catch was unavailable on IRC.

Problem is how newlines are tested in the tests.

Log in or register to post comments

Comment #44

Eric_A CreditAttribution: Eric_A commented 29 April 2010 at 19:22

#508738: drupal_html_to_text negative padding error

Log in or register to post comments

Comment #45

Eric_A CreditAttribution: Eric_A commented 3 May 2010 at 07:17

The testbot is happy in #508738: drupal_html_to_text negative padding error. Please review! (This issue depends on that one.)

Log in or register to post comments

Comment #46

Eric_A CreditAttribution: Eric_A commented 4 May 2010 at 17:15

Status:	Needs work	» Needs review
Issue tags:	-Needs tests, -TestingPartySF

#37: mail.test-299138-37.patch queued for re-testing.

Log in or register to post comments

Comment #47

4 May 2010 at 18:00

Status:	Needs review	» Needs work
Issue tags:		+Needs tests, +TestingPartySF

The last submitted patch, mail.test-299138-37.patch, failed testing.

Log in or register to post comments

Comment #48

arjenk CreditAttribution: arjenk commented 21 May 2010 at 14:27

Patch 1.26 of mail.inc (http://drupalcode.org/viewvc/drupal/drupal/includes/mail.inc?r1=1.25&r2=...) broke the test by doing:

@@ -466,7 +496,7 @@
         $chunk = $casing($chunk);
       }
       // Format it and apply the current indentation.
-      $output .= drupal_wrap_mail($chunk, implode('', $indent)) . "\n";
+      $output .= drupal_wrap_mail($chunk, implode('', $indent));
       // Remove non-quotation markers from indentation.
       $indent = array_map('_drupal_html_to_text_clean', $indent);
     }

The commit message was "- Patch #356074 by chx, Damien Tournoud: provide a sequences API." Which does not make much sense to me for this change... Perhaps the idea was to replace \n by MAIL_LINE_ENDINGS ?

Log in or register to post comments

Comment #49

sun

German

Karlsruhe

CreditAttribution: sun commented 21 May 2010 at 15:47

Just remove "\n" from all the assertions. As you can see, the input text of those assertions does not contain a newline or BR, so the outlined change in #48 was a fix to not insert newlines, which do not actually exist; therefore making newlines match between input and output.

Log in or register to post comments

Comment #50

kscheirer

English

Vallejo

CreditAttribution: kscheirer commented 1 August 2010 at 05:20

Title:	Tests for drupal_html_to_text()	» Fix and test drupal_html_to_text()
Category:	task	» bug

Man, this took a while to track down. First of all, Dries mixed up his commit messages! Arjenk, the issue for that patch is really #407452: Remove use of drupal_html_to_text() from system mails. Which was basically sun's patch.

sun - what wasn't obvious (because we had no tests for this function!) was that many of the tags were relying on that extra \n to format themselves properly. Stuff like h1, h2, dl, br, p, ol, ul, li, dl, dt, dd, etc. I think your patch was trying to make this function more general - but it doesn't seem to fulfill its original purpose now. Also _drupal_html_to_text_pad() is doing a substr, relying on the fact that there was an extra \n to throw away.

So right now drupal_html_to_text('Drupal Drupal') returns DrupalDrupal, which is probably not what we want. Some of these can be fixed by changing
$chunk = ''; // Ensure blank new-line. to $chunk = "\n"; // Ensure blank new-line. .

But that doesn't take care of all the cases - like '<ul><li>Drupal</li><li><ol><li>Drupal</li><li>Drupal</li></ol></li></ul>' - what text should that result in? Reading RFC 3676 did not make it clear either.

Arjenk - you wrote many of the test cases way back, based on #154218: Add HTML to text convertor for email (and wrap mails properly) - which included some examples that aren't available online anymore. It now looks like we have broken tests for a broken function. This may be a case where we write the test cases first, and then have the function match them - anyone up for it?

Log in or register to post comments

Comment #51

Tiburón CreditAttribution: Tiburón commented 22 August 2010 at 12:01

Assigned:

Unassigned

» Tiburón

Assigning to me.

(DrupalCon CPH 2010 Core Dev Summit)

Log in or register to post comments

Comment #52

Tiburón CreditAttribution: Tiburón commented 22 August 2010 at 14:27

File	Size
issue_299138_mail.test.patch	13.28 KB

The attached patch (based on #37) fixes the expected result of five tests of which four currently parses locally.

The tests for em, i, strong, b and br incorrectly expected a newline at the end of the result with the given input.

The test for br still fails as br is broken in drupal_html_to_text() (allowed but not handled).

In includes/mail.inc the functions drupal_html_to_text() and _drupal_html_to_text_pad() both needs to be fixed.

Log in or register to post comments

Comment #53

arjenk CreditAttribution: arjenk commented 27 August 2010 at 10:59

Wait, i'm puzzled here

First, kscheirer gave a excellent summary. And i agree with his
conclusion that the function is broken, but i'm not sure we can fix it
in the test.

The function is called drupal_html_to_text(), it is (was) a function
that was capable of formatting html to readable, formatted (hear the
newlines here?) text. It did need to add newlines in order to make
something out of the orginal html. To not allow the function to add
newlines, is to break the use of the function. So that is the status
of the current function: broken.

to give an example to illustrate:

[user:name],

<p>Thank you for registering at <a href="[site:url]">[site:name]</a>. Your application for an account is currently pending approval. </br>Once it has been approved, you will receive another e-mail containing information about 

<ul>
<li>how to log in,
<li>set your password, 
<li>and other details.
</p>

--  [site:name] team

(this is not the real text from core) gave the following text email:

test,

Thank you for registering at localhost [1]. Your application for an account
is currently pending approval. Once it has been approved, you will receive
another e-mail containing information about

  * how to log in,
  * set your password,
  * and other details.


    --  localhost team


[1] http://localhost/drupal7/

but now:

test,

Thank you for registering at localhost [1]. Your application for an account
is currently pending approval. Once it has been approved, you will receive
another e-mail containing information about * how to log in, * set your  
password, * and other details.


    --  localhost team
[1] http://localhost/drupal7/

The options are, to my opinion:

If it really is a problem to change the number of newlines, remove
the function from core, and change it with strip_tags, or
likewise. Which effectively removes all tags in mails.
Make a limited drupal_html_to_text() that only accepts (em, i,
strong, b) and removes the others. Perhaps also change the name as
well.
Agree that the fix in #407452: Remove use of drupal_html_to_text() from system mails broke the function, and go back to
there and try to fix it in another way. Perhaps make a test first if
we are dealing with a html text, and then call the function, and
else do not...

(EDIT: fixed the correct issue number)

Log in or register to post comments

Comment #54

sun

German

Karlsruhe

CreditAttribution: sun commented 27 August 2010 at 13:49

Status:

Needs work

» Needs review

Last option 3) in #53 is not possible. I tried to track and keep up with this issue, but didn't manage to do so. I'm not sure whether I understand #50. I like the tests added in #53, but didn't verify in detail whether they are valid. Overall, drupal_html_to_text() implements quite complex logic, so I'm pretty sure that it is possible to achieve any newline behavior we can think of. We should start by making sure that our test contains all expectations. Afterwards fix the logic.

Log in or register to post comments

Comment #55

27 August 2010 at 14:20

Status:

Needs review

» Needs work

The last submitted patch, issue_299138_mail.test.patch, failed testing.

Log in or register to post comments

Comment #56

kscheirer

English

Vallejo

CreditAttribution: kscheirer commented 31 August 2010 at 18:33

We should start by making sure that our test contains all expectations.

Agreed, we're back to square one.

Define what exactly this function is supposed to do
Write tests for it
Fix the function to pass tests

Need to track down when this function came into Drupal and see what its original purpose was?

@sun: The short version of #50 is - your patch in #407452: Remove use of drupal_html_to_text() from system mails broke this function (or at least, it doesn't do what it did before that patch). There were no tests for this function in core to catch that. The old function may not have been correct, but the current version definitely is not.

Log in or register to post comments

Comment #57

sun

German

Karlsruhe

CreditAttribution: sun commented 31 August 2010 at 20:42

Status:

Needs work

» Needs review

File	Size
drupal.html2text.57.patch	12.23 KB

I've slightly fixed the tests (which also means that expectations might be slightly changed).

The tests added here should definitely not try to be "compatible" with the current output of drupal_html_to_text(), but rather assert our expectations.

Our expectations are pretty clear: A proper conversion from HTML to plain text. If you happen to get e-mail notifications from drupal.org, then you may know most details of the expected format already.

Test results are clear, the current function is horribly broken. We should have another round to make sure that the expectations are right, then start to actually fix the bugs.

Log in or register to post comments

Comment #58

sun

German

Karlsruhe

CreditAttribution: sun commented 31 August 2010 at 20:44

Title:	Fix and test drupal_html_to_text()	» drupal_html_to_text() is broken
Component:	base system	» mail system
Assigned:	Tiburón	» sun

Feedback welcome.

Log in or register to post comments

Comment #59

31 August 2010 at 21:30

Status:

Needs review

» Needs work

The last submitted patch, drupal.html2text.57.patch, failed testing.

Log in or register to post comments

Comment #60

arjenk CreditAttribution: arjenk commented 1 September 2010 at 19:14

It would also help to add a test for the issue which led to the troublesome patch. Perhaps it had something to do with wordwrap and adding a newline on long lines? So how about adding the test:

str_repeat('Drupal ', 15) => str_repeat('Drupal ', 11) . "\n" . str_repeat('Drupal ', 4)

Log in or register to post comments

Comment #61

arjenk CreditAttribution: arjenk commented 1 September 2010 at 19:30

rfc3676 :

A generating agent SHOULD:

   o  Ensure all lines (fixed and flowed) are 78 characters or fewer in
      length, counting any trailing space as well as a space added as
      stuffing, but not counting the CRLF, unless a word by itself
      exceeds 78 characters.

   o  Trim spaces before user-inserted hard line breaks.

among others, we should also add some tests to test if this works properly.

Log in or register to post comments

Comment #62

sun

German

Karlsruhe

CreditAttribution: sun commented 3 October 2010 at 16:42

Title:	drupal_html_to_text() is broken	» drupal_html_to_text() formatting is broken and does not have tests
Priority:	Normal	» Major

ok, but what about the existing expectations/assertions in the patch? I need a confirmation that they are correct, before I'll continue with this patch.

Log in or register to post comments

Comment #63

sun.core CreditAttribution: sun.core commented 14 November 2010 at 21:53

Status:

Needs work

» Postponed (maintainer needs more info)

Either we have serious problems here or we don't. Feedback on the added tests would be helpful to move on here (and fix the actual bugs).

Log in or register to post comments

Comment #64

kscheirer

English

Vallejo

CreditAttribution: kscheirer commented 15 November 2010 at 00:47

Well considering we don't even have enough information to write tests, I think we have a serious problem here. No one posting in this issue queue seems to know what the tests should look like. Someone has to know what this function is supposed to do, can we find the person that committed it?

Log in or register to post comments

Comment #65

sun

German

Karlsruhe

CreditAttribution: sun commented 13 January 2011 at 13:36

Status:

Postponed (maintainer needs more info)

» Needs work

We already have tests in the latest patch, the current task is to confirm (or object to) the expectations in the tests.

Only with precisely tested expectations, we can fix drupal_html_to_text().

Log in or register to post comments

Comment #66

pillarsdotnet CreditAttribution: pillarsdotnet commented 10 March 2011 at 23:02

Status:

Needs review

» Needs work

File	Size
mail.inc_.patch	548 bytes

Patch to mail.inc restoring functionality in converting lists.

Log in or register to post comments

Comment #67

pillarsdotnet CreditAttribution: pillarsdotnet commented 10 March 2011 at 21:25

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #68

pillarsdotnet CreditAttribution: pillarsdotnet commented 10 March 2011 at 23:01

File	Size
mail.inc_.patch	545 bytes

That failed. Let's try just reversing the commit that caused the current brokenness.

Log in or register to post comments

Comment #69

pillarsdotnet CreditAttribution: pillarsdotnet commented 11 March 2011 at 05:09

Status:

Needs work

» Needs review

File	Size
mail.inc_.patch	585 bytes

I wish the testbot would make up its mind as to whether it prefers "-p0" or "-p1" in its patches...

EDIT: Now I see -- earlier patch was hand-edited and I deleted a couple lines i shouldn't have.

Log in or register to post comments

Comment #70

pillarsdotnet CreditAttribution: pillarsdotnet commented 11 March 2011 at 01:07

Note: all tests passed.

Log in or register to post comments

Comment #71

Pancho

UTC+2 🇪🇺 EU

CreditAttribution: Pancho commented 25 March 2011 at 16:54

Status:

Needs review

» Needs work

This bug renders the Simplenews module unusable and blocks further progress towards a D7 release there.
Please commit what you have there. #69 fixes the most urgent problem with drupal_html_to_text(). Further tests are a good idea but shouldn't block fixing a major bug. Thanks.

Log in or register to post comments

Comment #72

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 15:05

Title:	Fix the broken formatting in drupal_html_to_text() and also provide tests	» drupal_html_to_text() formatting is broken and does not have tests
Version:	8.x-dev	» 7.x-dev
Assigned:	Unassigned	» sun
Status:	Needs review	» Needs work
Issue tags:	-Needs backport to D6, -Needs backport to D7	+Needs tests, +TestingPartySF

Log in or register to post comments

Comment #73

Pancho

UTC+2 🇪🇺 EU

CreditAttribution: Pancho commented 27 March 2011 at 10:23

Status:

Needs work

» Needs review

@pillarsdotnet: Please reconsider your attitude. (You're not talking to a script kiddie, and even then you don't need to assume it was bullshit.)

Actually, I'm using Simplenews on D7 just as well to deliver our newsletter every two or three days.
So "unusable and blocking further progress" might in fact be a bit overstated. Correct would have been: "out of the box it is unusable, and because the bug is to be solved upstream, it paralyzes further development, plus: because admins don't get it work correctly, they refrain from further testing".

I thought to be specific enough, because it is obvious enough what patch #69 does. However to be even more detailed:
After applying #69 from this queue, drupal_html_to_text() stopped stripping off newlines from my simplemail newsletters (see #1099188: simplenews_html_to_text() strips off newlines).

Now, how come you're obviously not affected by the bug?
You might be using a different backend, such as Mime Mail, bypassing drupal_html_to_text().

Now the title of this major (!) issue here states "drupal_html_to_text() formatting is broken", so it doesn't mainly refer to adding tests. So what I'd like you to consider is fixing "drupal_html_to_text()" asap based on the tests we have now, even if the tests are not 100% complete. Thank you.

Log in or register to post comments

Comment #74

Simon Georges CreditAttribution: Simon Georges commented 17 April 2011 at 15:51

Closing #1099188: simplenews_html_to_text() strips off newlines as a duplicate of this one, and subscribing.
If I may be of any help to help solve this issue, I'm clearly willing to help, to stop people raising this in Simplenews issue queue ;-).

Log in or register to post comments

Comment #75

pillarsdotnet CreditAttribution: pillarsdotnet commented 22 April 2011 at 02:58

Assigned:	Unassigned	» sun
Category:	feature	» bug

File	Size
drupal_html_to_text-299138-75.patch	1.06 KB

Bumping, tagging, and re-rolling according to standard policy.

@sun: I still haven't mastered simpletest, but a minimum test requirement should be that:

Chunks of text located within separate block-level HTML tags must be separated by (at least) a newline in the plaintext version.
Chunks of text separated by a   or <hr /> tag must be separated by (at least) a newline in the plaintext version.

Log in or register to post comments

Comment #76

sun

German

Karlsruhe

CreditAttribution: sun commented 17 April 2011 at 18:52

Version:	7.x-dev	» 8.x-dev
Status:	Needs review	» Needs work
Issue tags:	-TestingPartySF	+Needs backport to D7

It would tremendously help if people would re-roll the proper patch. See #57.

Log in or register to post comments

Comment #77

pillarsdotnet CreditAttribution: pillarsdotnet commented 17 April 2011 at 19:11

File	Size
drupal_html_to_text-299138-77.patch	14.57 KB

@sun -- Obviously we want to do more, but at least the single-line change in #75 should be committed ASAP, because it fixes an obvious regression.

Added patch from #57, part of which had already been applied.

Also added my suggested test from #75 above.

Log in or register to post comments

Comment #78

pillarsdotnet CreditAttribution: pillarsdotnet commented 17 April 2011 at 19:08

Status:

Needs review

» Needs work

File	Size
drupal_html_to_text-299138-78.patch	14.62 KB

Sorry; forgot the second argument to assertFalse().

Log in or register to post comments

Comment #79

pillarsdotnet CreditAttribution: pillarsdotnet commented 17 April 2011 at 19:08

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #80

17 April 2011 at 20:32

The last submitted patch, drupal_html_to_text-299138-78.patch, failed testing.

Log in or register to post comments

Comment #81

pillarsdotnet CreditAttribution: pillarsdotnet commented 17 April 2011 at 21:08

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-299138-81.patch	14.55 KB

Trying again...

Log in or register to post comments

Comment #82

pillarsdotnet CreditAttribution: pillarsdotnet commented 17 April 2011 at 21:14

File	Size
drupal_html_to_text-299138-82.patch	15.71 KB

As long as I'm doing major surgery to drupal_html_to_text(), I might as well merge in #345931: Support value attribute in ordered lists.

Log in or register to post comments

Comment #83

17 April 2011 at 22:32

Status:

Needs review

» Needs work

The last submitted patch, drupal_html_to_text-299138-82.patch, failed testing.

Log in or register to post comments

Comment #84

pillarsdotnet CreditAttribution: pillarsdotnet commented 17 April 2011 at 23:05

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-299138-84.patch	9.54 KB

Removed the "Full HTML Document" test because:

I couldn't get it to pass no matter what I did.
I think it's going about things the wrong way.

Log in or register to post comments

Comment #85

pillarsdotnet CreditAttribution: pillarsdotnet commented 17 April 2011 at 23:09

Assigned:

sun

» pillarsdotnet

Log in or register to post comments

Comment #86

sun

German

Karlsruhe

CreditAttribution: sun commented 17 April 2011 at 23:44

Status:

Needs review

» Needs work

Removing a test that has been added for a good reason is a bad idea. The stated reasons are, in fact, critical reasons to keep it.
Merging in other feature requests will do nothing else than to slow down progress on this issue even more.

Log in or register to post comments

Comment #87

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 April 2011 at 00:38

Category:	feature	» bug
Status:	Needs review	» Needs work

@sun -- You fix it, then. This whole issue is a "feature request". The only part that was a bugfix was the patch in #75 which you so offhandedly rejected.

EDIT: Opened #1130198: Regression: line-breaks are mangled by \Drupal\Core\Utility\Mail::htmlToText()

Log in or register to post comments

Comment #88

pillarsdotnet CreditAttribution: pillarsdotnet commented 17 April 2011 at 23:55

Assigned:	pillarsdotnet	» Unassigned
Category:	bug	» feature

Log in or register to post comments

Comment #89

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 April 2011 at 00:02

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-tests+fix-299138-89.patch	10.02 KB

drupal_html_to_text-tests_only-299138-89.patch	5.8 KB

To justify a possible RTBC by someone other than @sun:

Log in or register to post comments

Comment #90

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 April 2011 at 01:10

Category:	bug	» feature
Status:	Needs work	» Needs review

Tests-only: failed (Good!)

Tests+fix: passed (Good!)

Log in or register to post comments

Comment #91

BeatnikDude CreditAttribution: BeatnikDude commented 18 April 2011 at 05:59

subscribe

Log in or register to post comments

Comment #92

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 April 2011 at 15:50

File	Size
drupal_html_to_text-tests+fix-299138-92.patch	14.73 KB

Replaced all "\n" with MAIL_LINE_ENDINGS for consistency. May possibly fail tests now, as tests assume: (MAIL_LINE_ENDINGS === "\n")

This begs the question: Should we write separate tests for each possible value of MAIL_LINE_ENDINGS?

Log in or register to post comments

Comment #93

18 April 2011 at 16:32

Status:

Needs review

» Needs work

The last submitted patch, drupal_html_to_text-tests+fix-299138-92.patch, failed testing.

Log in or register to post comments

Comment #94

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 April 2011 at 17:47

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-tests+fix-299138-94.patch	14.73 KB

Fixed typo.

Log in or register to post comments

Comment #95

Simon Georges CreditAttribution: Simon Georges commented 21 April 2011 at 05:41

Is there still something missing here ?
I'm really looking forward to having it in the 7.1, since we can't release Simplenews without it.

Log in or register to post comments

Comment #96

pillarsdotnet CreditAttribution: pillarsdotnet commented 21 April 2011 at 19:06

You can help by posting a review of the patch in #94. An acceptable review would include the following:

Confirm that all code and comments comply with Drupal coding standards.
Confirm that the tests included with the patch are necessary and sufficient.
Confirm that the patch actually solves the problem. Describe a reproducible test where the current code fails and the patched code succeeds.
Change the issue status from "Needs Review" to "Reviewed and Tested By the Community".

If, after thorough testing, you cannot in good conscience affirm all of the above, please explain what is lacking in the current patch and I will do my best to address your concerns.

Log in or register to post comments

Comment #97

Lars Toomre CreditAttribution: Lars Toomre commented 22 April 2011 at 02:43

@pillarsdotnet:

I spent some time reading through the patch file tonight and the new tests. As a whole, the new patch and tests look good. However, I would suggest that you expand the tests to include some additional cases:

1) Multiple good tags override
$text = '<ul><li>This is a complex text string.</li></ul>';
$result = drupal_html_to_text($text, array('em', 'ul'));

2) Purposefully mal-formed tags in string test (from user input)
$text = '<ul><li>This is a complex text string.</ul></ul>';
$result = drupal_html_to_text($text, array('li','em'));
should be => ' * This is a *complex* text string.'

3) Ignorant user example of highlighting
$text = "This is highlighted text.';
$result = drupal_html_to_text($text, array('B'));
should be => 'This is *highlighted* text.'

Cases 2) and 3) I have experienced in the last few months from users hand-entered node bodies and/or comments. These are examples of the edge cases that I think this particular function needs to test for. Users are notorious for what they might enter without trying to be malicious.

I am unsure whether this function is intended to be used with utf-8 strings. If it is, I would want to add some text strings that include characters outside of the usual ASCII range like the European special vowels and non-Roman language sets.

Finally, I can easily see this function used as part of a larger webpage data extraction function. That typically involves using something like cURL to retrieve a webpage as $text and then start stripping out the tags to get to whatever information is desired. Hence, I think that we also want a test that takes a mocked up web page and demonstrate that this function handles that "complex case" as well.

Edit: Two further thoughts/questions:

1) What happens if 'html' is passed in as the second parameter?
2) Looking at #732992: Allow filter_xss_admin() to accept HTML5 tags, I am wondering if the permissible html tags acceptable as the second variable should be constrained to the acceptable terms used in filter_xss()? If so, how are we going to deal with what is acceptable for HTML5 (being dealt with in that issue)?

Log in or register to post comments

Comment #98

pillarsdotnet CreditAttribution: pillarsdotnet commented 22 April 2011 at 03:29

Assigned:	sun	» pillarsdotnet
Category:	bug	» feature
Status:	Needs review	» Needs work

@Lars

I don't understand the significance of testing 'This is highlighted text.'.

Did you mean that the  tag should be supported?: It already is, and there is a test to prove it.
Did you mean that uppercase tags should be supported as well as lowercase tags?: The tag matching is already case-insensitive, but I suppose we should add a test to prove it.

Log in or register to post comments

Comment #99

pillarsdotnet CreditAttribution: pillarsdotnet commented 22 April 2011 at 04:19

Assigned:	pillarsdotnet	» Unassigned
Status:	Needs work	» Needs review

File	Size
drupal_html_to_text-tests+fix-299138-99.patch	14.9 KB

drupal_html_to_text-tests-299138-99.patch	6.99 KB

Okay, I've added all the tests that Lars asked for, except for the utf-8 ones. I really don't know what to do about those and would welcome any suggestions.

@further thoughts/questions:

1) What happens if 'html' is passed in as the second parameter?

Nothing. I added a test to prove it. Unsupported tags are stripped and ignored.

2) Looking at #732992: Change filter_xss_admin() to take a variable by default instead of a hardcoded list., I am wondering if the permissible html tags acceptable as the second variable should be constrained to the acceptable terms used in filter_xss()? If so, how are we going to deal with what is acceptable for HTML5 (being dealt with in that issue)?

Same as the previous. Passing unsupported tags into the second parameter will not affect the output, unless someone rewrites the drupal_html_to_text function.

If I get *real* ambitious, I might rewrite the entire function to make it easier to add new tags in the future, but right now I'm just looking for short-term results.

Log in or register to post comments

Comment #100

Lars Toomre CreditAttribution: Lars Toomre commented 22 April 2011 at 05:00

Sorry if my point was not clear enough... I was trying to highlight the case as in capitalized. I know that well formatted XHTML wants all tags in all lower-case, but users have the funny tendency to want to use both and . I thought a test to prove that we could accept either was appropriate.

OK... I have now read through all of the function changes and they make sense to me. I have not installed this patch though so I cannot say that it installs or runs as expected.

The added comments in the code look good, with full sentences ending in periods. I did not open up the patch in a text editor so I am unable to confirm that the comments do not extend beyond line 80. I also was unable to check for extra white spaces at the end of lines. There is one phrase 'see #345931: Support value attribute in ordered lists.' which I am unsure about in a comment. I am pretty sure we are not suppose to reference d.o. issues in comments, except in special cases. Is this the format that a special case should be in?

From reading the code and the tests, this patch appears to solve the problem and passes appropriate tests. I am not prepared to RTBC this however since I cannot confirm that the patch actually applies. Other steps to complete the review of this patch include:

1) A couple of people to also review this patch and make sure that it applies appropriately.
2) Add a couple of utf-8 strings with HTML tags embedded therein to make sure that the regex expressions do not degrade for instance an o with omlats to an ascii o.

Perhaps someone who does not speak English as a primary language could make some suggestions for item #2??

Log in or register to post comments

Comment #101

pillarsdotnet CreditAttribution: pillarsdotnet commented 22 April 2011 at 05:47

File	Size
drupal_html_to_text-tests+fix-299138-101.patch	16.93 KB

drupal_html_to_text-tests_only-299138-101.patch	7.51 KB

I did not open up the patch in a text editor so I am unable to confirm that the comments do not extend beyond line 80.

The comments don't. Some of the code does, a little.

I was trying to highlight the  case as in capitalized

See the very last test. It compares drupal_strtoupper(drupal_html_to_text($input)) with drupal_html_to_text(drupal_strtoupper($input)) where $input is a fairly long mix of tags and text.

There is one phrase 'see #345931: Support value attribute in ordered lists.' which I am unsure about in a comment.

You are correct. I have removed the references.

Now running coder review... done. Fixed problems identified by coder.

Log in or register to post comments

Comment #102

pillarsdotnet CreditAttribution: pillarsdotnet commented 22 April 2011 at 06:13

Issue tags:

+i18n

Responding to #97 and also to sun's comment in #86:

I think that we also want a test that takes a mocked up web page and demonstrate that this function handles that "complex case" as well.

First of all, this function is not supposed to do as good a job as a dedicated text-mode browser such as Lynx, links, or w3m. If you want a general data extractor, write a module. This is for sending plain-text email messages. That's why it's located in includes/mail.inc instead of modules/filter.

Secondly, producing such a test as you describe basically means taking a big chunk of html, running it through the function, and copying the output to your $expected variable. What does that prove? Only that any future improvements to the function must also be accompanied by a change this huge test. And very likely, the future coders will likewise run the html through the function and retroactively "expect" whatever comes out the other end. I think this is a poor testing methodology. That's what I meant when I said in #84 above, "I think it's going about things the wrong way."

Thirdly, I *have* run web-pages through the original function (as it existed in d6 or in #1130198) and I can testify that the resultant output is acceptably readable. I have about twenty people on my Simplenews mailing list who must receive text-only messages, and they're doing just fine.

Log in or register to post comments

Comment #103

Lars Toomre CreditAttribution: Lars Toomre commented 22 April 2011 at 06:36

OK... Glad to know that you had run a larger more complex text string (webpage) through this function. Skip the thought since you already demonstrated that 'html' tag cannot be 'slipped in' as an exception tag. My only other concern was what would happen if it encountered 'mal formed' HTMl code.

Glad that you ran the code through coder and corrected anything that came from that. Now if we can get the two items I raised previously in #100, we should be good to go!!

P.S. - I have code in contrib module that mimics this function and another that just eliminates one HTML tag. I am curious to test out this function as a possible replacement/enhancement there.

Log in or register to post comments

Comment #104

pillarsdotnet CreditAttribution: pillarsdotnet commented 22 April 2011 at 07:13

My only other concern was what would happen if it encountered 'mal formed' HTMl code.

I already added the "malformed HTML" tests that you suggested. If you think those tests are insufficient, please suggest additional ones.

From the patch:

      // Tests malformed HTML tags.
      '<br>Drupal<br>Drupal' => "Drupal\nDrupal",
      '<hr>Drupal<hr>Drupal' => str_repeat('-', 78) . "\nDrupal\n" . str_repeat('-', 78) . "\nDrupal",
      '<ol><li>Drupal<li>Drupal</ol>' => " 1) Drupal\n 2) Drupal\n",
      '<ul><li>Drupal <em>Drupal</em> Drupal</ul></ul>' => " * Drupal /Drupal/ Drupal\n",
      '<ul><li>Drupal<li>Drupal</ol>' => " * Drupal\n * Drupal\n",
      '<ul><li>Drupal<li>Drupal</ul>' => " * Drupal\n * Drupal\n",
      '<ul>Drupal</ul>' => "Drupal\n",
      'Drupal</ul></ol></dl>' => "Drupal",
      // Tests some unsupported HTML tags.
      '<html>Drupal</html>' => "Drupal",
      '<script type="text/javascript">Drupal</script>' => "Drupal",

Log in or register to post comments

Comment #105

pillarsdotnet CreditAttribution: pillarsdotnet commented 22 April 2011 at 07:27

Issue tags:

-i18n

File	Size
drupal_html_to_text-tests+fix-299138-105.patch	16.95 KB

Found a bug and fixed it. Dunno how to test for it, but calling drupal_html_to_text('<li>Drupal</li>') would produce a PHP warning.

Log in or register to post comments

Comment #106

pillarsdotnet CreditAttribution: pillarsdotnet commented 22 April 2011 at 07:36

File	Size
drupal_html_to_text-tests+fix-299138-106.patch	16.98 KB

drupal_html_to_text-tests_only-299138-106.patch	7.53 KB

Figured it out.

Log in or register to post comments

Comment #107

22 April 2011 at 08:52

Status:

Needs review

» Needs work

The last submitted patch, drupal_html_to_text-tests+fix-299138-106.patch, failed testing.

Log in or register to post comments

Comment #108

pillarsdotnet CreditAttribution: pillarsdotnet commented 22 April 2011 at 10:00

File	Size
drupal_html_to_text-tests+fix-299138-107.patch	16.97 KB

drupal_html_to_text-tests_only-299138-107.patch	7.53 KB

Must have brushed against the middle mouse button while editing.

Log in or register to post comments

Comment #109

pillarsdotnet CreditAttribution: pillarsdotnet commented 22 April 2011 at 10:01

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #110

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 13:58

Title:	Fix the broken formatting in drupal_html_to_text() and also provide tests	» drupal_html_to_text() formatting is broken and does not have tests
Category:	bug	» feature
Issue tags:	-Needs backport to D6	+Needs tests

Log in or register to post comments

Comment #111

kscheirer

English

Vallejo

CreditAttribution: kscheirer commented 22 April 2011 at 20:28

don't give up! You're trying to get drupal_html_to_text-tests+fix-299138-107.patch in right? I'll take another look as soon as I can, and thanks for picking this up.

Log in or register to post comments

Comment #112

pillarsdotnet CreditAttribution: pillarsdotnet commented 23 April 2011 at 06:57

Issue tags:

-Needs tests

+Needs backport to D6

Included a patched copy of drupal_html_to_text() in the Mail System 6.x-2.3 release to resolve the bug reported here:

#1135978: Mail System format() method uses drupal_html_to_text() which strips linefeeds from input.

It appears that this bugfix is also applicable to D6. Tagging accordingly.

Also removing the "Needs tests" tag because I honestly believe the tests in this patch are sufficient. If anyone disagrees, please do so constructively. That is, don't just complain -- provide a suggestion for the test you think is missing.

Log in or register to post comments

Comment #113

pillarsdotnet CreditAttribution: pillarsdotnet commented 23 April 2011 at 06:59

Title:

drupal_html_to_text() formatting is broken and does not have tests

» Fix the broken formatting in drupal_html_to_text() and also provide tests

More constructive issue title.

Log in or register to post comments

Comment #114

Heine CreditAttribution: Heine commented 23 April 2011 at 07:32

D6 behaviour is by design and correct. Only pass real HTML to the function. Additional whitespace will be collapsed, see #298708-12: drupal_html_to_text() removes line endings.

Log in or register to post comments

Comment #115

pillarsdotnet CreditAttribution: pillarsdotnet commented 23 April 2011 at 08:02

@Heine -- Are you saying that the DefaultMailSystem::format() method is broken because it blindly applies drupal_html_to_text() to all input?

If so, have you submitted a bug report?

Or perhaps you are claiming that the d6 and d7 versions of drupal_html_to_text() are somehow completely different despite sharing identical doxygen headers?

If so, have you submitted a bug report?

Log in or register to post comments

Comment #116

Heine CreditAttribution: Heine commented 23 April 2011 at 08:27

I was under the impression that this is the bug report dealing with drupal_html_to_text(). The following is what I understand is wrong about the current D7 drupal_html_to_text implementation.

Suppose you pass the following HTML through drupal_html_to_text.

<p>foo</p><p>bar</p><p>baz</p>

Expected output ( aka the 'acceptable plaintext representation') should keep the paragraph divisons of the text with:

foo

bar

baz

Output in D6 is according to expectation, in D7 we get:

foobarbaz

Log in or register to post comments

Comment #117

pillarsdotnet CreditAttribution: pillarsdotnet commented 23 April 2011 at 16:35

The only functions that call drupal_html_to-text() are:

None of the above functions check whether the argument to drupal_html_to_text() is HTML or plaintext.

Heine asserts that drupal_html_to_text() should collapse whitespace, by design. Yet this would break the functionality of the only cases where the function is used.

So I ask Heine, for the record, which of the following is true?

Your assertion was in error, and we should expect drupal_html_to_text() to preserve plaintext formatting.
All the documented use cases are in error, and drupal_html_to_text() should be deleted from drupal core.

Log in or register to post comments

Comment #118

Damien Tournoud CreditAttribution: Damien Tournoud commented 23 April 2011 at 16:58

Category:

feature

» bug

-  $text = substr($text, 0, -1);
+  $text = preg_replace('/\n$/s', '', $text);

Shouldn't that be MAIL_LINE_ENDINGS too?

Other then that, this looks RTBC to me. Also, this is purely a bug report. drupal_html_to_text() was accidentally broken on D7, and this patch fixes it.

@Heine: I think that pillarsdotnet was talking about backporting some additional bug fixes for drupal_html_to_text() that are also in that patch. The behavior around line breaks between the tags should obviously be the same on D6 and D7.

@pillarsdotnet: Heine is not saying that D7 behavior is correct.

Log in or register to post comments

Comment #119

pillarsdotnet CreditAttribution: pillarsdotnet commented 23 April 2011 at 19:30

File	Size
drupal_html_to_text-tests+fix-299138-119.patch	19.41 KB

drupal_html_to_text-test_only-299138-119.patch	7.99 KB

@Damien -- Corrected.

Also corrected doxygen to reflect the current list of supported tags.

Added a test confirming that internal (not leading or trailing) whitespace is preserved; this revealed a bug which I subsequently fixed.

Double-checked that everything passes coder review.

Hopefully this is ready to commit.

Log in or register to post comments

Comment #120

pillarsdotnet CreditAttribution: pillarsdotnet commented 23 April 2011 at 19:51

File	Size
drupal_html_to_text-tests+fix-299138-120.patch	19.4 KB

drupal_html_to_text-test_only-299138-120.patch	7.93 KB

Couldn't resist one more tweak -- it bothered me that some of the "expected" results had trailing whitespace. Fixed.

Log in or register to post comments

Comment #121

pillarsdotnet CreditAttribution: pillarsdotnet commented 23 April 2011 at 19:58

File	Size
drupal_html_to_text-tests+fix-299138-121.patch	19.4 KB

Even better...

Log in or register to post comments

Comment #122

Heine CreditAttribution: Heine commented 23 April 2011 at 20:10

Why should drupal_html_to_text operate on non-HTML?

Now it incorrectly converts HTML suchs as

<p>Foo

bar

baz
</p>

Which renders as

Foo bar baz

The acceptable plaintext representation should match this rendition and display

Foo bar baz

Not what this patch displays.

Shoehorning two incompatible formats into a conversion function indicates that the callers are in the wrong.

Log in or register to post comments

Comment #123

Heine CreditAttribution: Heine commented 23 April 2011 at 20:16

Well, that wasn't exactly fair; the incorrect whitespace treatment is not related to the patch.
I'll open a new issue.

Log in or register to post comments

Comment #124

pillarsdotnet CreditAttribution: pillarsdotnet commented 23 April 2011 at 20:48

@Heine

See the comment starting at line 497:

      // Convert inline HTML text to plain text; not removing line-breaks or
      // white-space, since that breaks newlines when sanitizing plain-text.

I find it interesting that the relevant commit is the same one which provoked this issue in the first place. Obviously there was more intended than the "provide a sequence API" comment would suggest.

I restate my conclusion in #117: either whitespace should be preserved, as required by the above comment and documented use-cases, or the function should be removed entirely. As I also said in #102, this function is not intended as a general-purpose text-mode browser. It is a convenience function to ensure that text emails remain readable, regardless of the input format.

Perhaps the simplest way to satisfy you would be to rename this function something else, like "drupal_sanitize_plaintext", or "drupal_html_or_text_to_plaintext".

Log in or register to post comments

Comment #125

Heine CreditAttribution: Heine commented 23 April 2011 at 21:13

Not really, sending mail is fundamentally broken; the API is misdesigned.

Plaintext should be passed through nl2br or perhaps the linebreak filter before being passed to drupal_html_to_text.

Log in or register to post comments

Comment #126

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 15:04

Title:	Improve drupal_html_to_text()	» Fix the broken formatting in drupal_html_to_text() and also provide tests
Category:	feature	» bug

Log in or register to post comments

Comment #127

Heine CreditAttribution: Heine commented 23 April 2011 at 21:28

If you can't bear reading reviews / musings aimed at the betterment of Drupal, stop posting patches, pillarsdotnet.

/me out

Log in or register to post comments

Comment #128

Damien Tournoud CreditAttribution: Damien Tournoud commented 23 April 2011 at 21:33

Status:

Needs review

» Needs work

Not really, sending mail is fundamentally broken; the API is misdesigned.

Nope, not at all. Mails are handled 100% in HTML in Drupal 7, and are only converted into plaintext (via drupal_html_to_text()) at the very end of the chain (when sending out the mail), because core by default only support sending email serialized as plain-text. You can see a summary of the design on #407452-35: Remove use of drupal_html_to_text() from system mails.

Converting plain-text to HTML before calling drupal_mail() is the responsibility of the caller.

You are completely right in saying that drupal_html_to_text() should collapse whitespace. pillarsdotnet is wrong in expecting that plain-text input should be handled.

Needs work again: we need to fix drupal_html_to_text() so that it properly collapse whitespace. The initial patch from #407452: Remove use of drupal_html_to_text() from system mails was clearly misguided in that particular part.

Log in or register to post comments

Comment #129

Damien Tournoud CreditAttribution: Damien Tournoud commented 23 April 2011 at 21:38

Also see http://drupal.org/update/modules/6/7#email-html for cristal clear statements that every mail is HTML and that we don't handle plain-text.

Log in or register to post comments

Comment #130

pillarsdotnet CreditAttribution: pillarsdotnet commented 23 April 2011 at 22:24

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-tests+fix-299138-130.patch	19.76 KB

drupal_html_to_text-test_only-299138-130.patch	7.9 KB

I give up; you win. Re-rolled as requested. Also removed the comment.

If and when this ever trickles down to D6 it will either need to preserve whitespace or be accompanied by a patch to every place in core that calls drupal_mail().

(looking...)

Hmm... Not such a big list as I feared... Will probably tackle this myself when the time comes.

Log in or register to post comments

Comment #131

pillarsdotnet CreditAttribution: pillarsdotnet commented 24 April 2011 at 02:54

Does anybody care about this? Fixing it would be ... complicated, and I don't have a use-case, so I'm tempted to leave it broken-as-designed:

drupal_html_to_text("<pre>Drupal\n     Drupal\nDrupal</pre>") == "Drupal Drupal Drupal"

The old code doesn't support the <pre> tag at all. I only added it as a synonym for  because it's listed among the block-level tags and I wanted a test that every block-level tag forces a line-break.

Log in or register to post comments

Comment #132

pillarsdotnet CreditAttribution: pillarsdotnet commented 24 April 2011 at 04:43

Found another bug:

drupal_html_to_text('Drupal<br /><br />Drupal') == "Drupal\nDrupal"

Trying to fix it, but I'm having problems.

Log in or register to post comments

Comment #133

pillarsdotnet CreditAttribution: pillarsdotnet commented 24 April 2011 at 17:31

File	Size
drupal_html_to_text-tests+fix-299138-133.patch	22.28 KB

drupal_html_to_text-test_only-299138-133.patch	8.32 KB

Finally got it fixed. <pre> tags are now fully supported and tested, as are multiple consecutive   tags.

The only gotcha is that spaces within <pre> tags turn into unicode non-breaking spaces.

There's probably a cleaner way to do this, but for now it passes all tests so I'm gonna go to bed and await further review.

Log in or register to post comments

Comment #134

pillarsdotnet CreditAttribution: pillarsdotnet commented 24 April 2011 at 06:03

File	Size
drupal_html_to_text-tests+fix-299138-134.patch	21.62 KB

drupal_html_to_text-test_only-299138-134.patch	8.32 KB

Fixed the gotcha and simplified the code.

Log in or register to post comments

Comment #135

pillarsdotnet CreditAttribution: pillarsdotnet commented 25 April 2011 at 07:40

File	Size
drupal_html_to_text-tests+fix-299138-135.patch	22.17 KB

drupal_html_to_text-test_only-299138-135.patch	8.35 KB

Nested <pre> tags are now tested and properly handled.

Log in or register to post comments

Comment #136

pillarsdotnet CreditAttribution: pillarsdotnet commented 29 April 2011 at 00:29

Issue tags:

-Needs backport to D6, -Needs backport to D7

#135: drupal_html_to_text-tests+fix-299138-135.patch queued for re-testing.

Log in or register to post comments

Comment #137

pillarsdotnet CreditAttribution: pillarsdotnet commented 14 May 2011 at 00:05

#135: drupal_html_to_text-tests+fix-299138-135.patch queued for re-testing.

Log in or register to post comments

Comment #138

14 May 2011 at 00:22

Status:

Needs review

» Needs work

The last submitted patch, drupal_html_to_text-tests+fix-299138-135.patch, failed testing.

Log in or register to post comments

Comment #139

pillarsdotnet CreditAttribution: pillarsdotnet commented 14 May 2011 at 00:30

Status:	Needs work	» Needs review
Issue tags:		+Needs backport to D6, +Needs backport to D7

#135: drupal_html_to_text-tests+fix-299138-135.patch queued for re-testing.

Log in or register to post comments

Comment #140

stephandale CreditAttribution: stephandale commented 17 May 2011 at 15:45

File	Size
drupal_html_to_text-tests+fix-299138-136.patch	26.05 KB

I've made some modifications so that consecutive block level elements are separated by a blank line (i.e. two new line characters rather than one) and nested lists are handled better. As part of these changes block level elements output a new line before and afterwards.

The most significant changes are a) the tags now output chunks as they need this control when nested, b) the indent is applied by the following value rather than by the current tag and c) $item is used as the current element of $split so $value can store values only (as they may be needed by closing tags when determining their chunk).

More tests have been added, to test these modifications and test short tags that don't contain a space (which is the more correct use - the space was only recommended as a guideline to support old user agents - see http://www.w3.org/TR/xhtml1/#guidelines)

I also replaced the 3 instances of \r\n in drupal_html_to_text() with MAIL_LINE_ENDINGS as I assumed that they were forgotten earlier. Hopefully there wasn't a reason they hadn't been changed!

I wasn't sure what to name the patch so I kept the same format as the last one and incremented the last integer in its name. Hope this is ok.

P.S. I couldn't find the Coder module for 8.x. Is there any way to automatically test coding guidelines on 8?

Log in or register to post comments

Comment #141

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 May 2011 at 01:32

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-141.patch	29.18 KB

New approach using a DOM tree instead of a custom parser. Looks cleaner to me and passes all tests locally.

Log in or register to post comments

Comment #142

18 May 2011 at 02:12

Status:

Needs review

» Needs work

The last submitted patch, drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-141.patch, failed testing.

Log in or register to post comments

Comment #143

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 May 2011 at 19:23

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-143.patch	30.89 KB

Okay; let's try that again...

Log in or register to post comments

Comment #144

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 May 2011 at 19:23

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #145

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 May 2011 at 20:10

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-145.patch	30.9 KB

Coder griped about using substr instead of drupal_substr and found an indentation error.

Corrected patch.

Log in or register to post comments

Comment #146

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 May 2011 at 20:25

@stephandale --

For patch naming conventions, see Advanced patch contributer guide.
For d8 coder, see #1161796: Initial d8 port of Coder and Coder Review modules.

Log in or register to post comments

Comment #147

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 May 2011 at 20:37

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-147.patch	31.06 KB

Found and fixed a subtle error in character counting.

Log in or register to post comments

Comment #148

pillarsdotnet CreditAttribution: pillarsdotnet commented 18 May 2011 at 21:47

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-148.patch	31.07 KB

Another coder gripe.

Note that the net difference in size between #140 and this patch is to add 31 comment lines and to take away 15 code lines.

Log in or register to post comments

Comment #149

pillarsdotnet CreditAttribution: pillarsdotnet commented 26 May 2011 at 17:45

Code in #148 has been added to Mail System module:

And is being considered for incorporation into the Simplenews module:

#1099188-10: simplenews_html_to_text() strips off newlines

Log in or register to post comments

Comment #150

salvis

he/his

CreditAttribution: salvis commented 29 May 2011 at 20:13

Paste this

$html = "<p>line 1<br />\nline 2<br />line 3\n<br />line 4</p><p>paragraph</p>";
dpm($html);
return drupal_html_to_text($html);

into Devel's Execute PHP textarea (devel/php) and you'll get

line 1 
line 2 line 3
 line 4paragraph

line 1
line 2
line 3
line 4

paragraph

I think line 2 should not be indented.

Log in or register to post comments

Comment #151

salvis

he/his

CreditAttribution: salvis commented 29 May 2011 at 20:44

Status:

Needs review

» Needs work

Here's another test case — I've put the following into a node body:

line 1
line 2

line 4
line 5

0X

where the X is a space, not followed by any newline.

This shows up just fine as

line 1
line 2

line 4
line 5

0

and it results in the markup in the following snippet:

$html = "<p>line 1<br /> line 2</p> <p>line 4<br /> line 5</p> <p>0</p>";
dpm($html);
return drupal_html_to_text($html);

Running the snippet in Devel's Execute PHP function gives:

line 1 line 2 line 4 line 5 0

line 1
line 2

line 4
line 5

Note the indenting again, and especially the missing 0!

Log in or register to post comments

Comment #152

pillarsdotnet CreditAttribution: pillarsdotnet commented 30 May 2011 at 03:56

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-152.patch	33.27 KB

Fixed, and tests added; thanks.

Log in or register to post comments

Comment #153

pillarsdotnet CreditAttribution: pillarsdotnet commented 30 May 2011 at 04:07

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-153.patch	32.86 KB

Turns out I didn't need that block-tag function after all. No functional change from previous patch; just deletion of cruft.

Log in or register to post comments

Comment #154

pillarsdotnet CreditAttribution: pillarsdotnet commented 30 May 2011 at 09:51

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-154.patch	32.47 KB

Slight refactoring; no functional changes.

Log in or register to post comments

Comment #155

pillarsdotnet CreditAttribution: pillarsdotnet commented 30 May 2011 at 10:07

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-155.patch	32.63 KB

Fixed a couple of gripes identified by coder. Again, no functional changes.

Log in or register to post comments

Comment #156

pillarsdotnet CreditAttribution: pillarsdotnet commented 30 May 2011 at 10:10

Assigned:

Unassigned

» pillarsdotnet

Log in or register to post comments

Comment #157

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 18:22

Issue tags:

-Needs committer feedback

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-157.patch	32.61 KB

Minor nitpicks; no code changes.

Applied to Mail System: 6.x-2.17; 7.x-2.17; 8.x-2.17

Proposed patch for:

Simplenews 7.x-1.x

Log in or register to post comments

Comment #158

pillarsdotnet CreditAttribution: pillarsdotnet commented 31 May 2011 at 20:07

Status:

Needs review

» Reviewed & tested by the community

Note that the code in this issue has been incorporated into Mail System since 6.x-2.6/7.x-2.6/8.x-2.6. According to the usage statistics page, that's 1,348 users. Does this qualify as "Reviewed and Tested by the Community" ?

Log in or register to post comments

Comment #159

pillarsdotnet CreditAttribution: pillarsdotnet commented 31 May 2011 at 20:08

Log in or register to post comments

Comment #160

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 31 May 2011 at 20:58

Status:

Reviewed & tested by the community

» Needs review

No. You never mark your own patch RTBC. All patches require peer review.

Log in or register to post comments

Comment #161

pillarsdotnet CreditAttribution: pillarsdotnet commented 1 June 2011 at 02:40

Issue tags:

+Needs committer feedback

Well, it's had plenty of review along the lines of "Looks good but..." and all the noted shortcomings have been fixed and tests added to ensure that they stay fixed.

Marking "Needs committer feedback" in hopes that someone will be willing to stick their neck out.

Log in or register to post comments

Comment #162

salvis

he/his

CreditAttribution: salvis commented 1 June 2011 at 07:30

Status:

Needs review

» Needs work

here's a <a href='/node/123'>node</a> link
here's a node [1] link
[1] /node/123

The link must be fully qualified, like http://example.com/node/123

Also, there should be an empty line between the text and the link list.

Log in or register to post comments

Comment #163

salvis

he/his

CreditAttribution: salvis commented 1 June 2011 at 09:05

I'm going through all the tags. You've done an amazing job with this. Here are a few more comments:

I like your <hX> implementations, but I'd suggest adding 3/2/1 additional newlines above the H1/H2/H3 heading to separate them from the text above.
a<hr />b
currently produces two empty lines above and one below the separator line; there should be just one empty line above and below.
<pre> lines must NOT be line-wrapped.
<ins> and <del> are inline elements, not block elements. Currently you're treating the tags as line breaks, which is not correct. They are typically rendered as underline and strikeout. The former is usually _underline_. I'm not aware of any standard for strikeout, but I'd suggest -strikeout-.
and <strike> are missing.
For ordered lists, more than 99 items are probably rare, but more than 9 are not unreasonable. Could you indent #1 though #9 by an additional space so that they'll line up with #10?
This is debatable, but I'd suggest inserting 2 spaces between table cells, and not wrapping long table rows. This would preserve a little bit of structure.

What is your strategy regarding soft line breaks (http://www.ietf.org/rfc/rfc2646.txt), space stuffing, etc.?

The difficulty in this issue is that it's so complex, and once a patch is committed, it'll be almost impossible to get new issues fixed that might pop up in the future. Putting this into production is certainly a good way to build confidence, but the feedback generated that way is very slow.

I'll incorporate your code into Subscriptions to generate some more feedback (hopefully), but the switch to HTML will take some major surgery before I can actually do this (right now I'm using the D6 mail.inc).

Log in or register to post comments

Comment #164

pillarsdotnet CreditAttribution: pillarsdotnet commented 19 June 2011 at 12:18

Title:	Improve drupal_html_to_text()	» Fix the broken formatting in drupal_html_to_text() and also provide tests
Assigned:	Unassigned	» pillarsdotnet
Category:	feature	» bug
Status:	Needs work	» Needs review
Issue tags:		+Needs committer feedback

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-164.patch	32.32 KB

Re #163:

I like your <hX> implementations, but I'd suggest adding 3/2/1 additional newlines above the H1/H2/H3 heading to separate them from the text above.

I don't like them, and they aren't mine. I suggest emulating Markdown.
a<hr />b currently produces two empty lines above and one below the separator line; there should be just one empty line above and below.

Two out of three text-mode browsers disagree with you. Both w3m and links render two empty lines above and below; only the older and arguably less-sophisticated lynx renders one empty line above and below. None of them, by the way, render headers as you suggest.
<pre> lines must NOT be line-wrapped.

RFC 3676 disagrees.
<ins> and <del> are inline elements, not block elements.

The w3 org says:

These two elements are unusual for HTML in that they may serve as either block-level or inline elements (but not both). They may contain one or more words within a paragraph or contain one or more block-level elements such as paragraphs, lists and tables.

The HTML-5 draft says you have to check all the child tags to decide whether to render them as phrasing content or flow content.
The Web Design Group says that they are inline if e.g., within another inline element or a P.
On reflection, I'm dropping support for both, as neither one has any accurate plain-text representation.
 and <strike> are missing.

And they were missing before, too. This is a bug report, not a feature request. See my comment in #102 above.
For ordered lists, more than 99 items are probably rare, but more than 9 are not unreasonable. Could you indent #1 though #9 by an additional space so that they'll line up with #10?

Since you ask so nicely, and I happen to agree, done. (2 files changed, 8 insertions, 8 deletions)
I'd suggest inserting 2 spaces between table cells, and not wrapping long table rows.

Wrapping is mandatory, according to RFC 3676, and will often be performed by the email-reading software even if we don't.

I considered coding full support for tables, but I thought about the colspan and rowspan attributes, and decided that it wasn't worth the trouble.

Log in or register to post comments

Comment #165

stephandale CreditAttribution: stephandale commented 1 June 2011 at 15:04

Status:

Needs review

» Needs work

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-164.patch	32.09 KB

Edit 01/06/2011 16:03: Patch in comment#167 does the trick.
Edit 01/06/2011 15:49: Bad timing - pillarsdotnet posted at same time. I'll make the mods to his latest patch and repost.

I've found that the use of the PCRE_MULTILINE modifier m in preg_replace() resulted in different behaviour on different servers, possibly due to the difference in the PCRE version though I can't be sure at this stage.

Server 1 (tests pass):

php > $s = "one\ntwo\n";
php > echo preg_replace('/^/m','FOO',$s);
FOOone
FOOtwo

Server 2 (tests fail):

php > $s = "one\ntwo\n";
php > echo preg_replace('/^/m','FOO',$s);
FOOone
FOOtwo
FOO

Note how on server 2 (which has an older PCRE library), the replacement is also made on the trailing new line.

I've added a three line workaround to the drupal_wrap_mail() function, so tests are now green on both servers. The change was made to the last three lines before the return statement and simply removes the $clean_indent if it's present at the end of the text.

  // Apply indentation. We only include non-'>' indentation on the first line.
  $indented_text = preg_replace('/^/m', $clean_indent, $text);
  $indented_text = preg_replace('/' . $clean_indent . '$/', '', $indented_text); // Workaround for differences in multiline behaviour.
  $text = $indent . drupal_substr($indented_text, drupal_strlen($indent));
  return $text;

I updated to the latest Drupal 8.x, applied the patch from comment#157 (http://drupal.org/node/299138#comment-4535020) and made my modifications. The attached patch should include those changes and my modifications. Note that it differs to the patch in comment#157 by more than the three lines I edited, which I can only assume is due to other changes in core since comment#157.

Log in or register to post comments

Comment #166

pillarsdotnet CreditAttribution: pillarsdotnet commented 1 June 2011 at 14:59

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-166.patch	32.78 KB

Re #162:

I agree; done.

Re #165:

Try this slightly simpler workaround and see if it passes on your server 2.

Log in or register to post comments

Comment #167

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 18:21

Assigned:	Unassigned	» pillarsdotnet
Status:	Needs work	» Needs review
Issue tags:		+Needs committer feedback

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-167.patch	32.78 KB

Applied to Mail System: 6.x-2.19; 7.x-2.19; 8.x-2.19

Log in or register to post comments

Comment #168

stephandale CreditAttribution: stephandale commented 1 June 2011 at 15:06

Status:

Needs work

» Needs review

Re #166: It does, yes. Thanks.

Note: I tried patch from #167.

Log in or register to post comments

Comment #169

salvis

he/his

CreditAttribution: salvis commented 2 June 2011 at 09:51

Re #164:

Two out of three text-mode browsers disagree with you. Both w3m and links render two empty lines above and below; only the older and arguably less-sophisticated lynx renders one empty line above and below.

<hr> is a separator. It separates two equal parts. There's no way to tell whether it should belong to the part above or below. Putting it in the middle is the right thing to do. Apparently, all three text-mode browsers agree with me that the space above and below should be equal. That's my point. I'd say one empty line is enough, but if you prefer two, that's perfectly fine with me, as long as the line is in the middle.

None of them, by the way, render headers as you suggest.

Let's take a step back. Headings are not 'footings' and not separators. They announce what comes after them, and as such they semantically belong to what follows. They have very little connection to what's above them. Thus the distance above should be larger than the distance below.

I don't know what the design goals and priorities of the text-mode browsers have been — probably to pack as much information on a 25-line CRT as possible, to minimize the required navigation (= transmission over a slow connection). That's completely irrelevant in a text file (or email message).

The goal here, IMHO, should be to render HTML as text in a way that helps the human reader to understand its structure and content. You're forced to drop a lot of information to convert to plain text, and any hint that you can reasonably provide will be appreciated by the human reader.

<pre> lines must NOT be line-wrapped.

RFC 3676 disagrees.

Huh? Our interpretations of RFC3676 seem to differ. Let me cite:

Text/Plain is usually displayed as preformatted text, often in a fixed font. That is, the characters start at the left margin of the display window, and advance to the right until a CRLF sequence is seen, at which point a new line is started, again at the left margin. When a line length exceeds the display window, some clients will wrap the line, while others invoke a horizontal scroll bar.

Text which meets this description is defined by this memo as "fixed".

RFC3676 goes on to say that there can be problems with this type of text, and that in general text should be flowed. BUT (my interpretation now) if the author explicitly says that he formatted the text exactly as he wants it to appear, then the renderer should comply, if technically possible.

The <pre> tag explicitly asks for the fixed format (preformatted text). Preformatted means that indents should be preserved, spaces should be preserved, line breaks should be preserved, etc. — essentially as much formatting should be preserved as possible. If you don't have a horizontal scrollbar (e.g. on a cell phone), then you obviously have to wrap the lines to avoid clipping them, but if you can avoid it, then you should preserve formatting.
Some email clients may still munge it, but others will do the right thing. If you munge it even before it leaves the site, then you ensure that it'll be equally incomprehensible for both camps. Is that the goal?

See also HTML5. It doesn't say this explicitly, but the intention of <pre> clearly is that line breaks should neither be added nor removed inside <pre>. That's what <pre> is all about.

See my comment in #102 above. --> This is for sending plain-text email messages.

Yes, and it's called drupal_html_to_text(). Thus, if I want to send a node body through email, then this function is the obvious choice. I thought the idea of this exercise was to make the D7 version better than the D6 one, was it not?

_Underlined_ is firmly established. I don't know whether it was supported in the D6 version or not, and it doesn't really matter. Call its absence a pre-existing bug if you want, but that doesn't justify carrying its absence over into your new implementation.

IMO, the goal should be to try to preserve as much information as possible. If a node author decides that underline or strikeout are needed to convey his meaning, then it would be pity to lose those markups when transmitting the node body through text-mode email, given that there is a reasonable way to express them.

Wrapping is mandatory, according to RFC 3676, and will often be performed by the email-reading software even if we don't.

Here's where we differ. Where in RFC 3676 does it say "MUST wrap"? In my reading it says "SHOULD wrap", which means there can be overriding considerations. The author's clearly expressed intention via <pre> or <table> are such an overriding consideration, IMO.

RFC 3676 even says explicitly that Hand-aligned text, such as ASCII tables or art, source code, etc., SHOULD be sent as fixed, not flowed lines.

Log in or register to post comments

Comment #170

salvis

he/his

CreditAttribution: salvis commented 2 June 2011 at 10:42

Status:

Needs review

» Needs work

The patch in #167 now produces

here's a <a href='/node/123'>node</a> link
here's a node [1] link

[1] http://example.com//node/123

Note the extraneous slash. Apparently, there's no test for this yet.

It seems like you have to remove the leading slash before you call url().

Log in or register to post comments

Comment #171

pillarsdotnet CreditAttribution: pillarsdotnet commented 2 June 2011 at 14:36

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-171.patch	34.51 KB

Point by point:

Headings ... have very little connection to what's above them. Thus the distance above should be larger than the distance below.

Please revise the following test code to meet your expectations:

  /**
   * Test that headers are properly separated from surrounding text.
   */
  function testHeaderSeparation() {
    // Text before and after.
    $html = 'Drupal<h1>Drupal</h1>Drupal';
    $text = "Drupal\n\n======== DRUPAL " . str_repeat('=', 61) . "\nDrupal";
    $this->_testHtmlToText($html, $text);
    // Paragraph before; text after.
    $html = '<p>Drupal</p><h1>Drupal</h1>Drupal';
    $text = "Drupal\n\n\n======== DRUPAL " . str_repeat('=', 61) . "\nDrupal";
    $this->_testHtmlToText($html, $text);
    // Text before; paragraph after.
    $html = 'Drupal<h1>Drupal</h1><p>Drupal</p>';
    $text = "Drupal\n\n======== DRUPAL " . str_repeat('=', 61) . "\n\nDrupal";
    $this->_testHtmlToText($html, $text);
    //  Paragraph before and after.
    $html = '<p>Drupal</p><h1>Drupal</h1><p>Drupal</p>';
    $text = "Drupal\n\n\n======== DRUPAL " . str_repeat('=', 61) . "\n\nDrupal";
    $this->_testHtmlToText($html, $text);
  }

Where in RFC 3676 does it say "MUST wrap"?

From RFC 3676:

The Text/Plain media type is the lowest common denominator of Internet email, with lines of no more than 998 characters

From RFC 2046:

NOTE: Some protocols defines a maximum line length. E.g. SMTP [RFC-821] allows a maximum of 998 octets before the next CRLF sequence. To be transported by such protocols, data which includes too long segments without CRLF sequences must be encoded with a suitable content-transfer-encoding.

From RFC 821:

text line

The maximum total length of a text line including the <CRLF> is 1000 characters (but not counting the leading dot duplicated for transparency).

So line wrapping MUST occur before 1000 characters (998 plus CRLF). Outside of <pre> tags, it SHOULD occur at a convenient width for text-mode readers, which historically has been 80 characters (78 plus CRLF). If there is a problem, it lies in the fact that we use the same code to meet both constraints.
I thought the idea of this exercise was to make the D7 version better than the D6 one, was it not?

The idea of this exercise is to "fix the broken formatting in drupal_html_to_text() and also provide tests". Please understand that "broken" means something entirely different from "sub-optimal". I now regret that I overzealously added support for tags that were not originally supported, as I may have unnecessarily delayed adoption of the more important fix. Also note the "Needs backport to D6" tag.
_Underlined_ is firmly established. ... Call its absence a pre-existing bug

Fine; added support for the  tag, even though its actual use is quite rare. Stylesheets would be the far more common use case, but supporting them is way out of scope.
It seems like you have to remove the leading slash before you call url().

Thanks. Fixed, and test added.

Log in or register to post comments

Comment #172

pillarsdotnet CreditAttribution: pillarsdotnet commented 2 June 2011 at 14:37

a<hr />b currently produces two empty lines above and one below the separator line; there should be just one empty line above and below.

I wasted four hours on this one. Try running the following program:

$dom = new DOMDocument;
$dom->loadHTML('<html><body><p>a</p><p><hr /></p><p>b</p></body></html>');
echo $dom->saveXML();

You will discover that <hr /> gets corrected to <hr/> because you can't nest <hr /> inside a  container.

Log in or register to post comments

Comment #173

sreynen CreditAttribution: sreynen commented 2 June 2011 at 14:49

Status:

Needs review

» Reviewed & tested by the community

Looking over this thread, the scope has expanded far beyond the initial bug report, which was addressed a long time ago.

The latest patch solves the original bug (lack of tests) without introducing any new bugs. If there are additional feature requests, or even other bugs that aren't yet fixed by this patch, I suggest we take them to separate issues and close this thread before it reaches 3 years old.

Log in or register to post comments

Comment #174

pillarsdotnet CreditAttribution: pillarsdotnet commented 2 June 2011 at 15:38

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-174.patch	34.52 KB

Re-ran #171 through coder review and found one strlen() that could be replaced by drupal_strlen().

Changing to avoid rejection by pedants, even though it makes absolutely no functional difference. Leaving status at RTBC.

Log in or register to post comments

Comment #175

Lars Toomre CreditAttribution: Lars Toomre commented 2 June 2011 at 16:09

I agree with @sreynen #173 that this patch fixes the original bug report and goes far beyond. Hence, I strongly urge that this patch be applied with the clear understanding that 'better' may not be 'perfect'. However, 'better' is also *much* better than the current state of core Drupal.

This patch adds the initial requested tests and passes tests also included for all of the extra code that was added to better handle the complex challenge of converting HTML text to its plain text equivalent. Edge cases, additional bugs and/or feature requests can be started in separate issues. It is time for this RTBC patch to be applied. It makes HTML-text to plain text conversion so much better!!

Log in or register to post comments

Comment #176

bdragon CreditAttribution: bdragon commented 2 June 2011 at 16:32

Skimmed through the patch and noticed a couple of minor nits in the doxygen docs:

* From somewhere whereabouts line 300ish in includes/mail.inc:

@@ -303,40 +305,47 @@ interface MailSystemInterface {
  * We use delsp=yes wrapping, but only break non-spaced languages when
  * absolutely necessary to avoid compatibility issues.
  *
- * We deliberately use LF rather than CRLF, see drupal_mail().
+ * We deliberately use variable_get('mail_line_endings), MAIL_LINE_ENDINGS)
+ * rather than "\r\n".

Typo, need a ' instead of ) after mail_line_endings.

* From the comment block above function _drupal_html_to_text():

+/**
+ * Helper function for drupal_html_to_text
+ *
+ * Recursively converts $node to text, wrapping and indenting as necessary.

Should have () after drupal_html_to_text there.

Log in or register to post comments

Comment #177

pillarsdotnet CreditAttribution: pillarsdotnet commented 2 June 2011 at 17:39

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-177.patch	34.52 KB

Updated.

Log in or register to post comments

Comment #178

sun

German

Karlsruhe

CreditAttribution: sun commented 2 June 2011 at 17:49

Status:

Reviewed & tested by the community

» Needs work

Needs work for @bdragon's review.

Please also make sure that all the RFC research is documented/summarized in inline comments (including RFC numbers). It would be silly to go through all of this research once again in the future.

Furthermore:

+++ b/includes/mail.inc
@@ -12,6 +12,8 @@
+define('NBSP', html_entity_decode('&nbsp;'));

Should be MAIL_NBSP. Although "NBSP" is not really descriptive. I don't understand why this is a constant, and why we're decoding   to get the character instead of directly specifying the character. This needs to be documented in the (missing) phpDoc.

+++ b/includes/mail.inc
@@ -303,40 +305,47 @@ interface MailSystemInterface {
+ * We deliberately use variable_get('mail_line_endings), MAIL_LINE_ENDINGS)
+ * rather than "\r\n".
...
+  // Convert LF or CRLF into platform-specific line-endings.

I don't understand why we're adding a variable for this. Especially since the actual function code talks about platform-specific line-endings -- so if something is platform-specific, why is it a variable? A short explanation for why it's a variable would be helpful.

+++ b/includes/mail.inc
@@ -303,40 +305,47 @@ interface MailSystemInterface {
+  $text = preg_replace('/\r?\n/', $eol, $text);
...
+    $text = preg_replace('/ +\r?\n/m', $eol, $text);

The first preg_replace() replaces \r\n with platform-specific or variable-defined line endings. The second preg_replace() still tries to match original line endings even though they got already replaced with $eol.

+++ b/includes/mail.inc
@@ -347,177 +356,328 @@ function drupal_wrap_mail($text, $indent = '') {
+ * @param $notes
+ *   The list of footnotes, an associative array of (url => reference number) items.

Exceeds 80 chars.

+++ b/includes/mail.inc
@@ -347,177 +356,328 @@ function drupal_wrap_mail($text, $indent = '') {
+function _drupal_html_to_text(DOMNode $node, array $allowed_tags, array &$notes, $parents = array(), &$count = NULL) {

$parents should use array type-hinting, too.

+++ b/includes/mail.inc
@@ -347,177 +356,328 @@ function drupal_wrap_mail($text, $indent = '') {
+  if (is_null($count)) {

Always use !isset() instead of is_null().

+++ b/includes/mail.inc
@@ -347,177 +356,328 @@ function drupal_wrap_mail($text, $indent = '') {
+        if ( !empty($node->attributes)
...
+          && valid_url($url) ) {

Stray leading and trailing space in if condition.

+++ b/includes/mail.inc
@@ -347,177 +356,328 @@ function drupal_wrap_mail($text, $indent = '') {
+      // Dictionary list.
+      case 'dl':
+        // Start on a newline except inside other lists.
+        if (!in_array('li', $parents)) {
+          $text = $eol;
+        }
+
+      // Dictionary term.
+      case 'dt':

Missing break; or missing inline comment that explains the fall-through to 'dt'.

+++ b/includes/mail.inc
@@ -347,177 +356,328 @@ function drupal_wrap_mail($text, $indent = '') {
+          if ( !empty($node->attributes)
+            && ($value = $node->attributes->getNamedItem('value'))) {
...
+        if ( !empty($node->attributes)
+          && ($value = $node->attributes->getNamedItem('start')) ) {

1) Stray leading and trailing space in if condition.

2) Extra parenthesis around $value can be removed.

3) Shouldn't wrap.

+++ b/includes/mail.inc
@@ -347,177 +356,328 @@ function drupal_wrap_mail($text, $indent = '') {
+      default:
+        break;

The empty default case needs an inline comment for why it's needed (if it is needed).

+++ b/modules/simpletest/tests/mail.test
@@ -43,7 +44,7 @@ class MailTestCase extends DrupalWebTestCase implements MailSystemInterface {
-   * @see DefaultMailSystem
+   * @see DefaultMailSystem()

Bogus change: DefaultMailSystem is a class, not a function.

+++ b/modules/simpletest/tests/mail.test
@@ -63,3 +64,285 @@ class MailTestCase extends DrupalWebTestCase implements MailSystemInterface {
+  function _string_to_html($text) {
+    return '"' .
+      str_replace(
+        array("\n", ' '),
+        array('\n', '&nbsp;'),
+        check_plain($text)
+      ) . '"';

1) Please remove the leading underscore and rename to stringToHtml().

2) Missing phpDoc.

+++ b/modules/simpletest/tests/mail.test
@@ -63,3 +64,285 @@ class MailTestCase extends DrupalWebTestCase implements MailSystemInterface {
+  function _testHtmlToText($html, $text, $args = NULL, $reason = '') {
+    $result = drupal_html_to_text($html, $args);
+    $this->assertEqual(
+      $result,
+      $text,
+      (drupal_strlen($reason) ? $reason . '<br />' : '')
+      . 'html = ' . $this->_string_to_html($html) . '<br />'
+      . 'result = ' . $this->_string_to_html($result) . '<br />'
+      . 'expected = ' . $this->_string_to_html($text)
+    );
+  }

1) Rename to assertHtmlToText().

2) Rename $args to $allowed_tags.

3) $reason should be $message.

4) Use a separate $this->verbose() after assertEqual() to output the debugging info as a page instead of squeezing it into the assertion message.

5) Missing phpDoc.

+++ b/modules/simpletest/tests/mail.test
@@ -63,3 +64,285 @@ class MailTestCase extends DrupalWebTestCase implements MailSystemInterface {
+      'Allowed &lt;b&gt; tag found.'
...
+      'Disallowed &lt;h1&gt; tag not found.'
...
+      'Disallowed &lt;p&gt;, &lt;em&gt;, and &lt;b&gt; tags not found.'
...
+      'Unsupported &lt;html&gt; and &lt;body&gt; tags not found.'

Might make sense to unconditionally run check_plain() on assertHtmlToText()'s $message, so we don't need those <...>.

+++ b/modules/simpletest/tests/mail.test
@@ -63,3 +64,285 @@ class MailTestCase extends DrupalWebTestCase implements MailSystemInterface {
+    $tests = array(
+      array(
+        'html' => "<p>line 1<br />\nline 2<br />line 3\n<br />line 4</p><p>paragraph</p>",

Let's make these a bit cleaner:

$tests = array();
$tests[] = array(
  'html' => '...',
  'text' => '...',
);

Powered by Dreditor.

Log in or register to post comments

Comment #179

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 2 June 2011 at 17:50

Status:

Needs work

» Needs review

Guys, if this gets marked RTBC one more time before it's ready, it's going to the very bottom of my stack.

Log in or register to post comments

Comment #180

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 2 June 2011 at 17:52

Status:	Needs review	» Needs work
Issue tags:	-Needs committer feedback

Log in or register to post comments

Comment #181

pillarsdotnet CreditAttribution: pillarsdotnet commented 2 June 2011 at 20:24

Needs work for @bdragon's review.

Already fixed.
Please also make sure that all the RFC research is documented/summarized in inline comments (including RFC numbers). It would be silly to go through all of this research once again in the future.

If the objective is absolute RFC-compliant pedantry, I'm unqualified for the job. Someone else will have to do it.
Should be MAIL_NBSP. Although "NBSP" is not really descriptive. I don't understand why this is a constant, and why we're decoding   to get the character instead of directly specifying the character. This needs to be documented in the (missing) phpDoc.

Changed globally to chr(160), which is what it evaluates to in my copy of PHP. Added inline comments to explain the magic number.
I don't understand why we're adding a variable for this. Especially since the actual function code talks about platform-specific line-endings -- so if something is platform-specific, why is it a variable? A short explanation for why it's a variable would be helpful.

The explanation is at the top of the file, lines 8-13 inclusive:
```
/**
 * Auto-detect appropriate line endings for e-mails.
 *
 * $conf['mail_line_endings'] will override this setting.
 */
define('MAIL_LINE_ENDINGS', isset($_SERVER['WINDIR']) || strpos($_SERVER['SERVER_SOFTWARE'], 'Win32') !== FALSE ? "\r\n" : "\n");
```
I'm using a variable $eol instead of repeating the code variable_get('mail_line_endings', MAIL_LINE_ENDINGS) but if you prefer the repetition, I will comply.
The first preg_replace() replaces \r\n with platform-specific or variable-defined line endings. The second preg_replace() still tries to match original line endings even though they got already replaced with $eol.

Okay, replaced '/ +\r?\n/' with "/ +$eol/" but I don't think it makes any difference.
Exceeds 80 chars.

Fixed.

$parents should use array type-hinting, too.

Changed. Drupal in general should use type-hinting. It doesn't.

Always use !isset() instead of is_null().

Changed. (PHP is such a screwball language! If we can't use the function, why does it exist?)
Stray leading and trailing space in if condition.

Re-wrote long if conditions using a temporary variable.
Missing break; or missing inline comment that explains the fall-through to 'dt'.

Good catch. Fixed.
The empty default case needs an inline comment for why it's needed (if it is needed).

Added an inline comment.
Bogus change: DefaultMailSystem is a class, not a function.

Fixed, but please file an issue against Coder Review, which claims the parentheses are needed, and against API, which fails to create a link without them.
Please remove the leading underscore and rename to stringToHtml().

Done.
Missing phpDoc.

Added, but is phpDoc really necessary for internal-use-only methods within a test class that don't show up on api.drupal.org anyway?
1) Rename to assertHtmlToText().

2) Rename $args to $allowed_tags.

3) $reason should be $message.

4) Use a separate $this->verbose() after assertEqual() to output the debugging info as a page instead of squeezing it into the assertion message.

5) Missing phpDoc.

All fixed. I didn't know about the verbose() function.
Might make sense to unconditionally run check_plain() on assertHtmlToText()'s $message, so we don't need those &lt;...&gt;.

Done.

Let's make these a bit cleaner:

Done.

Log in or register to post comments

Comment #182

pillarsdotnet CreditAttribution: pillarsdotnet commented 2 June 2011 at 19:15

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-182.patch	35.79 KB

Log in or register to post comments

Comment #183

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 14:04

Title:	drupal_html_to_text() formatting is broken and does not have tests	» Fix the broken formatting in drupal_html_to_text() and also provide tests<li>
Assigned:	Unassigned	» pillarsdotnet
Category:	feature	» bug
Issue tags:	-Needs tests	+Needs backport to D6

Log in or register to post comments

Comment #184

pillarsdotnet CreditAttribution: pillarsdotnet commented 2 June 2011 at 21:16

Title:

Fix the broken formatting in drupal_html_to_text() and also provide tests

» Fix the broken formatting in drupal_html_to_text() and also provide tests<li>

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-184.patch	37.19 KB

Fixed typos, and added a test for wrapping very long text lines. The test fails, however. Trying to figure out why.

Log in or register to post comments

Comment #185

pillarsdotnet CreditAttribution: pillarsdotnet commented 2 June 2011 at 21:15

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-185.patch	37.2 KB

Figured it out.

Log in or register to post comments

Comment #186

salvis

he/his

CreditAttribution: salvis commented 2 June 2011 at 21:20

Title:	Fix the broken formatting in drupal_html_to_text() and also provide tests<li>	» Fix the broken formatting in drupal_html_to_text() and also provide tests
Status:	Needs review	» Needs work
Issue tags:		+Needs committer feedback

The patch produces PHP syntax errors, so it's NW in any case. I think item 2 below (wrapping <pre> and <table> below 80 characters) also makes this NW.

Re #171:

Headings: This was/is a suggestion for discussion — I'd simply add three newlines above the level 1 heading, two for level 2, and 1 for level 3, independently of the context, in order to further emphasize the importance of the heading.

That would give us

  /**
   * Test that headers are properly separated from surrounding text.
   */
  function testHeaderSeparation() {
    // Text before and after.
    $html = 'Drupal<h1>Drupal</h1>Drupal';
    $text = "Drupal\n\n\n\n\n======== DRUPAL " . str_repeat('=', 61) . "\nDrupal";
    $this->_testHtmlToText($html, $text);
    // Paragraph before; text after.
    $html = '<p>Drupal</p><h1>Drupal</h1>Drupal';
    $text = "Drupal\n\n\n\n\n\n======== DRUPAL " . str_repeat('=', 61) . "\nDrupal";
    $this->_testHtmlToText($html, $text);
    // Text before; paragraph after.
    $html = 'Drupal<h1>Drupal</h1><p>Drupal</p>';
    $text = "Drupal\n\n\n\n\n======== DRUPAL " . str_repeat('=', 61) . "\n\nDrupal";
    $this->_testHtmlToText($html, $text);
    //  Paragraph before and after.
    $html = '<p>Drupal</p><h1>Drupal</h1><p>Drupal</p>';
    $text = "Drupal\n\n\n\n\n\n======== DRUPAL " . str_repeat('=', 61) . "\n\nDrupal";
    $this->_testHtmlToText($html, $text);
  }

Wrapping <pre> lines and table rows: I'm sorry, I wasn't communicating well enough. Yes, I agree that lines MUST wrap shortly before 1000. However, right now you're wrapping them before 80, and I thought that was what you were defending. Can we agree that newlines SHOULD NOT be added or removed inside <pre> and <tr> elements as long as lines are not longer than 998 characters?
Try this:
```
$html = "<pre>aaaaaaaaaaaaaaaaaaaaaaa\nbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbb\n ccccccccccccccc</pre>";
dpm($html);
return drupal_html_to_text($html);
```
It gives
- <pre>aaaaaaaaaaaaaaaaaaaaaaa
 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbb
 ccccccccccccccc</pre>
- aaaaaaaaaaaaaaaaaaaaaaa
 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
 bbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbb
 bbbbbbbbbbbbbbbbbbbbbbbbb
 ccccccccccccccc
There are two newlines among the b's which shouldn't be there.

The one space ahead of the c's turns into two spaces. It's a nit, but if you're trying to convert something that depends on exact indenting, then it's maddening.
The thought of doing more to separate table cells briefly occurred to me, too, but just like vertical separators, horizontal separators should be symmetrical IMO, and thus it would take at least " | " (three characters), and it still wouldn't work so well if the cells weren't aligned. Tabs rarely help with alignment either.
However, not getting any clue whatsoever about where a cell ends and the next one starts is not so great. Hence my suggestion to put two spaces between cells, rather than just one. This would usually allow the reader to count the cells in a row and give him a fighting chance to analyze and reconstruct the table structure, if he really wanted to. With only one space, it's completely impossible unless the cell contents are just single words.
Thank you for adding the tag! We're generally not seeing a lot of markup in nodes here on d.o, probably because most people don't particularly like to write HTML by hand. However, we have a list of "Allowed HTML tags" at the bottom of this textarea (<h1> <h2> <h3> <h4> <h5> <h6> <code> <del> <blockquote> <q> <cite> <ul> <ol> <li> <dl> <dt> <dd> <a> <table> <tr> <td> <th> <tbody> <thead> <tfoot> <colgroup> <caption>). This is a reasonable list for a technically oriented site, and it's a good assumption that any of these will be used at some time or other. If they can be rendered sensibly in plain text, and drupal_html_to_text() can support them with reasonable effort, then I'm sure that support will be appreciated in the right context.
I was going to argue that <strike> was on this list, and I think it was yesterday, and I think I've seen it used by people to show where they have made minor edits... It's still on g.d.o, for example here. It would be a pity to lose the <strike> information when such a node is transmitted through email. So, please, would you reconsider adding <strike>?
There's a smaller list in the default 'Filtered HTML' text format: <a> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>. I think these should all be supported in some way for the sake of consistency. Only two are missing in your implementation: <cite> could be approximated reasonably well as ; &ltcode> is more difficult, because it can be either inline or block. As a block, I think it should behave like <pre> — inline is tough though... Maybe `backticks` (Markdown), or { braces and spaces }?

Log in or register to post comments

Comment #187

salvis

he/his

CreditAttribution: salvis commented 2 June 2011 at 21:21

Status:	Needs work	» Needs review
Issue tags:	-Needs committer feedback

Oh, sorry, our messages crossed. My item 2 referred to your patch #182. Apparently, you've noticed the problem now.

Log in or register to post comments

Comment #188

salvis

he/his

CreditAttribution: salvis commented 2 June 2011 at 21:24

Status:

Needs review

» Needs work

No, the <pre> line is still getting wrapped below 80.

Log in or register to post comments

Comment #189

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 14:05

Title:	Fix the broken formatting in drupal_html_to_text() and also provide tests<li>	» Fix the broken formatting in drupal_html_to_text() and also provide tests
Assigned:	pillarsdotnet	» Unassigned
Status:	Needs review	» Needs work

Log in or register to post comments

Comment #190

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 18:24

Assigned:	pillarsdotnet	» Unassigned
Status:	Needs review	» Needs work

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-190.patch	37.23 KB

Applied to Mail System: 6.x-2.20; 7.x-2.20; 8.x-2.20

Log in or register to post comments

Comment #191

salvis

he/his

CreditAttribution: salvis commented 2 June 2011 at 23:04

Assigned:

pillarsdotnet

» Unassigned

I'm sorry for being a source of frustration for you. You've taken on a big task and done a great job. No one knows your code as well as you do — please don't give up! Take a break, let some feedback accumulate, and then deal with it calmly, rather than frantically shooting patches.

Re #190:

$html = "<pre>aaaaaaaaaaaaaaaaaaaaaaa\nbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbb\n ccccccccccccccc</pre>";
dpm($html);
return drupal_html_to_text($html);

on Devel's Execute PHP page now gives

<pre>aaaaaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbb
ccccccccccccccc</pre>
aaaaaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbb
ccccccccccccccc

The one space in front of the c's now turns into no space, it should be preserved.

Two newlines are inserted between the b's inside the <pre> element, that shouldn't happen, as long as the resulting line is shorter than 998 characters.

The trailing spaces are gone, which is a clear improvement.

Log in or register to post comments

Comment #192

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 17:49

Status:

Needs review

» Needs work

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-192.patch	38.19 KB

Here's a patch with a failing test. I think that making it pass will require a rewrite/replacement of (at least) drupal_wrap_mail() and friends, which would prevent the possibility of a backport.

EDIT: Had to change the test after learning some more about space-stuffing:

  /**
   * Ensure that content within <pre> tags is not changed.
   */
  function testNoWrapWithinPre() {
    // This awkward construct comes from includes/mail.inc lines 8-13.
    $eol = variable_get('mail_line_endings', MAIL_LINE_ENDINGS);
    $text = str_repeat('a', 80)
      . ' ' . str_repeat('b', 80) // Single space.
      . '  ' . str_repeat('c', 80) // Two spaces.
      . "$eol" . str_repeat('d', 80) // Single linefeed.
      . "$eol$eol" . str_repeat('e', 80) // Double linefeed.
      . "$eol " . str_repeat('f', 80) // Linefeed and space.
      . " $eol" . str_repeat('g', 80); // Space and linefeed.
    // The total length of $text is under 1000 octets and should not wrap.
    $html = "<pre>$text</pre>";
    // Space-stuffing will add an extra space to any line starting with a space.
    $text = str_replace("$eol ", "$eol  ", $text);
    $this->assertHtmlToText($html, $text, 'Text within <pre> is unchanged.');
  }

Log in or register to post comments

Comment #193

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 13:10

File	Size
drupal_html_to_text-fix_broken_formatting_and_forget_about_the_tests-299138-193.patch	9.93 KB

Here's an entirely different approach. ;->

Log in or register to post comments

Comment #194

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 13:17

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #195

3 June 2011 at 14:05

Status:

Needs review

» Needs work

The last submitted patch, drupal_html_to_text-fix_broken_formatting_and_forget_about_the_tests-299138-193.patch, failed testing.

Log in or register to post comments

Comment #196

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 15:28

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-fix_broken_formatting_and_forget_about_the_tests-299138-195.patch	39.64 KB

Fixed.

Log in or register to post comments

Comment #197

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 15:35

File	Size
drupal_html_to_text-fix_broken_formatting_and_forget_about_the_tests-299138-197.patch	10.11 KB

Corrected approach from #193.

Log in or register to post comments

Comment #198

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 18:24

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-198.patch	39.33 KB

Applied to Mail System: 6.x-2.21; 7.x-2.21; 8.x-2.21

Log in or register to post comments

Comment #199

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 17:51

Log in or register to post comments

Comment #200

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 19:00

Status:

Needs review

» Needs work

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-200.patch	39.51 KB

Weird. Looks like we somehow broke core.

Log in or register to post comments

Comment #201

salvis

he/his

CreditAttribution: salvis commented 3 June 2011 at 20:14

Please wait and give us some time to think...

In #192 you've pointed to drupal_wrap_mail(). Why would this stop working after having worked for D6?

The goal is to do format=flowed, delsp=yes wrapping, and this is NOT what I thought we should do!

Log in or register to post comments

Comment #202

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 20:30

Actually, I think the most interesting solution is in #197.

Log in or register to post comments

Comment #203

salvis

he/his

CreditAttribution: salvis commented 3 June 2011 at 20:49

After thinking about this some more I believe that RFC 3676 calls for soft wrapping <pre> tags.

I think the proper test for <pre> is the following:

/**
   * Ensure that content within <pre> tags is not changed.
   */
  function testNoWrapWithinPre() {
    // This awkward construct comes from includes/mail.inc lines 8-13.
    $eol = variable_get('mail_line_endings', MAIL_LINE_ENDINGS);
    $html = '<pre>'
      . str_repeat('a', 30) . ' ' . str_repeat('a', 30) . ' ' // Single space.
      . str_repeat('b', 30) . ' ' . str_repeat('b', 30) . '  ' // Two spaces.
      . str_repeat('c', 30) . ' ' . str_repeat('c', 30) . "$eol" // Single newline.
      . str_repeat('d', 30) . ' ' . str_repeat('d', 30) . "$eol$eol" // Double newline.
      . str_repeat('e', 30) . ' ' . str_repeat('e', 30) . "$eol " // Newline and space.
      . str_repeat('f', 30) . ' ' . str_repeat('f', 30) . "$eol  " // Newline and two spaces.
      . str_repeat('g', 30) . ' ' . str_repeat('g', 30) . " $eol" // Space and newline.
      . str_repeat('h', 30) . ' ' . str_repeat('h', 30) . "  $eol" // Two spaces and newline.
      . str_repeat('i', 30) . ' ' . str_repeat('i', 30) . '</pre>'; 
    $text =
      . str_repeat('a', 30) . ' ' . str_repeat('a', 30) . "  $eol" // Two spaces and newline.
      . str_repeat('b', 30) . ' ' . str_repeat('b', 30) . "   $eol" // Three spaces and newline.
      . str_repeat('c', 30) . ' ' . str_repeat('c', 30) . "$eol" // Single newline.
      . str_repeat('d', 30) . ' ' . str_repeat('d', 30) . "$eol$eol" // Double newline.
      . str_repeat('e', 30) . ' ' . str_repeat('e', 30) . "$eol  " // Newline and two spaces.
      . str_repeat('f', 30) . ' ' . str_repeat('f', 30) . "$eol   " // Newline and three spaces.
      . str_repeat('g', 30) . ' ' . str_repeat('g', 30) . "$eol" // Newline only.
      . str_repeat('h', 30) . ' ' . str_repeat('h', 30) . "$eol" // Newline only.
      . str_repeat('i', 30) . ' ' . str_repeat('i', 30) . '</pre>'; 
    // Soft wrapping and space stuffing for lines between about 78 and 998 characters long,
    // according to http ://www.ietf.org/rfc/rfc3676.txt.
    $this->assertHtmlToText($html, $text, 'Text within <pre> treated according to RFC 3676.');
  }

Log in or register to post comments

Comment #204

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 21:33

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-204.patch	40.75 KB

After thinking about this some more I believe that RFC 3676 calls for soft wrapping
 tags.

You can only soft-wrap content that starts at the left margin. You can't soft-wrap anything that is a descendant of a <blockquote>, <li>, <ol>, or <ul> tag. Because of the way we are rendering them, you also can't soft-wrap anything that is a descendant of a , <cite>, , , , or  tag. The more I consider this, the more I think the approach in #197 is the way to go. There is no right way to do things here.

Log in or register to post comments

Comment #205

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 21:18

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-205.patch	40.64 KB

Note that this will get rejected the first time it gets reviewed, as the comments extend beyond column 80.

EDIT: nevermind; code fails with a syntax error anyway.

Log in or register to post comments

Comment #206

salvis

he/his

CreditAttribution: salvis commented 3 June 2011 at 21:01

Actually, I think the most interesting solution is in #197.

You're not serious, are you? I'm pretty sure that spawning a child process is way too slow, and the text-mode browsers are designed to produce output for 25x80 terminals, which is not the same thing as a text email message.

Ah, this is a test, right? I'll readily admit that I'm not able to look at each of your patches, at the rate that you produce them...

Log in or register to post comments

Comment #207

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 21:44

I'm pretty sure that spawning a child process is way too slow,

All tests complete in less than one second on my six-year-old Lenovo T-60.

the text-mode browsers are designed to produce output for 25x80 terminals, which is not the same thing as a text email message.

All of the ones I referenced have command-line options to set the line width. None of them care about the screen height.

I'll readily admit that I'm not able to look at each of your patches, at the rate that you produce them...

You can tell which ones I'm serious about because I commit them to Mail System, which is the only thing that is going to benefit from this issue before D9 ships. And also, interdiff is handy when comparing large patchsets.

Log in or register to post comments

Comment #208

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 21:46

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-208.patch	40.64 KB

#205 has a syntax error and a test error which are now both corrected.

The final test, cut-and-pasted from #203, fails because it now requires behavior which you condemned in #186/#191 and which I fixed in #196.

The comments still extend beyond column 80.

Log in or register to post comments

Comment #209

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 14:07

Status:

Needs work

» Closed (works as designed)

Log in or register to post comments

Comment #210

salvis

he/his

CreditAttribution: salvis commented 3 June 2011 at 22:38

You can only soft-wrap content that starts at the left margin.

I think this is wrong. Space stuffing allows you to have leading spaces:

A generating agent MUST:

   o  Space-stuff lines which start with a space, "From ", or ">".

I believe this has been working in D6, but I'm not sure.

Because of the way we are rendering them, you also can't soft-wrap anything that is a descendant of a , <cite>, , , , or tag.

Why? Is this a limitation of your algorithm (can it be improved?) or do you see something in RFC 3676 that causes difficulties?

The final test, cut-and-pasted from #203, fails because it now requires behavior which you condemned in #191 and which I fixed in #196.

Yes, I got wiser between #191 and #203. I'm sorry for leading you down a path which I now think is a wrong one.

My hat's off to you for your drive, but maybe you're taking "talk is silver, code is gold" a bit too far. The basic problem in this issue is that we not only didn't have any tests, but we didn't even know how things are supposed work and why. In this situation it would be wise to discuss first and make sure we really understand what we're talking about (ideally getting consent from several people), before you start coding. This may seem slower, but ultimately it's probably faster, and certainly less frustrating.

Why would this stop working after having worked for D6?

You're absolutely right.

I was referring to drupal_wrap_mail(), which apparently hadn't been touched since D6. This is what caused me to question whether my test case was correct.

Obviously, the D7 drupal_html_to_text() is badly broken, but I still think that the D6 version was mostly working (except that it made it difficult to send HTML mail). It was broken by inadvertently committing #407452-64: Remove use of drupal_html_to_text() from system mails. The error was noticed a year ago in #299138-48: Improve \Drupal\Core\Utility\Mail::htmlToText(). Your patch in #299138-69: Improve \Drupal\Core\Utility\Mail::htmlToText() and in #1130198: Regression: line-breaks are mangled by \Drupal\Core\Utility\Mail::htmlToText() may have been all that was needed to get things working again, but without tests we can't really know.

There's a tremendous amount of work, know-how, and tuning in the D6 version that we (you, mostly) have had to redo and rebuild. Once you've pulled this off, D7 will be in much better shape, of course, but it takes a lot of learning and hard work. It's not just coding...

Log in or register to post comments

Comment #211

salvis

he/his

CreditAttribution: salvis commented 3 June 2011 at 22:48

I'm pretty sure that spawning a child process is way too slow,

All tests complete in less than one second on my six-year-old Lenovo T-60.

Guessing performance is very difficult. A more relevant test would be how many mails can be pumped through drupal_html_to_text() and sent out in one cron run. This may include spawning sendmail (or something similar), which could explain why that number is notoriously low. Spawning a text-mode browser may be a comparable effort, which might mean that you'd be cutting the number in half.

Not an attractive proposition...

Log in or register to post comments

Comment #212

pillarsdotnet CreditAttribution: pillarsdotnet commented 3 June 2011 at 23:06

Once ~~you've~~ pulled this off, D7 will be in much better shape,

Once someone has pulled this off, D9 will be in much better shape.

There; fixed that for you.

Log in or register to post comments

Comment #213

salvis

he/his

CreditAttribution: salvis commented 3 June 2011 at 23:22

Oh, you're using <del>!

;-)

Log in or register to post comments

Comment #214

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 01:32

Status:

Needs work

» Needs review

Testbots are idle; let's put them to work.

Log in or register to post comments

Comment #215

4 June 2011 at 01:34

Status:

Needs review

» Needs work

The last submitted patch, drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-208.patch, failed testing.

Log in or register to post comments

Comment #216

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 07:04

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-216.patch	42.55 KB

Log in or register to post comments

Comment #217

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 07:05

Status:

Needs review

» Needs work

Log in or register to post comments

Comment #218

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 07:29

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-218.patch	42.53 KB

Log in or register to post comments

Comment #219

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 13:43

Title:	Improve drupal_html_to_text()	» Fix the broken formatting in drupal_html_to_text() and also provide tests
Category:	feature	» bug

Spawning a text-mode browser may be a comparable effort...

My point is that we're rapidly approaching the complexity of (e.g.) html2text. If the goal is to duplicate the entire functionality, we might as well spawn the compiled binary, which is bound to be more efficient than its interpreted-language equivalent.

#1130198 may have been all that was needed to get things working again, but without tests we can't really know.

Re-opened #1130198. ~~Please add relevant tests to that issue.~~

Copied tests from this issue, then adjusted the "expected" values until they passed. Discovered that the current version of drupal_wrap_mail() will happily generate lines of over 2000 characters in length, so ~~removed the RFC-3676 test~~ replaced the testVeryLongLineWrap() assertion with a pass().

Log in or register to post comments

Comment #220

4 June 2011 at 07:32

Issue tags:

-Needs backport to D6, -Needs backport to D7

The last submitted patch, drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-218.patch, failed testing.

Log in or register to post comments

Comment #221

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 07:43

Status:

Needs work

» Needs review

Looks like the testbots broke down around 10pm last night.

Log in or register to post comments

Comment #222

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 10:07

#218: drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-218.patch queued for re-testing.

Log in or register to post comments

Comment #223

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 10:08

#200: drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-200.patch queued for re-testing.

Log in or register to post comments

Comment #224

4 June 2011 at 10:09

Status:	Needs review	» Needs work
Issue tags:		+Needs backport to D6, +Needs backport to D7

The last submitted patch, drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-218.patch, failed testing.

Log in or register to post comments

Comment #225

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 15:13

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-225.patch	43.67 KB

Yay! Got rid of all magic constants except for 160 (the non-breaking space character), 80 (the default line length for soft-wrapping) and 1000 (the default line length for hard-wrapping).

Working on adding support for tables; stay tuned.

Log in or register to post comments

Comment #226

Damien Tournoud CreditAttribution: Damien Tournoud commented 4 June 2011 at 16:05

Title:	Fix the broken formatting in drupal_html_to_text() and also provide tests	» Improve drupal_html_to_text()
Category:	bug	» feature

This issue transformed into a massive feature request. Let's keep the other #1130198: Regression: line-breaks are mangled by \Drupal\Core\Utility\Mail::htmlToText() to fix the regression.

Log in or register to post comments

Comment #227

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 17:06

Title:	Improve drupal_html_to_text()	» Fix the broken formatting in drupal_html_to_text() and also provide tests
Category:	feature	» bug

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-226.patch	46.94 KB

Full table support added.

Log in or register to post comments

Comment #228

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 18:58

File	Size
drupal_html_to_text-fix_broken_formatting_and_add_tests-299138-228.patch	49.18 KB

Added inline comments to explain the strategy for generating tables, improved efficiency slightly, and cleaned up inline comments in test code.

Log in or register to post comments

Comment #229

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 18:59

Title:	Fix the broken formatting in drupal_html_to_text() and also provide tests	» Improve drupal_html_to_text()
Category:	bug	» feature
Status:	Needs work	» Needs review

Log in or register to post comments

Comment #230

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 20:32

File	Size
drupal_html_to_text-improvements-299138-230.patch	49.74 KB

Coder review and cleanups.

Applied to Mail System: 6.x-2.22 release; 7.x-2.22 release; 8.x-2.22 release

Log in or register to post comments

Comment #231

pillarsdotnet CreditAttribution: pillarsdotnet commented 4 June 2011 at 21:31

File	Size
drupal_html_to_text-improvements-299138-231.patch	50.17 KB

Minor changes from actual use-testing.

Log in or register to post comments

Comment #232

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 16:11

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-improvements-299138-232.patch	50.12 KB

Last one; ~~I promise... :-)~~ EDIT: must have had my fingers crossed.

Applied to Mail System: 6.x-2.23 release; 7.x-2.23 release; 8.x-2.23 release

Log in or register to post comments

Comment #233

5 June 2011 at 07:17

Status:

Needs review

» Needs work

The last submitted patch, drupal_html_to_text-improvements-299138-232.patch, failed testing.

Log in or register to post comments

Comment #234

pillarsdotnet CreditAttribution: pillarsdotnet commented 5 June 2011 at 09:25

Status:

Needs review

» Needs work

File	Size
drupal_html_to_text-improvements-299138-234.patch	50.1 KB

Fixed errors in tests.

Log in or register to post comments

Comment #235

pillarsdotnet CreditAttribution: pillarsdotnet commented 5 June 2011 at 09:20

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #236

pillarsdotnet CreditAttribution: pillarsdotnet commented 5 June 2011 at 12:46

Status:

Needs work

» Needs review

File	Size
drupal_html_to_text-improvements-299138-236.patch	50.12 KB

Corrected a one-off error: Need to subtract the length of the line-break character from the maximum line length before wrapping or adding padding.

Log in or register to post comments

Comment #237

pillarsdotnet CreditAttribution: pillarsdotnet commented 5 June 2011 at 18:46

File	Size
drupal_html_to_text-improvements-299138-237.patch	50.12 KB

Removed url() calls from test code and also switched to DrupalWebTestCase.

Log in or register to post comments

Comment #238

pillarsdotnet CreditAttribution: pillarsdotnet commented 5 June 2011 at 19:10

File	Size
drupal_html_to_text-improvements-299138-238.patch	50.18 KB

Corrected the initial calculation of table cell wrapping width, and eliminated a duplicate assignment.

Log in or register to post comments

Comment #239

pillarsdotnet CreditAttribution: pillarsdotnet commented 5 June 2011 at 19:45

File	Size
drupal_html_to_text-improvements-299138-239.patch	50.36 KB

#237 fails, which means that #238 will probably fail too, but #1179448: verbose() results are missing.

Moving verbose() text to assertion message so that I can figure out why the testbots are failing where my local tests are succeeding.

Log in or register to post comments

Comment #240

sun

German

Karlsruhe

CreditAttribution: sun commented 5 June 2011 at 20:01

@pillarsdotnet: In light of #239, it would be helpful if you would run tests and debug test failures on your local box, before submitting patches to drupal.org. The common exception are patches that affect all of Drupal core, which take hours to test locally and only ~30 minutes for the testbot. But especially if it's only a single test-case that needs to be run, it's not really time-consuming to do that on your local box. Probably 90% of all follow-ups on this issue are pure patch/testing noise, which could have been avoided, in turn leading to more solid and focused follow-ups, reviews, and faster progress. Thanks!

Log in or register to post comments

Comment #241

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 14:12

Title:	Fix the broken formatting in drupal_html_to_text() and also provide tests	» Improve drupal_html_to_text()
Category:	bug	» feature
Status:	Closed (works as designed)	» Needs review

I am testing on my local box. Tests pass here but fail on drupal.org. ~~Dunno why.~~

EDIT: Okay, I need to run local tests with clean urls turned off and drupal running from a subdirectory, else the results aren't valid.

Probably 90% of all follow-ups on this issue are pure patch/testing noise

Really? 220 comments worth? (counting non-noise comments... 148 out of 243)
So.... 40% noise, including my (now removed) snarkiness.

Log in or register to post comments

Comment #242

5 June 2011 at 20:34

Status:

Needs review

» Needs work

The last submitted patch, drupal_html_to_text-improvements-299138-239.patch, failed testing.

Log in or register to post comments

Comment #243

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 15:26

Title:	drupal_html_to_text() formatting is broken and does not have tests	» Improve drupal_html_to_text()
Version:	7.x-dev	» 8.x-dev
Assigned:	sun	» Unassigned
Category:	bug	» feature
Status:	Needs work	» Needs review
Issue tags:	-Needs tests, -TestingPartySF	+Needs backport to D6

File	Size
drupal_html_to_text-improvements-299138-243.patch	55.86 KB

Improved table generation accuracy and performance.
Added test for correctly formed deeply-nested tables.
Test also asserts that table generation takes less than one second.

Log in or register to post comments

Comment #244

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 08:33

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #245

pillarsdotnet CreditAttribution: pillarsdotnet commented 12 June 2011 at 00:31

Status:

Needs review

» Needs work

Log in or register to post comments

Comment #246

pillarsdotnet CreditAttribution: pillarsdotnet commented 6 June 2011 at 15:53

Status:	Needs review	» Needs work
Issue tags:		+Needs backport to D7

File	Size
drupal_html_to_text-improvements-299138-246.patch	56.36 KB

Found and fixed a typo -- doxygen comment didn't match parameter name.

Applied to Mail system: 6.x-2.25; 7.x-2.25; 8.x-2.25

Log in or register to post comments

Comment #247

pillarsdotnet CreditAttribution: pillarsdotnet commented 10 June 2011 at 22:10

File	Size
drupal_html_to_text-improvements-299138-247.patch	55.87 KB

Fixed an edge case reported by Seph. Haven't added a test because I haven't yet figured out how to reproduce the problem. CNR for testbot only.

Applied to Mail System: 6.x-2.26; 7.x-2.26; 8.x-2.26

Log in or register to post comments

Comment #248

pillarsdotnet CreditAttribution: pillarsdotnet commented 12 June 2011 at 00:33

Changing back to "needs work" because #247 still lacks the following features:

A test for the edge case reported by Seph.
Support for tags such as <ins>, <del>, and <strike>.
Tag attributes, notably align, rowspan, and colspan.
A distinction between <th> and <td>.
CSS / stylesheet support.

Also, it doesn't end world hunger, save the whales, or promote world peace. Definitely needs work.

Meanwhile, please support #1130198-46: Regression: line-breaks are mangled by \Drupal\Core\Utility\Mail::htmlToText()

Log in or register to post comments

Comment #249

salvis

he/his

CreditAttribution: salvis commented 19 June 2011 at 13:10

Title:	Fix the broken formatting in drupal_html_to_text() and also provide tests	» Improve drupal_html_to_text()
Assigned:	pillarsdotnet	» Unassigned
Category:	bug	» feature
Status:	Needs review	» Needs work
Issue tags:	-Needs committer feedback

@pillarsdotnet: Please stop rewriting history (like #164 that you posted on June 1 and edited just now).

Log in or register to post comments

Comment #250

attiks CreditAttribution: attiks commented 29 June 2011 at 09:59

subscribing

Log in or register to post comments

Comment #251

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 12 August 2013 at 16:48

Version:

8.x-dev

» 9.x-dev

Log in or register to post comments

Comment #252

catch

he/him

English

CreditAttribution: catch commented 5 November 2014 at 15:53

Version:	9.x-dev	» 8.1.x-dev
Issue summary:	View changes
Issue tags:		+API addition, +Needs issue summary update

Could be done in 8.1.x as an API addition.

Log in or register to post comments

Comment #253

ianthomas_uk

he/him

English

Brighton, UK

CreditAttribution: ianthomas_uk commented 18 November 2014 at 13:27

Title:

Improve drupal_html_to_text()

» Improve \Drupal\Core\Utility\Mail::htmlToText()

Log in or register to post comments

Comment #254

18 November 2014 at 13:27

Version:

8.1.x-dev

» 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #255

18 November 2014 at 13:27

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #256

18 November 2014 at 13:27

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #257

18 November 2014 at 13:27

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #258

18 November 2014 at 13:27

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #259

18 November 2014 at 13:27

Version:

8.6.x-dev

» 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #260

18 November 2014 at 13:27

Version:

8.7.x-dev

» 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #261

18 November 2014 at 13:27

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #262

18 November 2014 at 13:27

Version:

8.9.x-dev

» 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #263

jungle

He/Him

Chongqing, China

CreditAttribution: jungle as a volunteer commented 17 August 2020 at 22:59

Outdated? htmlToText() is in \Drupal\Core\Mail\MailFormatHelper ATM. \Drupal\Core\Utility\Mail::htmlToText() does not exist in 9.1.x.

Log in or register to post comments

Comment #264

jungle

He/Him

Chongqing, China

CreditAttribution: jungle as a volunteer commented 17 August 2020 at 23:01

Status:

Needs work

» Needs review

Setting to NR for closing this if applicable.

Log in or register to post comments

Comment #265

kscheirer

English

Vallejo

CreditAttribution: kscheirer commented 18 August 2020 at 20:37

Category:	Feature request	» Task
Status:	Needs review	» Needs work
Issue tags:		+testing coverage

Well the original issue (12 years ago?? sheesh) was to add tests for htmlToText() since it's part of Drupal core.

Looking at core/tests/Drupal/Tests/Core/Mail/MailFormatHelperTest.php::MailFormatHelperTest() we still are not testing the htmlToText() function, so I think this issue still applies. Along the way this issue also wanted to improve the function. I haven't read through all the comments, but perhaps it would be easier to commit tests as-is, and then a new issue for improvements. Or close if there is no desire for the improvements anymore.

Log in or register to post comments

Comment #266

18 August 2020 at 20:37

Version:

9.1.x-dev

» 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.