Problem/Motivation
Html::load() + Html::serialize() produces different output across environments.
Steps to reproduce
use Drupal\Component\Utility\Html;
$source = <<<EOT
<table>
<tr>
<td>L</td>
<td>R</td>
</tr>
</table>
EOT;
$document = Html::load($source);
$out = Html::serialize($document);
var_export($out);
Run this in different environments.
Sometimes the output is like so:
<table>
<tr>
<td>L</td>
<td>R</td>
</tr>
</table>
Other times it is like so:
<table><tr><td>L</td>
<td>R</td>
</tr></table>
This has no practical effect in most cases, but it causes inconsistency in unit tests.
(Unforutnately I can't tell which differences in the env cause this inconsistency)
Proposed resolution
Pass either LIBXML_HTML_NODEFDTD or LIBXML_NOBLANKS to DOMDocument->loadHTML(), inside Html::load().
One of them consistently removes these line breaks, the other consistently preserves them.
If BC is a concern, we could add an optional parameter to Html::load(), to control this behavior.
And then in FilterHtml we could provide this parameter.
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
Comment | File | Size | Author |
---|---|---|---|
#19 | 3204929-19.patch | 1.51 KB | longwave |
#14 | 3204929-14.patch | 484 bytes | longwave |
#15 | 3204929-15.patch | 762 bytes | longwave |
Comments
Comment #2
cilefen CreditAttribution: cilefen commentedInteresting! I suppose it is something to do with the system's libxml.
Comment #3
longwaveCan you describe the two different environments a bit (OS, PHP version, etc), it may give someone enough info to find the root cause.
Comment #4
donquixote CreditAttribution: donquixote commentedPossibly. But I was not able to put my finger on it.
So, the PHP extensions are not so different.
I suspect the main difference lies in libxml libraries installed on system level.
Comparing
apt-cache policy libxml2
Comment #5
donquixote CreditAttribution: donquixote commentedBtw I doubt that it has anything to do with php version.
There is no difference between versions here: https://3v4l.org/2jSIa
Comment #6
donquixote CreditAttribution: donquixote commentedhttps://stackoverflow.com/a/36710186/246724
Comment #7
donquixote CreditAttribution: donquixote commentedSo I can confirm that the change is due to the version of the libxml2 library on system level (NOT the php extension).
old: libxml2 2.9.4. Here the line breaks were removed.
new: libxml2 2.9.10. Here the line breaks remain, unless a specific parameter is provided to remove them.
If we want to standardize, we should replicate the behavior from the newer version.
This can be done by passing LIBXML_HTML_NODEFDTD to DOMDocument->loadHTML().
Comment #8
cilefen CreditAttribution: cilefen commentedThat sounds like a plan.
Comment #9
donquixote CreditAttribution: donquixote commentedCould be this commit in libxml2:
https://gitlab.gnome.org/GNOME/libxml2/-/commit/f933c898132f20a50ba39ac6...
Comment #11
longwaveI think I am running into this in existing tests when working on #2441811: Upgrade filter system to HTML5 - that issue would fix this one by using a fixed PHP library for HTML parsing and serialization so any differences should be removed.
Comment #14
longwaveIt's possible this is now affecting core as I get the same failures locally as in HEAD: https://www.drupal.org/pift-ci-job/2431849
Attached patch fixes the filter module tests locally, so let's see how this goes on DrupalCI.
Comment #15
longwaveComment #16
longwaveIf this does fix the tests, as per #7 we should decide whether we want to keep the old behaviour with this flag, or enforce the new behaviour with the other flag and update the tests to match.
Comment #17
andypostLooking at commit https://gitlab.gnome.org/GNOME/libxml2/-/commit/f933c898132f20a50ba39ac6... it looks like #15 should be enough to keep consistency within libxml2 versions
Comment #19
longwaveA similar fix is required for Drupal\Tests\layout_discovery\Kernel\LayoutTest.
I can't reproduce either of the FunctionalJavascript failures locally.
Comment #20
catchIMO if this fixes the tests, let's commit this and open a follow-up for removing the flag (or not if we eventually decided not to).
Comment #22
Spokjei think this patch had given all it can give on fixing the errors in HEAD that are related to this issue.
The remaining failures:
- Seem unrelated to the this patch
- Can't be reproduced locally
So I _think_ they are CI-env related.
I think we should commit this and find out (with the DA) what's going on with the remaining 3 failures.
Bumping priority to Critical, since it's related to failing HEADs.
Comment #26
catchCommitted/pushed to 10.1.x and cherry-picked back through to 9.4.x, thanks!
Comment #27
longwaveOpened #3298822: Preserve whitespace in DOMDocument::loadHTML() calls as followup discussion.