HTML Entities Not Defined
| Project: | XLIFF Tools |
| Version: | 6.x-1.0-beta1 |
| Component: | Code |
| Category: | support request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
I am not an XML expert so please bear with me here... We are testing Xliff Tools for exporting a large site out for translation. The content of the pages is mixed html and a lot of the pages make use of the special characters like ™, © and ® amongst others. When exporting these pages to Xliff documents I was getting the following errors from the Devel module:
DOMDocument::loadXML() [function.DOMDocument-loadXML]: Entity 'trade' not defined in Entity, line: 1
Followed by 'Cannot Modify Headers' errors. Therefore there was no Xliff document produced for download.
I read on the PHP site that to use special entities with DOMDocument::loadXML() I would have to specify an external DTD to support these characters. I modified xliff.module around line 128:
$html = new DOMDocument();
$html->resolveExternals = true;
$html->loadXML('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html><head><title>'. check_plain($node->title) .'</title></head><body>' . $node->body . '</body></html>');This stops the error and provides the Xliff document to download but on inspection the document doesn't contain any references to trademark symbol etc (not even a character reference code). It appears these characters get stripped when xml2xliff.xsl is applied - is this intended behavior and is there anything we can do about this?
It would be a painful process to add the symbols in manual accross hundreds of pages so we would appreciate any insight you can offer.
Regards,
Paul

#1
I suffered the same issue (but with version 5.x 1.0) as a quick fix you can change xliff.module
$html->loadXML('to$html->loadHTML('This will forgive all the tags, but you run the risk of not having correctly encoded html entities. To complete this 'hack' here needs to be a routine added to the code to scan the HTML for html entities and then have them transformed to raw entities: for example: "€" becomes a raw "€".
The better solution of course as you specified is to ensure that the data is valid strict xml, and specify the entities in the head of the xml document generated. I will submit a proper patch for this if I get the chance to do it.