Caveat: I'm not as familiar with XML as I would like.

I'm using Views Datasource to output custom XML data files, and running into a XML parsing error. The file is being created just fine, but parsers don't like some of the HTML entities that are coming through.

Specifically, ’ is coming through to the XML file and is causing the parser to error out.

From reviewing the code, it looks like a possible solution is to add ’ to the _views_xml_xmlEntities function (line 180 in views_xml.module).

Two questions:

  1. Is this an acceptable patch, or is there a greater problem implied?
  2. If the ’ entity is missing, are there others that should be added to this function?

I'm happy to do the heavy lifting on a patch, but I wanted to ask first.

Thanks,
JK

Comments

jkaine’s picture

Title: XML output includes undefined entiries » XML output includes undefined entities

Solved it.

I updated the function _views_xml_xmlEntities (line 180 in views_xml_module) to account for the entities defined in core Drupal in the unicode.entities.inc file.

I don't use an IDE for my programming, so I can't generate a patch. If anyone would volunteer to translate this into a patch, I would very much appreciate it.

function _views_xml_xmlEntities($str) {
  $xml = array('"','&','&','<','>',' ','¡','¢','£','¤','¥','¦','§','¨','©','ª','«','¬','­','®','¯','°','±','²','³','´','µ','¶','·','¸','¹','º','»','¼','½','¾','¿','À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','×','Ø','Ù','Ú','Û','Ü','Ý','Þ','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ð','ñ','ò','ó','ô','õ','ö','÷','ø','ù','ú','û','ü','ý','þ','ÿ','ℵ','Α','α','∧','∠','≈','„','Β','β','•','∩','Χ','χ','ˆ','♣','≅','↵','∪','†','‡','↓','⇓','Δ','δ','♦','∅',' ',' ','Ε','ε','≡','Η','η','€','∃','ƒ','⁄','Γ','γ','≥','↔','⇔','♥','…','ℑ','∞','∫','Ι','ι','∈','Κ','κ','Λ','κ','〈','←','⇐','⌈','“','≤','⌊','∗','◊','‎','‹','‘','—','−','Μ','μ','∇','–','≠','∋','∉','⊄','Ν','ν','Œ','œ','‾','Ω','ω','Ο','ο','⊕','∨','⊗','∂','‰','⊥','Φ','φ','Π','π','ϖ','′','″','∏','∝','Ψ','ψ','√','〉','→','⇒','⌉','”','ℜ','⌋','Ρ','ρ','‏','›','’','‚','Š','š','⋅','Σ','σ','ς','∼','♠','⊂','⊆','∑','⊃','⊇','Τ','τ','∴','Θ','θ','ϑ',' ','˜','™','↑','⇑','ϒ','Υ','υ','℘','Ξ','ξ','Ÿ','Ζ','ζ','‍','‌',''');
  $html = array('"','&','&','<','>',' ','¡','¢','£','¤','¥','¦','§','¨','©','ª','«','¬','­','®','¯','°','±','²','³','´','µ','¶','·','¸','¹','º','»','¼','½','¾','¿','À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','×','Ø','Ù','Ú','Û','Ü','Ý','Þ','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ð','ñ','ò','ó','ô','õ','ö','÷','ø','ù','ú','û','ü','ý','þ','ÿ','ℵ','Α','α','∧','∠','≈','„','Β','β','•','∩','Χ','χ','ˆ','♣','≅','↵','∪','†','‡','↓','⇓','Δ','δ','♦','∅',' ',' ','Ε','ε','≡','Η','η','€','∃','ƒ','⁄','Γ','γ','≥','↔','⇔','♥','…','ℑ','∞','∫','Ι','ι','∈','Κ','κ','Λ','λ','⟨','←','⇐','⌈','“','≤','⌊','∗','◊','‎','‹','‘','—','−','Μ','μ','∇','–','≠','∋','∉','⊄','Ν','ν','Œ','œ','‾','Ω','ω','Ο','ο','⊕','∨','⊗','∂','‰','⊥','Φ','φ','Π','π','ϖ','′','″','∏','∝','Ψ','ψ','√','⟩','→','⇒','⌉','”','ℜ','⌋','Ρ','ρ','‏','›','’','‚','Š','š','⋅','Σ','σ','ς','∼','♠','⊂','⊆','∑','⊃','⊇','Τ','τ','∴','Θ','θ','ϑ',' ','˜','™','↑','⇑','ϒ','Υ','υ','℘','Ξ','ξ','Ÿ','Ζ','ζ','‍','‌',''');
  $str = str_replace($html, $xml, $str);
  $str = str_ireplace($html, $xml, $str);
  return $str;
}

And looking at this from the bigger picture, it seems like the XML entities and the HTML entities need to be in sync anytime you're outputting XML. I'm thinking that an update to unicode.entities array in core would be the best way to do this-- and especially if Drupal is moving away from a strict HTML output to providing content in whatever format is needed. I have no idea how to get this idea pushed to the core team, but if any of you have ideas, please have at it.

Best,
JK