problem importing "&" in taxonomy import xml

sissi1212 - March 25, 2009 - 13:35
Project:Taxonomy import/export via XML
Version:6.x-1.3
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:postponed (maintainer needs more info)
Description

In terms of import via taxonomy_xml, if a term contains an & between spaces, spaces are removed, which is not normal.

#1

dman - March 25, 2009 - 22:24

I've not seen that, and can't think of any code that would do that.
Can you post your sample input file?

Note that a bare '&' in text would be invalid XML in the first case... Is the input valid?

#2

sissi1212 - March 26, 2009 - 10:43

My runs well but there is no space between debate & culture for example, on drupal in categories I have debate&culture.

#3

dman - March 26, 2009 - 22:45

Can you post your sample input file?

Note that a bare '&' in text would be invalid XML in the first case... Is the input valid?

#4

dman - March 29, 2009 - 11:08
Status:active» postponed (maintainer needs more info)

#5

sissi1212 - March 30, 2009 - 12:36

Here is my file and thank you for your responsiveness. :)

AttachmentSize
section.txt 5.53 KB

#6

dman - March 30, 2009 - 13:46

You are totally right.
Unpredicted effect there :-/

What is happening is
<name>Culture &amp; debates</name>
goes through the XML tokenizer and produces
[tag][text][entity][text][endtag]
... but (for reasons I'm sure were pretty obvious at one time) every token is being trimmed as it is found.
so
[tag][text(space)][entity][(space)text][endtag]
becomes
[tag][text][entity][text][endtag]

:-/

Here's a quick attempt at a fix.
I'm falling asleep here, so may have made a blunder, but it does attack the symptom

AttachmentSize
taxonomy_xml-entity_trimming-413328-20090331.patch 1.33 KB
xml_format.inc-patched.txt 7.12 KB

#7

niQo - April 6, 2009 - 10:14

Hello dman,
I have tried your patch, it resolve " & " issue but have big side effects.

here is the result of my test : (by importing section.txt uploaded by sisi12 (comment #5) )
- an empty term is added
- term hierarchy is broken : there is no parent term for imported terms

#8

brad bulger - October 18, 2009 - 21:48

it seemed to work OK for me. i modified it slightly to change the assignments but i'd be surprised if that made much of a difference. i verified that it works against the 2.x-dev release as well. (i did get some strange behavior importing the section.txt file involving node types but that's a different issue.)

i also found that having ampersands in your synonym names causes problems - the export function wasn't escaping entities in the synonyms tag value, so the parser coughs on them when it tries to import the result. i ran the list through check_plain() and that seemed to fix it. i'm not sure the full implications of that, but that's what it was doing for the other tags so it seemed safe enough.

AttachmentSize
xml_format.trim_.patch 740 bytes
xml_format.synonyms.patch 435 bytes

#9

chrism2671 - November 1, 2009 - 18:49

I'm having problems also related to & - here's a file I've exported- it refuses to reimport it.

AttachmentSize
taxonomy_auto_created_vocabulary.xml_.xml_.zip 4.84 KB
 
 

Drupal is a registered trademark of Dries Buytaert.