problem importing "&" in taxonomy import xml
sissi1212 - March 25, 2009 - 13:35
| Project: | Taxonomy import/export via XML |
| Version: | 6.x-1.3 |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | postponed (maintainer needs more info) |
Description
In terms of import via taxonomy_xml, if a term contains an & between spaces, spaces are removed, which is not normal.

#1
I've not seen that, and can't think of any code that would do that.
Can you post your sample input file?
Note that a bare '&' in text would be invalid XML in the first case... Is the input valid?
#2
My runs well but there is no space between debate & culture for example, on drupal in categories I have debate&culture.
#3
#4
#5
Here is my file and thank you for your responsiveness. :)
#6
You are totally right.
Unpredicted effect there :-/
What is happening is
<name>Culture & debates</name>goes through the XML tokenizer and produces
[tag][text][entity][text][endtag]... but (for reasons I'm sure were pretty obvious at one time) every token is being trimmed as it is found.
so
[tag][text(space)][entity][(space)text][endtag]becomes
[tag][text][entity][text][endtag]:-/
Here's a quick attempt at a fix.
I'm falling asleep here, so may have made a blunder, but it does attack the symptom
#7
Hello dman,
I have tried your patch, it resolve " & " issue but have big side effects.
here is the result of my test : (by importing section.txt uploaded by sisi12 (comment #5) )
- an empty term is added
- term hierarchy is broken : there is no parent term for imported terms
#8
it seemed to work OK for me. i modified it slightly to change the assignments but i'd be surprised if that made much of a difference. i verified that it works against the 2.x-dev release as well. (i did get some strange behavior importing the section.txt file involving node types but that's a different issue.)
i also found that having ampersands in your synonym names causes problems - the export function wasn't escaping entities in the synonyms tag value, so the parser coughs on them when it tries to import the result. i ran the list through check_plain() and that seemed to fix it. i'm not sure the full implications of that, but that's what it was doing for the other tags so it seemed safe enough.
#9
I'm having problems also related to & - here's a file I've exported- it refuses to reimport it.