Project:Taxonomy import/export via XML
Version:6.x-1.3
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:needs review

Issue Summary

In terms of import via taxonomy_xml, if a term contains an & between spaces, spaces are removed, which is not normal.

Comments

#1

I've not seen that, and can't think of any code that would do that.
Can you post your sample input file?

Note that a bare '&' in text would be invalid XML in the first case... Is the input valid?

#2

My runs well but there is no space between debate & culture for example, on drupal in categories I have debate&culture.

#3

Can you post your sample input file?

Note that a bare '&' in text would be invalid XML in the first case... Is the input valid?

#4

Status:active» postponed (maintainer needs more info)

#5

Here is my file and thank you for your responsiveness. :)

AttachmentSize
section.txt 5.53 KB

#6

You are totally right.
Unpredicted effect there :-/

What is happening is
<name>Culture &amp; debates</name>
goes through the XML tokenizer and produces
[tag][text][entity][text][endtag]
... but (for reasons I'm sure were pretty obvious at one time) every token is being trimmed as it is found.
so
[tag][text(space)][entity][(space)text][endtag]
becomes
[tag][text][entity][text][endtag]

:-/

Here's a quick attempt at a fix.
I'm falling asleep here, so may have made a blunder, but it does attack the symptom

AttachmentSize
taxonomy_xml-entity_trimming-413328-20090331.patch 1.33 KB
xml_format.inc-patched.txt 7.12 KB

#7

Hello dman,
I have tried your patch, it resolve " & " issue but have big side effects.

here is the result of my test : (by importing section.txt uploaded by sisi12 (comment #5) )
- an empty term is added
- term hierarchy is broken : there is no parent term for imported terms

#8

it seemed to work OK for me. i modified it slightly to change the assignments but i'd be surprised if that made much of a difference. i verified that it works against the 2.x-dev release as well. (i did get some strange behavior importing the section.txt file involving node types but that's a different issue.)

i also found that having ampersands in your synonym names causes problems - the export function wasn't escaping entities in the synonyms tag value, so the parser coughs on them when it tries to import the result. i ran the list through check_plain() and that seemed to fix it. i'm not sure the full implications of that, but that's what it was doing for the other tags so it seemed safe enough.

AttachmentSize
xml_format.trim_.patch 740 bytes
xml_format.synonyms.patch 435 bytes

#9

I'm having problems also related to & - here's a file I've exported- it refuses to reimport it.

AttachmentSize
taxonomy_auto_created_vocabulary.xml_.xml_.zip 4.84 KB

#10

#8 works fine for me

#11

When using the patch in #8, you can't import a hierarchical vocabulary anymore.
I made another (standalone) patch, which fixes both problems.

AttachmentSize
taxonomy_xml__trim.patch 461 bytes

#12

Status:postponed (maintainer needs more info)» needs review

The patch in #11 causes every term to end with a newline character which screws up tag matching.
This new patch that will just trim every whitespace character except a normal space, simple but effective:

<?php
       
@$_tx_terms[$_tx_term][$_tx_tag] .= trim($data, "\t\n\r\0\x0B");
?>
AttachmentSize
taxonomy_xml__trim.patch 341 bytes

#13

I'm seeing this problem as well.
Except that when you export a taxonomy, terms in synonyms that have an & do not get encoded as &
When importing it causes an error when it hits that &..

(i'm also seeing 'term & end' gets imported as 'term&end')

nobody click here