Active
Project:
Node import
Version:
6.x-1.0-rc4
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
24 Apr 2009 at 13:48 UTC
Updated:
25 Apr 2009 at 21:55 UTC
I'm importing to a custom content type that has a taxonomy field with a tagged vocabulary (I hope that makes sense). I want the taxonomy terms to be able to contain ','s (i.e. commas) and I don't want multiple values. So when I'm asked for the multiple value separator I try ':' (colon) or '@' (at), neither of which appears in the terms being imported. However, the import splits on commas anyway and classifies the nodes with multiple terms.
I'm guessing this might be related to some of the other taxonomy issues... but I thought I'd report it in case this problem is new.
Comments
Comment #1
Robrecht Jacques commentedCan you input taxonomy terms/tags on the content type edit page (eg content/add/story) with comma's?
Comment #2
Martin.Schwenke commentedFantastically spotted! Thanks...
So, the answer is: yes, provided I (double-)quote them. So, I can do that in my CSV file...
There is still a very minor problem. When I import "A, B" I lose the space after the comma. That doesn't happen if I add a quoted tag on the content type edit page.
Any ideas? ;-)
Thanks again...
Comment #3
Martin.Schwenke commentedI've tried escaping everything I can think of but I can't find a work-around that lets me keep the spaces - they only disappear if they're immediately after a comma. They display just find in all the sample data views but from the preview import on the spaces are gone.
Comment #4
Martin.Schwenke commentedOK, it is way too late at night and I was changing the separator for the wrong field to '@' and leaving this one as a ','. So, I can now work around this...
The real problem is that the explode/trim in node_import_values() is executed even if $value is protected/delimited by double quotes:
The condition could also check that the 1st and last characters of $value aren't both '"' (i.e. a double-quote). I'm happy to provide a patch... but I'm happy to take advice on which of PHP's many pattern matching functions you prefer in your code... :-)
Comment #5
Robrecht Jacques commentedOK, so quoting the term works on the edit page... this means that probably node_import should just quote the tags it provides.
The default multiple values separator is "||" except for free-tagging vocabularies (where it is ","). Would quoting work? ... without testing ... if you provide something like
this tag has, a comma || this tag doesn't have one*and* specify that the multiple separator is "||", at first node_import would get the right terms (beingthis tag has, a commaandthis tag doesn't have onebut when node_import submits the value it will translate this tothis tag has, a comma, this tag doesn't have one. The reason why node_import does this is because a tag-vocabulary expects a comma. So that is wrong. Solution for this: submit"this tag has, a comma","this tag doesn't have one". This is a bug and needs to be fixed.Another bug you apparently spotted is that somehow
"A, B"value is translated to"A,B". Need to think/investigate that one a bit more.This is unrelated to the other taxonomy bug reports, so keeping this open. Interesting exception case to keep in the SimpleTests I'm writing now...
Seems you are not from Holland/Germany if posting this was late at night even if the name is a hint towards those countries. (I'm from Belgium myself)
Comment #6
Robrecht Jacques commentedOK, maybe a reaction on
If you would provide
this tag has, a comma || this tag doesn't have oneand the multiple separator is||we end up with two values:this tag has, a commaandthis tag doesn't have one. The fix of the the bug you're seeing, I've explained above: just make sure you quote the values you submit them (as the form element expects).If you would provide
this tag has, a comma , this tag doesn't have oneand the multiple separator is,(which is the default for tags) we end up with three values:this tag has,a commaandthis tag doesn't have one. You propose to have something like"this tag has, a comma","this tag doesn't have one"and have it parse asthis tag has, a commaandthis tag doesn't have one. This would means:The bug itself (as said before) is easily fixable... the solution in this comment (both of them, although I'd prefer the first one), would need some more work.
Comment #7
Robrecht Jacques commentedAnother additional comment: currently if you submit:
"this tag has, a comma", this tag doesn't have oneyou'd also end up with three values:"this tag has",a comma"andthis tag doesn't have one. It was for this case the two options were formulated.The bug I'd like to fix in -rc5 would be that if you submit:
this tag has, a comma || this tag doesn't have onethat you would end up with the correct values.Comment #8
Martin.Schwenke commentedNo, you won't get 3 values. You'll get 2 values and you will just lose the space after the comma in the first value. ;-)
So, by example, if $mseparator is set to ',' and you submit
"this tag has, a comma", this tag doesn't have onethen you get 3 intermediate values"this tag has,a comma"andthis tag doesn't have one. However, then when you submit for preview or import the first 2 values will be combined into"this tag has,a comma"(because, somehow a (real or implied) comma gets inserted) and you retain the other valuethis tag doesn't have one. So, the quotes still work even though node_import thinks the first 2 values are separate!I don't understand enough about the actual import process to understand why this happens... and that part of the code is abstract enough that it doesn't help me much... :-) I can't figure out where 'create' methods are set and also can't see a relevant call to implode().
I think this will be really hard to fix "properly" (i.e. without introducing special cases that break other things) and I think the best way of fixing it is to document it:
I'm guessing that people who are using node_import have enough of a clue to understand this. I used node_import for the 1st time last night and, after a bit of hacking, I managed to work out what was happening.
By the way, node_import is awesome. I really didn't believe it would work as well as it does because it is solving such a hard problem. Thanks for your work on it.
Oh, and I'm in Australia... but my parents came here from Germany... and I hope things are good in Belgium! :-)
peace & happiness,
martin
Comment #9
Robrecht Jacques commentedAha... that's why you loose the space... because the 3 values are trimmed and then combined again and then taxonomy interpret it as two values (if you have put the quotes) :-)
"A, B",C→"A,B",C→"A,B",C(because node_imports adds the "," as tags expect it).A, B||C→A, B,C→"A, B",Cshould be fixed and documented.Hmm.. what with
A " Bas value? ... need to test that :-)Comment #10
Martin.Schwenke commentedDo you actually know when you're importing into a tag-based vocabulary? I can't see anything in the code that recognises that... but I might not be looking in the right places. So if you add quotes do you potentially break non-tag cases?
"A, B"||C(or even"A, B"||"C") works fine so I think that requiring the quotes in the input is fine. I wouldn't try to make node_import cleverer than the content type edit form.A " Bin the content type edit page causes just a value ofAto be entered... so even unexpected things happen in the content type edit form. :-)Comment #11
Martin.Schwenke commentedJust one more comment... :-)
If you're going to quote the values before handing them to Drupal then you need to check that they're not already quoted in the input. So then you're stuck with trying to work out what the input means:
That's why I now think the current code is fine and the import fields really just want to allow whatever the content type input form would allow... along with a note somewhere explaining this.