Use spaces for free tagging (usability)
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | taxonomy.module |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | won't fix |
The free tagger in taxonomy.module requires that you use commas to separate terms. IMO this is a rather serious usability issue, as virtual every other tagging solution out there uses spaces. At the very least this will lead to annoyance or confusion for new users.
The attached patch changes this in a way which should please nearly everyone. The system now accepts both commas and spaces, though the commas are now undocumented (to keep things simple). This is acceptable because we can expect terms never to end in or consist of only a comma.
The only situation which changes is that the form foo bar baz is now three tags, instead of just one. To use spaces (or commas) inside a term, you must surround it with quotes, e.g. "foo bar" baz.
Note that this is automatically backwards compatible, as the syntax is only used to edit tags. The tags themselves are always stored individually. So, when you go edit an old node, you automatically get spaces instead of commas.
| Attachment | Size |
|---|---|
| tagging.patch | 3.8 KB |

#1
#2
Because I don't use any of those other tagging things and have only used Drupal, I'm used to commas. So therefore when I enter a tag string as:
yes, no, maybe, maybe not
It's confusing to me when the tags end up:
maybe | no | not | yes
It would be great if the regex could be "smart" enough to figure out which method I'm using (comma-separated or space-separated) and adjust accordingly, and support either spaces or commas so both "types" of people could be satiated. I use a lot of multi-word tags and it's going to be a rather vexing transition to double-quote them. :(
However, if this is the "norm" then I guess I can adjust. :P
#3
This is a tough one. Drupal's approach makes multi-word tags a LOT simpler than other sites like delicious or flickr. However, pretty much every other site has moved to space-separated-tags with double-quotes being used to surround multi-word tags. On a couple of the sites I run I regularly have to clean up after users who enter in tags the way they're accustomed to, creating one gigantic mega-tag in Drupal.
#4
What I was thinking is something like:
- Are there commas in this tag string?- If so, are they enclosed in double-quotes?
- If so, separate by space.
- If not, separate by comma.
Then we satisfy both "camps" :)
#5
I developed the free tagging system for Drupal.
The use of commas is deliberate and these same concerns were weighed when the patch was originally submitted (certainly check the development list archives) - back in the mean old days of the original folksonomy implementations, there was never the ability to use quoted words - it was ALWAYS one word per delimiter (and because of this, the type of delimiter used had far less weight or literal importance). This started causing all sorts of fun hacks in an attempt to work around the phrase issue (and these have evolved into their own meta-language, with stuff like "people:unconed food:banana" -- native support for this syntax, again in a more advanced way, was also considered as part of the original patch). The fact that people are used to using spaces is a legacy-support issue, not a usability one - it is far more natural to treat comma delimiters as a list, not spaces -- the very idea is built into the English language. Commas almost never delimit a set of words and, when they do, it is in a naturally occuring, and readily recognizable, form (such like "Michael, Jr", "Concord, NH", or "Drupal, LLC").
If we were heavily concerned about comparing our free tagging to "folksonomy" systems (which, if you read the original development threads regarding this feature, our folksonomy is /an entirely different design/ than most folksonomies, which are "owned" and user-specific vocabularies), then we would have named it "folksonomy" and not the more generic "free tagging".
It'd be interesting to explore some sort of Eaton-friendly logic:
if tags are len() > 50, then space-spliced, else comma-spliced
#6
Having a check to see which separators are used is not a good solution, because the syntax is only used for editing. When you save the node, the tags are split up, and only their content is saved. When you go back to edit the node, Drupal reassembles the string from the tags. At that point, we don't know what syntax the user used. For example, this procedure also removes unnecessary quotes around tags.
Yes, this goes against the principle of preserving user input as typed, but it's hard to avoid without coming up with a completely different UI, or saving the terms string as-typed somewhere.
#7
Ignore that patch, it got swiped from another issue I was submitting :/.
#8
Time to resurrect this with some *real* user data.
First off, I agree with Steven, et al.... using *spaces* is becoming the standard. Why oh why, should Drupal do this differently? We should not fight everyone else, it leads to tons of confusion for our users.
Second, I like the idea of supporting both spaces and commas, but we should add semi-colons too if we're going to have a flexible system.
Third, I like webchick's idea -- if there are spaces, use quotes, if not, then break only on the comma or semi-colon.
That would solve this problem, IMO.
Here's why... I'm running some very large sites, with user databases in excess of 500,000 in some cases. The tags that people enter would drive you nuts!!!
Breaking up tags by spaces and semi-colons and commas (one of those 3 systems) would fix nearly ALL edge cases. The only edge case it doesn't fix is the user that enters "this review is about my cousin bob" -- the ones that things tags are sentences, but in that case, if we break on spaces, you have a bunch of those words which then eventually sink to the bottom, instead of some huge ass string which causes some oddities.
#9
Sorry, did not see Steven's remark about commas verse spaces.
I do agree with that point -- what I mean, is we should all spaces, commas, or semi-colons for input -- like Steven's original patch, and break on either of those 3 conditions. When you go back to edit node, of course only spaces will be shown, which makes 100% sense.
I'll roll and updated patch in a few.
#10
Ok seems like this patch still applies :-)
The only changes I would advocate: catch users that enter comma and semi-colon separated lists and process those correctly. But upon edit, only show them separated by spaces as it should be -- that would provide maximum amount of usability but keep us consistent with other systems specifying "space" instead.
#11
I think webchick's logic in #4 might work, but otherwise, I agree with Morbus that separating spaces is unnatural and that commas feel good. If we really wanted to fix it and remove ambiguity, we'd have users add tags one at at time (Flickr style) using AJAX if available and CCK-like multiple fields otherwise.
#12
I think a flexible system that allows either spaces, commas, or semi-colons is the best system. However, we should promote spaces as that is the standard but still be flexible with input.
Additionally we need to lower case inputed tags too for consistency like other systems.
#13
*Need* to lowercase? What the frell are you talking about? Why would I want "Concord, NH", "Business Name, LLC", "eBay" and any number of other legitimate capitalizations munged by software that doesn't know any better?
#14
I agree w/ Morbus, I have to say... both on the commas and the "Leaving Stuff the Way I Typed it." However, that's because my use-case for tagging is a convenient way to categorize things that I've written myself.
Social networking sites with thousands of users tagging various bits of content, on the other hand, want it to work the other way. They want "Apples" and "apples" and "ApPlEs" and "APPLES" to be treated the same by the system, and to only show one version (probably all lower-case, so it's consistent). They also want the tagging mechanisms to mimic those in other popular social networking sites such as flickr, which means separating by spaces. And I can see that use case as well.
Sounds like we need:
- A means to store the string as it was originally entered - another table?
- A "Smart Parser" that can figure out whether said string was separated with commas, spaces, or semi-colons.
- (if it doesn't already) A tagging system that treats entered tags in a case-insensitive manner. I'm trying to envision a use case where I want "ebay" and "eBay" to refer to separate things, and can't really think of one.
- Make sure there are proper classes with which to override the display of tags, both on nodes themselves as well as in autocomplete listings, so they can be made all lowercase or uppercase or whatever suits the purposes of the site.
This also sounds like we're WAY over and beyond what we can get into 5.x during a code freeze, so bumping to 6.x. Sorry. :\
#15
Um, free tagging is already case insensitive and case preserving. Whichever casing is used when the tag is created, will be used from that point on.
#16
I'm with John Gruber on this one, taken from http://daringfireball.net/2006/11/stikkit
#17
This is almost certainly a dead issue because we're so late in the release cycle, but as long as we're pulling out quotes from bloggers, I'll cite Jakob Nielsen.
Like it or not, the use of spaces to separate tags -- and the use of quotation marks to indicate fully quoted strings -- is emerging as the 'standard' form of free tagging.
At worst, a configurable 'how should tags be split' setting in the taxonomy module would help. People trying to build high traffic sites with Drupal need to be concerned about that stuff, and they're not likely to dive in and hack taxonomy.module to make it happen. But again, that's likely not an issue for 5.
#18
Does "usability" equal to "common practice"? I don't think so. Regular users tend to dislike the idea of one-word-per-tag, and that's right. I really doubt that using quotes is easier than currently used tag separators; it will inevitably result in missing or misplaced quotation marks, which adds more headache to the web site's admin. You'll have hard times explaining to your users that "bloggers all over the world prefer this" (btw, blogger.com uses commas), and I don't think that Drupal is blog oriented.
In real life, you don't enumerate things with quotation marks and spaces, isn't usability all about gaining more by learning less?
#19
I agree with you, umonkey.
Here may be a case of other people are wrong and Drupal is right.
I have Delicious installed as an "extension" in my Firefox. Delicious is so great, hu ?
I click on my "Tag" in my Firefox menu. I have a field for tags.
Next to it, here's the frikin instruction : "space separated".
I often want to use 2 words for one tag, so I don't even know how to enter multiple-word tags in Delicious.
Talk about usability.
I agree with people who think it's more natural to use "comma" to separate ideas... not space.
Also the use of quotes have always bothered me. There's a language issue. In french (I am french) we don't use the same quotes as English people do. So if I were to use quotes, I would think :
Can I use this : «
Or that : "
Or this one : '
Or that one : “
And you need two quotes for one tag, to begin it and end it, so do my quotation marks "match" ?
So Jakob's Law of the Internet User Experience is users spend most of their time on other websites.
Jacok Nielson probably has attention deficit disorder and thinks everyone is like him. Reading him makes me want to reach for Prozac. He's right about many things but he's so, very, extremely negative. People who write books such as "Don't make me think"...! arggghhh It's so condescending! For Neilson, people are dumb. They don't want usability, they want to browse the Net half-asleep and are not interested in reading nor learning anything. And... ok that felt good ;)
#20
If people as smart as Morbus Iff had written the "popular" sites in question, they'd be doing it the right way (which, I believe, Drupal already does).
#21
For the web? Where you only control YOUR web site, and other developers control the broader environment in which your site lives? Yes. If everyone else does it one way, and you do it another, you will confuse your users. unless you have a compelling reason to do it your way -- something more compelling than a strong opinion about how you prefer things -- then you do it the way other folks are doing it.
Honestly, I don't care that much one way or another regarding spaces or commas. But suggesting users who expect things to work in the expected way are 'lazy' and stupid... well, that's not exactly compelling. ;-)
My headaches come because my users come on, are used to delicious and other popular sites, and enter long litss of tags without commas, generating huge and unusable 5-word taxonomy terms. I think a behaviour toggle would be nice, at least at the site configuration level, but so be it. Maybe my users are all just stupid, as you suggest. :-)
#22
Drupal does not tell the user how to separate tags. There is a "help text" property for vocabularies, use it. If you don't give your users a clue, what else do you expect? Without a hint they can only try and see if it works. The absence of this hint is a much worse usability problem of your web site.
I didn't write about someone's mental abilities, but there are people who use delicious and the likes, and the ones who don't (the vast majority, I think). The latter tend to expect things to work the natural way, and they don't care about what "other developers" think.
Also see comment #19, in some languages quotes are different. In some languages you have to change the keyboard layout to English to type a quotation mark, then back. If a user has to press five times more buttons, there's a chance she'll just give up (and I can't say I wouldn't).
Finally, does ["a b" "c d" "e f" "g h"] look as good as [a b, c d, e f, g h]?
I don't think there's much to discuss. I'm against this change, if my opinion is to be considered. And I'm absolutely against adding a configuration option for that. I want Drupal to continue getting leaner and meaner, I don't want it to suffer from the checkbox hell.
#23
I'm okay either way on this, and think if this issue is so split, then one more checkbox in a taxonomy settings page would not send us into checkbox purgatory, but rather give site admins more flexibility. The 20 lines of code for this setting won't make us re-adjust to the next notch in our belt either IMO. :P
#24
-1 for switching commas to spaces altogether I agree that this was probably originally done at delicious and so on because it was easier coding-wise (no hairy %20 in URLs to worry about), not because of any grand master usability plan.
-1 for a configuration option. You wouldn't want this site-wide, you'd want it per user, which means adding another widget to the user profile section that's already got all kinds of stuff in it.
+1 for a smarter regex that works along the lines of #4. The only bad thing about this option is it requires storing the originally entered string somewhere; however, we don't filter anything else on input, so it doesn't really make sense to do it here either.
#25
-1 to get rid of commas completely - I agree with webchick. It should be possible for drupal to figure it out. Also, I can add a good example where tags use commas - Google. Otherwise, it is just impossible to add multi-word tags.
+1 for the "smart" system.
just to add my 0.02
#26
I like commas based on what I'm used to (Google reader).
#27
I'm coming into this late as I just saw it mentioned on IRC. I am working on a site that will have a bunch of free tagging. Since I don't have users, yet, I can't speak to what they are used to but I know space separated tags drive me nuts on other sites. If this gets changed I, as a web admin, definitely want some sort of option or regex. Having Drupal figure it out sounds ideal but, if that can't be done, +1 from me on a config option. It doesn't need to be per user as (webchick?) said. Site wide would be fine by me. I'd set it to commas and put that in the help text for the field.
I love that Drupal does comma separated and am against any change that doesn't give the option to keep using commas.
Michelle
#28
I like commas the way they are now. I don't want two words to be two tags when I mean they should be a single tag.
Space forces a "one word, one tag" paradigm, even if we leave comma as an undocumented feature.
So, -1 from me on the functionality.
#29
Autopatch Results for tagging.patch:
patching file modules/taxonomy/taxonomy.module
Hunk #1 succeeded at 167 (offset 23 lines).
Hunk #2 FAILED at 656.
Hunk #3 FAILED at 772.
Hunk #4 FAILED at 1430.
Hunk #5 succeeded at 1487 with fuzz 2 (offset 71 lines).
3 out of 5 hunks FAILED -- saving rejects to file modules/taxonomy/taxonomy.module.rej
#30
+1 for using spaces over commas.
Power-users of Del.icio.us and flick have gotten very used to using spaces over commas, and this is an important usability issue that I've hacked core in the past to get around.
#31
I'm in support of comma delimiters.
Far too often have I had to add a multi-word tag to systems like last.fm or Slashdot and either have it split up or squash the words together. I was hesitant to try out free tagging at all because I thought it might work the same way and was relieved to find it used commas.
Even the "unsightly %20" argument doesn't have weight as spaces are encoded by a clean "+" almost everywhere now. So if there is to be a configuration option, I'd wish one of the choices to be "ignore spaces as part of tags, separate only by comma" - the other two might be "separate only by space" or "separate by space if no commas exist in the text"...
Oh, and of course the help text absolutely has to explain whatever format is being used, whatever else happens.
#32
I like using commas, but I understand the sought for spaces. I suggest we introduce a setting to control this.
#33
massive -1 to spaces and forcing to one tag per word by default, the tags we get are bad enough to clean up as it is, this would kill us; and if you use youtube, the way it seperates out tags makes lots of them completely useless to filter by. We're a medium sized site, but we're not free-for-all social networking and don't use tags like them, it's just a convenient way to categorise and list content.
Like Arancaytar said, an admin option would be fine though, and the three options seem reasonable with some documentation and a sensible default.
#34
As Steven has pointed out in the original patch comments, the patch does not change the data or database table itself. It only changes the way newly added tags by users are processed.
Is it possible to provide a toggle for the admin to choose a delimiter method, whether it be comma, space or any other delimiter?
#35
Definitely should be. I just wonder what the default should be. Smart separation sounds best to me, actually: You can usually rely on it. "apple banana citrus" are probably meant to be separate tags, "apple banana,citrus" is meant to group "apple banana" and not "banana,citrus". I doubt that any keyword could conceivably contain a comma, so a single comma should cause the delimiter to be switched to commas.
I suggest allowing quotes to escape spaces though, because using only a single, multi-word tag would otherwise be impossible.
Can this be patched in time for the freeze?
#36
Addendum - re "any other delimiter".
Is this worth the extra coding and extra form field? I can potentially see a border case where someone would want to use ; or even / if it gets really crazy, but beyond that it sounds like overdone flexibility to me...
#37
"I doubt that any keyword could conceivably contain a comma"
Smith, John
Marx, Karl
Lincoln, Abraham
would have to allow for "Smith, John" to enable that, but it's definitely a valid use case. We had a non-free-tagging vocabulary in the "Smith, John" format, and converted all the terms by hand to "John Smith" to avoid user input errors when we changed to free tagging.
#38
Whoops, indeed. I forgot that part (and for a sentence that contains "I doubt that ... conceivably ..." I deserved no better.)
So, allow for escaping quotes in all cases. Incidentally, what should happen if
Smith, John(without quotes) were to be entered when the mode is set to "separate by spaces only" - this?-
Smith,-
John#39
I don't know, but that's just given me a headache even thinking about it!
Probably it should strip the comma? Spaces only I think you'd get people used to drupal's system consistently using commas due to habit.
"Smith, John" would still work right?
#40
Yes, perhaps non-alphanumeric characters should be allowed within words but trimmed away on the ends.
Smith,Johnwould have to be parsed as one single tag in "spaces only" mode, though - no way around it, otherwise the option makes no sense. Still, because this may be prevalent, I suggest that the default setting be to use commas if they exist (outside of quotes) and to fall back to spaces otherwise - ie. the "flexible" mode.#41
@all: Ever logged on to Windows with CAPS-LOCK enabled? You'll get a nice speech balloon telling you that your password could be wrong since you're typing all upper-case. It's not a popup window and it's not preventing you from typing further, but you'll immediately recognize it.
We are talking about usability and misinformation here. We all agree, that commas are better than spaces. So why stick to spaces at all? The space is not the cause. Instead, a user just needs that important info. Let's use jQuery to attach a nifty DIV with a notice, clarifying that tags are separated by commas, if a user types '
foo' instead of 'foo,'. And hide that notice on leaving the field.The required regexp could certainly be enhanced to do not show that help/notice if a user typed '
"foo "'.We are doing similar things with the new user password field validator, so why not also here?
If you like to, I'd be open to design such a speech balloon.
#42
sun, that sounds like a great idea to me. It's also a usability improvement that doesn't change existing functionality so would be great for D6.
#43
OMG. That would be so incredibly annoying!
I frequently use tags like "Google Summer of Code," which means that basically all of my posts would get this silly notice, when I know already very well know how tags are entered. No thank you.
A warning like that should only pop up if it's at reasonable to expect that someone has made a mistake. "foo " is no way of indicating that someone has made a mistake. "foo bannana monkey dishwasher" might be, but how does the code differentiate those four words separated by spaces from "Google Summer of Code" which is intended?
#44
I've argued for space-delimited tags in the past, but it appears more and more services are using commas -- it's no longer quite as confusing. In the future, being able to switch the delimiting behavior on the admin side MIGHT be useful, but I don't think that it's as big of a win as it might have been a year or so ago.
I'm in favor of keeping the current behavior.
#45
also, I would read any content tagged "foo bannana monkey dishwasher".
#46
That's a good point. Given Eaton's comments, if other sites are using commas again now, there doesn't seem to be any reason to change the current behaviour.
#47
To be revisited in D7.
#48
Pheeww! Lots of strongly held beliefs in this thread.
All I want to contribute is that I started entering one word tags separated by space and it wouldn't let me. I was surprised, a bit miffed, and disappointed that I have to "read instructions" instead of doing things intuitively.
Really, speaking strictly from a user's point of view, if I enter one word tags, Drupal should not care one way or other whether I enter a space or a comma between tags. So why penalize me or annoy me with an error message?
From the developer's point of view, Drupal cannot guess which way the user is going. I suggest a simple check: the presence of at least one comma would tip Drupal off whether the user is separating by comma or space.
I don't consider embedded comma in a tag such as Concord, NH as a valid tag. People searching for Concord, NH will most probably type concord nh or concord "new haven" or concord, new haven, i.e. 2 tags instead of one tag with an embedded comma.
Standards evolve, depending on the popularity of the sites. If an immensely popular site uses comma to separate tags, people get used to it and get upset if other sites do not follow. If, instead, space rules, then ditto.
It would be great if D7 could handle both space and comma as separator. Or, at the least, give the Admin the choice to decide which way to go (after all, users do not know or care we are using Drupal behind the scenes).
Anyway, my 2c worth.
Love what you guys are doing!
vuxes
#49
Personally I think the best way to handle this would be to handle both transparently (this may or may not conflict with something I said earlier in the issue).
"Concord, NH" is valid IMO - especially if it's "Lincoln, Abraham", but it only works now if quotes are used anyway. I don't see any reason why it couldn't also work with space separators - just a slightly more complex regexp.
#50
Heh, long-running thread resurfaces :-)
my $2 :
-1 on spaces.
Commas are there to break up lists. It's what they do.
Spaces separate words, but not concepts.
Folksonomy tagged sites like youtube and flikr made a mistake in the early days and now have fragmented, unusable tags or munged comboWordsToMakeAPhrase. They are wrong and have probably regretted it. That's no compelling reason to copy them.
If I tag a blog post [The Big Day] I do not want the system to think I mean [The, Big, Day]
Meh.
#51
nth'd, dman.
Second-guessing the user's input is silly there, regardless of whether it results in warnings or silent re-interpretations of your input. At most there could be a settings option, but even that strikes me as somewhat over-engineered. Half the community tagging sites I know tag with comma separation without a big deal being made of it. Just make sure to document the expected format underneath the text field so nobody gets surprised if they're used to something different.
#52
What was I thinking, dman hit the nail on the head - I really, really hate this "every" "word" "is" "a" "tag" behaviour so must have been going mad on March 1st.
I don't see any reason this couldn't be done in contrib, so going to won't fix it.
#53
YouTube uses spaces for tagging. The quotes can control it, sometime free tagging is used as key words.
Also, comma is not the same for other languages, in my language, the comma is written like ، not ,, and free tagging does not recognize ، .
So I think spaces will serve more than comma!
#54
@Marat: Your comment suggests a localization of the separator more than a change to another separator. I believe spaces aren't used in some Asian languages, so what do we do about those? I think the comma is fine for English as it makes semantic sense and is easily understood. I think we should just consider localizing this as an option.
#55
Please open a new issue for localizing the separator. If this were configurable in some way then we wouldn't have to agree, right? You could have spaces, Bob could use commas, and I could use badgers. Should become a new issue, though.