Problem/Motivation
The Drupal user interface should use a UTF-8 ellipsis character instead of "..." strings to signify the omission from speech or writing of a word or words that are superfluous or able to be understood from contextual clues, or a set of dots indicating such an omission.
It has also been noted in discussion that parts of the user interface would benefit from improved legibility and aesthetics by removing many existing "..." instances and replacing with a single "."
Proposed resolution
Remove superfluous "..." instances. Update relevant policies to reflect the decision to introduce ellipsis characters formally then roll a patch.
Remaining tasks
Need a command to search for the instances that should be changed.
- Get Views to truncate HTML text by using something in core - #2279655: Add a way to truncate HTML strings without counting or damaging HTML elements (and use it in Views) - Html::truncate()
- Create child issues in some sensible scope
Done
- #2279105: Remove as many "..." and ellipsis characters from the codebase as possible without altering the meaning of text
- #2279617: _filter_url_trim() should use Unicode::truncate()
- #2279623: search_excerpt() should use unicode ellipsis characters instead of "..."
- #2279635: template_preprocess_username() should use Unicode::truncate()
- #2279681: Views' InOperator filter should use Unicode::truncate() in adminSummary()
- Policy updates
User interface changes
All "..." strings in the user interface that could appropriately be replaces with an ellipsis character will be replaced.
API changes
None
Original report by @m3avrck
Just as the title suggests, this patch replaces all '...' in core with their proper ellipse character. This is used sporadically in core (for example in pager.inc) and this patch just fixes the rest of core to be consistent. Plus the ellipse character looks more sexy than 3 periods in a row ;-)
Comment | File | Size | Author |
---|---|---|---|
#55 | drupal_48-44987-55.patch | 14.55 KB | mgifford |
#53 | drupal_48-44987-53.patch | 11.19 KB | mgifford |
#51 | drupal_48-44987-51.patch | 12.19 KB | mgifford |
#30 | drupal_48.patch | 6.45 KB | m3avrck |
#29 | drupal_47.patch | 7.17 KB | m3avrck |
Comments
Comment #1
ixis.dylan CreditAttribution: ixis.dylan commentedI'm seeing "â€" in your patch. Shouldn't that be "…" or "…", or is this some unicode weirdness?
Comment #2
ixis.dylan CreditAttribution: ixis.dylan commentedErr. I meant:
Comment #3
m3avrck CreditAttribution: m3avrck commentedNo, that weird character is actually correct :-) And no, we shouldn't use HTML entities as explained by Steven (the UTF guru) here: http://drupal.org/node/44498#comment-65823
If you apply the patch and then load up Drupal in your browser everything will look great. Reason you see weird characters is that depending on your editor, you may or may not be in UTF mode.
Comment #4
Morbus IffI disagree with this patch, but it's hard for me to say why. i see the "proper ellipses character" as equivalent to asking Drupal to support curly quotes on all it's output that contains quote (a lot). Or a proper emdash. And so on. -1.
Comment #5
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedI am all for nice typography, so I +1 this patch. However, I also believe that all output should be themable. That is, we need a theme_ellipsis function which should default to a proper ellipsis.
Comment #6
m3avrck CreditAttribution: m3avrck commentedWhy not? Already some of these in core (like pager.inc) and it just makes things look better, not to mention it improves semantics. It doesn't *interfere* with anything. Something wrong with more aesthetically pleasing and better semantic output? Note this only fixing things that are hardcoded, obviously user input is another case and different scope.
As for the curly quotes, that is another issue, that is purely aesthetic, quote verse another style quote. The ellipses issue is for semantics, 3 periods != an ellipse, different characters used, although they might look the same. As screen readers and other accessbility tools become more powerful, I'm quite sure they'll be able to make this distinction, if they don't already. 3 periods could come up as 3 empty sentences, while an actual ellipse character that comes up has meaning. Sure you could program it either way, but why not just be semantically correct from the start?
Just trying to make core more semantically correct, feel free to set to 'wont fix' or whatever you like.
Comment #7
m3avrck CreditAttribution: m3avrck commented@killes, do we really need a theme_ellipses()? It seems like overkill to me, why would you want to override this or what would you set it to? Seems like we might need theme_quotes() or theme_em_dash() to? Hmm... interesting, themable typography functions....
Comment #8
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedMaybe a theme function is wrong. Maybe it should be wrapped in t()? I have no idea if ellipsis is used say in Chinese.
Comment #9
m3avrck CreditAttribution: m3avrck commentedHmmm, perhaps t() would work, although as a UTF8 character, I don't see it as being a problem. Maybe our resident UTF8 can chime in, *cough* Steven *cough* ;-)
Comment #10
m3avrck CreditAttribution: m3avrck commentedHmm ok after talking with killes on IRC I misunderstood. Yes, I do see the benefits of using t() so any occurces of elllipses can be taken out if that country's language doesn't understand the meaning of the ellipse. As such, that character should be in the a t() string if it isn't already. This patch fixes that situation.
However, I can still see the case for the original patch and *no* t() ... but here are both for debate, hopefully one goes in :-)
Comment #11
ixis.dylan CreditAttribution: ixis.dylan commentedI understand the UTF8 situation now, so no arguments about that. However, there seem to be a lot of cases where ellipsis are used without a good reason (unless I don't understand where they should be used, of course).
"Starting updates..."
"more help..."
Do phrases like this deserve an ellipsis? English-speaking people will probably understand what the dots mean, but I think the "standard" usage of ellipsis is to indicate an ommision in a sentence or an incomplete statement. It's a minor point, but if we're taking them seriously enough to replace them with the correct symbols, then perhaps we should ensure that they're necessary in the first place?
Shortening a long URL to "www.domain.com...something.html" or ommiting words from a quote are valid uses, but using ellipsis to indicate a pause at the end of a sentence? I think we should check this first, or we may be making the text less structurally valid.
Comment #12
m3avrck CreditAttribution: m3avrck commented@leafish_dylan those are very valid comments.
That is why I introduced the last patch using t() instead. By doing this, countries that don't use ellipses can replace that character with an empty space or a different character and it'll be substituted in. I think this would solve all cases this way.
Comment #13
Morbus IffQuite frankly, "..." (three dots) is wrong, and so is the single character representation. Prenctice Hall's Handbook for Writers Eleventh Edition states that an ellipsis mark is "three spaced periods . . ." and that "for an omission within a sentence, use three spaced periods, leaving a space before and after each period. When the omission comes at the end, use four periods." Likewise, most explanations of what an ellipses is for indicate that it replaces missing words, not that it should be used as "trailing off a thought", which most of our in-core examples of it are implying (as such, I'm for removing ellipses from core as opposed to replacing them with a single character that doesn't visually indicate the space between the periods as required, or supports the spaces before an after the character). Likewise, the Chicago Manual of Style addresses ellipses thusly (while also indicating the correct form of an ellipses): "For manuscripts, inserting an ellipsis character is a workable method, but it is not the preferred method. It is easy enough for a publisher to search for this unique character and replace it with the recommended three periods plus two nonbreaking spaces (. . .)."
I also worry, in a totally unsubstantiated way, that developers who use text editors set to a different, non-utf8 character set, or which doesn't support utf-8 at all, will improperly try to transcode the utf8 mark into something else, which will show up in patches and perhaps accidentally be committed.
Comment #14
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedNow it gets hilarious, where did you get those recommendations from? Style manuals for type writers :-?
Comment #15
Morbus IffChicago Manual of Style at Wikipedia.
Comment #16
ixis.dylan CreditAttribution: ixis.dylan commentedI've seen style guidelines that recommend placing the ellipsis in brackets, which makes sense. It's quite common when they're used to strip irrelevant crap out of a quote or shorten a long URL/word. I don't recall ever seeing them spaced out, but Morbus is right (IMO) about removing them. Three dots at the end of a sentence isn't an ellipsis, it's an indication that the reader should either pause or expect more to come.
Comment #17
m3avrck CreditAttribution: m3avrck commentedMorbus, interesting points, but those apply aptly to the print world, what about the web world? How about this for the print world:
Obviously, this is impossible with HTML, hence the creation of the ellipsis entity, it won't break as a single character.
So where does that leave us? Looks like we have a print verse web issue here. But not only that, take a look at how often ellipses appear in software, from installing various products and such.
Perhaps, what is "literary correct" is not "well accepted" nor "well honored" in the tech world. Maybe that is the real debate here :-)
(ps - you gotta admit Drupal has attracted such a great bunch of developers that we can have an indepth debate about the use of 'ellipses' in core ;-))
Comment #18
Morbus IffWell, obviously, the way to duplicate it is with ". . .", and in some of my research earlier today, I saw various documents (nothing official or authoratitive, however, which is why I didn't include them in my previous comment, but the nutshell was a discussion about the ellipsis and emdash, and how misused ellipses were taking the intent away from the emdash) suggesting just that approach. I've found no authoratitive documents suggesting that the single character entity is acceptable however.
Comment #19
Morbus IffHere was the non-authorative I was talking about in the previous message: "The real problem with ellipsis, though, is whether to use three periods (which risks breaking an ellipsis over a line break) or a single-character entity… like … at the end of the preceding phrase. All modern word processors support the latter, and it's also supported in Unicode and extended ASCII. However, that support is (to use the technical term) butt-ugly: it's invariably too narrow in commonly used typefaces. One solution is a "virtual entity," such as . . ., but this causes problems with some browsers (especially for those not using a US-ASCII default). Here's a plea for web-font designers to expand the three-dot ellipsis to a reasonable width, probably two ems (and at least 1.5 ems)." Here, the commenter is complaining that the single entity ellipsis is actually too small to facilitate recognition.
Comment #20
Morbus IffAs for "literary correct" vs. the online world, that's an age-old (as old as it could be, at least) debate, ranging from chat rooms, to email, to emoticons and abbreviations. Largely, the opinion is that technology is ruining language and grammar. Hell, I wrote about this back in 1998 - it's certainly not a new argument, and the use of "txting" and cellphones and SMS have only made it worse.
Comment #21
m3avrck CreditAttribution: m3avrck commentedMorbus, you are correct that the current ellipse entity is indeed to small across many fonts, in most cases it should be at least double or 1.5 it's width as you have mentioned. But, the use of
. .
that won't break across lines? I mean 'non-breaking space' but I never realized that, if that is indeed the case.Regardless, thanks for the insightful information. This leaves us here:
1. The ellipsis entity is not inherently bad, it is just *too* narrow, correct?
2. Ellipses are wrongly used in my cases
So what to do? I think many places in core they could be removed, but a few they shouldn't. "Making updates..." seems like a valid place, albeit it is technically wrong, it is widely accepted and used on all installers, waiting-type scenarios.
The pager.inc uses ellipses and *correctly* uses them to indicate breaks, or so it seems.
Elsewhere? Perhaps best to just remove them? I wouldn't be against such a patch, as I'm really for *consistency* one way or the other.
Comment #22
Steven CreditAttribution: Steven commentedHow it looks is up to font designers, not us. It is semantically correct to use the proper character for it. But personally I've rarely seen spaced ellipses in full text.
As far as ellipsis usage goes, there is the common GUI convention that commands which require more information (and thus pop up a new dialog) end in an ellipsis, e.g. "Save", but "Save as…". This concept is fuzzy on the web though and I don't see how it could be applied to Drupal consistently.
There is also the common (ab)use of ellipsis for processes which take a while (like "Starting updates…", "Uploading file..."). I guess this last usage is wrong, and should be indicated with an animation / throbber / progressbar instead.
Comment #23
m3avrck CreditAttribution: m3avrck commentedSo perhaps we need a patch that fixes the ellipses in core as I have mentioned, take them out in places they aren't needed, and replace any with the possible throbber animation as needed as well?
Would that satisify all situations?
Comment #24
Steven CreditAttribution: Steven commentedAll the Ajax uses already have throbbers as far as I know.
Comment #25
Morbus IffI'm just not seeing the support and use of this in the wild:
Some further notes:
The two primary goals behind this Issue appear to be a) the single char entity is more semantic, and b) software doesn't know how to parse "..." correctly. Welp, if we change to single char ellipsis for that reason, we simply MUST do the same for quotes -- Unicode distinguishes between starting quotes and ending quotes, 201C: LEFT DOUBLE QUOTATION MARK (“), 201D: RIGHT DOUBLE QUOTATION MARK (”), LEFT SINGLE QUOTATION MARK (‘), 2019: RIGHT SINGLE QUOTATION MARK (’), which is far more semantic when compared to using the same character (") for both starting and ending.
Likewise, we also need to look at our use of dashes in core - in my investigation above, I saw FAR more sites using Unicode's 2014: EM DASH (—) then the ellipsis. As such, we should distinguish against that character, the 2013: EN DASH (–), and the innocent hyphen, and make sure emdashes are properly formatted (per the Chicago Manual of Style, there should be no spaces to the left or right of an emdash)
Comment #26
m3avrck CreditAttribution: m3avrck commentedMorbus, good examples. So none of them use the ellipse entity, interesting.
As for the quotes and EM and EN dashes, I agree 100%. If we are to fix ellipses, we should fix those as well. Looks like we need a "literary patch" to be made, huh? That would be the best semantically, IMO.
Comment #27
Zen CreditAttribution: Zen commentedlol! This is a pretty off-the-beaten-track issue :)
More fuel for the fire: From an ALA discussion [google cache - site doesn't seem to load here]:
-K
Comment #28
Zen CreditAttribution: Zen commentedheh, a nice bug - the patch in the last post was an erroneous patch I'd uploaded in preview mode but not submitted over .. 8 hours ago! Please disregard.
Anyways, I'm all for a translatable ellipsis character. +1.
-K
Comment #29
m3avrck CreditAttribution: m3avrck commentedAfter much debate here is a new patch that replaces this with
t()
and also removes a few unneeded ellipses from core, where other indications should be used, such as "etc."Comment #30
m3avrck CreditAttribution: m3avrck commentedBetter patch with a few more ellipses removed after talking with Steven.
Comment #31
joe-b CreditAttribution: joe-b commentedSince this is a discussion about semantics, might I highlight that an ellipse, which is what is being referred to throughout this facinating discussion, is a regular oval shape, traced by a point moving in a plane so that the sum of its distances from two other points (the foci) is constant, or resulting when a cone is cut by an oblique plane that does not intersect the base.
However, an ellipsis (pl. ellipses) is the omission from speech or writing of a word or words that are superfluous or able to be understood from contextual clues, or a set of dots indicating such an omission.
From one pedant to a bunch of others!
Comment #32
webchickGood point. Fixing title.
And this is the second most hilarious issue I've ever read (second only to Eaton's trailing slash broadway musical), though I have to say I did learn a lot. ;)
Comment #33
drummNeeds to be updated to HEAD.
Comment #34
catchbumping version.
Comment #35
keith.smith CreditAttribution: keith.smith commentedWhere has this issue been all my life?
(and subscribe)
Comment #36
Freso CreditAttribution: Freso commentedThis is probably one of the most important things to get into D7. Subscribing.
Oh, and also. Just because other websites aren't using … doesn't mean that Drupal shouldn't either. Drupal has a history of being on the "bleeding edge", and while this edge may not be bleeding as much anymore, we'll obviously still be ahead of other sites and software. :p
Comment #37
Garrett Albright CreditAttribution: Garrett Albright commented+1 for proper ellipses in core… yes, and educated quotes. Subscribing.
Comment #38
alexanderpas CreditAttribution: alexanderpas commented+1 and subscribing
Comment #39
Damien Tournoud CreditAttribution: Damien Tournoud commentedSo broadening the scope. It is well known that very large scope issues that change everything in one time have a lot more chance to be committed in less than a million years.
Comment #40
sunAwesome. subscribing
Comment #41
wretched sinner - saved by grace CreditAttribution: wretched sinner - saved by grace commentedsubscribing
Comment #42
David Latapie CreditAttribution: David Latapie commented+1 for a typography filter, a generalisation of this for depending on language (French requires a lot of non breaking spaces, for instance). Please tell if you know of some.
Comment #43
sunComment #44
sunComment #45
jzacsh CreditAttribution: jzacsh commented+1
Comment #46
Anonymous (not verified) CreditAttribution: Anonymous commentedFollowing on from #39 and #42, how about configurable support for ligatures in the filter?
Not very difficult to implement, would have big "wow" factor for typography geeks, and would impart an subliminal air of subtle sophistication to all Drupal sites.
Comment #47
smscotten CreditAttribution: smscotten commentedI like this too, but I'm wondering whether an output filter is the best way to make it happen. I guess that an output filter allows typographic treatments to be translated for different audiences, but the sort of thing I'd like to see is an input filter: when I submit quote marks in text, it gets translated into <q></q> tags and saved into the database, then styled with CSS.
Though as an output filter, this shouldn't be too hard to write as a module outside of core, should it?
Side note: for ligatures, how do alternate ligature characters affect SEO? If I have a site about "waffles" will search engines match that when people search for batter-based cakes cooked on a patterned iron? (Not mentioning the word spelled without ligatures so that I can search and see if I find this page.)
Comment #48
smscotten CreditAttribution: smscotten commentedForget that I suggested an input filter: http://drupal.org/node/263002
If there were a truly compelling argument for an input filter, I might want to poke that hornets' nest, but it's not worth rehashing an old argument for. Sorry.
Comment #49
mikl CreditAttribution: mikl commented#847608: Use ellipsis character (…) instead of three dots (...) addresses a tiny bit of this.
Comment #50
catchJust committed #847608: Use ellipsis character (…) instead of three dots (...)
Comment #51
mgiffordThis seems like it should be an easy fix … rather than ...
Most folks won't notice, but those who do will notice a lot.
Comment #53
mgiffordSomehow left SystemMenuBlock.php in there. Removed it in this patch as it was a leftover.
Comment #55
mgiffordHopefully this fixes the test.
Comment #56
thedavidmeister CreditAttribution: thedavidmeister commentedAwesome to see that there's a push to improve typography in Drupal, but this thread from 2006 can't be all things to all people.
There's so much going on here:
- Converting triple periods to ellipsis characters
- Discussing an auto-typography function or extending t() somehow
- Dealing with em dashes
- Dealing with left/right quotes
- Removing triple periods from some places in the UI
The scope here is ludicrous and it's sad to see something valuable like typography treated like a second-class discussion. In Drupal core issues that we take seriously, we try to get things done, which means sticking to one goal and if someone has a new great idea, we create a new issue for that. If there are lots of related great ideas, we create a meta issue (a typography meta, anyone?).
Let's bring this right back to what it says in the issue summary (that nobody has updated in 8 years), dealing with ellipsis characters. I wholeheartedly encourage someone to open new issues for each of the points I summarised above.
Since the latest patch touches comments as well as markup, we're looking at a change to policy for both our coding standards and our user interface style guide standards before we try to roll patches or we're just going to make things muddy and inconsistent for ourselves.
I don't see a discussion explaining that replacing periods with ellipses is "correct" in any of these places:
- https://drupal.org/node/604342
- https://drupal.org/style-guide/content
- https://drupal.org/coding-standards
- https://drupal.org/coding-standards/docs
- https://drupal.org/style-guide/typography
I can only presume there is no policy and any patch we commit here without one will be undone by other patches that come after it within 6 months. FWIW, the patch from #55 misses a lot of instances of "..." in the codebase :(
In the interest of reducing the scope, and also DX, and also avoiding inconsistent formatting of comments between 3rd party code and Drupa, I'd suggest we DON'T try to introduce ellipses into code comments and instead focus on interface changes only for this issue.
Comment #57
mgiffordI can confirm that there are certainly missing "..." that the patch misses when you grep it. Quite a few actually, but mostly in the comments and 3rd party code. I agree we shouldn't bother with changing that.
I'm in favour of moving the policy discussion to a new issue, although having some guidance would be useful for the patch.
There are some interesting comments in #25 when it comes to writing a policy about use of typography characters like this.
Looking for other policies like this, there are examples:
http://wiki.wesnoth.org/Typography_style_guide
Which should include how we deal with spaces http://english.stackexchange.com/questions/91653/space-before-three-dots
There are fun examples here http://www.smashingmagazine.com/2011/08/15/mind-your-en-and-em-dashes-ty...
I'd like to see this done consistently if we can. I do think it is a less important issue though.
I guess it's just part of a good style guide which should also include things like #2256367: Consistently use "website" instead of "web site" in Drupal Core docs and UI text & #950534: [policy] Consistently use "email" instead of "e-mail" in Drupal - both of which would benefit by having a policy document of some sort.
Comment #58
mgiffordComment #59
thedavidmeister CreditAttribution: thedavidmeister commentedI honestly think step 1 might be "write a patch that removes as many ellipsis as possible from the codebase, especially in comments". They're sort of awkward and don't add new meaning or extra information, and has been pointed out earlier they aren't friendly to other languages and so are a barrier to entry for ESL developers.
When we have fewer things to deal with, the policy might be easier to make and uphold.
Comment #60
thedavidmeister CreditAttribution: thedavidmeister commentedopened #2279105: Remove as many "..." and ellipsis characters from the codebase as possible without altering the meaning of text.
Comment #61
mgiffordNo problem with opening up a new issue. I thought this is what I'd done in #55 though. Just trying to take a look at removing some.
I'd really like to avoid the problem we've seen with e-mail -> email. This is a trivial change that's been open since 2006. Let's not try to get it perfect. Let's make small incremental moves in this direction and make it a policy so we can slowly eleminate all of them.
This isn't something that really has a huge impact on anything, but it's a simple annoying problem that can be solved.
Comment #62
thedavidmeister CreditAttribution: thedavidmeister commentedI'm not saying what you did was wrong or bad. I'm just suggesting we give it a new home with smaller scope. I'm happy to reroll the relevant parts of what you did in #55 and post to the other issue.
Lol, too late! This issue is from 2006 >.< that's why I'd like to get patches committed by making the scope smaller and focus on things that people are less likely to disagree with philosophically or find other problems with the patch.
Comment #63
thedavidmeister CreditAttribution: thedavidmeister commented@mgifford - I put a patch up at the issue linked in #60 that removes a bunch of "..." (without replacing them with an ellipsis). Found a few more issues to open though.
Comment #64
thedavidmeister CreditAttribution: thedavidmeister commentedOpened #2279617: _filter_url_trim() should use Unicode::truncate(), it's along the lines of part of #55.
Comment #65
thedavidmeister CreditAttribution: thedavidmeister commentedComment #66
thedavidmeister CreditAttribution: thedavidmeister commented#2279623: search_excerpt() should use unicode ellipsis characters instead of "..." from #55 too.
Comment #67
thedavidmeister CreditAttribution: thedavidmeister commented#2279635: template_preprocess_username() should use Unicode::truncate()
Comment #68
thedavidmeister CreditAttribution: thedavidmeister commented#2279655: Add a way to truncate HTML strings without counting or damaging HTML elements (and use it in Views) - Html::truncate()
Comment #69
thedavidmeister CreditAttribution: thedavidmeister commented#2279681: Views' InOperator filter should use Unicode::truncate() in adminSummary()
Comment #70
mgiffordThis seems like a great approach. Thanks @thedavidmeister!
Comment #71
mgiffordThere's been a lot of progress on this in the last 10 months!
There's a patch that needs a bit of work here #2279105: Remove as many "..." and ellipsis characters from the codebase as possible without altering the meaning of text
We still don't have a good solution for #2279655: Add a way to truncate HTML strings without counting or damaging HTML elements (and use it in Views) - Html::truncate()
We need a policy about using ellipsis characters.
We need to find where else they are being used. But we're way closer than we were a year ago, largely thanks to @thedavidmeister.
Comment #84
quietone CreditAttribution: quietone at PreviousNext commentedI chatted with benjifisher about this in #ux. He suggested adding the use of the ellipsis to the User Interface text doc in the wiki. I have added a sentence in the Style section.
And I am updating the IS to show that that work is complete. Also, changing this to a Meta to organize the remaining work.