With recent changes to the strings in the HEAD of core, I've been wondering -- as I've mentioned before in IRC -- about our potential to capture feedback regarding the "readibility" of certain strings.
To provide background, I recently pasted an aggregate of all the help texts (some 6,300 words) in Drupal 6 into a "readability index calculator" to determine where it fell on the Flesch-Kincaid scale.
The results:
Result
Method used: Flesch-Kincaid (English).
Flesch-Kincaid Grade level: 14.
Flesch-Kincaid Reading Ease score: 24.The Flesch-Kincaid Reading Ease score indicates how easy a text is to read. A high score implies an easy text. In comparison comics typically score around 90 while legalese can get a score below 10.
The Flesch-Kincaid Grade level indicates the grade a person will have to have reached to be able to understand the text. E.g. a grade level of 7 means that a seventh grader will be able to understand the text.
The Flesch-Kincaid index determines this score using a formula with inputs including the average number of words in a sentence and the average number of syllables in a word.
I am not a translator, but I suspect that translators are in a unique position to provide feedback about which original strings are the most difficult to parse, understand and translate. The localization server already has an interface built around the individual strings, will likely be deployed in some central place on d.o. infrastructure, and could possibly provide the additional service of recording translator's "votes" for difficult-to-translate strings. Essentially, the strings receiving the most votes for "difficult" could then be reviewed to determine a way to increase their clarity.
I thought I would attempt to implement such a feature, and would appreciate any feedback before I go too far down a wrong path.
- I do not yet know the best interface for this feature. I've attached a couple of mockups that represent some very, very basic "playing around" with this idea. (They are very rough.)
- I do not really know a way to quantify how much additional load this would place on the localization server (to respond to and record "votes" for difficulty), since as far as I know, this is not a feature we currently have.
- I wonder if the ability to flag a string as "difficult to translate" should be language-dependent? In other words, I would expect there to be at least two potential uses for such a flag: (1) the string is badly written and hard to understand in English, and is therefore hard to translate into any language. (2) the string is written ok in English, but its particular form or structure makes it difficult to translate into a specific language. (Thinking about this second possibility is why I constructed the mockups the way I did, as an extra field per each field relative to a specific language.)
Any thoughts would be greatly appreciated (even if the thought is: "I don't think this feature would be very useful.")
| Comment | File | Size | Author |
|---|---|---|---|
| l10n_server_mockup2.png | 18.49 KB | keith.smith | |
| l10n_server_mockup1.png | 18.61 KB | keith.smith |
Comments
Comment #1
gábor hojtsyWow, first it is good to see that the UTF buttons work for others too. (I was a bit worried about this, but this is not actually related :)
What's related, is that another use case here is things which are impossible to translate by their nature. Update module in core contains t('ago') for example, which is, well, not really a translatable word as-is. Even if it would be t('@time ago'), we could not provide a Hungarian translation (maybe other languages are lucky in this :), so this does not map to good translations. So flagging these kinds of strings is also a possible use case.
Since you brought this up on #drupal, I was also thinking about this. One thing which also popped up before is to build in the possibility to be able to comment on translations, or save a comment with a suggestion. See http://drupal.org/node/196878 I thought that maybe we can somehow find a crossroads here. Or maybe not :) So comments might be towards the author of the original string (eg. "'ago' is not translatable as-is"), which would provide some context instead of just getting a flag. However, these comments are supposed to be used to facilitate some discussion among the translators originally, so we might not be right, if we provide two types of comments. It might be overkill.
And one last thing, is that we have support for suggestions, which lead to a translation, and we also save all previous suggestions and translations for a string. So we can see whether particular strings had an extraordinary number of translations or suggestions before they were settled on something which fit the team. This is one way to get some information from the system without specifically asking for it from the translators. This could be supplemental however, as it does not identify strings which the translator was staring at, then walked out for a dictionary, checked some words, and entered some translation, which nobody was brave enough to modify later, as they did not understand the original text either, and did not care that much :) (That's also a common use case: "if it is translated, why touch it again" :)
This is all just to help the brainstorming, I think this is a good suggestion, and we have a good opportunity to discuss it.
Comment #2
gábor hojtsyOne more use case: strings with multiple possible translations. Take tracker and audio module. Tracker lets you t('track') users, while audio module displays t('track') information. Now how does a translator know whether "track as a verb" or "track as a noun" is to be translated Oftentimes, these kinds of errors are possible to fix by finding better wording (although sometimes it is not that easy). Obviously this is a limitation of our current system, where we can only translate one set of chars to another set of chars, regardless of the location. This might be solved later (or not), but until then, these kind of cases are also valid for marking something as hard to translate.
Comment #3
psicomante commented@Gabor
Same situation in Italian. View (db view on views module) is translated differently than View as a verb. Can be solved by creating a multiple translation of the same string that live together?
Comment #4
gábor hojtsyPsicomante: again, let's not try to solve that problem here, it is a huge can of worms, and has nothing to do with localization server, unless Drupal itself supports it. That discussion belongs to the Drupal issue queue not to this module.
Comment #5
psicomante commentedSorry Gàbor, i didn't mean the 'Drupal translation' in drupal tables, but the translation in l10n_server. It's more difficult to translate a multiple-meaning word that a long sentence.
I didn't know that the same string was translated in the same way also in different modules. I'm sorry.
Comment #6
gábor hojtsyPsicomante: we use the same rules as Drupal. If views module has the view noun and some links have the view verb, it still needs to use the same translation in l10n_server, as even if we support different translations, it has no point until Drupal would take advantage of it. Drupal can only support one translation per language for one string. So this first needs to be solved in the Drupal issue queue, and not here. Unrelated.
Comment #7
keith.smith commentedBumping this back up to the top of my tracker, after seeing a couple of issues that remind me of it.
In, http://drupal.org/node/206685, a by design core issue, it never occurred to me (I haven't looked, but I'm guessing I modified that string) that "filed in" is very close to "filled in" especially when it is next to a checkbox that is -- coincidentally -- filled in when you want to indicate "filed in". Say that twelve times fast and then translate it.
In http://drupal.org/user/16678, though my Swedish practically nonexistent, I imagine>/em> this is a discussion regarding "actions" and "triggers", both of which are likely to be candidates for "translation questions".
I'm not sure that I have the php street cred to do this yet, but I'll give it a shot unless someone beats me to it.
Comment #8
zirvap commentedI assume you mean http://drupal.org/node/203710. And yes, that is a discussion about translating "actions" and "triggers", but I don't think it would qualify for a "difficult to translate"-tag. Sometimes you need a discusson on how to translate a specific word, but that doesn't mean it's hard to translate, just that a decision has to be made.
However, I do agree that this would be a useful feature, but I'm not sure it would be useful without a comment field to explain what the problem is and how it could be solved. Maybe a string needs to be split up, or several strings combined, or one string might need to be translated in several different ways, depending on the context (for instance the "View (noun)" versus "View (verb)" problem).
One use case I just stumbled over (from Organic groups):
"Your !prof is configured to: Always receive email notifications."
coupled with !prof = "personal profile".
In Norwegian, I really, really want to say "Den personlige profilen din er konfigurert til: ...", which is more or less equal to "The personal profile of yours is configured to: ..." But I can hardly translate !prof to the Norwegian equivalent of "The personal profile" without knowing that it will always appear at the beginning of a sentence, or that the Norwegian suffix "-en" (which is equivalent to "the") will always be appropriate.
Comment #9
gábor hojtsyzirvap: I hope you filed that bug for Organic Groups :) If !prof could be a few more things (a pre-defined list, not user defined), they should have different full strings for the "few more things".
Also, it seems we are moving towards having a generic comment field, and marking the comment for the translation team or for the project author, instead of having just a marker for "difficult to translate". Or we can have a marker and an option to provide more info. If only the marker is used, we can provide a preset comment aimed at the project author. Makes sense?
Comment #10
zirvap commentedDone, thanks for the reminder :-)
This solution makes sense to me. But while the project author should only be notified of comments aimed specifically in his/her direction, the translation team should probably see all comments. In most (all?) cases, it would be useful to ask the translation team for brilliant ideas on how to translate the problematic string, while waiting for the project author to solve the problem for good.
Comment #11
gábor hojtsyHm, I thought about this some more yesterday, and come up with the following inspired by the above. What about providing a "submit feedback to project author" (or somesuch) titled link by the source text, which leads to the issue queue of the project at hand (given that the project has a drupal.org home link in our DB)? :) After all, we would like to provide feedback for the project authors, which is best handled in their issue queue. GHOP showed that a custom system (ie. the Google issue queue) does not help much in getting more contributors involved. This is very easy to implement.
We can still add the suggested checkbox to "mark this string difficult to translate" and somehow feed back that information as well to the project authors. But I believe that we already have a tool for textual feedback on projects, and that's the issue queue. It also helps solve the problem more quickly, instead of just sitting in our system.
Comment #12
keith.smith commentedThat's a very good idea.
Thinking about the details of that, will (the future of the) Localization Server be that it is installed centrally on d.o., and that there is only one *approved* copy of it running?
If an issue can be automatically populated with the right project, version, and other information, plus some intro boiler plate text and then the comments of the translator noting the ambiguity, and then that issue be (a) submitted to the appropriate queue and (b) stored locally in the Server itself, this could be ideal. By storing it locally, if additional translators flag the same string as troublesome *after* the initial report, the Server could "bump" the issue it created. And, we'd still be able to produce statistics on the most troublesome phrases.
If there are multiple Localization Servers running though, there would be a possibility for duplicate issues (presumably at least one per Server). This is no big deal in that someone can easily mark duplicates (or the "reporting" function creating the issues can be automatically disabled on non-d.o. Servers). In some ways, it might be better for all _automatic_ issues to be created in a Translators issue queue, or something, and have someone monitor that and parse the issues out to the individual projects (after dealing with duplications and bogus reports). In the case of strings that are shared between different projects (like the "view" example), how otherwise would you know what queue to send the ticket to?
Will the Server be running on a system such that people are "logged" into it with their d.o. usernames? Or would the tickets all be created in the name of some generic user, like the "patchnewbie" thing we had for a while?
I really like this idea a lot.
Comment #13
gábor hojtsyYes, the general idea is that a centrally installed localization server will be used. We cannot *stop* others from doing it, but we can discourage them. Some good things about a central service will be that it is managed, so you don't need to manage it yourself, but it will also be integrated with the drupal.org project/packaging system, so we will feed back extraction errors to project owners automatically, and we will offer downloadable packages of translations without translators ever touching or thinking about CVS. (Right now, the translations need to be exported and committed to CVS to be available generally, in the future, CVS use for translations is planned to be forbidden, to get a common workflow for translators).
So l10n_server will run on a subdomain at drupal.org. Since it will be a separate Drupal instance then the drupal.org database, users will not enjoy single-sign-on. OpenID server setup on Drupal.org is planned (depreciating the old drupal.module based logins), which would ease the login on other subdomains as well (but still does not provide a single-sign-on solution). So when an issue is submitted, we need the user to be logged in on drupal.org as well.
A translators issue queue might indeed be better then submitting reports to projects directly, but we can be a bit clever here, and submit issues to specific projects, when we know the string only appears in a specific project and submit it to the generic project, when it appears in multiples. The l10n_server could store the drupal.org issue number, which would allow it to direct translators to the issue in discussion and post follow ups later if needed.
Comment #14
joachim commented+1
As an example I just saw
's flickrwhich clearly expects to go after a username -- this will fail on a lot of languages, eg French needs to say
le flickr dewith the username after. Obviously t() ought to be used with a @user token.
Comment #15
zirvap commentedjoachim, as Gabor remined me in #9 you should file an issue in the Flickr module issue queue for this specific string, since this feature hasn't been implemented (yet).
Comment #16
joachim commentedNow I've figured out what the magnifying glass icon does, I see this string is:
flickrhood: 5.x-1.0 (1)
which is deprecated -- this raises #627840: Make deprecated modules inactive.
Comment #17
gábor hojtsyI'm still at where I was two years ago on this:
In other words, we can open an issue queue to submit these if no project can be identified. We need to somehow channel this feedback to drupal.org, so it can be handled in the issue queue and get fixed.
Comment #18
SebCorbin commented