To enable translation of user-entered strings alongside the current support for code-based ones, we should write generic CRUD API functions for the locales_source and locales_target tables.
Currently the locales table data are handled by direct queries from e.g. form submit handlers and the locale() function. These methods can be used only for the 'default' textgroup, the original focus of the locale system.
In contrib the i18nstrings module provides some relevant methods, but these obviously can't be used by core, and i18n is a very large dependency.
A first step would be to write API methods for handling locales_source and locales_target data and adapt the existing calls in locale.module and locale.inc to use these methods where possible.
From there, in separate issues we could begin to introduce localization support for admin-defined strings in core.
Parent issue
#1260628: META: Better translation management API in locale module
Comment | File | Size | Author |
---|---|---|---|
#54 | locale-api-core-update-361597-54.patch | 25.9 KB | loganfsmyth |
#52 | locale-api-core-update-361597-52.patch | 25.9 KB | loganfsmyth |
#49 | locale-api-core-update-361597-49.patch | 26.15 KB | loganfsmyth |
#47 | locale-api-core-update-361597-47.patch | 26.96 KB | loganfsmyth |
#46 | locale-api-core-update-361597-46.patch | 25.8 KB | loganfsmyth |
Comments
Comment #1
nedjoI'm working on a draft set of APIs.
Comment #2
Jose Reyero CreditAttribution: Jose Reyero commentedThis is some patch for translating user defined strings, which is quite related. It translates for now just a pair sample ones like content type title and custom menu items title.
It introduces a new function, tt(), written after the current t() which takes textgroup and a string key. It is the same idea as i18nstrings but quite simplified and uses full caching for a textgroup which should be a performance boost.
Comment #3
nedjoI was originally thinking we needed a set of APIs that could be used by t() (through locale) and also user-defined strings.
One thing I think we should avoid completely is the functionality shared by t() and tt() that insert/updates are called as triggered as fetching translations. We know when user-entered strings are being inserted or updated, so we should never have to say "if a record doesn't exist, create one on the fly". This on the fly insertion has caused perennial misuse of t(). See #361614: Make locale() read only?.
Instead, I think we need separate methods for insert, update, delete, and to fetch translations. Like we have for any other data type. The difference is that for fetching translations we're joining on two tables.
Another need is for loading multiple strings. Currently, we need a separate query for each translatable property of an object. We should at least be able to load all properties at once for a given object. This might mean tracking an optional table name and ID value in the locales_source table. E.g., for a taxonomy term, this would be the taxonomy term table and a tid value. Then we can have a query that loads all translations at once for a term.
As I started into writing locales APIs I found that they needed to be fairly generalized to cover all the existing loading calls, and started into a general read method, [#365899 ]. Not sure yet if that's the right direction or not.
Comment #4
Jose Reyero CreditAttribution: Jose Reyero commentedNedjo,
I agree about the need for such api and making these locale functions read-only. However I was just trying to move on with custom string translation beacuse this is an important step to make things translatable.
Maybe I should post this one in a different thread. Though the case is quite an example about why we need such locale api. Btw there's this _locale_get_cache function you can reuse...
I'll add some notes on that other issue about locale read-only
Anyway, do you think we can move this patch here along with the 'locales data api' or should I move it to a different thread?
Comment #5
nedjoLive from the internationalization code sprint, a first cut at a set of locales apis.
These are mostly just drafted and need testing.
Approach:
1. Introduce three new general-use API methods, drupal_load_record(), drupal_load_records(), and drupal_delete_records().
I tried to avoid introducing these general APIs, honest I did, but I found I couldn't. Without them, the code for handling the locales_source and locales_target tables was largely duplicated. These API functions may be justified by the locale data alone, but probably a lot of core could be rewritten to use them--or some refinement of them.
So I'm not positive this is the way to go, but it seems better than repeating code.
2. Introduce CRUD api functions for both locales_source and locales_target data.
These functions contain no direct queries, instead calling drupal_write_record() and the new _load and _delete methods.
Methods can be used in very simple ways but also support more complex functionality like specifying which fields to return from load functions and matching by an array of conditions.
3. Introduce a method for fetching a translation for a given string.
Here either a single language or an array of languages can be fed in. If an array of languages, a translation in the first matched language will be returned. By default, returns the original string if no translation found, but with the option of returning FALSE instead.
4. Sketch in some initial tests.
Haven't actually run the tests yet, I expect they're failing.
Once these functions are in place, we should be able to convert some of the existing code invoked by t() and by translation imports to use these functions. And they will be the basis for handling user-defined strings alongside the existing code-based ones.
Comment #6
Jose Reyero CreditAttribution: Jose Reyero commented@nedjo,
About your latest patch, I'm not sure that records functions really belong here, they may be better in a different patch, even if not having them adds a little overhead into this one.
Also we should focus first on the part of the api usable by other modules, maybe rewriting the internals later. Something like
locale_get_string($textgroup, $value, $langcode, $key = 'source');
locale_save_string($string); // Object $string having source, tergroup, location, etc..
This should speed up development on other fronts.
Comment #7
Jose Reyero CreditAttribution: Jose Reyero commentedI'm posting this optional patch (that possibly won't work at all) just to illustrate the previous points.
This patch (when finished) would:
- Expand hook_locale() to retrieve some more information (whether the group is cacheable, which field to use as search key...)
- Add a bunch of locale api functions
Comment #8
Pasquallesubscribe
Comment #9
catchsubscribing.
Comment #10
Jose Reyero CreditAttribution: Jose Reyero commentedAll we are doing locale-related needs badly this one, so I'm setting priority to critical. There are like a dozen other issues waiting for this one.
Comment #11
Pasqualle@Jose: if you have some time can you explain this with examples, how the code and how the UI should work..
1)
a. translation of simple texts like forum title, site title
b. group of strings from table columns like project statuses
2) what will be the different compared to tt() function in i18nstring?
3) are there any good docs about this?
4) will this patch enable the option to translate the same string differently based on context? like track can mean CD track and user track, which needs to be translated differently in some languages..
Comment #12
nedjoThis patch includes two other patches: #365899: API methods for schema-based load and delete operations, needed for APIs, and #369423: drupal_write_record() returns FALSE on insert when table has multi-field primary key, needed to fix a bug that made a test fail.
Today I updated the APIs, wrote some more tests, and then began the work of bringing in Jose's improvements.
The key point Jose's patch addressed is: locales_source records will be keyed by different fields depending on the type of data we're talking about.
For code-based strings, the string itself is the key (in combination with the version).
For user-entered strings, it's the location that is unique (maybe in combination with the version--not quite sure of this yet...).
I got part way through reworking my patch to take this into account. This all needs some more thought. Is version relevant for non-code-based strings? If so, how do we use it?
Currently tests are failing in the locale_get_translation() function--the most complex of the functions and the one I'm only partway through reworking.
Jose, if you want to work on fixing this up and post the next patch version that would be great!
Comment #13
Jose Reyero CreditAttribution: Jose Reyero commentedThis is a rewrite and extenstion of nedjo's patch, main changes:
- Drop generic drupal_record_* methods, these may be better in a different core patch
- Split locale() function into locale_string(), which will search by string and locale_location() which will search by location
- Renamed 'target' methods to 'translation' so they refer to the functionality, not to the table
Whatever I think we better focus to define an API so we can use it in other patches while we discuss the implementation of this one. These would be the main local api method signatures:
There are also some group loading and deleting functions. They're 'smarter' than simple record oriented ones (I.e. the locale_delte_strings() knows how to delete source strings and related target strings.
This still needs some polishing and updating the tests
Comment #14
nedjoNice work Jose. It's good to see tt() emerging as a parallel to t(), with appropriate generalizing of some dependent functionality, e.g., string replacement.
I've gone over the patch and made numerous small and a few larger refinements and fixes.
Of note:
* merged locale_string(), locale_location(), and locale_translate() back into locale(). We can use a single function with different arguments.
* removed locale_tt(), which is not needed and contained obsolete code left over from i18nstrings.
* moved all API CRUD functions to locale.module and made them consistent with the methods in #365899: API methods for schema-based load and delete operations
* renamed CRUD functions in the form locale_objectype_operation, e.g., locale_source_save().
* fixed up the tests, including hook_locale() example.
Currently one of the tests introduced in this patch - using tt() to fetch an existing translation - fails. I haven't yet investigated why.
Comment #15
Jose Reyero CreditAttribution: Jose Reyero commentedFixed a few things with latest nedjo's patch:
- Added new api documentation for hook_locale()
- Added back the feature to remove target strings using query conditions for the source table (was in my previous patch)
- Fixed issues with test. The locale_test module had a t() in hook_locale() which caused recursion
- Fixed some issues with tt() and _locale_get_cache() functions and removed the 'text' textgroup. As we have a 'custom' on for testing we don't need it yet.
Still:
- I don't see what can be the case for removing a source string but keeping translations ($remove_translations param)
- Also, there seems to be no use for the locale_source_load_multiple() function.
- Locale tests seem to be broken, independently of this patch, for the uninstall part... ?
Comment #16
nedjoNice fixes, thanks reyero.
Agreed, we can drop this argument.
True. The multiple loader requires very little more code than a single record loader would, but it would be good to apply to use cases in the module. The best would be the complex query we're using for filtering strings in the edit UI. Since this is paged, however, it would have to wait on #299267: Add "Extender" support to SELECT query builder which would add paging support to the D7-style select queries.
Comment #17
stella CreditAttribution: stella commentedRe-roll of patch against latest HEAD. I also included some minor style changes which coder flagged.
Simple tests all pass. I also ran the new ones (not yet committed) from the patch at #369229: Review locale module's simple tests and they passed too.
Cheers,
Stella
Comment #19
stella CreditAttribution: stella commentedRe-rolled patch against latest HEAD.
Cheers,
Stella
Comment #21
stella CreditAttribution: stella commentedMarking for a re-test
Comment #22
chx CreditAttribution: chx commentedI am not sure I want to agree on this. There are no user defined strings, imo. There are variables, there are menu titles, block titles and so on. My current thinking is that we would be better off solving this on a case per case basis instead of a central translation function. Context and UI are two things that make me wonder about the viability of this. For example, for menu entries, you might want a table which shows the various per language paths and allows entering a title for each.
Comment #23
cburschkaThe last test was on March 2 and stella's post on March 13, so I have requested a retest.
(I haven't examined the patch in detail, and honestly can't say whether this is a good idea or not.
Comment #25
nedjo@chx:
Certainly there will continue to be a need for case by case handling. Not every string translation need can or should be handled via the locales system. Menu items may well be a case where we need custom handling, and no doubt there are more.
But many use cases do seem to work.
Note again that this patch doesn't introduce anything particularly new. The result is what we have the beginnings of in D6: a locales system capable of handling different types of strings. This is why we have hook_locale() and a textgroup field in the locales_source table. This system is used extensive in D6 via the Internationalization package for taxonomy vocabularies and terms, menu items, profile fields, node type data, CCK field labels, etc. What's missing from core, and hence what's provided in this patch, is only an API for working with this existing system. It completes part of what was left unfinished in the D6 work of generalizing the locale system to be usable with more than built-in strings.
This patch however will need reworking when #334283: Add msgctxt-type context to t() is applied. That patch introduces a new field to locales_source and hence changes the way that a unique string will be determined. Some rethinking here is needed.
Comment #26
mfbsubscribing
Comment #27
sun.core CreditAttribution: sun.core commentedUnfortunately, too late for D8.
Comment #28
catchDowngrading all D8 criticals to major per http://drupal.org/node/45111
Comment #29
Gábor Hojtsy#1189184: OOP & PSR-0-ify gettext .po file parsing and generation is related and would require this to be done too. Leaving both open to separate work.
Comment #30
Gábor HojtsyI'd like to get in #1188430: Rip out textgroup support from locale module before getting back to this. @chx: this should make you happy I guess :)
Comment #31
Gábor HojtsyI've forked the drupal_replace_string() discussion to #1191614: Make t() formatter available as its own function because it has no relation to CRUD operations for locale objects, which this issue concentrates on. Also it could be an interesting discussion in itself. Please follow and support there :)
Comment #32
Gábor HojtsyI'm hard at work on this as well as #1189184: OOP & PSR-0-ify gettext .po file parsing and generation at http://drupal.org/project/gettextapi for Drupal 7 so we can work out our solution in the wild, get a lot of real life use and then integrate into Drupal 8. The import refactoring is already very well ahead there.
Comment #33
Gábor HojtsyI've just submitted #1215716: Introduce locale_language_save() to start off at the low level of language CRUD for a locale data API :) Please review/comment.
Comment #34
Gábor HojtsyI've documented how this fits to overall Druapl 8 plans at http://groups.drupal.org/node/161589.
Comment #35
andypostSubscribe
Comment #36
plachsubscribe
Comment #37
Gábor HojtsyRetilting to be more specific. Not sure if this should be a META issue or we should try and tackle all the pieces in here at once.
Comment #38
Gábor HojtsyHere are quick extracts from the latest patch above with the functions / tests we actually want in now. The code itself in the functions is not at all up to date, D8 does not have textgroups anymore (but other things did not change yet), so I'm uploading this as .txt (also there are crucial patch headers missing and it would not apply anyway). I've just copy-pasted the relevant things out of the last patch to highlight what we want to keep working on from this patch going forward.
Comment #39
podaroksubscribe
Comment #40
nedjoShould this build on or wait on #1184944: Make entities classed objects, introduce CRUD support?
Comment #41
Gábor Hojtsy@nedjo: translation strings and sources will definitely not become entities. I think refactoring our code to standalone API functions in itself is a very useful string. If we are to move to an OO API for CRUDs all around Drupal, we can move the translations/source API to that too. I think working on abstracting out this code is to be done sooner than later (so many other things depend on it), and then we can move forward from there much easier if we want a different interface for these CRUD operations. What do you think?
Comment #42
loganfsmyth CreditAttribution: loganfsmyth commentedComment #43
plachComment #44
loganfsmyth CreditAttribution: loganfsmyth commentedThis takes what Gabor posted in the txt file and implements it in the current codebase.
I have changed a few of the function signatures to better parallel other places in core, and I updated the tests to match the new functions.
I will go through and replace direct queries with calls to the new API functions, but I wanted to get this posted to show my progress and get some initial comments on the API.
Comment #45
Gábor HojtsyI think the real "in-field test" for this API is if we can convert the existing direct queries to the locales_* tables to this in all of locale module (except maybe locale() that is, which might have a performance problem if we do, not sure). If that works, existing tests should stil pass. Otherwise it looks nice. :) What are the outstanding questions in your view?
Comment #46
loganfsmyth CreditAttribution: loganfsmyth commentedI've updated the patch to include changing core to use the API functions instead of direct DB queries, where it was possible. There are still a few direct queries left that rely on COUNT, MIN, GROUP BY and/or LIKE.
This passes Locale tests locally, so posting it to run the whole test suite.
Comment #47
loganfsmyth CreditAttribution: loganfsmyth commentedNew patch fixing some formatting and a few bugs I noticed.
Comment #49
loganfsmyth CreditAttribution: loganfsmyth commentedApparently my git-fu is not as strong as I thought, sorry. Here's a rerolled version that should actually work.
Comment #50
babasse05 CreditAttribution: babasse05 commented#49: locale-api-core-update-361597-49.patch queued for re-testing.
Comment #51
Berdirnitpicking: space before 'source'
I haven't read the whole issue, is there a reason for not separating this into two arguments, similar to other load_multiple() functions? I think re-using the same argument in different ways is something we're trying to get away from...
wondering if we shouldn't enforce an object here? would make the & unecessary.
The second sentence should probably be moved to a new line (with an empty line between).
The is_array() check is not necessary. condition() handles that automatically.
Same here, in_array() is not necessary.
Same here, why not enforce an object? It's already documented like that..
there is an assertEqual() assertion that could be used here.
Same here. the isset() is imho not necessary. If the test fails then it fails, it doesn't matter if there are additional exceptions or not...
Also, a quick search turned up some more direct queries on these tables:
_locale_export_get_strings() in gettext.inc
_locale_rebuild_js() in locale.inc
locale() in locale.module
locale_form_locale_language_overview_form_alter() in locale.admin.inc
There is also _locale_translate_seek() in locale.pages.inc but that doesn't look lik it's supported by the API and probably shouldn't be.
-17 days to next Drupal core point release.
Comment #52
loganfsmyth CreditAttribution: loganfsmyth commentedAll very good points. I've fixed the comments, added type hinting for stdClass, removed all of the in_array checks from the conditions, and changes the asserts to assertEqual.
Having a separate lid argument for sources makes sense. I've updated the source related functions to have a lids argument first. That's a more difficult question for the translation target tables because they have no primary key. For now I've left those functions mostly as they were.
For the functions you mentioned, each one relies on SQL functions, and I wasn't sure what to do about that.
_locale_export_get_strings() relies on an ORDER BY plid,plural.
_locale_rebuild_js() uses a LIKE condition.
locale() uses LENGTH()
locale_form_locale_language_overview_form_alter() relies on COUNT() and a GROUP BY
Comment #53
BerdirThe & is not necessary anymore.
Neither is the comment.
One remaining in_array() check hiding here :)
Wondering if it would make sense to use $plural = NULL als default value and in that case ignore the argument and delete all occurences of that lid/langcode combination?
Because then, it could for example be used here.
About those functions, I see, haven't looked that close. Maybe we can leave them for now and create a follow-up issue to see if anything can be done about them. So that it doesn't block process here..
Also wondering if the existing tests + the new ones cover the new API functions or if something is missing. Haven't seen the delete functions being used in the tests for example.
-20 days to next Drupal core point release.
Comment #54
loganfsmyth CreditAttribution: loganfsmyth commentedGood catch for the forgotten & and comment. I also fixed the last in_array.
The reason that I have $plural default to 0 instead of NULL is that I felt it would be misleading to have a function called 'locale_translation_delete' and 'locale_translation_delete_multiple' where in fact BOTH of them could remove multiple translations depending on arguments. This is kind of a unique issue here because the translation table has a multicolumn primary key. Is there anywhere else in core that has a combine primary key with API functions for managing them? Maybe we can see what they do.
Comment #55
Gábor HojtsyTalked about this with @berdir in IRC. Basically, our plural storage is pretty broke and we need to resolve that sooner than later. Depending on the number of plural variants you have the strings are stored separate as "@count comments", "@count[2] comments", etc as separate sources. When you go in the UI and want to delete one, your expectation is that you only delete that one, not the whole thing. The plids in translations form a **linked list** of plurals, so if you delete one that can also break this chain and make it broke on export. See _locale_import_one_string() on how this $plid based linked list is created. Basically, each new plural is linked to the previous one as a child. So the UI to delete them one-by-one can be used to destroy multiple ones for export, even though they are in the DB.
Yeah, so all-in-all plural storage is very broken, and IMHO we should move to a simple string vs. string storage like l10n_server does, where the string has internal "serialization" of the singular and plural forms, so we can display in-place editing with all of them AND we can delete them all at once if we want to AND we don't have stale data lingering around if/when you change the number of plurals for a language.
So all that said, I think we either have an issue for that already, or should have one, and I don't think we should fiddle a lot with plural deletion or overall handling for now. I think if we can reproduce the current behavior (delete translations for a single sid when the sid is deleted) which is exposed on the UI, then we should be fine.
Comment #56
Gábor HojtsyWe can alternatively put this issue on hold and work on fixing up plural storage and editing, if you prefer and then map the API to the much cleaner storage. I think we can do either way and do not want to discourage the work here, so ideas welcome on that :)
Comment #57
Gábor HojtsyPutting on hold for now, see above.
Comment #58
andypostThis issue is #532512: Plural string storage is broken, editing UI is missing
This way we could really simplify storage and probably make translation UI saner
Comment #59
Gábor HojtsyAdding UI language translation tag.
Comment #60
Soul88Hello guys.
While thinking about localization import performance I've noticed the following function _locale_import_one_string_db() that actually causes lots of problems. Main problem is that it imports the string one by one. And makes 2 to 3 queries per each row.
Lets now imagine that we already have 5 languages (approximately 350000-400000 in the `locales_destination` table) and now we want to add one more languge (70000-80000 more rows). So we'll need to run 140000-240000 queries and have more than 100000 index rebuilds. The DB is going to die somewhere here.
So the idea is that all the imports will be incredibly slow until we rewrite this function. And it doesn't matter what wrapers will call this function.
I have several ideas on how we can rewrite this function but unfortunately all of them are MySQL specific, as I'm pretty familiar with this DB engine.
I think my best guess was to use temporary table that includes all the fields from both locales_source and locales_destination tables. And after that I'd run a bunch of queries like:
UPDATE t1 INNER JOIN t2 USING(...) SET t1.fld = t2.other_fld WHERE ...
My other approaches where to use:
INSERT INTO tbl1 (<field_list>) VALUES (...), (...), (...) ON DUPLICATE KEY UPDATE ...
But, according to andypost, none of this approaches SQL standart compatible, so will not work on all the DBs.
Additioanlly to that there is no way to use Drupal's db_insert cause it seems that it makes just a number of separate INSERT queries, and not "INSERT INTO tbl (...) VALUES (..), (..), (..)". So we'll still rebuild indexes after each query.
To summarize everything above: we just HAVE TO make the locales import in batch from the DB perspective (not the DRUPALs batch API), but on the other hand it's very hard to do it the way most DB engines could support it.
So where should we go here?
Comment #61
Gábor HojtsyThis is currently being worked on at #1189184: OOP & PSR-0-ify gettext .po file parsing and generation, which would greatly change (basically rewrite) the import/export APIs altogether. I would suggest you go there and help test the performance of that vs. the current solution. That would be a very useful addition there.
Therefore marking this back to postponed and not on the sprint.
Comment #62
loganfsmyth CreditAttribution: loganfsmyth commentedComment #63
Jose Reyero CreditAttribution: Jose Reyero commentedThe idea here was nice but most of the code is somehow obsolete.
I've created this new one to restart with a fresh patch for Drupal 8, #1785086: Introduce a generic API for interface translation strings
Comment #63.0
Jose Reyero CreditAttribution: Jose Reyero commentedAdd parent