Closed (duplicate)
Project:
Drupal core
Version:
7.x-dev
Component:
locale.module
Priority:
Normal
Category:
Task
Assigned:
Reporter:
Created:
30 May 2007 at 07:46 UTC
Updated:
26 Jul 2012 at 14:35 UTC
Jump to comment: Most recent file
Guys,
I did some changes on locale module.
regards,
massa
| Comment | File | Size | Author |
|---|---|---|---|
| #10 | locale_22.patch | 35.21 KB | brmassa |
| #4 | locale_21.patch | 31.64 KB | brmassa |
| locale_20.patch | 38.58 KB | brmassa |
Comments
Comment #1
recidive commentedWow! That's a big patch. Needs review though ;)
Comment #2
gábor hojtsyOK, for us to be able to review this, *do not* include three different changes in one patch! Either have them in order, or *at most* the first and second one together, which have an immediate likeliness to get into core, while the third one will be likely target for debates.
Comment #3
gábor hojtsyBy the way, a few remarks to help you quickly get to better results:
- we have implemented the PO importer with a stream oriented parser with callbacks for each string, because parsing a big PO file into memory (into a big array), and then passing that array around is *very memory intensive* See this issue with explanation and even a graph: http://drupal.org/node/47610 ... so for this reason, this is rather implemented with not one big array but a SAX-like parser. I worked so much on this more then a year ago... It would be a pity to have this memory footprint issue back in again...
- this patch seems to be quite rough, you have stuff commented out, you have calls to _locale_import_message() removed, but not the function itself, and so on...
- you also removed the file name from most errors as the data source might not be a real file, *but* when importing mutliple files in Drupal 6 with the built in batch features, how would a user tell what file to look into/replace/fix/complain about, if he had an error in a file? some creative solution here would be required.
- there is lot of code which breaks Drupal coding standard, like this one does on nearly every occassion:
Comment #4
brmassa commentedGábor,
your right. here the "Import/Export Translation API"
regards,
massa
Comment #5
brmassa commentedGábor,
sorry, while i was writing, you posted another comment:
1* im not familiar with SAX, but using an array, my intention is to provide a atomic operation: or EVERYTHING or NOTHING. PO files doesnt have many problems, but on XML files, if an error is discovered on the middle of the operation, everything can be defected.
2* _locale_import_message() was deleted since the API is not based on files, but contents. Even not implemented on this patch, its possible to import a translation only by copying the PO content on a textearea. its useful when implementing some kinda validator or other operations.
3* the same as before. i dont know how to proceed.
4* i totally agree about the Code standards and i used to adopt it. i used the D5 locale.module, so this error was there too. im gonna fix it. (BTW: was the coder module ported to D6?)
cheers,
massa
Comment #6
gábor hojtsyChanged title to reflect the issue. Awaiting fixes for my remarks.
Comment #7
brmassa commentedGárbor,
there are two trade offs against performance: the operation will be not atomic and creates a too concatenated process.
And generally language imports/exports are not oftem and doesnt use big files.
if you really prefer i do this, it you be easy. so... you decide.
cheers,
massa
PS: your last post was on the same time as mine. take a look there
Comment #8
gábor hojtsyThere is no difference between finding an error in the middle of an XML file and in the middle on a PO file. With stream based parsing, the error only shows up in the middle, regardless of the format. The SAX XML parsing works so that you register a few callbacks to invoke when certain "events" happen throughout parsing. In PO parsing, you could register a function to be called when a string gets parsed. Now this is always the "_import_one_string" function, but as far as I see, you would like to have that replaced and otherwise leave the algorithm in place (you only replaced the import_one_string calls with an array buildup). If SAX is not known for you, a solution for this can be derived from design patterns such as the "template method" pattern. (Just that we don't have classes to inherit from and define our own processing functions, so we can pass on the function to call back). Sematic code:
Instead of building up a big array, the callback gets called for each array. If you have a contrib module which needs a big array, it can build it up in that callback, but it does not fit the constraints a Drupal site could run at shared hosting servers.
So this is a reason to have unclear and confusing error messages to users? We had bug reports before from guys with error messages which did not spell out the file name, so the file name involved in the error was unknown. As I have pointed out earlier, if you have a batch process, which imports 16-20 files (this will be so for *any* foreign language Drupal site with Drupal 6), you will have a very hard time debug the error if you have no file name. When you use textareas or a single HTTP source, maybe it is clear that it is the source of the error, but not for the Drupal core use case. It is not that hard to get an extended error message though. Either "componentize" the string (which is not 100% translation friendly), or have a callback provide the error message with a message code or rather a message key.
This clearly omits the filename if there is no file known, and clearly includes it if we know the file. It is not really language friendly though, so it might make sense to instead of passing the $string along, pass an $error_key and build up the message for that properly:
Or you can use a named callback for this too, just like the string parsing if you wish (again, look at SAX :).
That was an excerpt from your code (an excerpt from locale_get_modules()), so I would not say it comes from the existing code.
Well, we try to push people to use smaller PO files, but nothing stops them from exporting a complete translation from a site and import that on another one. That could be a PO file half a megabyte or bigger in size. Your methods would use twice as much memory for that as the current algorithm if I extrapolate well from the early 2006 findings we had.
Comment #9
gábor hojtsyBTW this is the relevant issue where a filename was/is missing, so debugging is hard: http://drupal.org/node/129709
I also forgot to note that you would actually need "readers" and "writers" as parameters.
The reader would return a line, the writer would write out one string when found. This is slightly more general then the example I provided above (but it does not suggest an *actual* variable name or parameter order set, it is just a simple example to show you the idea).
Comment #10
brmassa commentedGárbor,
i did some small changes. its now much more like you asked.
cheers,
massa
Comment #11
gábor hojtsyLooked through the code. Some comments:
- There is already a hook_locale() in core, maybe you can reuse it more creatively, instead of introducing a possibly confusing hook_localeapi()
- In locale_translate_import_form_submit(), you pass on a file resource id to _locale_import(), then on _locale_import(), you indicate it is $data, which is supposed to be the whole file
- We discussed passing around the whole file data and/or a big array of the parsed strings is a no-go for core, unless you can enlighten us with some benchmarks
- Code comments are unclear and contain spelling errors: eg. "Coordenates all exporting process" is too general to help anything
- We don't start the variable documentation with the type name, and definitely not lowercase
- Receiving a function name to use, and accepting it as-is from a request parameter is part of a disaster recipe... You don't want people to run arbitrary PHP functions on your server
- ' %filename(line %lineno): %line' is not localizable... I would also refrain from printing the actual line contents there as it can easily be gibberish nonsense to the actual Drupal user (many Drupal users seeing these errors are not aware of how PO files are constructed, neither they care about it)
- There is really no value of the call_user_func_array() function here, if you know the exact variable number and order... "call_user_func_array($tables, array("db-store", $current, $language, $mode, $group))", in fact you might only need call_user...() stuff to easily tell that your $tables is a function name in fact (which should be more clear from the variable name from the start, so $tables does not look attractive)
- We did have quite a few bug reports about problematic plural formulas, so removing the file name from that error message does not seem to be a good idea.
- "$strings = _locale_export_get_strings($language, $group);" this is again not going to work nicely, when you export two or three thousand strings... I see Drupal currently does build an array of strings when exporting, we did not look into optimizing that a year ago, as importing was the immediate target for the work. Seems like now is the time.
- We don't have function ending comments like "} // _locale_export_po()"
- There is no such thing as a "Portable Gettext Object"... There is although "Gettext Portable Object"
Comment #12
brmassa commentedGárbor,
1* Performance: i agree. I agree that we should look after performance and tunning, but a billion strings is basicly a 1mb file, and most servers can afford it. Im gonna try to fix it.
2* Comments: can be easily changed. ok!
3* Since im quite busy on my business in Brazil (i sell computers), and taking care of Ecommerce module and some other modules i maintain is a lot time demanding, im going to work on this patch later. Its not critical anyway.
4* Im going to work on Live Translations module. I just did several important changes on the module and i will launch a testing site with it. i just customized the locale import/export feature to help me on this module.
best regards,
massa
Comment #13
gábor hojtsyThe Hungarian Drupal 5.x core translation is a half a megabyte PO file, and contains 2409 strings. So a 1mb file contains around 5000 strings, not a billion. Most people install more modules, and will not use the Drupal core modules only, so these 2409 strings is just the start for a foreign language site. A year ago we found most shared hosting services (for which your automated updates are especially useful, not the control freak professionals) cannot even afford to import this half a megabyte file at once. Look, if you have this whole file read at once, then taken apart to small pieces and stuffed into a big array structure, it gets to *megabytes* of data for this data to be in memory at once. Similarly for exports. Did you actually look at the benchmarks I have pointed you to? Why do I repeat the same arguments over and over again, just to find you think we are in the billion strings dimension?
Comment #14
gábor hojtsyBTW just to reiterate: I very much care about this feature getting into core, or I would not come back and review new patches all over again. I think it should be done right, for the satisfaction of our users. We worked really hard to handle PO imports better in Drupal 6 with the new batch API and lot's of automated tools, and there is no room for stepping back to a performance-wise disastrous solution.
Comment #15
brmassa commentedGárbor,
I think you got me wrong: i really agrees about performance.
Because i dont have your expertise with locale module, i dont know exactly how to do this. This patch doesnt change that much, so its probably better if you take a look and modify it. The file are still loaded at once, but the lastest patch implements the callback function per string, as you recommended. I did only a "beta" code.
Core + contribs PO file may have about 5000 strings, or 1mb. its probably not a that big deal to load it, but compared to your solution that rounds about 0mb, its completly ridiculous. i really believe that it needs the optimizations you pointed. i frankly dont know much how to do and dont have the free time to study.
This feature isnt critical anyway. There is no other than PO files containing drupal translations and im developing a module to, hopefully, turn this a little bit useless.
sorry for any misundestand.
best regards,
massa
PS: how did you do those benchmarks? since im developing the Live Translation module, its going to be quite useful.
Comment #16
gábor hojtsyAs far as I have last tracked, you are going to use PO files to share translations to keep compatibility and hackability (in a good sense :), so this part of Drupal is going to be used more with your module not less.
As far as memory benchmarks go http://php.net/memory_get_usage helps when placed to the places you want to have your memory consumption measured. That shows pretty nicely where does your memory get used.
Comment #17
gábor hojtsyOK, so to move this issue forward, let's get the plural fixes in core at least. They are bug fixes, if I understand this right. If you could describe the bug and have that code to commit, I would test and commit the patch directly. I would love to refactor the locale code to be more modular for things such as live translation, and it would be much better to concentrate on that once we have the bugs fixed.
Thanks for your contributions!
Comment #18
gábor hojtsyBTW I included part of the patch (to refactor DB operations from inser_one_string() into another function) in this cleanup effort: http://drupal.org/node/150521
Comment #19
brmassa commentedGárbor,
plural bug: I dont remember much what was the bug, but i think its this: PO files should be formatted like
It works fine on languages that have only 1 plural. Many languages, however, has more plural forms. studing the code i noted that it ignores this fact, considering only the first.
I believe the entire system has the same design. I dont know much about PO files and localization systems. My "Live Translation" module probably suffers the same problems.
massa
PS: since you are a member of Hungarian translation team also, i invite you to help on LT. im hosting the server on drupal.titanatlas.com
Comment #20
gábor hojtsyI debugged the code a great deal today, and found out that there is only a simple bug in there, which exposes the internal @count[2] type of index to the PO file. I did include a fix and other improvements in the rework of the "export" part of your patch, submitted here: http://drupal.org/node/152670
Unfortunately due to the nature of how plurals are handled, I did not find a way to generate PO files with a "stream" type of flow, as you do in your modified potx.inc and as I suggested we keep doing with the PO import. Unfortunately nothing ensures that we have the plural versions in order from the DB, so we cannot just work with a stream type of parsing as it seems.
If you have time, I would appreciate if you could test the Drupal 6 patch I submitted at http://drupal.org/node/152670 so I can get on to the import part of your refactoring ideas, if that gets committed.
Comment #21
brmassa commentedGabor,
is it already fixed right?
massa
Comment #22
gábor hojtsyThe export refactoring is done in Drupal 6, because there were no interest shown from people for the import refactoring (look at the export refactoring issue mentioned above for an example), I concentrated on other issue, so the import refactoring is not done unfortunately.
Comment #23
brmassa commentedAngie,
XLIFF is more-and-more popular format for translations. Gábor started himself and old module called XLIFF Tools. It might be the time to revisit it, since its MUCH more advanced and complete than GetText (pot and po files) format. Otherwise, i suggest to close this issue.
regards,
massa
Comment #24
gábor hojtsyWell, for Drupal interface translations, the Gettext format is great, because it actually has nice tools for people to work with: translate, merge, etc. XLIFF is still somewhat of a new kid on the block in terms of readily available tool support. Also, it bring a bigger baggage in terms of bandwidth usage for simple string translation such as with Drupal's interface strings.
Comment #25
brmassa commentedGábor,
your right. I only would recommend XLIFF for 2 reasons: 1* Place Drupal 7 in the vanguard tech (along PHP 5+ only, Web Services and other bleeding edge technologies) and 2* it has a features that is not available for PO files: context. Its hard to translate "View" because it can be a verb or an object. We could support it on t() function, just like Qt does. It also provides some other features, so we might consider cost/benefit carefully.
However, if you believe its really not yet the moment (it should be a Drupal 8+ matter or not at all), i suggest to close this issue. I might reopen it in the future.
regards,
massa
Comment #26
gábor hojtsyWell, it is a far cry if only the transport format supports context. Supporting context in the locale API would be a requirement first. And there was no progress there lately as far as I have seen.
Comment #27
gábor hojtsyAlso, discussing that would be highly offtopic for this issue :)
Comment #28
damien tournoud commentedXLIFF is nothing more than an transport format. The fact that it "supports" context means nothing to us: that would be easily implemented using the gettext format using a simple convention (for example: prefixing source strings with the context).
We really need to add context to
t(), but that's another issue *entirely*.Comment #29
brmassa commentedagreed.
Comment #30
damien tournoud commentedSome thoughts about add context to t(): #334283: Add msgctxt-type context to t().
Comment #31
plachCleaning-up the "language system" issue queue as per http://cyrve.com/criticals.
Comment #32
gábor hojtsyRelated #1189184: OOP & PSR-0-ify gettext .po file parsing and generation but no proposal there to support other formats, just decouple the .po file handling. I think it might pave the way for this effort, but I'm not convinced about the special need for supporting more formats. A contrib module will be much easier to do after that patch, since the API should be much more open about it.
Comment #33
gábor hojtsy#1189184: OOP & PSR-0-ify gettext .po file parsing and generation make the API so open to do this that any contrib module should be able to do that. Finally :)