After trying to use POTX i saw, there is no line the same as before. please test yourself. the result from potx and msgcat is somewhat different, that every lines has changed. I don't like to commit this to CVS.

1. Bug:

#: sites/all/modules/views/views_rss.module:10
msgid "Views RSS: RSS feed"
msgstr ""

2. Bug: Headers not added:

# LANGUAGE translation of Drupal (general)
# Copyright YEAR NAME <EMAIL@ADDRESS>
# Generated from files:
#  views.module,v 1.166.2.32 2007/04/13 16:27:12 merlinofchaos
#  views_ui.module,v 1.44.2.18 2007/04/12 01:29:08 merlinofchaos
#  views_comment.inc,v 1.4.4.6 2007/03/06 22:39:24 merlinofchaos
#  views_node.inc,v 1.30.2.15 2007/04/12 18:44:04 merlinofchaos
#  views_search.inc,v 1.2.2.6 2007/03/03 22:59:33 merlinofchaos
#  views_statistics.inc,v 1.9.2.6 2007/03/06 21:23:27 merlinofchaos
#  views_user.inc,v 1.14.2.7 2007/03/03 22:31:27 merlinofchaos
#  views_upload.inc,v 1.4.4.5 2007/03/03 22:40:31 merlinofchaos
#  views_rss.module,v 1.12.2.5 2007/03/06 21:44:32 merlinofchaos
#  views.install,v 1.21.4.12 2007/04/14 03:35:47 merlinofchaos
#  views.info,v 1.2.2.3 2007/01/19 00:22:45 merlinofchaos
#  views_rss.info,v 1.3.2.3 2007/01/19 00:22:46 merlinofchaos
#  views_theme_wizard.info,v 1.3.2.3 2007/01/19 00:22:46 merlinofchaos
#  views_ui.info,v 1.4.2.3 2007/01/19 00:22:46 merlinofchaos
#

3. Order of strings is totally different

4. Different line breaks

potx result:

#: modules/views_user.inc:121
msgid "The User ID argument allows users to filter a to nodes authored or commented on the specified user ID."

Old extractor.php with msgcat result:

#: modules/views_user.inc:121
msgid ""
"The User ID argument allows users to filter a to nodes authored or commented "
"on the specified user ID."

Comments

gábor hojtsy’s picture

Status: Active » Postponed (maintainer needs more info)

1. What should we do there? Have just the module file name?
2. You mean CVS header lines, or the "generated from" lines you quoted yourself?
3. Randomly? Or is there some noticable order in them anyway?
4. This is how msgcat works, extractor.php did export one-line strings all the time. If you run msgcat on the potx resulting PO file, you will get the line breaks as you expect from msgcat.

hass’s picture

Status: Postponed (maintainer needs more info) » Active

1. i think it should be relative to the modules directory - not only remove every path. view's for e.g. have some submodules. if i place extractor inside the views module directory any extract's i get the subdirs in the language files. the problem i see is - someone use potx from web and the other from command line and they have different results!? keep it same or we get tooo much diff's in CVS.

2. the "generated from" lines are missing with potx. the headers i posted are an example how it looks like with msgcat.

3. to me it looks like potx is a randomizer :-). i don't know - but msgcat orders totally different. checkout and compare both results. ~98% changed lines that haven't changed - aside the order...

4. i saw xgettext have an option named "--no-wrap". i think this one caused this, isn't it?

gábor hojtsy’s picture

1. hass, with extractor.php you get paths relative to where you have put extractor.php. It could be in a subfolder in case you are translating a submodule, or in the root folder, so you get different paths. What we can do in potx.module is to generate paths as if you would put extractor.php into the folder you selected as POT generation root. That does not guarantee compatibility with extractor.php, but maybe works more like how *you* used extractor.php before.

2. Yes, that is exactly because of the join() error you already reported, and which I made a duplicate now.

3. Maybe you can attach a pot file done with the extractor+msgcat way and with potx module, so we can do a diff?

4. We are not using xgettext.

hass’s picture

StatusFileSize
new51.63 KB

3. here is the potx generated file, rename it to .pot, please

hass’s picture

StatusFileSize
new47.28 KB

3. here is the extractor generated file, rename it to .pot, please. And then compare...

hass’s picture

2. yes, reproduceable. extract a POT for a module then go back to the tab. the error is displayed.

gábor hojtsy’s picture

Just comitted a fix for subissue 2 here. I tested with several modules on my test site, and it seems to work nicely.

ray007’s picture

StatusFileSize
new2.4 KB

On top of yours here a "fix" for number 1, normalize the filenames in output to the directory we request a pot-file for.

That to everybody's liking?

hass’s picture

@Gabor

1. it's not only my problem and how i used extractor.php before. In general it is possible to place a contributed module in a sites/all/modules/*, but i'm able to place it inside sites/www.example.com/modules/* or somewhere else. now if the translator from sites/www.example.com/modules/* extracts and commit his files - we will have big and senseless diff's in CVS. Additional the site name gets committed into CVS - of every translator. Awful...

2. i will test later, thx.

gábor hojtsy’s picture

Yes, that seems to be logical. Except that you would write a capital letter at the start of the documentation of $strip_prefix

+ * @param $strip_prefix
+ * An integer denoting the number of chars to strip form filepath for output

Otherwise seems to be fine, feel free to commit.

Hass, do we know at what column msgcat wraps? As far as I have found out, it wraps at "width of the output device" which is, well quite ambiguous.

gábor hojtsy’s picture

Hass, I tried to point out that if you put your file in "sites/all/modules" in the hopes of getting translation files for all modules (which you got before with extractor.php to the file system), you will have your pot files generated with different locations, compared to putting extractor.php into "sites/all/modules/pathauto" for example. Shortening the path to all PO files for the modules you use can end up in different results, and noone told translators to put extractor.php to any particular place, so they could be creative.

Now we can enforce a path style with potx.module, which is better again then extractor.php possibly having any type of path style. I just tried to point out that what we come up might still have diffs with your previous practice, because the previous practice was largely undefined.

hass’s picture

Yes, i understand. i have only put extractor inside a contributed modules directory (i'm not talking about any core modules) to get all module strings out and not to loose some module strings inside the general.pot of core!!! All contributed modules POT files i've seen are build in this way... and everyone aware of this general.pot problem will place extractor inside the modules directory...

Aside I have a dream... infrastructure will build all language independed POT templates :-).

gábor hojtsy’s picture

hass, you have a dream, I have a Google SoC project to do that :) So it will happen this summer. Still we need to fix it here, because it will not get right in the build system otherwise.

ray007’s picture

Commited my patch with Gábor's suggested documentation change.

I don't think we can do anything about #3, but the order should stay the same once the pot-files in use are those generated by potx, right?

#4 probably also shouldn't be too hard if we set a fixed number for the cutoff point. If we really want that.

hass’s picture

#3: are we not able to learn how the order logic is build and rebuild the same logic? :-)

#4: last time i edited a file for core one of the German translators turned this back in a no-wrap version... maybe the nowrap version should be the preferred way... but a poEdit save will wrap the lines - whatever i'm doing - as i remember... i must test to say for sure.

hass’s picture

maybe the different order is only caused by "--use-first" command line option!? only an idea...

gábor hojtsy’s picture

Looked through the ordering stuff. Diffed the two files you sent me. Noticed that the locations make a good difference (but that is fixed, so run a big sites/all/modules/views/ replace to remove these). By looking at the remaining files the following issues appear:

- file list missing from top -> already fixed
- potx version has bad file name at top in parenthesis -> already fixed

- wrapping done on "extractor version"
- "extractor version" has following order: general.pot, translation file for files from subfolders, translation files from the same folder
- potx version has following order: translation files from the same folder, translation files from subfolders, general.pot strings are intermixed with others

So the wrapping account for most of the diffs, while reordeing accounts for some others. I put "extractor version" to quotes because it is really a combination of extractor.php and msgcat we are replacing with potx module's web based extractor. Ordering is not really surprising, because instructions said that you should use this msgcat line: msgcat --use-first general.pot [^g]*.pot (regardless of how buggy that is for module names starting with "g"). So no wonder general.pot stuff comes to the first position. The other files are used in file time order I guess, but extractor.php seems like doing files in the same order as potx.module does, that part of the code did not change.

gábor hojtsy’s picture

But forgot to mention that the strings appearing in multiple files does not seem to appear at all in the potx generated version.

gábor hojtsy’s picture

Hass, run latest potx.module on views module, to compare with your files. The repeated strings appear again, due to the bug fix done on the "generated from files" list. The "generated from files" is more accurate then what you have before with extractor.php and msgcat, because that only kept your general.pot list, but we have a complete list now.

Ordering and wrapping is surely different still. Ordering differences are mainly due to the two factors:

- Repeatedly appearing strings intermixed in the pot (not at the start as was before) -> we can help this by ordering the $strings array by number of occurances (that would still not be exactly as before, but would still push the repeated strings to the beginning).

- Mass ordering differences erupt from how msgcat and potx scan through files. Extractor generated files into the current folder. msgcat goes through them in alphabetical order, so you get general.pot, views-module.pot, views_book-inc.pot, views_comment-inc.pot and so on. The inc files are in subfolders, but because the extractor naming algorithm does not distinguish between them, you get that order from msgcat. Now with potx module, you basically get the order of the result of the glob() call in _potx_explore_dir(). Seems like it orders the results in "brace order" (that is order of extensions), then the order of file names. So you get views.module and other .module files in the same folder, then .install, .info, and so on files. Then it does down to subfolders once all files are parsed in the same folder and goes on the same order.

So the difference is that msgcat orders by absolute alphabetical order based on extractor.php results, while potx module orders by "folder deepness", then file extension, then file basename.

That said, it is not impossible to do the same ordering in potx then with extractor.php + msgcat, we would just need to order the $files array by name of file, without considering the path to the file, but I don't think that ordering is really logical. Do you?

hass’s picture

#1. the paths seems fixed now, thx.

#5. Bug: i have executed potx on command line and this produced only one "general.pot". In past this has created one file for every submodule and some more files. Is this new behavior correct? looks really easier for the translator (!) , but why is this one not named views.pot...? :-) In past it was named "[module name]-module.pot" as an example - if only one file was created for a module. however i'd like to name it like to upper level directory... and therefor "[module name].pot" :-)

Additional i compared a command line generated pot file with the web generated pot file (both created with potx) and - they are *different*. They must be both the same or we will have senseless diff's...

Ordering: i don't like this new ordering very much, but i don't know how to fix this yet...

I'm merging files with (example):

msgcat --use-first content_admin-inc.pot content_copy-module.pot fieldgroup-module.pot nodereference-module.pot number-module.pot optionwidgets-install.pot | msgattrib --no-fuzzy -o cck.pot

and not with:

[^g]*.pot

so i have "the hand on the order", what i have done until now with an "dir /b" under windows... but i thought msgcat will do it in a way it likes and not the order i have in the command line... something new for me, but ok.

gábor hojtsy’s picture

Hass, yes, the reason of modifying potx-cli.php to use a single file mode by default was made exactly to help translators (remove the msgcat requirement altogether) *and* at the same time resemble what is generated by potx.module.

It is not named views.pot because we don't know that it should be named views.pot. You can have so many files in the folder with submodules and so on. Maybe we can guess the desired filename by looking one folder up, but that only works if the working directory is the same where your files are (eg. in autodiscovery mode). Granted, that is the default, but people can do "php potx-cli.php --files=c:/mydrupal/modules/mymodule.module" Maybe this is not much realistic, but still a possibility. Naming the resulting file after the name of the folder potx-cli.php runs from seems to be a possible default, but then we need a command line option for that (and a better option parsing code, now that it gets important to support multiple options at a time).

"Additional i compared a command line generated pot file with the web generated pot file (both created with potx) and - they are *different*. They must be both the same or we will have senseless diff's..."

Again would be nice if you could collect the differences disturbing you. You have the files already, why would we redo all the hassle you went through already? :)

hass’s picture

#2. the join bug seems fixed, too. Aside, think about if there isn't a way to display the error to the users... i know this will be very difficult and potx should be bug free... but never say never :-). Maybe Ajax will do the trick...

#4. seems not very important. a no-wrapping one liner is ok. and i cannot find a default msgcat width, too. doesn't matter...

hass’s picture

StatusFileSize
new2.58 KB

sorry, here are the diffs between CMD and Web generated POT... :-)

hass’s picture

Aside, will be a good idea to put potx-cli.php,v 1.1.2.2 2007/05/02 09:18:15 goba on a blacklist...

The only place i will use this php potx-cli.php --files=potx.module is for POTX module themself :-). there are so much translateable strings inside the potx files that are never displayed... makes no sense and the JA.po translation have done so, too.

hass’s picture

StatusFileSize
new1.47 KB

forget my patch in #23... now i reroled the diff's after removed potx files from views directory... :-(.

gábor hojtsy’s picture

Sure, we can put potx-cli.php on a blacklist.

The second patch you sent shows that for some reason, the .info files were not picked up by the command line version. That should have something to do with special handling of .info files in certain modes. This could be in _potx_build_files(), but all I see there is some $filename variable used in CORE and MULTIPLE modes, but that $filename is not defined before it gets used. Maybe it was intended to be $file. Anyway, that should not affect this behaviour because we should be in SINGLE mode...

hass’s picture

any updates on the open issues?

ray007’s picture

Hmm, different timestamps and the info files missing.
The first one is "works as designed".

Gabor, any idea about the 2nd?

gábor hojtsy’s picture

Well, we made a lot of changes on hass' bug reports, and I think we fixed this along the way. A try with a fresh potx version would be great.

hass’s picture

As i remember #25 isn't fixed, isn't it?

gábor hojtsy’s picture

Status: Active » Postponed (maintainer needs more info)

I have just committed a new version, which fixes the .info file issue (among lots of feature additions :). Please grab the latest CVS version and see if it works to your satisfaction.

hass’s picture

Status: Postponed (maintainer needs more info) » Active
StatusFileSize
new1.19 KB

I'm sorry, but there are diff's... maybe related to http://drupal.org/node/155106 and http://drupal.org/node/155108.

gábor hojtsy’s picture

Status: Active » Fixed

Well, if it is only the potx.inc, then it is fixed with my patch. See the issues you opened. There is no point in keeping three issues open for the same problem.

hass’s picture

Status: Fixed » Active
StatusFileSize
new22.31 KB

sorry, i thought this has been partly fixed earlier... this is a diff between good old extractor+msgcat and potx... the diff patch is bigger in size then the pot file at all... so we get big diff's in CVS if different extractors are used.

gábor hojtsy’s picture

Status: Active » Closed (won't fix)

Well, you remember right that we already discussed that msgcat works in the order of the generated file names, while potx works in the order of the source file names and paths, which are quite different. Potx does not know the names of the generated files throughout the process, only at the very end of the file generation cycle, which makes it quite hard to do an msgcat type of merge.

Look, I am trying to concentrate my powers to make the web based translation editor a possibility, which means that all .po files will be wiped from CVS and imported to a database, which will have a nice web interface to edit all translations for all versions of all modules, share translations between them easily and so on. Even with included PO(T) import/export functionality for the desktop geeks, and people on the road to work offline. I think my energies are better concentrated there. The recent updates to potx were also part of that effort. I API-ifed the extraction process, so we can save extracted strings to the database instead of generating template files.

BTW as the diff shows we are better with extracting strings from .info files, so that's good ;)

  • Gábor Hojtsy committed 56e5c2f on 7.x-2.x
    Only mangle with too small files, possibly folding them into general.pot...

  • Gábor Hojtsy committed 56e5c2f on 7.x-3.x
    Only mangle with too small files, possibly folding them into general.pot...