Manual updating the Drupal 5.x core strings to Drupal 6.x showed that the fuzzy matching (while results in sometimes funny and very misleading matches), helps a lot in jumpstarting translation of new strings. Doing a fuzzy matching on ten times as much strings might not be that easy (and the algorithm would still need to be figured out), but we definitely add fuzzy matches strings as suggestions for untranslated strings.

Comments

psicomante’s picture

If i understood :), it's a great feature :P

gábor hojtsy’s picture

Just marked #563228: Suggestion for similar strings as duplicate. Copying here its maybe cleaner explanation:

The Problem:
often, when in a string changes a punctuation or a word, the entire translation(s) are trashed. This is a problem for long strings.

The solution
Using similar_text() to find similar source string (i think similar for 80%, we could discuss about it) and propose to the user relative translations for that string as a suggestion tab. Our battle will be to not destroy the server during the search :)

Of course, the performance of the match is key here.

hass’s picture

Version: 5.x-1.x-dev » 6.x-1.x-dev

+

zirvap’s picture

Related issue: #204150: Differentiate fuzzy and suggestions.

Would it be realistic to get this implemented before (or not too long after) the 15th January string freeze for 7.x? It would potentially save a lot of translation time.

If it's not realistic to do it automatically, we could go for a partly manual procedure for now:

  1. Export 6.x translation
  2. Do a fuzzy matching in a .po editor
  3. Import the fuzzy strings to the localization server as suggestions (or even better as fuzzy strings, ref. #204150: Differentiate fuzzy and suggestions)

(I don't have the knowledge to contribute code, but I can figure out and describe a manual procedure, if there's not time and/or interest enough for the automatic solution.)

zirvap’s picture

I've added a rough description of a manual import of fuzzy strings to the handbook, see Create a file with fuzzy strings for import to the localization server.

gábor hojtsy’s picture

@zirvap: I've at least included links to your docs in the Drupal 7 alpha 1 translation announcement: http://localize.drupal.org/node/712

gábor hojtsy’s picture

Status: Active » Closed (duplicate)

We should think about this issue in a more complete scenario. Since solving this on the PHP level did not prove good in performance terms (I've been in discussion with Xano who authors the ATR module to apply his work to our database - http://drupal.org/project/atr), it looks like we'll not get this without integration with a 3rd party translation memory. I've opened #740494: Find translation memory to integrate with and implement integration as an umbrella issue for that.

joelpittet’s picture

Issue summary: View changes

Mentioned here #2560783-28: Replace !placeholder with :placeholder for URLs in hook_help() implementations

Was thinking maybe would be more performant if its just doing str_replace(). Was looking at this purely for placeholder changes breaking existing translations.

I'm new to this project and queue could you point me to the area of the code that was tested in #7 so that I can experiment?

t('!pass to @escape and %em', ['!pass' => '', '@escape' => '', '%em' => '']);

Could be compared as

$fuzzy = str_replace(array_keys(['!pass' => '', '@escape' => '', '%em' => '']), '', '!pass to @escape and %em');
// Result is ' to  and '.
gábor hojtsy’s picture

@joelpittet: looks like what you are looking at is a possible core change? Both core (all versions) and localize.drupal.org handle 'Hello !friend' and 'Hello @friend' as two distinct strings to translate. In fact 'hello @friend' would be a third one, etc. So not sure how your logic would work in core. It definitely sounds like a proposal you have for core, not for localization server.

joelpittet’s picture

I don't know where the comparison is being made to see when a string has changed, may it be core or here I have no clue. Any pointers would help.

I'd like propose that all the placeholders if they were to swap would be treated as if they didn't exist when comparing changes and swap them.
May have to temper that if the name of the placeholder changed it could be a different string so the 'type' would probably be a bit better to hold that true, but that is the proposal.

So @friend/!friend/%friend would all be treated as the same string.

gábor hojtsy’s picture

@joelpittet: that would require core changes first and foremost, so a core issue should be opened with a proposal for that. Currently Drupal core stores source strings as "binary strings" ie. every character matters and makes a difference. Looking up translations with possible variance in placeholder modifiers could be a performance issue to deal with as well.

hass’s picture

This would at least save translators a lot of time :-). At least localization server could do this and save the time there.

joelpittet’s picture

So we'd need to change the data type on locales_source:source from blob to TEXT/UTF8mb4 or something? And maybe to deal with performance save the variations as a hash or something along side?

gábor hojtsy’s picture

I am happy to continue a conversation on a core issue.

joelpittet’s picture

Here's one core issue that is related:
#851362: Add hash column to {locales_source} to query faster locale strings
and similar but opposite because it's going full blob #147947: [DBTNG + XDB] Replace some TEXT:BIG with BLOB and related #2490976: Locale caching algorithm is broken on Non MySQL/PostgreSQL databases

Still a bit unsure what to call the new issue and what it should be proposing for core.

joelpittet’s picture

Also not familiar with the differences between blob and text for sizes and drawbacks/features to convert them, so may be getting over my head here.

All I was hoping to do was review the mechanism for comparing strings have changed and find a way to fuzzy them up, can make it performant after I find a solution. Building a mental model of how this all works and looking for an in to experiment with some ideas.