Posted by EvanDonovan on July 22, 2009 at 10:05pm
Jump to:
| Project: | Translation Framework |
| Version: | 6.x-1.2 |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed (fixed) |
Issue Summary
When I submit a node to Google Translate, it comes back with the HTML entities escaped and with spaces added in the middle of closing tags. This means that the formatting on it is broken. Here is an example.
Original node:
This is a paragraph.
This is a new paragraph with a link.
I like paragraphs, don't you?
Spanish translation:
<div class="fun-stuff">
<p> Se trata de un párrafo. </ p>
<p> Se trata de un nuevo apartado con <a href="http://www.example.com"> un link </ a>. </ p>
Me gusta <p> párrafos, ¿no? </ p>
<br />
<p> <img src="http://www.urbanminstry.org/files/img-thing.jpg" alt="" /> </ p>
</ div>
Comments
#1
For the most part, I was able to resolve this issue by modifying google_translation_postprocess(). See below, where I added a bunch of str_replace to strip out various things that Google Translate was changing or adding in:
<?phpfunction google_translation_postprocess($translate) {
$translate->translation = str_replace('<', '<', $translate->translation);
$translate->translation = str_replace('>', '>', $translate->translation);
$translate->translation = str_replace('</ ', '</', $translate->translation);
$translate->translation = str_replace('"', '"', $translate->translation);
$translate->translation = str_replace('<br>', '', $translate->translation);
$translate->translation = str_replace('<br />', '', $translate->translation);
echo $translate->translation;
return $translate;
}
?>
The only remaining issue is that sometimes paragraphs are getting broken up into two paragraphs (with the first paragraph being very short and cut-off). I believe that this is happening where the breaks come in for the 500-character limit.
#2
I had the same problem and implemented the post process filtering in a wrapper module to avoid altering the google_translation module code.
<?php
/**
* translation_framework plugin that wraps calls to the google_translation module
* only providing it's own postprocess function to strip out extraneous tags and characters
* from very basic markdown text that were added by the google translation
*/
function google_translation_filter_translation_realtime($op = 'info') {
switch ($op) {
case 'info':
$info['google_translation_filter'] = array(
'name' => t('GOOGLE (markdown friendly)'),
'preprocess' => 'google_translation_preprocess',
'translate' => 'google_translation_translate',
'postprocess' => 'google_translation_filter_postprocess',
'languages' => 'google_translation_languages',
'description' => t('Utilize GOOGLE Translation software'),
);
return $info;
}
}
/**
* Filter out br tags and remove spaces from heading hashes. Would be better implemented with regular expressions.
*/
function google_translation_filter_postprocess($translate) {
$translate->translation = str_replace('<br> ', '', $translate->translation);
$translate->translation = str_replace('<br>', '', $translate->translation);
$translate->translation = str_replace('# ', '#', $translate->translation);
return $translate;
}
?>
You could have as many of these as you like.
Andy
p.s. Nice module Darren - thanks.
#3
Thanks guys, have put the fixes into the Development branch for the module http://drupal.org/cvs?commit=256692 is the CVS commit message for this.