Czech typographic conventions and Smartypants

intu.cz - March 23, 2006 - 20:12
Project:SmartyPants
Version:HEAD
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:active
Description

Hello, as a preface, there is my long post on the subject. The whole thing boils down to the need to follow Czech typographic conventions on Czech Drupal pages. As I said in my post, I have done violence to the CVS version of Smartypants (I can obviously send the changed files - but things such as diff and patches are well beyond my modest abilities : AND I feel it's an incredibly dirty hack...). The easy bit was creating a form item for Czech quotes and the rest in the .inc file. But one of the most important requirements still lies ahead of me.

It is illustrated on the .gif attached. Single letter prepositions and conjunctions should never be left at the end of a line of Czech text. Here is a piece of code from the GPL Texy formatter, which has a filter covering this (and many other typo conventions in Czech).

/**
* AUTOMATIC REPLACEMENTS MODULE CLASS
*/
class TexyQuickCorrectModule extends TexyModule {
    // options
    var $doubleQuotes = array('„', '“');  // left & right double quote („ “)
    var $singleQuotes = array('‚', '‘');  // left & right single quote (‚ ‘)
    var $dash         = '–';                    // dash (–)




    function linePostProcess(&$text)
    {
        if (!$this->allowed) return;

        static $replace;
        if (!$replace) {
            $replaceTmp = array(
              '#(?<!"|\w)"(?!\ |")(.+)(?<!\ |")"(?!")()#U'      // double ""
                                                        => $this->doubleQuotes[0].'$1'.$this->doubleQuotes[1],

              '#(?<!\'|\w)\'(?!\ |\')(.+)(?<!\ |\')\'(?!\')()#UUTF'  // single ''
                                                        => $this->singleQuotes[0].'$1'.$this->singleQuotes[1],

              '#(\S|^) ?\.{3}#m'                        => '$1&#8230;',                       // ellipsis  ...
              '#(\d| )-(\d| )#'                         => "\$1$this->dash\$2",               // en dash    -
              '#,-#'                                    => ",$this->dash",                    // en dash    ,-
              '#(?<!\d)(\d{1,2}\.) (\d{1,2}\.) (\d\d)#' => '$1&#160;$2&#160;$3',              // date 23. 1. 1978
              '#(?<!\d)(\d{1,2}\.) (\d{1,2}\.)#'        => '$1&#160;$2',                      // date 23. 1.
              '# -- #'                                  => " $this->dash ",                   // en dash    --
              '# -&gt; #'                               => ' &#8594; ',                       // right arrow ->
              '# &lt;- #'                               => ' &#8592; ',                       // left arrow ->
              '# &lt;-&gt; #'                           => ' &#8596; ',                       // left right arrow <->
              '#(\d+) ?x ?(\d+) ?x ?(\d+)#'             => '$1&#215;$2&#215;$3',              // dimension sign x
              '#(\d+) ?x ?(\d+)#'                       => '$1&#215;$2',                      // dimension sign x
              '#(?<=\d)x(?= |,|.|$)#m'                  => '&#215;',                          // 10x
              '#(\S ?)\(TM\)#i'                         => '$1&#153;',                        // trademark  (TM)
              '#(\S ?)\(R\)#i'                          => '$1&#174;',                        // registered (R)
              '#\(C\)( ?\S)#i'                          => '&#169;$1',                        // copyright  (C)
              '#(\d{1,3}) (\d{3}) (\d{3}) (\d{3})#'     => '$1&#160;$2&#160;$3&#160;$4',      // (phone) number 1 123 123 123
              '#(\d{1,3}) (\d{3}) (\d{3})#'             => '$1&#160;$2&#160;$3',              // (phone) number 1 123 123
              '#(\d{1,3}) (\d{3})#'                     => '$1&#160;$2',                      // number 1 123

              '#(?<=^| |\.|,|-|\+)(\d+)([:HASHSOFT:]*) ([:HASHSOFT:]*)([:CHAR:])#mUTF'        // space between number and word
                                                        => '$1$2&#160;$3$4',

              '#(?<=^|[^0-9:CHAR:])([:HASHSOFT:]*)([ksvzouiKSVZOUIA])([:HASHSOFT:]*) ([:HASHSOFT:]*)([0-9:CHAR:])#mUTF'
                                                        => '$1$2$3&#160;$4$5',                // space between preposition and word
            );

            $replace = array();
            foreach ($replaceTmp as $pattern => $replacement)
                $replace[ $this->texy->translatePattern($pattern) ] = $replacement;
        }

        $text = preg_replace(array_keys($replace), array_values($replace), $text);
    }



} // TexyQuickCorrectModule

My knowledge of php is limited, and I can only dream of understanding the magic regex patterns like the ones in Smartypants or the Texy module. Could anybody help me to unravel its mysteries?

And on that note - do you feel that Smartypants and i18n modules should join forces and combine to provide a single solution to content and typography localization within Drupal? Something slightly more systematic than my additions to the Smartypants module?

Thanks for any help

Roman Dergam

AttachmentSize
czech-typography.gif3.77 KB
 
 

Drupal is a registered trademark of Dries Buytaert.