German version

nath - November 13, 2007 - 14:23
Project:Related Block
Version:5.x-0.x-dev
Component:Code
Category:support request
Priority:normal
Assigned:Unassigned
Status:needs work
Description

I'd like to use this module for a German site. Obviously, the stopwords need to be different for the German version. So far, I have simply commented out the English version of the list and started to create my German version. It would be much nicer to have a clean way to have multiple lists.

I think, the easiest way would be to have an additional variable in the settings, where one could choose the language and then have the code choose the list for the selected language.

Do you have time to implement this or should I have a go myself? I could provide a German list of words.

#1

rhys - November 14, 2007 - 09:24

I would definitely like to add this capability, so if you could provide a german list, I'd be happy to add this functionality.

#2

nath - November 14, 2007 - 12:37

Ok, one problem I already noticed with German texts is that the words are splitted at umlauts (äöü) in the following way:

"Präsentation" is split to the following:
[16] => Pr [17] => auml [18] => sentation

This is likely because of the HTML encoding that has "Präsentation" as "Präsentation". I guess in other languages there would be similar problems.

#3

nath - November 14, 2007 - 12:53

Ok, the problem with the encoding of umlauts can be solved by adding the following line to the beginning of _related_block_strip:

$text = html_entity_decode($text,ENT_QUOTES,"UTF-8");

#4

nath - November 14, 2007 - 16:41

A first version of the German word list:

$overusedwords = array( '', 'aber', 'alle', 'als', 'auch', 'auf', 'aus', 'bei', 'beim', 'bis', 'brauchen', 'da', 'damit', 'dann', 'das', 'dass', 'dem', 'den', 'denn', 'der', 'des', 'die', 'dies', 'diese', 'doch', 'drei', 'durch', 'eigene', 'ein', 'eine', 'einem', 'einen', 'einer', 'es', 'gut', 'für', 'haben', 'hat', 'ich', 'ihnen', 'ihr', 'ihre', 'ihren', 'ihrer', 'im', 'ins', 'ist', 'kann', 'können', 'kommt', 'man', 'mit', 'müssen', 'muss', 'nach', 'neue', 'nicht', 'nur', 'oder', 'per', 'schon', 'sehr', 'seine', 'sich', 'sie', 'sind', 'so', 'sollten', 'sowie', 'über', 'und', 'unter', 'von', 'was', 'welche', 'wenn', 'werden', 'wird', 'wie', 'zum', 'zur');

#5

rhys - November 15, 2007 - 09:19

Thanks, I'm quite busy with another module I'm in the process of finishing soon, so I'll try to get this idea as soon as possible.

#6

spiderman - November 15, 2007 - 20:34

i'm interested in making this work in a Drupal-ish way. my suspicion is that we should follow these instructions to make the module "translatable", and then work out a way to include grab the list of stopwords from the relevant .po file, somehow. at very least, we should take care to integrate with the i18n mechanisms which are in core for D6, for providing language selection options, etc.

perhaps the porterstemmer module has tackled this problem already?

#7

rhys - November 15, 2007 - 21:04

Agreed that the porterstemmer module be useful for making the related block more relevant at least in regards to the English language. I'm not sure that this process could be applied for multiple different languages.

Also, it seems from the list that nath provided that the usage of common words are somewhat similar, but not necessarily the same at least in terms of direct translation. This somewhat contradicts the system of the .pot files, as well as leaves the user unable to be able to specify more words that should be considered common.

To solve this, I suggest we implement somewhat of a locale specific list of common words, using the variable_get for common words.
This could be set up as an array with the key as the locale, and the list of common words similar to the current one.
Since we're doing only single words, this could be a string separated by spaces, which combined with an explode(' ',$variable), would provide the necessary array type to strip out the relevant words from whatever locale is currently selected. This would allow it to be integrated with modules such as the localizer module.

This would also allow a admin configurable method to edit the appropriate strings. This method could then use the .pot file to provide the default common string.

#8

nath - February 7, 2008 - 10:16

Any news on how we could proceed?

#9

rhys - March 4, 2008 - 16:27

So I'm going to do it somewhat in a drupalish way, which is to have seperate files which are included on the basis of locale. Will have a commit sometime soon.

#10

rhys - March 4, 2008 - 16:51
Status:active» needs work

It's totally untested, including even for syntax errors. You'll need the files which should be stored within the updated module. these are located in the "ignore" directory, which will contain the word lists for the various languages.

If you have problems, please let me know immediately, so I can do something about it.

#11

rhys - March 4, 2008 - 16:53
AttachmentSize
related_block.module.patch 3.57 KB
 
 

Drupal is a registered trademark of Dries Buytaert.