Hi,
I managed to collect at least basic stop words for the Czech langage. Could you please include them in the Lucene Api installation?
Here they are:
a, aby, aj, ale, anebo, ani, aniz, ano, asi, avska, az, ba, bez, bude, budem, budes, by, byl, byla, byli, bylo, byt, ci, clanek, clanku, clanky, co, com, coz, cz, dalsi, design, dnes, do, email, ho, i, jak, jake, jako, je, jeho, jej, jeji, jejich, jen, jeste, jenz, ji, jine, jiz, jsem, jses, jsi, jsme, jsou, jste, k, kam, kde, kdo, kdyz, ke, ktera, ktere, kteri, kterou, ktery, ku, ma, mate, me, mezi, mi, mit, mne, mnou, muj, muze, my, na, nad, nam, napiste, nas, nasi, ne, nebo, nebot, necht, nejsou, není, neni, net, nez, ni, nic, nove, novy, nybrz, o, od, ode, on, org, pak, po, pod, podle, pokud, pouze, prave, pred, pres, pri, pro, proc, proto, protoze, prvni, pta, re, s, se, si, sice, spol, strana, sve, svuj, svych, svym, svymi, ta, tak, take, takze, tamhle, tato, tedy, tema, te, ten, tedy, tento, teto, tim , timto, tipy, to, tohle, toho, tohoto, tom, tomto, tomuto, totiz, tu, tudiz, tuto, tvuj, ty, tyto, u, uz, v, vam, vas, vas, vase, ve, vedle, vice, vsak, vsechen, vy, vzdyt, z, za, zda, zde, ze, zpet, zpravy
I also enclose the stop words in UTF-8 txt file.
Bery
| Comment | File | Size | Author |
|---|---|---|---|
| czech stop words.TXT | 1.01 KB | xbery |
Comments
Comment #1
cpliakas commentedAwesome!!! Thanks for including. Since this is not "functionality" per-se, I will include it in the 6.x-2.5 release.
Thanks for the contribution,
Chris
Comment #2
xbery commentedThanks. I know that it it not functionality but it it somehow connected to it :-)
Glad to contribute,
Lukas
Comment #3
cpliakas commentedCool. The reason I added the "non functionality" part is because I have a strict policy of not adding new features into stable branches. It helps keep the code in working order without introducing new bugs, although it still happens :-). In this case, your code is adding new configurations so I am completely willing to integrate it into a stable branch. More or less just an FYI.
Thanks,
Chris
Comment #4
meba commentedWhy are "zpravy" (news), "clanek" (article), "clanky" (article, plural), "design" stop words?
Comment #5
xbery commentedWell, "design" and maybe "zpravy" words are misplaced I guess but the others related to Drupal itself won't do any good when used as keywords. Changing them to fit your suggestions.
a, aby, aj, ale, anebo, ani, aniz, ano, asi, avska, az, ba, bez, bude, budem, budes, by, byl, byla, byli, bylo, byt, ci, co, com, coz, cz, dalsi, dnes, do, ho, i, jak, jake, jako, je, jeho, jej, jeji, jejich, jen, jeste, jenz, ji, jine, jiz, jsem, jses, jsi, jsme, jsou, jste, k, kam, kde, kdo, kdyz, ke, ktera, ktere, kteri, kterou, ktery, ku, ma, mate, me, mezi, mi, mit, mne, mnou, muj, muze, my, na, nad, nam, nas, nasi, ne, nebo, nebot, necht, nejsou, není, neni, net, nez, ni, nic, nove, novy, nybrz, o, od, ode, on, org, pak, po, pod, podle, pokud, pouze, prave, pred, pres, pri, pro, proc, proto, protoze, prvni, pta, re, s, se, si, sice, spol, sve, svuj, svych, svym, svymi, ta, tak, take, takze, tamhle, tato, tedy, tema, te, ten, tedy, tento, teto, tim , timto, tipy, to, tohle, toho, tohoto, tom, tomto, tomuto, totiz, tu, tudiz, tuto, tvuj, ty, tyto, u, uz, v, vam, vas, vas, vase, ve, vedle, vice, vsak, vsechen, vy, vzdyt, z, za, zda, zde, ze, zpet
Removed also "stranka" (page) and few more others.
Comment #6
cpliakas commented