Hi,

I managed to collect at least basic stop words for the Czech langage. Could you please include them in the Lucene Api installation?

Here they are:

a, aby, aj, ale, anebo, ani, aniz, ano, asi, avska, az, ba, bez, bude, budem, budes, by, byl, byla, byli, bylo, byt, ci, clanek, clanku, clanky, co, com, coz, cz, dalsi, design, dnes, do, email, ho, i, jak, jake, jako, je, jeho, jej, jeji, jejich, jen, jeste, jenz, ji, jine, jiz, jsem, jses, jsi, jsme, jsou, jste, k, kam, kde, kdo, kdyz, ke, ktera, ktere, kteri, kterou, ktery, ku, ma, mate, me, mezi, mi, mit, mne, mnou, muj, muze, my, na, nad, nam, napiste, nas, nasi, ne, nebo, nebot, necht, nejsou, není, neni, net, nez, ni, nic, nove, novy, nybrz, o, od, ode, on, org, pak, po, pod, podle, pokud, pouze, prave, pred, pres, pri, pro, proc, proto, protoze, prvni, pta, re, s, se, si, sice, spol, strana, sve, svuj, svych, svym, svymi, ta, tak, take, takze, tamhle, tato, tedy, tema, te, ten, tedy, tento, teto, tim , timto, tipy, to, tohle, toho, tohoto, tom, tomto, tomuto, totiz, tu, tudiz, tuto, tvuj, ty, tyto, u, uz, v, vam, vas, vas, vase, ve, vedle, vice, vsak, vsechen, vy, vzdyt, z, za, zda, zde, ze, zpet, zpravy

I also enclose the stop words in UTF-8 txt file.

Bery

CommentFileSizeAuthor
czech stop words.TXT1.01 KBxbery

Comments

cpliakas’s picture

Category: feature » task
Issue tags: +6.x-2.5

Awesome!!! Thanks for including. Since this is not "functionality" per-se, I will include it in the 6.x-2.5 release.

Thanks for the contribution,
Chris

xbery’s picture

Thanks. I know that it it not functionality but it it somehow connected to it :-)

Glad to contribute,

Lukas

cpliakas’s picture

Cool. The reason I added the "non functionality" part is because I have a strict policy of not adding new features into stable branches. It helps keep the code in working order without introducing new bugs, although it still happens :-). In this case, your code is adding new configurations so I am completely willing to integrate it into a stable branch. More or less just an FYI.

Thanks,
Chris

meba’s picture

Status: Active » Needs review

Why are "zpravy" (news), "clanek" (article), "clanky" (article, plural), "design" stop words?

xbery’s picture

Well, "design" and maybe "zpravy" words are misplaced I guess but the others related to Drupal itself won't do any good when used as keywords. Changing them to fit your suggestions.

a, aby, aj, ale, anebo, ani, aniz, ano, asi, avska, az, ba, bez, bude, budem, budes, by, byl, byla, byli, bylo, byt, ci, co, com, coz, cz, dalsi, dnes, do, ho, i, jak, jake, jako, je, jeho, jej, jeji, jejich, jen, jeste, jenz, ji, jine, jiz, jsem, jses, jsi, jsme, jsou, jste, k, kam, kde, kdo, kdyz, ke, ktera, ktere, kteri, kterou, ktery, ku, ma, mate, me, mezi, mi, mit, mne, mnou, muj, muze, my, na, nad, nam, nas, nasi, ne, nebo, nebot, necht, nejsou, není, neni, net, nez, ni, nic, nove, novy, nybrz, o, od, ode, on, org, pak, po, pod, podle, pokud, pouze, prave, pred, pres, pri, pro, proc, proto, protoze, prvni, pta, re, s, se, si, sice, spol, sve, svuj, svych, svym, svymi, ta, tak, take, takze, tamhle, tato, tedy, tema, te, ten, tedy, tento, teto, tim , timto, tipy, to, tohle, toho, tohoto, tom, tomto, tomuto, totiz, tu, tudiz, tuto, tvuj, ty, tyto, u, uz, v, vam, vas, vas, vase, ve, vedle, vice, vsak, vsechen, vy, vzdyt, z, za, zda, zde, ze, zpet

Removed also "stranka" (page) and few more others.

cpliakas’s picture

Status: Needs review » Closed (won't fix)