Patch (to be ported)
Project:
Search API Database Search
Version:
7.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
4 Oct 2011 at 15:32 UTC
Updated:
14 Jun 2017 at 09:37 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
luksakI managed to test it with Solr and it doesn't work either. changing the project.
Comment #2
zambrey commentedSubscribe, just want to know if fulltext search is somehow possible.
Comment #3
drunken monkeyThis behaviour is actually not defined in the Search API, it's up to the individual backends on how the text is searchable, whether partial matches are supported, etc. Currently, as far as I know no search backend provides this out-of-the-box, though.
While it's a simple matter of changing a config file for Solr, it's much more complicated for the database backend. We'd probably have to index (and search for) n-grams and provide this as an option along with the current behaviour.
In any case, having this in the DB backend-specific issue queue was right.
Comment #4
luksakThank you for the feedback.
This is going to be necessary I guess, isn't it? Is it going to require a lot of effort?
Could you point me in the right direction with Solr approach? I have got no experience in using it.
Lukas
Comment #5
drunken monkeyDepends on what „this“ is.
See, e.g., #1056018: Better document Solr config customization options and #1307784: Fuzzy Search.
Comment #6
luksakGood searchengines (e.g. Google) shows you results also if a string only partially matches. This functionality should be in the Search API and also usable by people who do not have Solr. For a first step the functionality I initially described (MySQL's LIKE '%foo%') would already satisfy me.
For now I will go for the Solr integration. But I will have projects where I do not have the possibility to install Solr on the server but still need a good search.
Edit:
The Solr approach worked perfectly. I had to change the schema.xml: Inside the node
<fieldType name="text" ...I added two new filters
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />and uncommented the line
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>Edit 2:
I uncommented the line
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>again because somehow this removed the exact matches. Maybe I will understand Solr one day ;)
Comment #7
luksakI guess this is the solution for this problem: fuzzysearch. I have not tested it since I use Solr now.
Comment #8
btmash commentedWanted to add a clarification to this regarding apachesolr. EdgeNGramFilterFactory (
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />) only creates n-grams from the beginning or end of an input token. If you want more fuzzy matches, you should try NGramFilterFactory (<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="25" />) instead.Comment #9
drunken monkeyThanks for the link, I didn't know that module! Seems great, though. I've added it to the module list on the Search API project page.
Wow, thanks a lot for that! Seems like I've been giving wrong advice to people for over half a year …
Good to know, though.
Although recently I implemented this in Solr (on a private project, not cleanly enough for the official one) by adding wildcards to all words' starts and ends and using the "edismax" request handler (available since 3.5, I think), which also worked very well.
Anyway, still leaving this open as we might one day want to implement this functionality here, too. Maybe by joining efforts with the "Fuzzy Search" maintainer.
Comment #10
luksakIt's great to hear that there is some progress.
How did you use edismax?
I do not understand why there are two DB search server modules. You really should join those two modules.
Comment #11
attiks commentedFYI: have a look at http://drupal.org/project/search_api_string_filter
Comment #12
semiaddict commentedI was able to get partial word searches on fulltext fields by slightly modifying the generated query.
A patch is attached with those modifications.
Note: I am not sure if this has any negative impact on any other portions (facets, sorts, etc).
Comment #13
robmc commentedThanks for the patch! I can confirm this works. We have testing scheduled which should expose issues. I'll report if any are found.
Cheers,
Rob McCrea
Comment #14
agoradesign commentedThanks for the patch! I can also confirm that it works as proposed. This behaviour offers definitely a far better user experience!
Comment #15
Anonymous (not verified) commented#12 seems working(I dont see why it shouldnt ^^).
Comment #16
achtonI'd be curious to hear the module maintainer's opinion on the approach in #12. The search_api_string_filter module does not work for fulltext fields, so is not an option in my case.
Comment #17
mrharolda commentedThe patch in #12 works, but IMHO it's best to be able to select what word matching you'd like on your site: exact (= ...) or fuzzy (LIKE %...%).
Does the search_api support engine-specific options?
Edit: yes it does, but only on the server part, not for queries ... :(
Comment #18
mrharolda commentedSlightly improved patch with better input filtering ... still looking in making this optional.
Comment #19
achtonIt seems this approach breaks the "search keys" feature, ie. searching for multiple words.
My testing suggests that only the first term is used for fetching results, if the patch in #18 is applied.
Can anyone confirm this?
Comment #20
mrharolda commented@achton Can't confirm your issue.
I did however find out that nodes with multiple hits show up multiple times in your results. A search on 'tes' with a node with both 'test' and 'testing' in it will give duplicate results. :(
Comment #21
colanThat last patch looks good (no duplicate results), and doesn't noticeably harm performance. I RTBCed it, but maybe that's premature if we want to make it an option? I don't see a problem with leaving it as is, but maybe we could create another issue to make it an option if folks want to turn it off?
Comment #22
drunken monkeySeems logical, I guess, that's the problem with using "LIKE".
I'm also pretty sure it will perform worse than my proposed solution of giving the user the option to index n-grams instead of only whole words. However, this one is of course much easier to implement (save for the little problem mentioned above), and if we let users decide if they want this behaviour (and the associated performance losses) it should be OK, I guess.
So, if you fix the above bug and add the option for users (as you say, including it on the server and using it on all queries is the only way here, regrettably – #1720348: Add the concept of query extenders might change this in the future, though), I would commit this. The option should probably contain a short warning regarding the performance impact.
Comment #23
yenidem commentedThank you very much @MrHaroldA and @semiaddict #12, this patch worked. I have spend a loooot of time for this issue.
Comment #24
semei commentedWill this get committed? I personally also feel like partial mathcing is absolutely indispensable.
Comment #25
drunken monkeyAs said in #22, the current patch has still two problems. If those get addressed, I'll commit the patch. (We should probably add some tests, too, to ensure that behavior doesn't pop up later.)
Comment #26
Anonymous (not verified) commentedIs it a problem to add distinct() ??
Comment #27
damienmckennaComment #28
drunken monkeyWhy should this suddenly not need work? If you think just adding
distinct()will work, then please, just do that.Comment #29
damienmckenna@drunken: I just changed the status to 'needs review' to trigger testbot; background: I'm helping maintain a site and discovered the above patch had been applied, I wanted to get a quick status check on it, sorry for adding some confusion.
Comment #30
drunken monkeyAh, OK. Please just note that next time.
At least it's good to know our current tests already catch the problems with this patch.
Comment #31
Johnny vd Laar commentedFor people that don't want to hack search api db. Here is a piece of code that you can place in your own module that does the same as the patch.
Comment #32
geerlingguy commentedFrom #22:
Could you say what, exactly, is required to get this patch through? I'm in agreement with #24:
Without partial matching, search is pretty useless in most use cases I've encountered... for me, if I need to worry about performance at all, there's no way I'm going to use database-backed search anyways. At that point, I'll switch to solr for the indexing/searching.
But I'd really like to see this patch (or something like it) committed, so I'm willing to work on whatever improvements are required.
Comment #33
drunken monkey- Eliminate the duplicate results when multiple words match, and make all tests pass.
- Add a server option to switch this behavior on or off (default to off), with a note that this might have a negative impact on performance.
Comment #34
geerlingguy commentedOkay, I'll try to get to this soon.
Comment #35
Anonymous (not verified) commented#31 If you don't work with nodes, this code:
basically makes the whole hook inoperative.
So it's better to use:
Comment #36
gilsbert commentedHi.
Nice news.
+1 waiting for the patch.
Comment #37
styrbaekThe Views Pager disappear when using the code in #31
Comment #37.0
styrbaektypo
Comment #38
stopshinal commentedWhat is the status on this? I also need partial matching.
-#31, I tried this code and it didn't seem to have an effect.
Comment #39
jon pughRe-rolled on 7.x as of Nov 19th.
Comment #40
mrharolda commentedThe patch above still hasn't got an option to enable/disable partial matching, nor does it handle duplicates in the result.
Edit: the db_like() filtering was also stripped from the patch!
Comment #41
markplindsay commentedI was able to get #31's module code working with some modifications. The idea is that you want to get all conditionals using LIKE.
It looks like a Search API query on many fulltext fields (picked out in the Search API UI's Fields tab) is formed using a series of UNIONs. In my case, I had fulltext enabled on some taxonomy term names so users could search by tags. But the LIKE conditionals were not being extended to these taxonomy term name UNIONs. So searching for partial tag names wasn't working.
By running $to_like on the UNION conditionals as well, I was able to get partial matches working with these other fulltext fields. My code isn't perfect and may not accommodate your use case, but maybe it can give you a start.
Comment #42
jncrucesThe previus comment worked for me perfectly. Only i added the percentage symbol before the search text to simulate a search like "contains any word".
Comment #43
FranciscoLuz commented#42 works!
Comment #44
FranciscoLuz commentedComment #45
Anonymous (not verified) commentedThere should be also "starts with" and "ends with" options available in addition to "contains".
Comment #46
Anonymous (not verified) commented#42 works but only if this condition is met
is_object($tables['t']['table']which is not always the case. I had my search set up to only index the title field, in which case $tables['t']['table'] is just a string and not an object.Also, the Views pager disappears. Removing the $query->distinct(); line makes it appear again. Duplicates don't seem to appear in my case, but haven't tested properly.
Comment #47
jncruces#46 has the reason. Removing $query->distinct(); is solved the pager problem.
Thanks.
Comment #48
Johnny vd Laar commentedAttached patch works with the new db structure and also provides a server option to switch on / off this partial search behavior.
Comment #50
Johnny vd Laar commentedOk I think I missed a var somewhere. Here is a fix for the notice.
Comment #51
FranciscoLuz commentedThe patch at #50 worked as advertised.
I am having an issue though with portuguese characters like ç, ã, é and so on.
Say for instance I am searching for tração but type tracao instead, it won't return the results containing tração.
Does anyone know how could I fix this issue.
Comment #52
Anonymous (not verified) commented#51 see 2100665
Comment #53
drunken monkeyThanks, that looks great! And when the test bot is happy, it's pretty probable that it will also work correctly – at least when the option is not enabled. However, it would be great if you could also add some tests for this option, so we have some proof that it really works (and can assure it keeps working).
Also, this part is very confusing (though rather brilliant I have to say, after finally having understood it) and thus in dire need of some comment explaining it:
Apart from that, as said, it's great! Thanks a lot again!
Comment #54
caesius commentedPatch needs updating for the latest dev commit which touched the same parts of service.inc.
Comment #55
Johnny vd Laar commentedI've updated my patch with the latest commit, added some comments and added test scripts. Lets see what testbot thinks of it.
Comment #56
drunken monkeyWow, excellent work, thanks! I wish all contributors to my modules were as good as you …
Anyways, there were still two tiny faults with your patch:
This search is sorted by score, but 1/2/4 and 3/5 have actually the same score (5 and 1, respectively). Therefore, this test passing relies on the database sorting by ID for identical scores, which is not always the case (and should in any case not be relied on here).
Therefore, I added an explicit
id ASCsort.Existing servers won't have this setting, so there should be an
empty()around it.Also, since you only use it once, we don't really need a variable for it, I'd say. (Even though the temptation to keep it and rename it to
$match_partsis quite large …)I also changed some strings and added basic method documentation. Please see the attached patch, I hope it still passes. If it's also still alright with you, I think we can finally commit this!
Comment #57
drunken monkeyOops, that's the right one.
Comment #58
Poieo commentedThe partial matching seems to be working well. However, now I'm getting the following error using today's dev with this patch.
Comment #59
Poieo commentedA little more info that may help...I'm using the Full Text search field and the only time the issue appears is if the search term is a single word. If I use multiple words, the results, including partial, work great.
Comment #60
Johnny vd Laar commentedI didn't encounter that error, can you perhaps post the query that search api db generated?
Comment #61
Poieo commentedI tried to use View's settings to show the query but I only get the following along with the error message: Query No query was run.
Is there anything else I can do to provide you with this information?
Comment #62
Poieo commentedUnder Fulltext search settings in views, under 'Use as', if Search keys is selected I do not get the error, but if Search filter is selected I get the error...this may be unavoidable. I'm pretty sure I need to have Search keys selected anyway.
I tried both settings on a clean install of Drupal Commerce and the error does occur for Search filter.
Comment #63
Johnny vd Laar commentedI can't seem to reproduce this error. I have:
Do you also have the error when you didn't apply the patch?
Comment #64
drunken monkeyThanks a lot for reporting this problem! After some playing around, I could also reproduce this.
However, getting a reproducible test case, finding the root cause of the problem and then fixing it took several hours. But now I'm done and the attached patch should hopefully solve this and some other problems with specific setups. I've also included tests for the problems I've found (there were several).
Please see if this patch now (or still) works for you!
It's really complex so I'd like to make sure it breaks nobody's setup.
Comment #65
Johnny vd Laar commentedLast patch works for me, but I didn't get the error in the first place. Does it work for you Poieo?
Comment #66
caesius commentedPatch no longer applies; service.inc has since been updated.
Comment #67
jeroentCreated reroll of this patch.
Comment #68
caesius commentedWorks for me. I was previously having an issue with multiple-word searches not working, but this patch fixes that.
Comment #70
drunken monkeyOK, great!
Committed (finally).
Thanks again to everyone working on this, especially Johnny!
Comment #71
Johnny vd Laar commentedyou're welcome! thanks for committing.
Comment #72
jnorell commentedTrying to get this feature to work, what am I missing? I'm using the default search box and results page (not custom view).
Search on parts of a wordcheckbox.I then search for known partial terms in title or body (and longer than min length), with no results found.
Thanks...
Comment #74
drunken monkey#2286329: Incorrect facet counts in multi-word search reports a regression caused by this patch, it seems we accidentally reverted the commit from #1403916-4: Multi word search results sets incorrect count for search_api_facets when Facets is in use.
These added lines were introduced in #55 – do you still know why you added them, Johnny, or can you reconstruct it? I can't find anything wrong with removing those, and the tests also don't fail.
@ jnorell: Please don't comment in fixed issues, otherwise your comments are likely to be overlooked (as happened here). Either re-open them when commenting, or open a new issue.
Have you been able to get this to work again? What you describe looks like it should work, and re-indexing shouldn't be required either. Is the index maybe lying on the wrong server, or did you miss some other mistake in the setup?
Comment #75
Johnny vd Laar commentedHmm I don't remember anymore why I've done that.
Comment #76
drunken monkeyOK, to be expected, I guess.
Could you maybe test the patch in #2286329-4: Incorrect facet counts in multi-word search and see whether partial searches still work for you after this? Then I'd just commit and we'll assume this somehow got in from an earlier version or something.
Comment #77
drunken monkeyClosing here again since we committed the other issue.
Comment #78
brunorios1 commentedHi,
I have two nodes with titles: 111222333 and 111-222-333
A search for 111-222-333 returns the 2 nodes as results.
But a search for 111222333 only returns the node with exact match title 111222333.
This is expected?
How can I create a search that works in both cases?
Thanks!
Comment #79
mrharolda commented@brunorios1: only SOLR support that kind of searches.
Check this for more info: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Tokenizers
Comment #80
brunorios1 commentedThanks @MrHaroldA,
I'll take a look.
Comment #81
brunorios1 commented@MrHaroldA,
I managed to do this with search_api_db using the search_api Tokenizr processor with the dash in the Ignorable Characters field.
Thanks again.
Comment #82
brunorios1 commentedCan you please confirm if there's a "starts with" option available as mentioned in #45?
My server is configured to search parts of words.
I have an exposed filter (Indexed node: Title) in my view, but I only see "contains" and "doesn't contain" avalable in the operators options.
Thanks!
Comment #83
Anonymous (not verified) commentedHow/where do I enable the partial match/search? I don't see anything in the UI.Overlooked this is D7.
Comment #84
guilopes commentedHello @brunorios1, This patch add option to search by "Start With"
Comment #85
drunken monkeySeems I overlooked this while identifying issues that need to be ported to D8. This never made it in there, only three places where, for some reason, the D7 code for this shows through.
Comment #86
drunken monkeyMore or less a 1:1 port of the D7 patch.
Comment #87
borisson_The code looks great and I think the tests for this are very readable and as far as I can see they cover all the changes.
Comment #89
drunken monkeyGreat to hear, thanks for reviewing!
Committed.
Comment #90
drunken monkeyDamn, this seems to have broken tests for some reason. Or the test bot is in "random fails" mode again, which actually seems more likely to me, given the test results.
But let's see.
Comment #92
drunken monkeyOK, it seems the test fails are actually unrelated to this issue, but most likely to a change in Core. And that's when I'd hoped with the stable release this sh*t would be over …
See #2668908: Fix new test fails.
Comment #93
nyariv commentedI am getting an sql error on multiple keyword search for #86 patch using fulltext search contextual filter:
Comment #94
nyariv commentedComment #95
drunken monkeyCan you reproduce this with a clean installation, using the latest versions of both Core and this module?
If so, please list the steps to reproduce this problem in more detail.
Comment #96
nyariv commentedTried backtracing the steps on my environment but I could not reproduce it. Looking at the code in Database.php , if either of the lines 1814
or 1872
are removed then the error disappears. It seems the issue is when the $not_nested flag is false and multi word is used and partial matches are on.
Comment #97
nyariv commentedOk I managed to reproduce on clean install. The steps are:
1) Install profile standard with search api 8.x-1.x, enable all logging messages to display
2) Install Database Search Defaults module
3) Change database server setting 'minimum word length' to 1 and enable 'search parts of word'
4) Create search index view, add search fulltext contextual filter
5) Insert any two different words into arguments and update preview
Comment #98
drunken monkeyCan't reproduce it, sorry. Are you maybe using a database software other than MySQL – e.g., Postgres?
Sorry, forgot to ask that in my previous mail.
Comment #99
drunken monkeySorry, disregard the above – it seems I had error display disabled, for some strange reason.
Actually, this really is broken, as the attached patch should show. Setting to "Needs review" for the test bot.
Comment #102
drunken monkeyI think/hope I managed to fix it. Please test/review!
Also, as far as I can see, the minimum character count doesn't influence this at all.
Comment #103
drunken monkeyAlso, at least the test needs to be backported.
Comment #106
borisson_I really like the tests, they are very expressive @drunken_monkey++
I can't help but nitpick at least a little bit.
I think this'd be more readable if it'd have the surrounding parenthesis because I didn't pick up on the fact that this is a short if statement.
Too much indentation here, I think.
Can we make this annotation more specific? Is
string[]|nullcorrect?Comment #107
drunken monkeyI would have expected nothing less. ;)
Should all be fixed with the attached updated patch.
Not really a "short if statement", just an assignment of a boolean value. But sure, we can add parantheses.
Comment #108
drunken monkeyDrupal.org didn't want my patch …
Comment #109
borisson_Looks great!
Comment #110
nyariv commentedTested #108, seems to work well.
Comment #112
drunken monkeyGreat, thanks for reviewing and testing!
Committed.
Comment #113
drunken monkeyComment #114
ricdeters commentedI'm new to drupal. What is required to get this funtionality in Drupal 7?
Comment #115
drunken monkeyThe functionality is already present in Drupal 7, you just need to enable it in the search server settings (
admin/config/sarch/search_api/server/SERVER_ID/edit). What still needs to be ported is a small bug fix that might cause problems with this option in some situations.If you are a developer and want to port the patch, start by adding the assertions from the patch to D7's
SearchApiDbTest::searchSuccessPartial()and see if they pass. If not, try to apply the same fix as in the D8 patch (the code is largely the same or similar).Comment #116
mikemadison commentedI know mostly this is about D8 at this point, but a quick throwback all the way to #6 for D7...
We are using Search API Solr and were running into incomplete partial matching using a views based search page. It took me a bit to understand why this wasn't working, as my schema.xml file already contained the suggested code above...
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />HOWEVER the code wasn't being applied to a text field. Once I added the filter class to the text field definition in the schema.xml, reloaded the SOLR core, and then re-indexed the server, it worked great.
Comment #117
nikolay borisov commented+1 for http://drupal.org/project/search_api_string_filter for Drupal 7