This is a patch that optionally includes taxonomy terms in search results. A checkbox appears in the settings page, and if set, terms will be added to the indexed text.
This also places some spaces into the indexed text to improve separation between elements (which was a problem even without this feature request).
| Comment | File | Size | Author |
|---|---|---|---|
| apache_solr_include_terms.patch | 2.45 KB | mattconnolly |
Comments
Comment #1
robertdouglass commentedThis part (and similar parts) have to get in, one way or another.
The taxonomy terms part needs to be dealt with separately.
Comment #2
janusman commentedProbably best to handle what is to be queried using a Dismax Solr query? Then it is not a matter of adding it during index, but at query time. This would work using the qf parameter.
Comment #3
robertdouglass commented@janusman: I've read the Dismax docs several times now. But. Could you please for me explain in detail why this is true?
Thanks!
Comment #4
robertdouglass commentedI've addressed the space issue in DRUPAL-5 and DRUPAL-6--1
The taxonomy terms issue remains.
Comment #5
janusman commentedOk, let me rephrase this =)
I think there is no harm in actually adding the taxonomy terms to the "text" portion--as we already are adding all the node body, including probable "noise" like CCK field names; so I guess I approve the current patch...
What I meant was that, additionally, we can get much better relevancy scoring of results, by using dismax, and specifying per-field boosts. However this is a separate discussion and already in progress at #284923: Boost title relevancy (so you can ignore my previous comment, IMO proceed with the patch, and continue the other discussion in parallel) =)
MAYBE in the future we'll realize that to maximize what admins can control about what is searched, how any field can affect relevancy, etc. then we might realize that putting stuff (title, body, taxonomy... etc) into the "text" field might not be best. But for now at least we want 100% recall.
Comment #6
robertdouglass commented@janusman: got it. Assigning issue to self.
Comment #7
robertdouglass commentedThis needs a re-roll. What's this bit for?
Also, what is the goal of the patch? Is it to aid in the finding of results when searching for term names? Or is it the optical display of the search results? My feeling is that we already do just the right amount for terms and term names (especially with janusman's hierarchical taxonomy patch that will likely get committed soon), so this patch should be dealing with the optics. Perhaps we should simply (optionally) show taxonomy term names on search results?
Comment #8
robertdouglass commentedI'm reconsidering whether this is critical (ie for 1.0 release). Please opine.
Comment #9
drunken monkeyThe goal seems to be to add the term names to the solr text field. Which puzzles me, because afaict terms already are included and found when searching the default field. Maybe this gets done via the hook_nodeapi('update index') call.
This would mean that this patch was rather useless in the current form, but I might be wrong.
In any case, I don't think this is critical for 1.0 (apart from the whitespace-between-strings-thing, which already got committed).
Comment #10
robertdouglass commentedAgree with the monkey.
Comment #11
JacobSingh commentedI second the monkey, although I would like to bring up the (not 1.0) conversation:
When people start wanting to index a site and then retrieve that index from another site, or from something else non-drupal, or when using a federated search, it would be nice to provide our rationalized data in way this is more portable. I realize that storing the term as {VOCAB}/{TERM} also has a downside because renaming terms becomes a Royal PITA, but something to consider...
Best,
Jacob
Comment #12
janusman commentedDoes $node->body depend at all on theming functions? If so, should we ensure a "minimum" set of fields to index? (e.g. terms. =) I vote yes for highly-relevant fields: title, terms, etc.
In my case, on my current theme, $node->body does NOT contain any terms... here's a sample from a Solr Query (only the "body" field from one node):
... and further down, the indexed taxonomy_name field:
So if we are searching only the Solr field "text" (the default) for "Engineering", this node wouldn't show up, because that's not in the node body (therefore, not in $text)
I vote for this other patch however, which not only includes terms but all parent terms in a hierarchic vocabulary (perhaps this case should still be marked "wont fix" and the other one will go through) =)
I can see ppl asking for support because terms aren't indexed because are using this or that theme... =)