Closed (fixed)
Project:
Apache Solr Search
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
29 Jan 2009 at 00:57 UTC
Updated:
23 Oct 2010 at 22:08 UTC
Jump to comment: Most recent file
Comments
Comment #1
damien tournoud commentedHere is a first patch for this. Do we need a steepness, in addition to the boost?
Comment #2
pwolanin commentednot sure why a steepness would be relevant here if you can set the boost per content type.
Why this instead of just saving an array and letting it be serialized/unserialized?
Also, per Jacob we need a setting (potentially) to totally exclude certain node types from indexing.
Comment #3
damien tournoud commentedFollowing a IRC discussion with Peter, here is a new version:
- use bf queries to set the type specific boosts
- allow to completely omit nodes from indexing
Comment #4
damien tournoud commentedAnd a fixed version.
Comment #5
JacobSingh commentednice implementation! This feature will be killer.
I feel there is a usability issue here though. It's certainly good to remove node types which are not going to be queried on, however we need to warn the user that they are being removed when you omit the boost. I imagine many users would assume that "Omit" is going to mean, do not boost it, not "remove it".
Also, when they turn on a node type which they had previously not been using, we should warn them and/or force a re-index so they are not confused.
What do you think? How should we make this clear.
Comment #6
JacobSingh commentedWait, I just reviewed again. I didn't notice you set the previously omitted nodes to be re-indexed.
Sorry :/
I'll go back through and do a proper review with the patched code later on today.
Comment #7
pwolanin commentedI think that a setting to omit from the index should really be a separate settings form from the boost. We should, but default, not any bq for content types if we can avoid it.
Comment #8
damien tournoud commentedThis still needs review, but this is wrong:
This should take the site hash into account too.
Comment #9
pwolanin commentedAh, sure at least if you think you might have a multi-site index.
Note, however, that the delete index operation is not limited currently to the current site - so we could go with this for now, but handle it better when multi-site support goes back in.
Comment #10
pwolanin commentedHere's a better patch that separates boosts from exclusion - also correctly handles the case where we 'Reset to defaults'
Comment #11
dwwI'll see if I can make time to review/test this, but I can't promise I will with all the other d6 upgrade issues on my plate... ;)
Comment #12
pwolanin commentedComment #13
pwolanin commentedcommitted to 6.x
Comment #14
dreed47 commentedI installed this patch and I see two issues with the node exclusion part.
First, the admin page at /admin/settings/apachesolr/index shows a count of all nodes in the system as though they are all to be indexed, even though I've set some node types to be excluded
Second (and much more important) It seems as though the cron job is pulling nodes that should be excluded. For example, I have it set to process 50 nodes per cron run and it pulls the first 50 that it comes to and they are all excluded node types so it indexes nothing and waits until the next cron run. For many people this may not be an issue but for my current use case it is. Say I have 10k nodes of a type that I don't want to index and 1k nodes of type that I do. The cron indexing should not have to loop thru all 11k nodes.
Comment #15
dwwHaven't confirmed myself, but sounds like #14 brings up an important bug in how this works. ;)
Comment #16
pwolanin commented@dww - yes, I'm aware of those issues - already had imagined we might need a follow-on patch. I'm not convinced that the node_load of non-indexed nodes is a problem, but much more serious is that indexing may hang forever if all the nodes selected for indexing on a given cron run are excluded.
Comment #17
pwolanin commentedThis might be a sufficient fix to prevent the really critical bug.
Comment #18
pwolanin commentedA little better refactoring - a separate hook for node exclusion.
Comment #19
damien tournoud commentedLooks like a good idea at first sight.
I commented (on IRC) on the previous version of the patch that we probably don't want to output:
When count($documents) == 0 ;)
Comment #20
dwwYup, +1 on the concept here. Code appears good on visual inspection though I haven't tested it. I guess I should really setup a solr instance on my laptop to test stuff like this. ;)
Comment #21
pwolanin commented@Damien - is it bad to watchdog that we sent 0? might help with debugging. Committing as is - we can revisit the watchdog call if needed.
@dww - it's really easy to run the example (Jetty) server locally. grab me in IRC if you want assistance.
Comment #22
pwolanin commentedsee follow-up patch: http://drupal.org/node/370796
Comment #24
kentr commentedJust want to confirm: the content type bias is only applied at index time, not at query time (so I must re-index to see the effect)?
Thanks.
[Edit] Found the answer: Content type bias is done at query time.