right now we are appending "extra" information retrieved via node_invoke_nodeapi($node, 'update index'); to the end of the body. However, we are also using the body to generate the snippets shown in search results, so this has the unfortunate effect of exposing text that is only meant to be indexed instead as part of the content - including the empty () problem: http://drupal.org/node/365258

Unfortunately, this would also mean that comments would not show up in search snippets, unless we separate out the 'extra' returned by comment module and just add that to the body.

Comments

pwolanin’s picture

Status: Active » Needs work
StatusFileSize
new2.68 KB

Here's a start on what a patch could look like.

robertdouglass’s picture

I'd like to go one step further with comments and give them their own field. So +1 for the approach in #1 with the addition of getting comments on their own. There is relatively significant demand for being able to search comments alone, or, alternatively, to search while ignoring comments. I also think there will be some demand for not indexing comments in an attempt to reduce index size.

robertdouglass’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev
jpmckinney’s picture

apachesolr_commentsearch addresses the "searching for comments" issue.

The patch has a syntax error:

+    $extra_text = 

I don't think we should special-case the "extra" information added by the comment module.

Maybe what we want to do is not use "body" as the hl.fl, but some other field that lacks the "extra" information?

jpmckinney’s picture

Version: 6.x-2.x-dev » 7.x-1.x-dev
pwolanin’s picture

Does the field used for highlighting need to be both indexed and stored? I'd guess it does - which would lead to rather bloating the index.

jpmckinney’s picture

hl.fl fields must be stored, yes.

pwolanin’s picture

Ok, well I think that means the only reasonable fix is to add the "extra" info into an extra search field that's part of the qf params, so it would be searched, but never part of the snippet.

pwolanin’s picture

Title: separate schema field for indexing "extra" information » separate schema field for indexing comments and "extra" information
Status: Needs work » Needs review
StatusFileSize
new10.38 KB

Thinking about this made me realize we should also just go ahead and index the comment in a separate field. That allows, at the least, for a different boost for comments versus body text.

Also we had a weird extra setting:

variable_get('apachesolr_index_comments_with_node', TRUE)

which seems totally redundant to

variable_get('apachesolr_exclude_nodeapi_types', array())

which allows you to exclude comments for any given node type.

pwolanin’s picture

StatusFileSize
new10.38 KB

re-roll for class name change

pwolanin’s picture

StatusFileSize
new10.3 KB

fix comments

cpliakas’s picture

One issue I ran into with this is that users who didn't have the ability to view comments were seeing them in the search results. See #717104: Output of search results show comments even though user has no rights to view. Before applying the patch, we should ensure these permissions are respected.

~Chris

pwolanin’s picture

http://api.drupal.org/api/drupal/modules--comment--comment.module/functi... seems to prevent coments from being indexed at all in this case in D7

cpliakas’s picture

Agreed. Although there is potential for improvement in terms of how this is handled mainly by core, the important part is that no information is disclosed to the end user.

pwolanin’s picture

Status: Needs review » Fixed

committed to 7.x

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.