Hi,

Now that I have figured out how to adjust the amount of data crawled through nutch and stored through Solr and how to adjust the length of the presentation of that data the last piece still remains:

How to adjust the teaser length while using the highlight feature.

Example:

I search "justice" on the site and get these returns:

Cases - All terms
03/22/2010 06/07/2010 Samuel A. Alito, Jr. 8-1 1 2 3 4 5 6 7 8 9 … next › last » Cases Justices Advocates ...

http://www.oyez.org/cases

Supreme Court Tour
Supreme Court Tour | The Oyez Project Skip to Navigation Oyez Site Feedback On The Docket Appellate.net Justia SCOTUSblog Cases Justices Advocates Benefactors About Tour Home › Supreme Court Tour › Supreme Court Tour Printer-friendly version Cases Justices Advocates Benefactors About Tour Footer Links ...

http://www.oyez.org/tour

Notice on the first result I only get the 100 characters or so? Notice on the second result I get 200? This is an issue. This happens because the word "justice" appears only once in the first instance and twice in the second instance and is cut off by some login in ApacheSolr.

How can I be sure? Maybe it's just all there is?

Simple enough to find out! Just go to your Solr admin instance on 8983. Type in the exact keyword in the basic query and get the results, Then, view the page source. There you will see exactly what Solr has in it's database for that query. For me, since I modified nutch/Solr to keep the whole page, I have, well, the whole page. So I know ApacheSolr logic is doing this.

What have I done so far?

I have looked all over the code and found a few places of interest.

apachesolr_search.module line 1464-1477
search.module line 1200-1245

What are my results?

Nothing. I can't seem to adjust the length of the highlighted portion of ApacheSolr.

What I want:

I want to make the length of the entire search snippet standard for all results, to lets say 300 characters, AND keep the highlighting, minus the character limit logic it currently uses.

Sound tough? Heck yeah it is. This would complete my tutorial http://drupal.org/node/968308 on adjusting teaser lengths using SolrSearch. Any input would be super duper.

Comments

maxmmize’s picture

Based on this:

http://wiki.apache.org/solr/HighlightingParameters

I have been adjusting my slop to


  <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
    <lst name="defaults">
      <!-- slightly smaller fragsizes work better because of slop -->
      <int name="hl.fragsize">400</int>
      <!-- allow 50% slop on fragment sizes -->
      <float name="hl.regex.slop">1.0</float>
      <!-- a basic sentence pattern -->
      <str name="hl.regex.pattern">[-\w ,/\n\"']{40,400}</str>
    </lst>
   </fragmenter>

I set it high to see results. I cleaned out my nutch and Solr and rebuilt them to vanilla. Still no result change.

maxmmize’s picture

I turned off highlighting in solrconfig.xml (both in my lib and in my modules folder) and re-ran everything, I still get highlighting, so this proves that Drupal is doing it. Now I just need to find out where and how.

<!-- example highlighter config, enable per-query with hl=true -->
     <str name="hl">false</str>

BTW, shouldn't turning this off prevent our mod form running highlighting?

maxmmize’s picture

Found the issue:

apachesolr_search.module is not being overriden by solrconfig.xml for some reason. (permissions?) I replaced NULL in the params for hl.fragsize to 400 and wham, done. Possible bug?

/**
 * Add highlighting settings to the search params.
 *
 * These settings are set in solrconfig.xml.
 * See the defaults there.
 * If you wish to override them, you can via settings.php
 */
function apachesolr_search_highlighting_params($query) {
  $params['hl'] = variable_get('apachesolr_hl_active', NULL);
  $params['hl.fragsize']= variable_get('apachesolr_hl_textsnippetlength', 400);
  $params['hl.simple.pre'] = variable_get('apachesolr_hl_pretag', NULL);
  $params['hl.simple.post'] = variable_get('apachesolr_hl_posttag', NULL);
  $params['hl.snippets'] = variable_get('apachesolr_hl_numsnippets', NULL);
  $params['hl.fl'] = variable_get('apachesolr_hl_fieldtohightlight', NULL);
  return $params;
}
maxmmize’s picture

Status: Active » Needs review
jbrauer’s picture

The variables seem to be ignored if they are empty. For example in settings.php setting:

'apachesolr_hl_pretag' => NULL,
'apachesolr_hl_posttag' => NULL,

still provides the words highlighted with tags. But putting something like '--' as the value causes it to render -- for the pre or post tag. Any value seems to work as long as it's not NULL or ''.

jpmckinney’s picture

Title: Adjust highlighting teaser length » Add UI for highlighting config variables
Version: 6.x-2.x-dev » 7.x-1.x-dev
Category: task » feature
Status: Needs review » Active

We need a patch for an issue to be "Needs review".

You can set these variables in your settings.php (or with strongarm). We should probably expose them in the UI.

Add feature in HEAD first.

pwolanin’s picture

Status: Active » Closed (won't fix)

I don't think we should add a UI for this to the base module.

The OP seems to be discussing some other issue -