Highlighting offset problems

Jody Lynn - April 10, 2009 - 19:24
Project:Apache Solr Search Integration
Version:6.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:duplicate
Description

I have a problem with highlighting- the highlighted characters are offset before the correct terms.

The site stores docbook xml in the node bodies and renders it as html, and this problem only happens on these nodes. The offset is variable per instance (I've seen 2, 4, 7, 17, 23 characters) but always highlights before it should when it happens. See screenshot.

This is using Acquia hosted solr.

AttachmentSize
Picture 1.png132.42 KB

#1

Jody Lynn - April 10, 2009 - 22:38

It seems to be caused by & # x 2 0 1 9 ; characters being used as apostrophes (spaces added so it doesn't get processed). It looks like in apachesolr_strip_ctl_chars and apachesolr_clean_text that these characters are spared from removal?

It seems like each one of them in my node causes a one character offset in highlighting after it.

#2

Jody Lynn - April 14, 2009 - 17:22
Status:active» duplicate

I think this is a duplicate of #382358: character encoding issues caus Solr highlighter to fail.
I believe I've resolved it for this site by specifying ASCII encoding for the xsl output.

 
 

Drupal is a registered trademark of Dries Buytaert.