Highlighting offset problems
Jody Lynn - April 10, 2009 - 19:24
| Project: | Apache Solr Search Integration |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | duplicate |
Jump to:
Description
I have a problem with highlighting- the highlighted characters are offset before the correct terms.
The site stores docbook xml in the node bodies and renders it as html, and this problem only happens on these nodes. The offset is variable per instance (I've seen 2, 4, 7, 17, 23 characters) but always highlights before it should when it happens. See screenshot.
This is using Acquia hosted solr.
| Attachment | Size |
|---|---|
| Picture 1.png | 132.42 KB |

#1
It seems to be caused by
& # x 2 0 1 9 ;characters being used as apostrophes (spaces added so it doesn't get processed). It looks like in apachesolr_strip_ctl_chars and apachesolr_clean_text that these characters are spared from removal?It seems like each one of them in my node causes a one character offset in highlighting after it.
#2
I think this is a duplicate of #382358: character encoding issues caus Solr highlighter to fail.
I believe I've resolved it for this site by specifying ASCII encoding for the xsl output.