use extractFormat as 'text'
pwolanin - October 27, 2009 - 23:42
| Project: | Apache Solr Attachments |
| Version: | 6.x-2.x-dev |
| Component: | Code |
| Category: | task |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
Description
If extractOnly is true, additional input parameters we can use is:
extractFormat=xml|text - Default is xml. Controls the serialization format of the extract content. xml format is actually XHTML, like passing the -x command to the tika command line application, while text is like the -t command.
see also https://issues.apache.org/jira/browse/SOLR-1274.
I had planned to include this for the last weeks since I knew my patch got into Solr, but forgot in my excitement of getting this module working at all with Solr in the last few days.
Probably doesn't matter much since we are stripping out all tags, but should give even greater consistency between using tika and Solr.

#1
#2
committed
#3
Automatically closed -- issue fixed for 2 weeks with no activity.