use extractFormat as 'text'

pwolanin - October 27, 2009 - 23:42
Project:Apache Solr Attachments
Version:6.x-2.x-dev
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:closed
Description

If extractOnly is true, additional input parameters we can use is:

extractFormat=xml|text - Default is xml. Controls the serialization format of the extract content. xml format is actually XHTML, like passing the -x command to the tika command line application, while text is like the -t command.

see also https://issues.apache.org/jira/browse/SOLR-1274.

I had planned to include this for the last weeks since I knew my patch got into Solr, but forgot in my excitement of getting this module working at all with Solr in the last few days.

Probably doesn't matter much since we are stripping out all tags, but should give even greater consistency between using tika and Solr.

#1

pwolanin - October 27, 2009 - 23:45
Status:active» needs review
AttachmentSize
text-format-616426-1.patch 897 bytes

#2

pwolanin - October 27, 2009 - 23:57
Status:needs review» fixed

committed

AttachmentSize
text-format-616426-1.patch 897 bytes

#3

System Message - November 11, 2009 - 00:00
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.