Postponed
Project:
Xapian integration
Version:
7.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Issue tags:
Reporter:
Created:
15 Oct 2008 at 10:19 UTC
Updated:
18 Sep 2012 at 05:21 UTC
Jump to comment: Most recent file
Comments
Comment #1
open-keywords commentedActually, this post seem to say yes !
http://www.trellon.com/blog/xapian-search-drupal
Comment #2
singularoWhile we have not yet added the indexing of pdf, doc etc in, it is in our plans, but has not been a high priority for our own projects so far.
Patches are welcomed ;-)
Comment #3
miiimoooHi. I've made a start on this. The patch may not help you. My scenario might be different from yours. I needed to integrate the fileshare module. So it's a dirty little hack. I do the indexing in an external cron job using omega. This adds a tab to the normal search called 'Files', so no need to patch drupal core. Then returns search results from whatever got indexed by the cronjob. In my case this is the fileshare folders.
Comment #4
marvil07 commentedmoving to 6.x
Comment #5
marvil07 commentedchanging the title to make a little more sense.
The plan is to support upload and filefield modules
IMO we should follow omega implementation, using external tools(any2text app), copying an rearranging from there:
format/apps I see easier:
other format/apps:
Comment #6
marvil07 commentedComment #7
marvil07 commentedI think this is going to fit good at 7.x #923752: Integrating with search_api, postponing until that gets in.
Comment #8
marvil07 commented#923752: Integrating with search_api got in!
Comment #9
marvil07 commentedComment #10
ywarnier commentedThe following code might serve as an inspiration at least for the text extracting bit (+ marvil07 has experience in the piece of code in this project related to Xapian and indexing all uploaded documents): http://code.google.com/p/chamilo/source/browse/main/inc/lib/document.lib...
Comment #11
marvil07 commentedlet's finally try this
Comment #12
marvil07 commentedAfter exploring/making some code and constantly rewriting it(aka first time familiarizing with d7 fields api, that's not really so similar to cck, whatever :-p) I end up thinking that I will be creating an independent project for getting a plain text version of each field, if possible.
So instead of just another module in xapian, it would be one module_exists().
Attaching current code(a new xapian submodule), but hoping to move it to its own soon and before adding code in the xapian project.
Comment #13
marvil07 commentedI started a sandbox project module to generate a plain text representation of fields: Plain. Postponing this a little until I move it to a full project.
I also opened a discussion to get feedback about it on the Contributed Module Ideas group: Plain text for fields.
Comment #14
marvil07 commentedSee http://drupal.org/project/search_api_attachments and http://drupal.org/sandbox/cpliakas/1145040
Hopefully code can be merged to unify backend extraction(maybe on converter or at plain) and then integrated(making search_api_attachments plugabble).