SUMMARY:
we have two data sets which we would like to index and search from the same search bar, and have results from each be returned in the final result set. it feels like i should be able to do this, but i'm not sure where the best place to hack in is, if it's even possible.
DETAIL:
- first data set is, of course, Drupal data (using nid as index reference).
- second set is data from an external database, not at all tied to Drupal's nid's. it is a catalog of files for which we want to create links to description/download pages.
so, if we search for "trike" in drupal's search bar, we might get back a couple nodes of content with the work "trike" in them, as well as a result with a link describing a file with "trike" in the description of the file.
WHERE I'M AT SO FAR:
I've successfully used DataImportHandler to bring in data from the external database. SOLR itself seems to allow strings for id's, so I have used DIH/schema.xml to pre-pend "a" to each id that comes in from the external data set. this ensures each external id won't overwrite the internal nid's in Drupal's main index. i also arrange the external data into title, content, teaser, etc so that the records otherwise fit with Drupal's existing schema. from the JAVA DIH webapp, i'm able to successfully search for "trike" and get results both from Drupal and from the external data set. but when I go to the apachesolr_search box and search for "trike", only the Drupal results are printed to the screen.
IDEAS:
i think at some point in either the Drupal apachesolr_search module or the Drupal Solr Query Interface (or somewhere else?), the schema is changed from SOLR's, "an id is a string" to Drupal's, "an id is an int" and thus id's with alpha characters in them are discarded (for instance, id == "a101"). perhaps there's a way to hack in and override this without breaking existing functionality? my hope is, if i can identify where Drupal is reading the id, I can hack in and change those id's to a single, designated nid that i use as a dummy node. i can't do this at the SOLR indexing level, obviously, because it would just keep writing over/updating that id in the index as it built the index.
any ideas? thanks in advance!
Comments
Comment #1
pwolanin commentedYou might try looking at http://drupal.org/project/distributed_search or maybe http://drupal.org/project/adjustisearch
Comment #2
mausolos commentedI took a look at these, thank you. The distributed search requires openid stuff, which is something we're explicitly avoiding for the time being. Adjusti-search seems great if I had an entire secondary engine that was returning results of its own, but what I actually have is a distinct, secondary index which I'm essentially trying to interpolate with the primary index. That said, I could probably figure out something to handle search requests anyway, but if anyone has any other ideas, I'd love to hear them.
Thank you!
Comment #3
mausolos commentedhttp://drupal.org/node/635556 is related, I think...
Comment #4
Grayside commentedIs there any further guidance on this concept? I was hoping to use Solr to index several different (not all Drupal) sites and create a master search tool in Drupal.
Comment #5
jpmckinney commentedInstead of prepending "a", why not prepend "999999"? That would satisfy this module's expectation that id's are int's.
Comment #6
jpmckinney commentedMarking http://drupal.org/node/610892 duplicate.
Comment #7
scott.whittaker commentedAny success with this method?
Comment #8
pwolanin commentedThe concept of site hash used by the apachesolr_multisitesearch module is perfectly applicable to non-Drupal site. I suggest you look at that code.
Comment #10
dpalmer commentedmausolos, were you successful in your objective? I am trying to do the same thing now...
Comment #11
mausolos commentednot so much.. i believe i got as far as discovering that i could do it on the solr side of things by prepending each result id from the foreign dataset with, say, an alpha character; this would allow id 1 native to the foreign database to coexist as id a1 next to drupal nid 1. i was looking at search_api for D7 just last night, however, and it looks like it might have some ways of doing this that are much better (though I haven't carefully examined them yet, since that project is long behind me).
unfortunately, given these tools, we concluded that the most realistic ways of doing this without rewriting the solr search module were one of:
a) to simply designate an "address range" for drupal nid's. so our sql server, which is where the separate catalog existed, would get sql id + 1000000 or something, and we'd just pray that we never got a million nodes in drupal (or wherever our cutoff was).
b) give up and just import the catalog data into drupal nodes, updated every-so-often via rotation of table data (so update the table data on a non-live database, then transpose it within mysql from the dummy db to the live db, then trigger the index update stuff)
c) write a module whose sole purpose is to somehow trick the indexer into thinking its reading drupal nodes, when in fact it's reading an xml stream that's being fed via a stored procedure from the sql server, and masking the data as drupal nodes (obviously, this was just an idea, it's probably insane and might not even work).
best of luck, i'd love to hear what you end up doing! :)
Comment #12
Raul Cano commentedHi mausolos,
Maybe it's a bit too late but did you actually achieve something in D6? I am running the same case here (see my comment https://www.drupal.org/node/635556#comment-10190268), but it looks like there is not a simple solution for Drupal 6...