Combining Search Files module with Search module
frewfrux - February 2, 2009 - 22:10
| Project: | Search Files |
| Version: | 6.x-2.0-beta1 |
| Component: | Miscellaneous |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs review |
Description
We have installed the regular Search module on our website without realizing that it didn't search documents. Then, we installed Search Files. I can see that it added a new search page to use for documents, but is there any way to combine this functionality with the regular Search module? It just seems increadably non-user-friendly to make our users have to search twice for every bit of information they want. (Not only searching twice, but doing so from different pages.)
Idealy, there should be one search engine that searches the entire site, including all the files.

#1
+1 on this. I may be developing this for a project coming up, but any input/help would be appreciated.
#2
Hi folks,
Any movement on this? In the 6.x version, the search is seamless, but the results are separated out.
#3
RE DRUPAL 6.x:
here is a code snippit you can add to attachment_search.module, it still needs work and it is working with filefield CCK not attachments, You have to change
$node->field_fileto whatever it is in the attachment module, sorry I'm unclear on that bit.
with this snippit, the file contents are added to the node content when drupal creates the search index, so when you do a content search, the node will showup in the results if the attached file includes the search terms. So you could disable the attachments tab on the search results.
Again this is a start, hopefully it helps you.
<?php/**
* Implementation of hook_nodeapi()
*/
function search_attachments_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
switch ($op) {
case 'update index':
if($node->field_file){
foreach($node->field_file as $file){
$contents .= "<p>". _search_attachments_index_file((object)$file) ."</p>";
}
}
return $contents;
break;
/*
case 'view':
if($node->field_file){
foreach($node->field_file as $file){
$contents .= "<p>". _search_attachments_index_file((object)$file) ."</p>";
}
}
$node->content['filestuff'] = array('#value' => "<H1>File Contents</h1>$contents");
break;
*
*/
}
}
?>
#4
Hi Mark,
Does this patch require the patch posted in http://drupal.org/node/409516 allowing FileSearch to use the CCK field?
Kind regards,
Danielle
#5
I believe it would work with attachments module as well, but you need to change these lines of code
<?phpif($node->field_file){
foreach($node->field_file ...
?>
So that other patch is required to get it working with file field, but not required if you are using the attachments module, but either way, you have to change my code a little.
#6
Hi Mark,
I'm struggling with this :)
How do I find the realtionship with the different tables and apply them to your script?
I'm running a test with an attachment, (nid 6121). In the node table I can find no reference to an attachment, also I can see the file attached to the node in the files table (fid 3125). The only table I can see which seems to pull them together is the "upload" table which contains the nid and the fid.
I'm thinking I need to reference the upload table to link the searches together.
Kind regards,
Danielle
#7
You shouldn't have to find the table in your DB. The way I went around it is to
1. Install the Devel module, enable it
2. add the following block of code (at the end) to your "search_attachments.module"
<?php
/**
* Implementation of hook_nodeapi()
*/
function search_attachments_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
if($node->type == 'link'){
switch ($op) {
case 'update index':
if($node->field_file){
foreach($node->field_file as $file){
$contents .= "<p>". _search_attachments_index_file((object)$file) ."</p>";
}
}
return $contents;
break;
case 'view':
//dpm($node);
if($node->field_file){
foreach($node->field_file as $file){
$contents .= _search_attachments_index_file((object)$file);
}
if($contents){
$f[1] = array(
'#type' => 'fieldset',
'#title' => t('File Contents'),
'#description' => t("(This text is required to be here for better search results)"),
'#collapsible' => TRUE,
'#collapsed' => TRUE
);
$f[1][1] = array(
'#value' => "<p>". htmlentities($contents) ."</p>",
);
$node->content['filestuff']['#value'] = drupal_render($f);
}
}
break;
}
}
}
?>
3. uncomment the line
<?php//dpm($node);
?>
<?phpdpm($node);
?>
4. View a node with a file attachment, the dpm function will give you a nice display of the $node object in your web browser,
5. Look through the $node object in your web browser, and locate the array or object that contains your file information
6. Change
<?php$node->field_file
?>
Hope you can find it
#8
subscribing
#9
Subscribing.
#10
Has anyone managed to get this to work with search_files?
I've tried markDrupal's example, modifying it for search_files, but haven't had any luck. Has anyone managed to get it to work?
I've also been reading about ApacheSoir, which looks like a little complicated to set up (especially if you're trying to search files), and isn't cheap once you start running it on multi-servers. They list a site that features indexing of attached documents, which seems to work like what I'm looking for...
http://drupal.org/node/447564 (see the Institute for the Study of War example)
Thanks
#11
Are you using filefield CCK or the Upload module for attaching files?
You need to identify which variable in the $node object contains the FILE object.
If you need help, you can try downloading the DEVEL module : http://drupal.org/project/devel
Enable it
Uncomment the line
//dpm($node);and view a node with a file attached
You will get a nicely formated display of the node object. From there you can locate the FILE object , look for something with a 'fid' and 'filepath' defined.
once you locate the FILE object, replace
<?php$node->field_file
?>
with the FILE object you found
It looks like 4 replacements are needed
yours may be
<?php$node->attachments
?>
or
<?php$node->field_uploads
?>
if you need more support, try to get a screen grab of the output of the $node object (by using dpm($node)) and post it to this issue
#12
First off, thanks markDrupal for your last post. That helped me get things running.
Now I've got a new question.
I have a couple larger pdf's that are involved in a search. I find when I search for words from them under the "attachment" tab, it takes about 2 seconds to return the files in question, but when I search the same term under the "Content" tab, it takes 167 seconds.
I found that by commenting out the following line...
search_index($file->fid, 'attachment', $contents);...in the _search_attachments_index_file function (which is called in the search_attachments_nodeapi example), the 167 second load time was brought down to 2.8 seconds.
Looking at the search_index function in search.module, it seems to be re-indexing the results that it's already retrieved from seach_dataset, search_index, etc. I'm just wondering if there will be consequences should I attempt to bypass this function during the retrieval of my search results, or is there a purpose for this that I'm not seeing.
Thanks
#13
Nice catch, yeah in my code, every time the node is viewed it is also reindexed. When you do a content search Drupal renders each node and tries to find the relevant area of content so it can show you a short sample of the node on the search results page. So every time your huge PDF file shows up in the results it is also reindexed before you get the search results.
I looked at the _search_attachments_index_file() function and it looks like we can easily change it so it dosen't reindex the file on every node view.
I found this bit of code in the _search_attachments_index_file() function that we can use to speed things up
<?php
$contents = _search_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file->filepath);
?>
<?php
/**
* Implementation of hook_nodeapi()
*/
function search_attachments_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
//You can limit this functionality to only certain node types by defining them here, or change it to TRUE to effect all node types
if($node->type == 'link'){
global $base_path;
switch ($op) {
case 'update index':
if($node->field_file){
foreach($node->field_file as $file){
$contents .= "<p>". _search_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file['filepath']) ."</p>";
}
}
return check_markup($contents);
break;
case 'view':
//dpm($node);
if($node->field_file){
foreach($node->field_file as $file){
$contents .= _search_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file['filepath']);
}
if($contents){
$f[1] = array(
'#type' => 'fieldset',
'#title' => t('File Contents'),
'#description' => t("(This text is required to be here for better search results)"),
'#collapsible' => TRUE,
'#collapsed' => TRUE
);
$f[1][1] = array(
'#value' => "<p>". check_markup($contents) ."</p>",
);
$node->content['filestuff']['#value'] = drupal_render($f);
}
}
break;
}
}
}
?>
#14
Thanks again Mark,
Although another little twist. I found with the latest code that, when indexing by running cron, the files weren't indexed properly. For example, those large files I spoke about had 2146 rows in search_index with the appropriate sid when I indexed them with your old process, but only 28 rows with the new code (and these words related to the node, not the file).
The quick fix I stuck in for the problem was simply checking the REQUEST_URI value for "search/node", since that string will appear in the url of the search page, so I use your new method if viewed on a page, but the old if ran outside of the page view.
I'm sure there is probably a better way but for now it's indexing and returning what I'm looking for. I hope to look further into this soon.
Thanks for all your help. Here is my tweaking of your function...
function search_attachments_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
//You can limit this functionality to only certain node types by defining them here, or change it to TRUE to effect all node types
if($node->type == 'link'){
global $base_path;
switch ($op) {
case 'update index':
if($node->file){
foreach($node->file as $file){
if (strpos($_SERVER[REQUEST_URI],'search/node/') == "1"){
$contents .= _search_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file->filepath);
}else{
$contents .= "<p>". _search_attachments_index_file($file) ."</p>";
}
}
}
return check_markup($contents);
break;
case 'view':
//dpm($node);
if($node->files){
foreach($node->files as $file){
if (strpos($_SERVER[REQUEST_URI],'search/node/') == "1"){
$contents .= _search_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file->filepath);
}else{
$contents .= _search_attachments_index_file($file);
}
}
if($contents){
$f[1] = array(
'#type' => 'fieldset',
'#title' => t('File Contents'),
'#description' => t("(This text is required to be here for better search results)"),
'#collapsible' => TRUE,
'#collapsed' => TRUE
);
$f[1][1] = array(
'#value' => "<p>". check_markup($contents) ."</p>",
);
$node->content['filestuff']['#value'] = drupal_render($f);
}
}
break;
}
}
}
#15
Anyone coming up with an idea to fulfill the original request and make "search" looking into "files in attachmens" and "files in directories" automatically without requiring the user to trigger three searches?
#16
Reading the about the search interface at http://api.drupal.org/api/group/search/6, it seems that code quite similar to this should do the trick for indexing attachments and file fields, except that it should implement nodeapi('search result') rather than nodeapi('search view'). I think that, as suggested at #363860: incorporate Search Files in Drupal default Search box, this code should be in a separate module, though using the search_files.module for the helper functions.
This would not solve the problem of finding files which are not linked to nodes either as a field or an attachment. This isn't a problem for me, as all my files are linked to nodes, but it wouldn't fulfill the description of the bug.
@maintainers: What do you think of this suggestion?
#17
+1 Subscribing
#18
I think my last suggestion doesn't quite fix the problem. It does do the searching, but when the search results are viewed, the link is to the node that the file is associated with, not the file itself.
To address this, I've made a start on a patch that searches through both the node and seach_attachements_att indicies similutaneously. This is done by creating a new version of do_search() called search_files_attachments_do_search(). This is almost identical to the core function, except that it can take an array of $types rather than just one $type. There is then code in search_files_attachments_search() (copied from node.module) to display the node if it is a node rather than a file.
I've deleted what appeared to be a redundant invocation of do_search() from the code.
If you think this is a worthwhile approach, I can clean up the patch by providing docs for search_files_attachments_do_search().
At present this code gets confused by files which are stored by means other than upload module - but I think this is to do with the query which has been commented out in the current dev version, and which I've deleted.
#19
#20
#21
I've created a little patch for search.module which will integrate the files search into the default search form. I've tested this with search_files-6.x-2.0-beta4 and it works quite well.
Index: search.module===================================================================
--- search.module (revision 15)
+++ search.module (working copy)
@@ -1147,6 +1147,13 @@
if (isset($keys)) {
if (module_hook($type, 'search')) {
$results = module_invoke($type, 'search', 'search', $keys);
+
+ // Include file results in node search
+ if ($type == 'node') {
+ $file_results = module_invoke('search_files_attachments', 'search', 'search', $keys);
+ $results = array_merge($results, $file_results);
+ }
+
if (isset($results) && is_array($results) && count($results)) {
if (module_hook($type, 'search_page')) {
return module_invoke($type, 'search_page', $results);
#22
If your not using the search_files_attachments module then change the following
$file_results = module_invoke('search_files_attachments', 'search', 'search', $keys)
to
$file_results = module_invoke('search_files_directories', 'search', 'search', $keys)
Im guessing if you want to combine all three than you do this. (havent tested it as i dont use the attachments part.
<?php
// Include file_attachments results in node search
if ($type == 'node') {
$file_results = module_invoke('search_files_attachments', 'search', 'search', $keys);
$results = array_merge($results, $file_results);
}
// Include file_directories results in node search
if ($type == 'node') {
$file_results = module_invoke('search_files_directories', 'search', 'search', $keys);
$results = array_merge($results, $file_results);
}
?>
Phil
#23
While I can confirm that the patch in #21 works, I don't think hacking core is the proper way to go about this (though it might be an okay stopgap solution for some people). I'd prefer to see a solution as in #18. However, there's something wonky with that patch file, I can't get it to apply. Also, from what I can tell it overreaches a bit, cleaning up file names and output and doing other things that I don't think are related to this issue (though they are certainly things that need to be worked on).
#24
How do you manage the pagination?
In this way every module_invoke will have its own pagination that creates conflicts each others.