We have installed the regular Search module on our website without realizing that it didn't search documents. Then, we installed Search Files. I can see that it added a new search page to use for documents, but is there any way to combine this functionality with the regular Search module? It just seems increadably non-user-friendly to make our users have to search twice for every bit of information they want. (Not only searching twice, but doing so from different pages.)

Idealy, there should be one search engine that searches the entire site, including all the files.

CommentFileSizeAuthor
#18 search_files-268195-18.patch6.65 KBegfrith
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Rob_Feature’s picture

+1 on this. I may be developing this for a project coming up, but any input/help would be appreciated.

mooreds’s picture

Hi folks,

Any movement on this? In the 6.x version, the search is seamless, but the results are separated out.

markDrupal’s picture

RE DRUPAL 6.x:
here is a code snippit you can add to attachment_search.module, it still needs work and it is working with filefield CCK not attachments, You have to change
$node->field_file
to whatever it is in the attachment module, sorry I'm unclear on that bit.

with this snippit, the file contents are added to the node content when drupal creates the search index, so when you do a content search, the node will showup in the results if the attached file includes the search terms. So you could disable the attachments tab on the search results.

Again this is a start, hopefully it helps you.

/**
 * Implementation of hook_nodeapi()
 */
function search_attachments_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
  switch ($op) {
    case 'update index':
      if($node->field_file){
        foreach($node->field_file as $file){
          $contents .= "<p>". _search_attachments_index_file((object)$file) ."</p>";
        }
      }
      return $contents;
    break;
    /*
    case 'view':
      if($node->field_file){
        foreach($node->field_file as $file){
          $contents .= "<p>". _search_attachments_index_file((object)$file) ."</p>";
        }
      }
      $node->content['filestuff'] = array('#value' => "<H1>File Contents</h1>$contents");
      break;
     * 
     */
  }
}
Dinis’s picture

Hi Mark,

Does this patch require the patch posted in http://drupal.org/node/409516 allowing FileSearch to use the CCK field?

Kind regards,
Danielle

markDrupal’s picture

I believe it would work with attachments module as well, but you need to change these lines of code

      if($node->field_file){
        foreach($node->field_file ...

So that other patch is required to get it working with file field, but not required if you are using the attachments module, but either way, you have to change my code a little.

Dinis’s picture

Hi Mark,

I'm struggling with this :)

How do I find the realtionship with the different tables and apply them to your script?

I'm running a test with an attachment, (nid 6121). In the node table I can find no reference to an attachment, also I can see the file attached to the node in the files table (fid 3125). The only table I can see which seems to pull them together is the "upload" table which contains the nid and the fid.

I'm thinking I need to reference the upload table to link the searches together.

Kind regards,
Danielle

markDrupal’s picture

You shouldn't have to find the table in your DB. The way I went around it is to
1. Install the Devel module, enable it
2. add the following block of code (at the end) to your "search_attachments.module"

/**
 * Implementation of hook_nodeapi()
 */
function search_attachments_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
  if($node->type == 'link'){
    switch ($op) {
      case 'update index':
        if($node->field_file){
          foreach($node->field_file as $file){
            $contents .= "<p>". _search_attachments_index_file((object)$file) ."</p>";
          }
        }
        return $contents;
      break;

      case 'view':
        //dpm($node);
        if($node->field_file){
          foreach($node->field_file as $file){
            $contents .= _search_attachments_index_file((object)$file);
          }
          if($contents){
            $f[1] = array(
              '#type' => 'fieldset',
              '#title' => t('File Contents'),
              '#description' => t("(This text is required to be here for better search results)"),
              '#collapsible' => TRUE,
              '#collapsed' => TRUE
            );
            $f[1][1] = array(
              '#value' => "<p>". htmlentities($contents) ."</p>",
            );
            $node->content['filestuff']['#value'] = drupal_render($f);
          }
        }
        break;

    }
  }
}

3. uncomment the line //dpm($node); ==> dpm($node);
4. View a node with a file attachment, the dpm function will give you a nice display of the $node object in your web browser,
5. Look through the $node object in your web browser, and locate the array or object that contains your file information
6. Change $node->field_file in the code above to the correct file variable you found in step 5.

Hope you can find it

pvhee’s picture

subscribing

leici’s picture

Subscribing.

kid_baco’s picture

Has anyone managed to get this to work with search_files?

I've tried markDrupal's example, modifying it for search_files, but haven't had any luck. Has anyone managed to get it to work?

I've also been reading about ApacheSoir, which looks like a little complicated to set up (especially if you're trying to search files), and isn't cheap once you start running it on multi-servers. They list a site that features indexing of attached documents, which seems to work like what I'm looking for...

http://drupal.org/node/447564 (see the Institute for the Study of War example)

Thanks

markDrupal’s picture

Are you using filefield CCK or the Upload module for attaching files?

You need to identify which variable in the $node object contains the FILE object.

If you need help, you can try downloading the DEVEL module : http://drupal.org/project/devel
Enable it
Uncomment the line //dpm($node);
and view a node with a file attached
You will get a nicely formated display of the node object. From there you can locate the FILE object , look for something with a 'fid' and 'filepath' defined.
once you locate the FILE object, replace

$node->field_file

with the FILE object you found
It looks like 4 replacements are needed
yours may be

$node->attachments

or

$node->field_uploads

if you need more support, try to get a screen grab of the output of the $node object (by using dpm($node)) and post it to this issue

kid_baco’s picture

Version: 5.x-1.2 » 6.x-2.0-beta1

First off, thanks markDrupal for your last post. That helped me get things running.

Now I've got a new question.

I have a couple larger pdf's that are involved in a search. I find when I search for words from them under the "attachment" tab, it takes about 2 seconds to return the files in question, but when I search the same term under the "Content" tab, it takes 167 seconds.

I found that by commenting out the following line...

search_index($file->fid, 'attachment', $contents);

...in the _search_attachments_index_file function (which is called in the search_attachments_nodeapi example), the 167 second load time was brought down to 2.8 seconds.

Looking at the search_index function in search.module, it seems to be re-indexing the results that it's already retrieved from seach_dataset, search_index, etc. I'm just wondering if there will be consequences should I attempt to bypass this function during the retrieval of my search results, or is there a purpose for this that I'm not seeing.

Thanks

markDrupal’s picture

Nice catch, yeah in my code, every time the node is viewed it is also reindexed. When you do a content search Drupal renders each node and tries to find the relevant area of content so it can show you a short sample of the node on the search results page. So every time your huge PDF file shows up in the results it is also reindexed before you get the search results.
I looked at the _search_attachments_index_file() function and it looks like we can easily change it so it dosen't reindex the file on every node view.

I found this bit of code in the _search_attachments_index_file() function that we can use to speed things up

$contents = _search_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file->filepath);
  
/**
* Implementation of hook_nodeapi()
*/
function search_attachments_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
//You can limit this functionality to only certain node types by defining them here, or change it to TRUE to effect all node types
  if($node->type == 'link'){
    global $base_path;
    switch ($op) {
      case 'update index':
        if($node->field_file){
          foreach($node->field_file as $file){
            $contents .= "<p>". _search_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file['filepath']) ."</p>";
          }
        }
        return check_markup($contents);
      break;

      case 'view':
        //dpm($node);
        if($node->field_file){
          foreach($node->field_file as $file){
            $contents .= _search_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file['filepath']);
          }
          if($contents){
            $f[1] = array(
              '#type' => 'fieldset',
              '#title' => t('File Contents'),
              '#description' => t("(This text is required to be here for better search results)"),
              '#collapsible' => TRUE,
              '#collapsed' => TRUE
            );
            $f[1][1] = array(
              '#value' => "<p>". check_markup($contents) ."</p>",
            );
            $node->content['filestuff']['#value'] = drupal_render($f);
          }
        }
        break;

    }
  }
}
kid_baco’s picture

Thanks again Mark,

Although another little twist. I found with the latest code that, when indexing by running cron, the files weren't indexed properly. For example, those large files I spoke about had 2146 rows in search_index with the appropriate sid when I indexed them with your old process, but only 28 rows with the new code (and these words related to the node, not the file).

The quick fix I stuck in for the problem was simply checking the REQUEST_URI value for "search/node", since that string will appear in the url of the search page, so I use your new method if viewed on a page, but the old if ran outside of the page view.

I'm sure there is probably a better way but for now it's indexing and returning what I'm looking for. I hope to look further into this soon.

Thanks for all your help. Here is my tweaking of your function...

function search_attachments_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
//You can limit this functionality to only certain node types by defining them here, or change it to TRUE to effect all node types
  if($node->type == 'link'){
    global $base_path;
    switch ($op) {
      case 'update index':
        if($node->file){
          foreach($node->file as $file){

if (strpos($_SERVER[REQUEST_URI],'search/node/') == "1"){
            $contents .= _search_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file->filepath);
}else{
            $contents .= "<p>". _search_attachments_index_file($file) ."</p>";
}
          }
        }
        return check_markup($contents);
      break;

      case 'view':
        //dpm($node);
        if($node->files){
          foreach($node->files as $file){

if (strpos($_SERVER[REQUEST_URI],'search/node/') == "1"){
            $contents .= _search_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file->filepath);
}else{
            $contents .= _search_attachments_index_file($file);
}
          }
          if($contents){
            $f[1] = array(
              '#type' => 'fieldset',
              '#title' => t('File Contents'),
              '#description' => t("(This text is required to be here for better search results)"),
              '#collapsible' => TRUE,
              '#collapsed' => TRUE
            );
            $f[1][1] = array(
              '#value' => "<p>". check_markup($contents) ."</p>",
            );
            $node->content['filestuff']['#value'] = drupal_render($f);
          }
        }
        break;

    }
  }
}
thl’s picture

Anyone coming up with an idea to fulfill the original request and make "search" looking into "files in attachmens" and "files in directories" automatically without requiring the user to trigger three searches?

egfrith’s picture

Reading the about the search interface at http://api.drupal.org/api/group/search/6, it seems that code quite similar to this should do the trick for indexing attachments and file fields, except that it should implement nodeapi('search result') rather than nodeapi('search view'). I think that, as suggested at #363860: incorporate Search Files in Drupal default Search box, this code should be in a separate module, though using the search_files.module for the helper functions.

This would not solve the problem of finding files which are not linked to nodes either as a field or an attachment. This isn't a problem for me, as all my files are linked to nodes, but it wouldn't fulfill the description of the bug.

@maintainers: What do you think of this suggestion?

livingegg’s picture

+1 Subscribing

egfrith’s picture

I think my last suggestion doesn't quite fix the problem. It does do the searching, but when the search results are viewed, the link is to the node that the file is associated with, not the file itself.

To address this, I've made a start on a patch that searches through both the node and seach_attachements_att indicies similutaneously. This is done by creating a new version of do_search() called search_files_attachments_do_search(). This is almost identical to the core function, except that it can take an array of $types rather than just one $type. There is then code in search_files_attachments_search() (copied from node.module) to display the node if it is a node rather than a file.

I've deleted what appeared to be a redundant invocation of do_search() from the code.

If you think this is a worthwhile approach, I can clean up the patch by providing docs for search_files_attachments_do_search().

At present this code gets confused by files which are stored by means other than upload module - but I think this is to do with the query which has been commented out in the current dev version, and which I've deleted.

egfrith’s picture

Status: Active » Needs review
egfrith’s picture

Category: support » feature
dachande’s picture

I've created a little patch for search.module which will integrate the files search into the default search form. I've tested this with search_files-6.x-2.0-beta4 and it works quite well.

Index: search.module
===================================================================
--- search.module	(revision 15)
+++ search.module	(working copy)
@@ -1147,6 +1147,13 @@
   if (isset($keys)) {
     if (module_hook($type, 'search')) {
       $results = module_invoke($type, 'search', 'search', $keys);
+
+      // Include file results in node search
+      if ($type == 'node') {
+      	$file_results = module_invoke('search_files_attachments', 'search', 'search', $keys);
+      	$results = array_merge($results, $file_results);
+      }
+
       if (isset($results) && is_array($results) && count($results)) {
         if (module_hook($type, 'search_page')) {
           return module_invoke($type, 'search_page', $results);
Philo72’s picture

If your not using the search_files_attachments module then change the following

$file_results = module_invoke('search_files_attachments', 'search', 'search', $keys)

to

$file_results = module_invoke('search_files_directories', 'search', 'search', $keys)

Im guessing if you want to combine all three than you do this. (havent tested it as i dont use the attachments part.

     // Include file_attachments results in node search
      if ($type == 'node') {
        $file_results = module_invoke('search_files_attachments', 'search', 'search', $keys);
        $results = array_merge($results, $file_results);
      }

     // Include file_directories results in node search
      if ($type == 'node') {
        $file_results = module_invoke('search_files_directories', 'search', 'search', $keys);
        $results = array_merge($results, $file_results);
      }

Phil

Dane Powell’s picture

While I can confirm that the patch in #21 works, I don't think hacking core is the proper way to go about this (though it might be an okay stopgap solution for some people). I'd prefer to see a solution as in #18. However, there's something wonky with that patch file, I can't get it to apply. Also, from what I can tell it overreaches a bit, cleaning up file names and output and doing other things that I don't think are related to this issue (though they are certainly things that need to be worked on).

gianluca.b’s picture

How do you manage the pagination?

In this way every module_invoke will have its own pagination that creates conflicts each others.

punchmonkey’s picture

I'd be very interested in seeing some way to combine the search results. A site I'm currently working on will have a large mix of regular node content and PDF attachments added through either Upload core or FileField.

gauravkhambhala’s picture

How about pagination? Any updates to get it right?

buckley’s picture

+1 for combining the regular search results page with the file results

I see no reason for splitting them up and its quit a (major) usability problem.

makangus’s picture

all the solutions above have problems with pagination, the last invoked module always take over the pagination and each page always display 20 items instead of 10

mstrelan’s picture

I have come up with a solution based on #3 and #13. My version does not require the 'view' operation of nodeapi to show the file attachment content. It is based on 6.x-2.0-beta4 and using the standard attachments rather than filefield, but can be modified to use either. Perhaps this should have a config option.

The best part about my method is that the snippet shows that the text is from the attachment, as well as possibly from the node content.

EDIT: Updated code below to filter out irrelevant file attachments from search result.

<?php
/**
 * Implementation of hook_nodeapi()
 */
function search_files_attachments_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
  global $base_path;
  if (TRUE) {
    switch ($op) {
      case 'update index':
        if ($node->files){
          foreach($node->files as $file){
            $contents .= "<p>". search_files_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file->filepath) ."</p>";
          }
        }
        return check_markup($contents);
      break;

      case 'search result':
        if ($node->files) {
          $info = array();
          $keys = str_replace('search/node/', '', $_REQUEST['q']);
          foreach ($node->files as $file){
            $contents = search_files_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file->filepath);
            if ($contents) {
              $snippet = search_excerpt($keys, $contents);
              $snippet_plain = strip_tags($snippet);
              $relevance = 0;
              $tmp = array();
              $tmp_relevance = preg_match_all("/$keys/i", $snippet_plain, $tmp);
              $relevance += $tmp_relevance;
              foreach (explode(' ', $keys) as $key) {
                $tmp_relevance = preg_match_all("/$key/i", $snippet_plain, $tmp);
                $relevance += $tmp_relevance;
              }
              if ($relevance) {
                $info['attachments'][] = array(
                  'filename' => $file->filename,
                  'filepath' => $file->filepath,
                  'content' => $snippet,
                  'relevance' => $relevance,
                );
              }
            }
          }
          usort($info['attachments'], 'search_files_attachments_relevance_sort');
          return $info;
        }
        break;

    }
  }
}

function search_files_attachments_preprocess_search_result(&$variables) {
  if (isset($variables['info_split']['attachments'])) {
    $info_split = &$variables['info_split'];
    $snippet = &$variables['snippet'];
    $attachments = $info_split['attachments'];
    foreach ($attachments as $attachment) {
      $snippet .= '<p class="search-result-attachment"><strong>Attachment: <em>'. l($attachment['filename'], $attachment['filepath']) .'</em></strong> '. $attachment['content'] . '</p>';
    }
    unset($info_split['attachments']);
    $variables['info'] = implode(' - ', $info_split);
  }
}

function search_files_attachments_relevance_sort($a, $b) {
  if ($a['relevance'] == $b['relevance']) {
    return $a['filename'] < $b['filename'] ? -1 : 1;
  }
  return $a['relevance'] > $b['relevance'] ? -1 : 1;
}
?>
jay_N’s picture

Subscribing

boabjohn’s picture

@mstrelan: Thanks for a combined approach...happy to test it out but am not a code man. Can I just clarify a couple of points:

1. This module is search_files, not search_attachments (it apparently got combined at some point). In the original instructions by markDrupal @#7, he says to tack on the code at the end of the search_attachments.module

Can I confirm we are talking about the search_files.module?

2. Being very wary of code: I notice that the search_files.module opens with a <?php but does not close it.....the last line being simply a closing '}'

Am I really to copy literally the code snippet above and paste directly after the module's current closing bracket, thus changing the final character of the module to ?>?

Thanks...

Dane Powell’s picture

You are right to be wary; files should not be closed by ?>
http://drupal.org/coding-standards

boabjohn’s picture

Hi Dane...thanks for the tip: and have you given #29 a go? Results?

mstrelan’s picture

Hi boabjohn,

Search attachments is a sub module of Search files, it indexes files that are attached to nodes, rather than indexing files in the files directory. I believe the functions above should be search_attachments_... rather than search_files_attachments... so this will need to be updated in all the places it is referenced.

In regards to the closing php tag - PHP files don't require a closing php tag, but most of the time it is ok to have the closing tag. But as Dane mentioned it is against Drupal's coding standards to close it.

Hope this helps.

Michael

boabjohn’s picture

Howdy Michael, so sorry to be slow, and thanks for your patience!

I do want to index only files on the server that are attached to nodes...my files have been uploaded/attached via cck filefield.

So: do I still need to replacec all instances of search_files_attachments* with search_attachments* ??

Thank you for the guidance...hoping that this work might make it toward the module itself so poor nongs like me can make use of it without tormenting the code...

Cheers,

JB

mstrelan’s picture

Actually my original post was correct with the function names. Mine is meant to work with the Upload module, but it can be adapted to CCK fields by changing $node->files to whatever your field is, for example if your field is called files it would be $node->field_files

amin698uk’s picture

Component: Miscellaneous » Search Attachments
Assigned: Unassigned » amin698uk
Category: feature » support

Hi,

Im a new drupaller and would like the functionality of Search and Search file attachmets to be combined for our KB website.

There are a number of suggestions and patches proposed above however this confuses me as im not sure where i am to place these code snippets i.e. which module or script?

Any guidance/assistance would be appreciated.

Thanks
Mo

mstrelan’s picture

Hi amin698uk,

Usually it is best to wait until a patch is reviewed and tested and rolled in to a module update. You can try using any of the above suggestions but there are no guarantees as to what will happen.

This particular issue relates to the 6.x-2.x branch of search_files, rather than 6.x-1.x, so first make sure you have that version. If you have that version then in your search_files directory there will be a search_files_attachments.module file. You can paste my code directly to the bottom of that.

The Drupal cache will then need to be flushed. This can be done by going to admin/settings/performance and clicking on the clear all caches button.

You should then keep a close eye on the issue to see if a patch is included in a release, otherwise make sure you don't update the module without re-adding the code.

Hope that helps.
Michael

amin698uk’s picture

Assigned: amin698uk » Unassigned

Any assistance ont his would be appreciated.

curtaindog’s picture

http://drupal.org/node/607852#comment-3043360 presents a workaround for #21 that repages results in code to make the pager happy.

lucascaro’s picture

Hi all, the code from #29 worked for me after changing both lines that had

$contents = search_files_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file->filepath);

to

$contents = search_files_attachments_get_file_contents($file->filepath);

In my case the extra path argument wasn causing search_files_attachments_get_file_contents to look for the files in the wrong path (when using drush).

cheers.

awakenedvoice’s picture

Subscribing

benone’s picture

subscrr

jimboh’s picture

I also would like a single search entry point.
I have added code in #29.
I only get attachments displayed in the search results when the searched for word also exists in the node containing the attachments.
Im also not getting the same number of attachments shown under "Content" as appears if I use the attachments tab (the correct number).
and got this error warning: usort() [function.usort]: The argument should be an array in /home/content/69/6621969/html/drupal/sites/all/modules/search_files/search_files_attachments.module on line 322.
when single result found.

How would this solution be expected to work? would you disable the attachments tab?
As an alternative, as I only really need search on the attachments, is it possible to disbable the content/Users tabs?

mstrelan’s picture

@jimboh - my method is designed to co-exist with the files search, so users could search for content&files together (in the use case that they don't know whether the result should be a file or node content) or they can search specifically for a file (ie. they know they want a file attachment). It sounds to me you don't need my modifications, you could probably just do some form alters and redirect to search/files/SEARCH_TERMS

vsalvans’s picture

@mstrelan Thanks!!! it's what I need

I'd like to share my modifications for CCK in the mstrelan's code
Change $node->files to $node->field_your_cck_field_name (ex. $node->field_curs_pdf)
Change $node->filepath to $file['filepath']
Change return check_markup($contents) to return $contents; //may be the best way it's just srtip all html tags before.. don't know.

then where "if ($relevance)" I put this code

if ($relevance) {
                $info['attachments'][] = array(
                  'filename' => $file['filename'],
                  'filepath' => $file['filepath'],
                  'content' => $snippet,
                  'relevance' => $relevance,
                );
              } else {
	            $info['attachments'][] = array(
                  'filename' => $file['filename'],
                  'filepath' => $file['filepath'],
                  'content' => '..'.$keys.'..',
                  'relevance' => 1,
                );              
              }

"search_excerpt" function didn't return a valid snippet (pdfs can have many weird issues)

Don't forget clear cache to make search results list appear properly

Finally disable attachments search tab if you use CCK like me.
I like use directories search tab as it gives to user a more specific search but you can disable it aswell.

Thank all for your comments on this issue.

candelas’s picture

subscribing

Alan D.’s picture

Here is a fully functional field example based on mstrelan code above for the field field_attachments.

This forces the files to use the private file system and triggers a download attachment in the process.

glodigital.info

name = GloDigital
description = Contains many common code fragments and theming overrides.
package = Other
core = 6.x

glodigital.module


/**
 * Implementation of hook_nodeapi().
 */
function glodigital_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
  switch ($op) {
    case 'update index':
      return glodigital_search_files_update_index($node);

    case 'search result':
      return glodigital_search_files_search_result($node);
  }
}

/**
 * Callback of hook_nodeapi() operation 'update index'.
 */
function glodigital_search_files_update_index(&$node) {
  if (!empty($node->field_attachments) && function_exists('search_files_attachments_get_file_contents')) {
    $contents = '';
    foreach($node->field_attachments as $file){
      $file = (object) $file;
      $contents .= "<p>". search_files_attachments_get_file_contents('./' . $file->filepath) ."</p>";
    }
    return check_markup($contents);
  }
}

/**
 * Callback of hook_nodeapi() operation 'search result'.
 */
function glodigital_search_files_search_result(&$node) {
  if (!empty($node->field_attachments) && function_exists('search_files_attachments_get_file_contents')) {
    $info = array();
    $keys = str_replace('search/node/', '', $_REQUEST['q']);

    foreach ($node->field_attachments as $file){
      $file = (object) $file;
      $contents = search_files_attachments_get_file_contents('./' . $file->filepath);
      $contents = file_get_contents('./' . $file->filepath);
      if ($contents) {
        $snippet = search_excerpt($keys, $contents);
        $snippet_plain = strip_tags($snippet);
        $relevance = 0;
        $tmp = array();
        $tmp_relevance = preg_match_all("/$keys/i", $snippet_plain, $tmp);
        $relevance += $tmp_relevance;
        foreach (explode(' ', $keys) as $key) {
          $tmp_relevance = preg_match_all("/$key/i", $snippet_plain, $tmp);
          $relevance += $tmp_relevance;
        }
        if ($relevance) {
          $file = (array) $file;
          $file['content'] = $snippet;
          $file['relevance'] = $relevance;
          $info['attachments'][] = $file;
        }
      }
    }
    if (!empty($info['attachments'])) {
      usort($info['attachments'], 'search_files_attachments_relevance_sort');
    }
    return $info;
  }
}

/**
 * Implements hook_preprocess_search_result().
 */
function glodigital_preprocess_search_result(&$variables) {
  if (!empty($variables['info_split']['attachments'])) {
    $info_split = &$variables['info_split'];
    $snippet = &$variables['snippet'];
    $attachments = $info_split['attachments'];
    foreach ($attachments as $attachment) {
      $file = (object) $attachment;
      $path = $file->filepath;
      if (strpos($path, file_directory_path() . '/') === 0) {
        $path = trim(substr($path, strlen(file_directory_path())), '\\/');
      }
      $href =  url('system/files/'. $path, array('absolute' => TRUE));
      $options = array(
        'attributes' => array(
          'type' => $file->filemime . '; length=' . $file->filesize,
        ),
        'query' => 'download=1'
      );
      // Use the description as the link text if available.
      if (empty($file->data['description'])) {
        $link_text = $file->filename;
      }
      else {
        $link_text = $file->data['description'];
        $options['attributes']['title'] = $file->filename;
      }
      $snippet .= '<p class="search-result-attachment"><strong><em>'
        . theme('filefield_icon', $file) // The filefield icon
        . l($link_text, $href, $options) // Link to the file
        . ' (' . format_size($file->filesize) . ')' // File size
        . '</em></strong> '. $attachment['content'] . '</p>';
    }
    unset($info_split['attachments']);
    $variables['info'] = implode(' - ', $info_split);
  }
}

function search_files_attachments_relevance_sort($a, $b) {
  if ($a['relevance'] == $b['relevance']) {
    return $a['filename'] < $b['filename'] ? -1 : 1;
  }
  return $a['relevance'] > $b['relevance'] ? -1 : 1;
}

/**
 * Implements hook_file_download().
 *
 * Force downloads on files attached to certain content types.
 *
 * This only works when the link has been prefixed with 'system/files/' and the
 * query parameter 'download' is present.
 */
function glodigital_file_download($filepath) {
  if (!isset($_GET['download'])) {
    return NULL;
  }
  if ($file = db_query("SELECT * FROM {files} WHERE filepath like '%s'", $filepath)) {
    $filepath = file_create_path($filepath);
    return array(
      'Content-Type:' . $file->filemime,
      'Content-Disposition: attachment; filename="' . basename($filepath) . '";',
      'Content-Length: ' . sprintf('%u', filesize($filepath)),
    );
  }
}
Alan D.’s picture

Version: 6.x-2.0-beta1 » 6.x-2.x-dev
Category: support » feature

Even better, this one does all filefield types based on the field search display settings. Replace the following two functions in the code posted above. This would be generic enough to go into the main module (with hook name changes and remove function_exists() checks).

/**
 * Callback of hook_nodeapi() operation 'update index'.
 */
function glodigital_search_files_update_index(&$node) {
  // Dependency on CCK & Search Files modules.
  if (!defined('NODE_BUILD_SEARCH_INDEX') || !function_exists('search_files_attachments_get_file_contents')) {
    return;
  }

  // Gather type information.
  $type = content_types($node->type);

  // Loop through finding file fields.
  foreach ($type['fields'] as $field_name => $field) {
    if ($field['type'] == 'filefield') {
      // Only search if the field is not excluded or hidden from the search index.
      $search_settings = $field['display_settings'][NODE_BUILD_SEARCH_INDEX];
      if (empty($search_settings['exclude']) && $search_settings['format'] != 'hidden') {
        $field_values = $node->{$field_name};
        if (!empty($field_values)) {
          $contents = '';
          foreach($field_values as $file){
            $file = (object) $file;
            $contents .= "<p>". search_files_attachments_get_file_contents('./' . $file->filepath) ."</p>";
          }
          return check_markup($contents);
        }
      }
    }
  }
}

/**
 * Callback of hook_nodeapi() operation 'search result'.
 */
function glodigital_search_files_search_result(&$node) {
  if (!defined('NODE_BUILD_SEARCH_RESULT') || !function_exists('search_files_attachments_get_file_contents')) {
    return;
  }

  // Gather type information.
  $type = content_types($node->type);

  // Loop through finding file fields.
  foreach ($type['fields'] as $field_name => $field) {
    if ($field['type'] == 'filefield') {
      // Only search if the field is not excluded or hidden from the search index.
      $search_settings = $field['display_settings'][NODE_BUILD_SEARCH_RESULT];
      if (empty($search_settings['exclude']) && $search_settings['format'] != 'hidden') {
        $field_values = $node->{$field_name};
        if (!empty($field_values)) {
          $info = array();
          $keys = str_replace('search/node/', '', $_REQUEST['q']);
          foreach ($field_values as $file){
            $file = (object) $file;
            $contents = search_files_attachments_get_file_contents('./' . $file->filepath);
            if ($contents) {
              $snippet = search_excerpt($keys, $contents);
              $snippet_plain = strip_tags($snippet);
              $relevance = 0;
              $tmp = array();
              $tmp_relevance = preg_match_all("/$keys/i", $snippet_plain, $tmp);
              $relevance += $tmp_relevance;
              foreach (explode(' ', $keys) as $key) {
                $tmp_relevance = preg_match_all("/$key/i", $snippet_plain, $tmp);
                $relevance += $tmp_relevance;
              }
              if ($relevance) {
                $file = (array) $file;
                $file['content'] = $snippet;
                $file['relevance'] = $relevance;
                $info['attachments'][] = $file;
              }
            }
          }
          if (!empty($info['attachments'])) {
            usort($info['attachments'], 'search_files_attachments_relevance_sort');
          }
          return $info;
        }
      }
    }
  }
}