Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
We have installed the regular Search module on our website without realizing that it didn't search documents. Then, we installed Search Files. I can see that it added a new search page to use for documents, but is there any way to combine this functionality with the regular Search module? It just seems increadably non-user-friendly to make our users have to search twice for every bit of information they want. (Not only searching twice, but doing so from different pages.)
Idealy, there should be one search engine that searches the entire site, including all the files.
Comment | File | Size | Author |
---|---|---|---|
#18 | search_files-268195-18.patch | 6.65 KB | egfrith |
Comments
Comment #1
Rob_Feature CreditAttribution: Rob_Feature commented+1 on this. I may be developing this for a project coming up, but any input/help would be appreciated.
Comment #2
mooreds CreditAttribution: mooreds commentedHi folks,
Any movement on this? In the 6.x version, the search is seamless, but the results are separated out.
Comment #3
markDrupal CreditAttribution: markDrupal commentedRE DRUPAL 6.x:
here is a code snippit you can add to attachment_search.module, it still needs work and it is working with filefield CCK not attachments, You have to change
$node->field_file
to whatever it is in the attachment module, sorry I'm unclear on that bit.
with this snippit, the file contents are added to the node content when drupal creates the search index, so when you do a content search, the node will showup in the results if the attached file includes the search terms. So you could disable the attachments tab on the search results.
Again this is a start, hopefully it helps you.
Comment #4
Dinis CreditAttribution: Dinis commentedHi Mark,
Does this patch require the patch posted in http://drupal.org/node/409516 allowing FileSearch to use the CCK field?
Kind regards,
Danielle
Comment #5
markDrupal CreditAttribution: markDrupal commentedI believe it would work with attachments module as well, but you need to change these lines of code
So that other patch is required to get it working with file field, but not required if you are using the attachments module, but either way, you have to change my code a little.
Comment #6
Dinis CreditAttribution: Dinis commentedHi Mark,
I'm struggling with this :)
How do I find the realtionship with the different tables and apply them to your script?
I'm running a test with an attachment, (nid 6121). In the node table I can find no reference to an attachment, also I can see the file attached to the node in the files table (fid 3125). The only table I can see which seems to pull them together is the "upload" table which contains the nid and the fid.
I'm thinking I need to reference the upload table to link the searches together.
Kind regards,
Danielle
Comment #7
markDrupal CreditAttribution: markDrupal commentedYou shouldn't have to find the table in your DB. The way I went around it is to
1. Install the Devel module, enable it
2. add the following block of code (at the end) to your "search_attachments.module"
3. uncomment the line
//dpm($node);
==>dpm($node);
4. View a node with a file attachment, the dpm function will give you a nice display of the $node object in your web browser,
5. Look through the $node object in your web browser, and locate the array or object that contains your file information
6. Change
$node->field_file
in the code above to the correct file variable you found in step 5.Hope you can find it
Comment #8
pvhee CreditAttribution: pvhee commentedsubscribing
Comment #9
leici CreditAttribution: leici commentedSubscribing.
Comment #10
kid_baco CreditAttribution: kid_baco commentedHas anyone managed to get this to work with search_files?
I've tried markDrupal's example, modifying it for search_files, but haven't had any luck. Has anyone managed to get it to work?
I've also been reading about ApacheSoir, which looks like a little complicated to set up (especially if you're trying to search files), and isn't cheap once you start running it on multi-servers. They list a site that features indexing of attached documents, which seems to work like what I'm looking for...
http://drupal.org/node/447564 (see the Institute for the Study of War example)
Thanks
Comment #11
markDrupal CreditAttribution: markDrupal commentedAre you using filefield CCK or the Upload module for attaching files?
You need to identify which variable in the $node object contains the FILE object.
If you need help, you can try downloading the DEVEL module : http://drupal.org/project/devel
Enable it
Uncomment the line
//dpm($node);
and view a node with a file attached
You will get a nicely formated display of the node object. From there you can locate the FILE object , look for something with a 'fid' and 'filepath' defined.
once you locate the FILE object, replace
with the FILE object you found
It looks like 4 replacements are needed
yours may be
or
if you need more support, try to get a screen grab of the output of the $node object (by using dpm($node)) and post it to this issue
Comment #12
kid_baco CreditAttribution: kid_baco commentedFirst off, thanks markDrupal for your last post. That helped me get things running.
Now I've got a new question.
I have a couple larger pdf's that are involved in a search. I find when I search for words from them under the "attachment" tab, it takes about 2 seconds to return the files in question, but when I search the same term under the "Content" tab, it takes 167 seconds.
I found that by commenting out the following line...
search_index($file->fid, 'attachment', $contents);
...in the _search_attachments_index_file function (which is called in the search_attachments_nodeapi example), the 167 second load time was brought down to 2.8 seconds.
Looking at the search_index function in search.module, it seems to be re-indexing the results that it's already retrieved from seach_dataset, search_index, etc. I'm just wondering if there will be consequences should I attempt to bypass this function during the retrieval of my search results, or is there a purpose for this that I'm not seeing.
Thanks
Comment #13
markDrupal CreditAttribution: markDrupal commentedNice catch, yeah in my code, every time the node is viewed it is also reindexed. When you do a content search Drupal renders each node and tries to find the relevant area of content so it can show you a short sample of the node on the search results page. So every time your huge PDF file shows up in the results it is also reindexed before you get the search results.
I looked at the _search_attachments_index_file() function and it looks like we can easily change it so it dosen't reindex the file on every node view.
I found this bit of code in the _search_attachments_index_file() function that we can use to speed things up
Comment #14
kid_baco CreditAttribution: kid_baco commentedThanks again Mark,
Although another little twist. I found with the latest code that, when indexing by running cron, the files weren't indexed properly. For example, those large files I spoke about had 2146 rows in search_index with the appropriate sid when I indexed them with your old process, but only 28 rows with the new code (and these words related to the node, not the file).
The quick fix I stuck in for the problem was simply checking the REQUEST_URI value for "search/node", since that string will appear in the url of the search page, so I use your new method if viewed on a page, but the old if ran outside of the page view.
I'm sure there is probably a better way but for now it's indexing and returning what I'm looking for. I hope to look further into this soon.
Thanks for all your help. Here is my tweaking of your function...
Comment #15
thl CreditAttribution: thl commentedAnyone coming up with an idea to fulfill the original request and make "search" looking into "files in attachmens" and "files in directories" automatically without requiring the user to trigger three searches?
Comment #16
egfrith CreditAttribution: egfrith commentedReading the about the search interface at http://api.drupal.org/api/group/search/6, it seems that code quite similar to this should do the trick for indexing attachments and file fields, except that it should implement nodeapi('search result') rather than nodeapi('search view'). I think that, as suggested at #363860: incorporate Search Files in Drupal default Search box, this code should be in a separate module, though using the search_files.module for the helper functions.
This would not solve the problem of finding files which are not linked to nodes either as a field or an attachment. This isn't a problem for me, as all my files are linked to nodes, but it wouldn't fulfill the description of the bug.
@maintainers: What do you think of this suggestion?
Comment #17
livingegg CreditAttribution: livingegg commented+1 Subscribing
Comment #18
egfrith CreditAttribution: egfrith commentedI think my last suggestion doesn't quite fix the problem. It does do the searching, but when the search results are viewed, the link is to the node that the file is associated with, not the file itself.
To address this, I've made a start on a patch that searches through both the node and seach_attachements_att indicies similutaneously. This is done by creating a new version of do_search() called search_files_attachments_do_search(). This is almost identical to the core function, except that it can take an array of $types rather than just one $type. There is then code in search_files_attachments_search() (copied from node.module) to display the node if it is a node rather than a file.
I've deleted what appeared to be a redundant invocation of do_search() from the code.
If you think this is a worthwhile approach, I can clean up the patch by providing docs for search_files_attachments_do_search().
At present this code gets confused by files which are stored by means other than upload module - but I think this is to do with the query which has been commented out in the current dev version, and which I've deleted.
Comment #19
egfrith CreditAttribution: egfrith commentedComment #20
egfrith CreditAttribution: egfrith commentedComment #21
dachande CreditAttribution: dachande commentedI've created a little patch for search.module which will integrate the files search into the default search form. I've tested this with search_files-6.x-2.0-beta4 and it works quite well.
Comment #22
Philo72 CreditAttribution: Philo72 commentedIf your not using the search_files_attachments module then change the following
$file_results = module_invoke('search_files_attachments', 'search', 'search', $keys)
to
$file_results = module_invoke('search_files_directories', 'search', 'search', $keys)
Im guessing if you want to combine all three than you do this. (havent tested it as i dont use the attachments part.
Phil
Comment #23
Dane Powell CreditAttribution: Dane Powell commentedWhile I can confirm that the patch in #21 works, I don't think hacking core is the proper way to go about this (though it might be an okay stopgap solution for some people). I'd prefer to see a solution as in #18. However, there's something wonky with that patch file, I can't get it to apply. Also, from what I can tell it overreaches a bit, cleaning up file names and output and doing other things that I don't think are related to this issue (though they are certainly things that need to be worked on).
Comment #24
gianluca.b CreditAttribution: gianluca.b commentedHow do you manage the pagination?
In this way every module_invoke will have its own pagination that creates conflicts each others.
Comment #25
punchmonkey CreditAttribution: punchmonkey commentedI'd be very interested in seeing some way to combine the search results. A site I'm currently working on will have a large mix of regular node content and PDF attachments added through either Upload core or FileField.
Comment #26
gauravkhambhala CreditAttribution: gauravkhambhala commentedHow about pagination? Any updates to get it right?
Comment #27
buckley CreditAttribution: buckley commented+1 for combining the regular search results page with the file results
I see no reason for splitting them up and its quit a (major) usability problem.
Comment #28
makangus CreditAttribution: makangus commentedall the solutions above have problems with pagination, the last invoked module always take over the pagination and each page always display 20 items instead of 10
Comment #29
mstrelan CreditAttribution: mstrelan commentedI have come up with a solution based on #3 and #13. My version does not require the 'view' operation of nodeapi to show the file attachment content. It is based on 6.x-2.0-beta4 and using the standard attachments rather than filefield, but can be modified to use either. Perhaps this should have a config option.
The best part about my method is that the snippet shows that the text is from the attachment, as well as possibly from the node content.
EDIT: Updated code below to filter out irrelevant file attachments from search result.
Comment #30
jay_N CreditAttribution: jay_N commentedSubscribing
Comment #31
boabjohn CreditAttribution: boabjohn commented@mstrelan: Thanks for a combined approach...happy to test it out but am not a code man. Can I just clarify a couple of points:
1. This module is search_files, not search_attachments (it apparently got combined at some point). In the original instructions by markDrupal @#7, he says to tack on the code at the end of the search_attachments.module
Can I confirm we are talking about the search_files.module?
2. Being very wary of code: I notice that the search_files.module opens with a
<?php
but does not close it.....the last line being simply a closing '}'Am I really to copy literally the code snippet above and paste directly after the module's current closing bracket, thus changing the final character of the module to
?>
?Thanks...
Comment #32
Dane Powell CreditAttribution: Dane Powell commentedYou are right to be wary; files should not be closed by
?>
http://drupal.org/coding-standards
Comment #33
boabjohn CreditAttribution: boabjohn commentedHi Dane...thanks for the tip: and have you given #29 a go? Results?
Comment #34
mstrelan CreditAttribution: mstrelan commentedHi boabjohn,
Search attachments is a sub module of Search files, it indexes files that are attached to nodes, rather than indexing files in the files directory. I believe the functions above should be search_attachments_... rather than search_files_attachments... so this will need to be updated in all the places it is referenced.
In regards to the closing php tag - PHP files don't require a closing php tag, but most of the time it is ok to have the closing tag. But as Dane mentioned it is against Drupal's coding standards to close it.
Hope this helps.
Michael
Comment #35
boabjohn CreditAttribution: boabjohn commentedHowdy Michael, so sorry to be slow, and thanks for your patience!
I do want to index only files on the server that are attached to nodes...my files have been uploaded/attached via cck filefield.
So: do I still need to replacec all instances of search_files_attachments* with search_attachments* ??
Thank you for the guidance...hoping that this work might make it toward the module itself so poor nongs like me can make use of it without tormenting the code...
Cheers,
JB
Comment #36
mstrelan CreditAttribution: mstrelan commentedActually my original post was correct with the function names. Mine is meant to work with the Upload module, but it can be adapted to CCK fields by changing
$node->files
to whatever your field is, for example if your field is called files it would be$node->field_files
Comment #37
amin698uk CreditAttribution: amin698uk commentedHi,
Im a new drupaller and would like the functionality of Search and Search file attachmets to be combined for our KB website.
There are a number of suggestions and patches proposed above however this confuses me as im not sure where i am to place these code snippets i.e. which module or script?
Any guidance/assistance would be appreciated.
Thanks
Mo
Comment #38
mstrelan CreditAttribution: mstrelan commentedHi amin698uk,
Usually it is best to wait until a patch is reviewed and tested and rolled in to a module update. You can try using any of the above suggestions but there are no guarantees as to what will happen.
This particular issue relates to the 6.x-2.x branch of search_files, rather than 6.x-1.x, so first make sure you have that version. If you have that version then in your search_files directory there will be a search_files_attachments.module file. You can paste my code directly to the bottom of that.
The Drupal cache will then need to be flushed. This can be done by going to admin/settings/performance and clicking on the clear all caches button.
You should then keep a close eye on the issue to see if a patch is included in a release, otherwise make sure you don't update the module without re-adding the code.
Hope that helps.
Michael
Comment #39
amin698uk CreditAttribution: amin698uk commentedAny assistance ont his would be appreciated.
Comment #40
curtaindog CreditAttribution: curtaindog commentedhttp://drupal.org/node/607852#comment-3043360 presents a workaround for #21 that repages results in code to make the pager happy.
Comment #41
lucascaro CreditAttribution: lucascaro commentedHi all, the code from #29 worked for me after changing both lines that had
$contents = search_files_attachments_get_file_contents(str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . $base_path . $file->filepath);
to
$contents = search_files_attachments_get_file_contents($file->filepath);
In my case the extra path argument wasn causing search_files_attachments_get_file_contents to look for the files in the wrong path (when using drush).
cheers.
Comment #42
awakenedvoice CreditAttribution: awakenedvoice commentedSubscribing
Comment #43
benone CreditAttribution: benone commentedsubscrr
Comment #44
jimboh CreditAttribution: jimboh commentedI also would like a single search entry point.
I have added code in #29.
I only get attachments displayed in the search results when the searched for word also exists in the node containing the attachments.
Im also not getting the same number of attachments shown under "Content" as appears if I use the attachments tab (the correct number).
and got this error warning: usort() [function.usort]: The argument should be an array in /home/content/69/6621969/html/drupal/sites/all/modules/search_files/search_files_attachments.module on line 322.
when single result found.
How would this solution be expected to work? would you disable the attachments tab?
As an alternative, as I only really need search on the attachments, is it possible to disbable the content/Users tabs?
Comment #45
mstrelan CreditAttribution: mstrelan commented@jimboh - my method is designed to co-exist with the files search, so users could search for content&files together (in the use case that they don't know whether the result should be a file or node content) or they can search specifically for a file (ie. they know they want a file attachment). It sounds to me you don't need my modifications, you could probably just do some form alters and redirect to search/files/SEARCH_TERMS
Comment #46
vsalvans CreditAttribution: vsalvans commented@mstrelan Thanks!!! it's what I need
I'd like to share my modifications for CCK in the mstrelan's code
Change $node->files to $node->field_your_cck_field_name (ex. $node->field_curs_pdf)
Change $node->filepath to $file['filepath']
Change return check_markup($contents) to return $contents; //may be the best way it's just srtip all html tags before.. don't know.
then where "if ($relevance)" I put this code
"search_excerpt" function didn't return a valid snippet (pdfs can have many weird issues)
Don't forget clear cache to make search results list appear properly
Finally disable attachments search tab if you use CCK like me.
I like use directories search tab as it gives to user a more specific search but you can disable it aswell.
Thank all for your comments on this issue.
Comment #47
candelas CreditAttribution: candelas commentedsubscribing
Comment #48
Alan D. CreditAttribution: Alan D. commentedHere is a fully functional field example based on mstrelan code above for the field field_attachments.
This forces the files to use the private file system and triggers a download attachment in the process.
glodigital.info
glodigital.module
Comment #49
Alan D. CreditAttribution: Alan D. commentedEven better, this one does all filefield types based on the field search display settings. Replace the following two functions in the code posted above. This would be generic enough to go into the main module (with hook name changes and remove function_exists() checks).