Here's a patch to make it work with filefield
ericduran - March 21, 2009 - 17:41
| Project: | Search Files |
| Version: | 6.x-2.x-dev |
| Component: | Search Attachments |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs review |
Description
Hi,
Great module. I'm not sure what's the stats of people using upload module vs. filefield module in d6 but I think most people do use filefield over upload because of the flexibility with cck.
So here's a patch to make it work with filefield.
It works with whichever module you have enable either filefield or upload.
If you have both it will only work with filefield. I could make it work with both by I don't think anyone would use both on their set up.
| Attachment | Size |
|---|---|
| patch.txt | 1.97 KB |

#1
patch does not apply
#2
#3
I'm going to try this out. Defiantly something I am interested in using
#4
Hey, it works for me! at least with PDFs!! great job.
I did have to manually change the SQL query
to
if($filefielM){$searchQuery = "SELECT f.* , d.data, u.nid FROM {files} f JOIN {content_type_link} u ON u.field_file_fid = f.fid INNER JOIN {search_dataset} d ON f.fid = d.sid WHERE fid = %d"; }#5
subscribing
#6
I took a look at the original patch and the update from comment #4, independent of whether the patches apply or not. There is a problem of approach.
The problem is that the patch supplied is very specific to how you have set up your CCK types on your site. The original patch would only work if you had created exactly one FileField CCK type on your whole site, and you called the type "Files", called the CCK field "field_files", and furthermore that it was a single-valued field, and probably that there weren't any other CCK fields on that data type that were multiple-valued. All of that made CCK put the data for that field into the database table "content_type_files", with column name "field_files_fid".
The "correction" in #4 worked for that person, because his data type happened to be named "Link", and field was named "field_file", so on his site, the database table was called "content_type_link" and the column "field_file_fid".
Obviously, neither of these patches is general enough to use for the generic situation, where someone could have created multiple CCK types with different file attachment fields, and they might be single-valued or multiple-valued (CCK handles these two cases differently for storage), and they might name their fields and content types anything they wanted.
So this approach is not going to work in general. I don't have another suggestion to make yet, but am working on something for another module, so I might have one in the next few days.
#7
See also http://drupal.org/node/335890 -- one of these should be marked as a duplicate of the other, probably.
#8
A patch based on the one above, but it should work for any CCK filefields on the system, and will allow searching both attachments and CCK file fields, not one or the other, as the patch above does. The patch was rolled against 6.x-2.0-beta4. Give it a shot and let me know if it works!
#9
Testing patch in #8.
#10
The patch in #8 will still not work in all cases. The reason is that CCK is not consistent in how it stores field data. If the fields are single-valued, it will put them into the table for the content type. If they are multiple-valued, they will get their own table.
For an example of how to get searching by files working with filefield attachments, check out what the Search by Page module does. It has a sub-module (included in the distribution) called sbp_attach.module that works well [see function sbp_attach_sbp_paths()].
Project page: http://drupal.org/project/search_by_page
CVS view of most recent version of sbp_attach.module file: http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/search_by_p...
Just as a note, this module allows you to select which FileField fields to index for search as attachments, and it also allows you to restrict only to "listed" files or else index everything. There's a loop starting with
foreach ( $fieldlist as $fieldname => $value ) {in that function that does the work of figuring out where the database storage is for each of the selected field on each content type.
#11
The patch in #8 did not work for me, possibly for the reasons jhodgdon describes in #10. Filefield data still does not make it into the search index, while core upload module attachments do just fine.
#12
As an addendum, here's how I got around this bit with CCK in a custom module I wrote for my site in D5. Not sure if it's the properly "Drupalish" way of handling it... Edit:
$fieldnameis a string with the machine name of the field, in case it's not obvious.<?php
// First, check if a table named for the field exists.
// (CCK stores field data in two different sorts of tables).
$db_field_tablename = db_escape_table("content_" . $fieldname);
$db_fieldname = db_escape_table($fieldname . "_value");
if (db_table_exists($db_field_tablename)) {
$db_tablename = $db_field_tablename;
$query_fieldname = $db_fieldname;
}
else {
$db_tablename = db_escape_table("content_type_" . $typename);
$query_fieldname = $fieldname;
}
// Match field value if one was passed.
if ($field_value) {
$q = "SELECT DISTINCT(c.`nid`) "
. "FROM {".$db_tablename."} as c LEFT JOIN {node} as n "
. "ON n.nid = c.nid "
. "WHERE c.`$db_fieldname` LIKE '%s' "
. "AND n.type = '%s' "
;
$r = db_query_range($q,$field_value,$typename,0,1);
return db_result($r);
}
?>
#13
Your idea in #12 is better than the previous patches and will probably work in many/all cases, but the correct way to figure out what DB table and column are being used for a particular field is to ask CCK, via the content_fields() and content_database_info() functions. See #10.
#14
I will test as soon as a new patch is out.
#15
Good work #8, but multivalue filefields are widely used.
Being able to search in the cck filefields I think it is a basic feature for Search Files.
If one adds the possibility of multivalues at #8 patch, could be a candidate to commit?
#16
Subscribing
#17
I spun a new patch, this time accounting for single and multiple file fields using the CCK API for grabbing table and field names as suggested by jhodgdon in #13. Someone want to give it another try? It seems to work just fine.
#18
I tested the patch in #17 against 2.x-dev, and it does find the file.
However, I got an error on the search results page:
user warning: Table 'sandbox.upload' doesn't exist query: SELECT SUM(i.score * t.count) AS score FROM search_index i INNER JOIN search_total t ON i.word = t.word INNER JOIN files AS f ON f.fid = i.sid LEFT JOIN upload u USING (fid) LEFT JOIN node n USING (nid) WHERE f.status = 1 AND (i.word = 'git' OR i.word = 'commit') AND i.type = 'file' GROUP BY i.type, i.sid HAVING COUNT(*) >= 2 ORDER BY score DESC LIMIT 0, 1 in /Users/pesh/Sites/sandbox.dev/modules/search/search.module on line 946.And then under the search individual search result, there are funny words listed (see img).
#19
Hmmm...did you disable/uninstall the upload module after installing the patched version? The 'funny' text is lorem ipsum, and it came up because it was named 'git.txt'.
#20
Core upload is disabled.
Tthere is lorem ipsum in there for sure, but it's not the most common; it also says node and story++. Turns out the latin came from some nodes I had autogenerated with Devel module. That's interesting, because those nodes don't have attachments! The node with "git.txt" attached is a story node with just a few letters in the body field. The file itself is a simple git cheat sheet.
#21
subscribe