deleted attachment remains in search index
drenton - January 29, 2008 - 20:01
| Project: | Search attachments |
| Version: | 5.x-3.0 |
| Component: | Code |
| Category: | support request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | postponed |
Jump to:
Description
Hello All,
When we delete an attachment, is there a way to remove it from the search index ?
The entire node is not being deleted, just the attachment. The attachment, however, remains in the revision history.
Thanks,
Dale

#1
No, not without manually deleting the attachment's rows from search_dataset and search_index. However, this problem will not occur in version 5.x-4, which is currently under development. In that version, if you delete the file, it is removed permanently from the search index.
If you can tell me which module was used to upload the file and attach it to the node (upload, webfm, etc.), I can supply SQL that will let you delete the file's rows in search_dataset and search_index.
#2
This question has made me aware of a bug in the driver for the upload module (which I assume you are using since this issue has come up) -- it doesn't take into account entries in Drupal's file_revisions table. So, the correct answer to your question should have been: in the upcoming version, if you delete a file from a node that was attached using upload.module, its text will be removed from the search index if it is not attached to the current version of the node. Thanks for bringing this to my attention.
Assuming you are using the core upload module to attach the file, here is SQL to delete its text from the index tables. You will need the nid of the node the file was attached to in order to perform the first SQL query.
Be sure to back up your database before deleting anything.
mysql> select * from files where nid = 1347;
+-----+------+------------------------+------------------------------+------------+----------+
| fid | nid | filename | filepath | filemime | filesize |
+-----+------+------------------------+------------------------------+------------+----------+
| 186 | 1347 | testing.txt | files/testing.txt | text/plain | 219 |
+-----+------+------------------------+------------------------------+------------+----------+
1 row in set (0.00 sec)
// Note the fid of your file, in the first column of your results; in this example it is 186.
mysql>delete from search_dataset where type = 'file_upload' and sid = 186;
Query OK, 1 row affected (0.00 sec)
mysql>delete from search_index where type = 'file_upload' and sid = 186;
Query OK, 23 rows affected (0.01 sec)
There is no need to rebuild your index at this point.
#3
Yes, you are correct, we are using the core upload module.
In the meantime I've created a cron job that runs daily to clean up any old attachments. Does it look ok ?
Thanks
<?php
$link = mysql_connect('localhost', 'user', 'password') or die('Could not connect: ' . mysql_error());
mysql_select_db('drupal') or die('Could not select database');
$query = 'SELECT nid, vid FROM node';
$result = mysql_query($query) or die('Query failed: ' . mysql_error());
while ($line = mysql_fetch_array($result, MYSQL_ASSOC)) {
$query2 = "SELECT f.fid FROM files f LEFT JOIN file_revisions r ON r.fid = f.fid WHERE f.nid={$line['nid']} AND r.vid<>{$line['vid']}";
$result2 = mysql_query($query2) or die('Query failed: ' . mysql_error());
while ($line2 = mysql_fetch_array($result2, MYSQL_ASSOC)) {
$query3 = "DELETE FROM search_dataset WHERE type = 'file_upload' AND sid = {$line2['fid']}";
$query4 = "DELETE FROM search_index WHERE type = 'file_upload' AND sid = {$line2['fid']}";
$result3 = mysql_query($query3) or die('Query failed: ' . mysql_error());
$result4 = mysql_query($query4) or die('Query failed: ' . mysql_error());
}
}
mysql_free_result($result);
mysql_free_result($result2);
mysql_free_result($result3);
mysql_free_result($result4);
mysql_close($link);
?>
#4
I didn't test the script as a cron job but I did set up a test case and ran the $query2 SQL, and it returned expected results. $query3 and $query4 look good too. I'd say back up your db and run the cron job to see if it deletes the orphaned file_upload entries.
#5
Looks like it did the trick.
I changed $query2 to this though :
$query2 = "SELECT f.fid FROM files f, file_revisions r WHERE r.fid = f.fid AND f.nid={$line['nid']} AND r.vid<>{$line['vid']}";
Thanks
#6
I had to make some changes. New cron job is :
<?php
$link = mysql_connect('localhost', 'user', 'password') or die('Could not connect: ' . mysql_error());
mysql_select_db('drupal') or die('Could not select database');
$sql = "SELECT nid, vid FROM node";
$nodes = mysql_query($sql) or die('Query failed: ' . mysql_error());
while ($working_node = mysql_fetch_array($nodes, MYSQL_ASSOC)) {
$sql = "SELECT DISTINCT f.fid FROM files f, file_revisions fr WHERE f.fid = fr.fid AND
f.nid = {$working_node['nid']} AND fr.vid <> {$working_node['vid']}";
$node_files = mysql_query($sql) or die('Query failed: ' . mysql_error());
while ($working_node_file = mysql_fetch_array($node_files, MYSQL_ASSOC)) {
// check to see if this fid is being used by the current node vid
$sql = "SELECT n.vid FROM node n, file_revisions fr WHERE n.vid = fr.vid AND
n.vid = {$working_node['vid']} AND fr.fid = {$working_node_file['fid']}";
$check_node = mysql_query($sql) or die('Query failed: ' . mysql_error());
if (mysql_num_rows($check_node) == 0) {
$sql = "DELETE FROM search_dataset WHERE type = 'file_upload' AND sid = {$working_node_file['fid']}";
$delete_dataset = mysql_query($sql) or die('Query failed: ' . mysql_error());
$sql = "DELETE FROM search_index WHERE type = 'file_upload' AND sid = {$working_node_file['fid']}";
$delete_index = mysql_query($sql) or die('Query failed: ' . mysql_error());
}
}
}
mysql_close($link);
?>