This issue is a proposal to change the way search_attachments works. It is based on a file called 'nextgen.txt' distributed with version 5.x-3-dev.
Currently, search_attachments has two significant design limitations: 1) it indexes files based on whether the parent node has been altered (not whether the file itself has been altered), and 2) it cannot index files uploaded via FTP (i.e., not uploaded using one of the file management modules) or files not attached to a node.
To remedy these limitations, search_attachemts could use a database table (call it search_attachments_files) that records the managing module and filepath of each file, and also the filepath of all files uploaded into the /files directory or any other that might store FTPed files. The site admin would be able to configure which directories fall into the last category; 'none' could be used as a value for FTPed files in the managing module column.
This table would also record the modification time for each file generated by the php stat() function. Using hook_cron(), search_attachments would iterate through the new 'search_attachments_files' table, generate the mtime for each file, and compare the last mtime with the generated one. If stat() returned a more recent mtime, the file would be reindexed by calling search_index(). For files that are not attached to a node, the 'sid' parameter that is passed to search_index() could be the file's index in search_attachments_files; for files that are attached to nodes, the node's ID is passed as is currently the case. Display in search results of files not attached to nodes would probably be a subset of the current display.
The rows in the search_attachments_files table would need to be populated by iterating through each driver and determining what files have been added since last check. In order to find which files are not managed by a file manager, the /files and other indicated directories would need to be checked to see if any eligible files existed, and if so, they would be registered in the table. This syncronization activity would need to be performed each time cron.php was run, before the indexing took place.
Any feedback on this proposal is welcome.