Unified Document Management
Proposed by: dldege
Problem StatementDrupal is lacking a unified document/file management system. Currently, core provides the upload module which provides basic file uploading and attachment of files to other nodes. The files themselves are not nodes and therefore do not take advantage of all the services built up in Drupal for nodes (taxonomy, RSS, etc.). While this works for simply attaching a screenshot, PDF, etc. it does not lend it sell well to more advanced document sharing scenarios. It has also led to a myriad selection of contributed node modules that try to specialize the handling of one media type such as images, video, and audio all of which generally re-implement their own file upload management. The problem is further exacerbated when more then one module comes into existence for the same file type - images is a great example. The end product if very confusing both conceptually and from the administrative view since its not clear how modules work together (if at all) or when or when not to use the core file upload.
ProposalThis problem could be alleviated by the development of a unified document node system which would be responsible for all core services of document handling such as file upload, download, server storage management, etc. Specialized document viewing, editing, etc. would be relegated to other contributed modules that could respond to a set of unified document specific module hooks exposed by the module. For example the existing video and audio modules would become specialized viewers/editors and instead of defining new node types they would simply implement unified document hooks to provide all the specialized functionality they currently provide (video playback, audio ID3 tagging, and so on.). They would be responsible for managing any special database fields needed for the given document but would not be required to do any file management or node management. In addition, the document module could be used without any specialized modules to simply provide a basic, hierarchical document system supporting basic file/folder operations, upload/download, and organization.
Goals- GOAL 1 - File Nodes
- The document module introduces hierarchical file/folder node types. It moves files into Drupal as nodes instead of ancilliary objects (attachments). As such the nodes (files/folders) can take advantage of RSS, taxonomy, search, node access, and any other facilities built in Drupal for nodes.
- GOAL 2 - File Storage
- files need to be stored somewhere. The document module could just roll its own file storage but why not use something existing if possible? It could use the core file API but its very basic and not well suited for this hierarchical node approach. fileapi would be perfect as the back end file storage API for the document module because it abstracts out the physical storage used and provides a single API. It also would work as a flat, file store which leaves the hierarchy management to document.module (this is a good thing for many reasons such as allowing quick and easy folder renames, moves, etc. without having to physically change any of the stored files). fileapi can also continue to stand on its own for other module developers for other specialized tasks.
- GOAL 3 - File Type Customization API and hooks
- Specialized file type handling will be desired. Things like videos, audio, pdf, openoffice, etc. The document.module makes no special handling of any file type. It is only providing the upload, storage, and organization of your files. To allow customization other modules which I'm calling "document extension modules" could be created to add the specialized handling if the document module provides a decent enough API and hooks to allow module authors to tweak the needed aspects of a document node at all life cycle points. This frees up these developers to code features for the file type while being able to rely on document for the actual file stuff. These modules no longer need to create their own node types.
- GOAL 4 - Presentation Customization (Theme)
- User will want to control how the file/folder information is displayed and its interactivity. Robust theme support is a must for this and all well designed modules.
Document module
The document.module would implement two new node types.
Rough diagram - http://12.152.208.71/circadia/files/soc.png
- Folder - A container for files and other folder nodes - folder nodes would create the document hierarchy.
- File - A node directly related to a single file that is uploaded to the Drupal server at node creation time.
The new nodes would flow into Drupal just like any other. For example, your front page could have all new folders and documents worked into the content flow just like blog posts or pages. Folder and files could be tagged with taxonomy terms, syndicated with RSS, incorporated with views, and so on. The document module could provide one or more relevant blocks and/or a dynamic menu system that mimics the file/folder hierarchy.
File system
The document module would be responsible for tracking the folder/file relationships in the database, enforcing naming conventions, and doing all file system operations such as renames, deletes, etc. The module could use the core file system API for managing files but it would probably be preferable to write something new or use something like the filemanager module to implement a more robust system that handles a large number of files, file name collisions, and so on. Is it probably not advisable to attempt to mirror the file/folder hierarchy on the physical file system as this would be hard to manage when renaming or moving folders since instead of a single database edit you know have a potentially large file system operation to keep everything in sync. It would be better to keep all parent/child relationships and other pertinent information in the database and simply maintain some type of flat or bin based document pool on the disk.
Hooks
The document module could take a cue from Drupal core an implement its own specific subset of module hooks that would allow for specialized document type modules to be created. Some of these hooks would be similar to those are ready found in hook_nodapi() but would be specific to document nodes to not confuse or complicate the use of hook_nodeapi(). For example,
- hook_document_file_view(&$node) - called when a file node is being viewed. Modules can add in file type specific view content to $node->content[]
Similarly hooks would be defined for all other applicable document handling operations such as file or folder create, delete, update, rename(?), move(?), and so on.
Theme Support
Example: http://12.152.208.71/circadia/files/documents.png
Default theme functions would be created in the module for theming all aspects of a folder view and file view. In addition, the module would attempt to call document type specific theme functions so that document extension modules could also define specific theme support for their types (video or audio for example).
When a Folder node is viewed it would theme a display of all the sub folders and files it contains. For each file item it would call theme functions specific to the file type if they exist or fall back to the default file theming. Ways this could be done include using the mime type or extension of the file as theme function name or by creating a more direct relationship between files and helper modules that is maintained by the document module via some registration process.
SummaryA unified document system for Drupal would be a much needed addition to Drupal and solve a lot of the current frustration users encounter trying to figure out on a case by case basis how to work with files on their sites. By moving files out of the attachment mode and into full blown nodes many more interesting possibilities exist and it becomes easier for module developers to focus on media specific modules without having to worry about the file management part. The end solution is much easier for Drupal administrators to set up and understand. This project seems doable in the Summer of Code time frame and is of appropriate complexity for a student developer.
DisclaimerObviously, the idea is not fully fleshed out and needs more detail - comments and suggestions would be great. I do have a skeleton module developed that provides a starting point.
- Discussion Points
- Access Control - how should the document module implement access control and what granularity of permissions would be the most flexible without getting overly confusing or cumbersome
- How to support node revision system - should it archive copies of files if a new version is uploaded into the file node? Should it be designed to allow a more robust, third party version system (CVS, SVN...)? Don't support revisions of the actual file at all and only the node content fields?
- Attachment/Node Reference - Should it implement a node reference system so that other nodes can easily reference any file or folder or should this be left to CCK, et al. Another idea here would be to introduce a document input filter that would all you to inline reference documents in your posts. For example, it might be something like [inline:document/test/test.jpg] which would reference that node and return an img link markup. Another form might be [link:document/test/test.pdf] which would return an a tag. In this way any documents could be referenced in the post.
- Should this replace the need for upload.module and its functionality or be a separate component with upload.module still available to do "node file attachments" (ie. you could even attach files to file or folde nodes ???). What about user pictures, and other site files - leave as is?
Comments on this proposal
Data storage - file system vs. database
jrt - February 21, 2007 - 14:54
Have you considered storing the document data in the database, instead of/in addition to storing it on the file system?
There are some (big) disadvantages to storing the data in a db, but you could easily handle revisions by just pointing the record at a different BLOB, without external dependencies on a VCS. You could also associate access control directly with the file data and even allow the same virtual file to point at different data, depending on the user's roles.
Just something to consider...
======================================================
======================================================
Interesting idea
dldege - February 21, 2007 - 15:01
Storing the files as BLOBs would make handling file node revisions much easier to implement. One drawback is that all file access (HTTP gets) of the file would have to be handled by PHP since it requires a db access and then transfer of the bits. On the other hand, you could add a simple caching system like imagecache uses to maintain a cached version of the file on disk which is created on the first file get request (and flushed when/if a file revision is created).
I don't know enough about BLOBs and database performance to know for sure if this would b a good idea but it certainly makes some aspects of the file management easier.
dLd
======================================================
======================================================
For small files that's ok
Wim Leers - February 23, 2007 - 02:45
For small files that's ok (.txt, .doc, .jpg's perhaps), but what about uploading files of 60 MB or so? That would fill up your database very quickly.
@dldege: It's a very interesting idea, and indeed a very real need for the Drupal project. I'm curious how this proposal will further evolve.
======================================================
======================================================
Core?
rcross - February 23, 2007 - 20:59
Is this something that would be aimed at being included in core? Personally I would really like to see something like this get into core. Something to consider also would be to provide a bit of an abstraction layer and then let certain parts be configurable. For example, be able to support both database storage and file system storage. Then you could also abstract the revision system so that by default it uses the database to keep track, but that if svn/cvs are present on the server you could use that instead for more sophisticated systems.
I also think this type of abstraction would be key to making something like this get into a really usable module. Otherwise it will likely turn into another "alternate file system module". If its not a core module or a very popular module, its unlikely you will see the condensation of the other file modules such as audio/video because there won't be any incentive to change their modules and depending on an external module is not desirable for complexities sake.
As for your discussion points:
-Access Control - again for flexibility, the more control the better. I would expect to see something like, "read title/attributes of file", "read/download file", "create new file", "update/change existing file", "read folder title/attributes", "read/list folder contents", "create folder", "update folder". And probably some other controls like, "admin file sytem", "manage file system access", and maybe something else.
-Node revision - going along with my abstraction thing, I think that by default it should/could be managed by the database since not everyone will have access to cvs/svn but it should allow for using svn/cvs if its available (and the user wants it). I think revision of files is a pretty important feature of a robust file system module, so i don't think it should only keep revisions of the node fields (but that could be an optional configuration sort of how creating a new revision is a check box option on nodes now)
-Referencing - there definitely does need to be some good way of referencing files, but i don't know the best way of doing that. That might be ok for something like CCK to do but since cck isn't in core yet, depending on a third party module for something fairly key I think might be a mistake - though there is a lot of talk about having cck in core by the next release. I also think the resolution of this issue directly relates to whether or not this should replace the upload.module.
-upload.module - I think if this module is really well done (which i'm sure it will ;-) ) and is able to be included as core, then it should definitely be able to replace the upload module. I think it could/should be the main file system module so would replace user pictures and stuff as well - like you said, maybe some of these specialized fields will just be a subset of feature or modules will add features to it.
- On a side note, something else that I would like to see this module be able to take care of is the ability to get files off of the local file system. A good example is really large files - they usually are not a good idea to upload a 50mb file. What would be better would be to ftp the file to the server and then put it under drupals management through this interface. This would also be the case if the server was part of an intranet or something where people might just drop files onto it through file sharing or something (doesn't have to be ftp)
Good Luck!!
EDIT: I also just saw this http://drupal.org/project/fileapi which might be something to look at or to work with.
--Ryan
www.ryancross.com
www.jamescrossinc.com
======================================================
======================================================
While I think this has
dldege - February 25, 2007 - 20:50
While I think this has potential for a core module I think that to work as a Summer of Code project the scope has to limited to what can be done in three months by a student with unknown Drupal experience. If a good, flexible design can be created and the student did a good job (assuming this project even makes SoC) on the basics then it could get out into the community and hopefully gain some momentum to add more of the advanced features without the need for a complete refactoring.
I agree that as a contrib module it would be hard for it to gain that momentum since it become just one more alternative for the file management/file sharing problem. Still CCK, views, and others have gained enough adoption that they are moving toward core so it could happen.
I like the idea of abstracting out the file management so that there would be options for the actual file storage (CVS, SVN, Drupal Database, Filesystem, NSF, etc.). Again, this might be too much for SoC. Maybe http://drupal.org/project/fileapi would be appropriate starting point - I'll have to take a look at how that works. Filemanager was decent too. Or for SoC, simply storing the files as blobs might be best as long as the API, allows for that to easily be changed if desired.
As far as FTP (SFTP, SCP, other) vs. HTTP upload goes I think that both should be included since I always find myself as the admin wanting to just put files on the server the fastest way and then point Drupal at them vs. the user upload metaphor in a social sharing use case.
For access control I was pondering if it even made sense to go farther and have per user or group permissions at a per file or folder level. That sounds hard to administer and probably out of scope for SoC. The best first round is probably similar to what you suggest, view, edit, create, download, modify, etc. of a file or folder.
Thanks for the great followup. We'll see where this goes.
dLd
======================================================
======================================================
Sounds good
rcross - February 26, 2007 - 23:22
I think you are probably right in terms of scope of the SOC project. I think the right thing to do would be building the appropriate abstraction (which i think some of that could also be pulled from existing project) and then just focusing on a single/simple implementation. As you said, implement the database storage as long as you provide for other storage methods.
As for access control, I'm still unclear about access controls in d5. I thought there is supposed to be a unified ACL-type of implementation for all access control, which also works to arbitrate access control between the different access control modules. Then the access control modules are more about the interface to managing those controls. But, i'm not sure how much of this is working from inside core (i'd like to think it is in core). Anyways, i point that out because i think that looking at user/group permissions on a file-by-file or folder-by-folder basis, would be the realm of a access control module. The actual file module (i.e. this one, UDM) would just provide the hooks to the various permissions. No?
Also, the ftp access was meant to be an "also do" not "instead of", so i agree with you.
--Ryan
www.ryancross.com
www.jamescrossinc.com
P.S. The other reason that might be considered is whether the core contributers would actually be interested in having a good file management module being included in core. Getting their support (and possible guidance) upfront would go a long way in eventually getting this into core.
======================================================
======================================================
P.S. The other reason that
dldege - February 27, 2007 - 13:16
P.S. The other reason that might be considered is whether the core contributers would actually be interested in having a good file management module being included in core. Getting their support (and possible guidance) upfront would go a long way in eventually getting this into core.
I agree, hopefully someone from the core team will read this and provide some input. I'm hoping the SoC facilitators can provide some feedback as well so I can continue to define the goals and scope.
dLd
======================================================
======================================================
I got in touch with dopry
dldege - March 5, 2007 - 11:27
I got in touch with dopry and he's interested in this proposal and the potential of using his fileapi module as the back end file system API for the document module. fileapi abstracts out the physical storage of the file using a driver metaphor which lets you seamlessly swap out the actual storage scheme. For example there is a driver for php file system storage that is the basic "store files on the local disk" method and he's working on one for Amazon A3. Other drivers could be things like SQL blob storage, SVN, CVS, etc. It looks promising and I'm waiting to hear back from him with his thoughts and if he'd be interested joining me as a mentor on this project if its accepted.
dLd
======================================================
======================================================
A little Confused
rcross - March 5, 2007 - 23:43
I'm a little confused now. if the fileapi takes care of the storage abstraction. What else is the purpose of this proposed module. The only thing that comes to mind would be implementing more functions on that abstraction layer, i.e. the revisioning, the permissions. But I would like to see those things included into a fileapi, not a seperate package. So, what else would this module provide?
--Ryan
www.ryancross.com
www.jamescrossinc.com
======================================================
======================================================
fileapi is just a backend
dldege - March 6, 2007 - 17:42
fileapi is just a backend storage solution - its the "where" files go on the server part. It doesn't actually integrate the file into Drupal in any meaningful way. The document module would introduce the file node that would have a relationship to some file physically stored by fileapi (again, the actual storage 'driver' shouldn't matter to document module) and offer all the hooks,etc. to allow the file node to be a useful entity in Drupal vs. just a file that can be linked to with an URL. Folder node would allow for arranging files in a hierarchy independently of taxonomies and other facilities in Drupal (which would still also be valid on file/folder node types).
A big distinction is that the document module would maintain that hierarchy, not fileapi. Fileapi would just be a document store and it would be easy to change the hierarchy of files without having to actually change the physical storage of the file.
The document extension modules would then respond to document module hooks to add specialized viewing,etc. of specific file types like audio or videos, pdf, ppt, and so on.
Here's a screenshot from a my proof of concept module that shows how the default theming might work for folder/file nodes.
http://12.152.208.71/circadia/files/documents.png
I'll try to draw up a simple block diagram that shows how I think all the parts would work.
UPDATE: http://12.152.208.71/circadia/files/soc.png
dLd
======================================================
======================================================
In the course of my daily
dldege - March 5, 2007 - 12:49
In the course of my daily feed reading I came across this post. http://www.chapterthreellc.com/node/42 about doing a digital library of sorts in Drupal. While this shows the inherent flexibility of Drupal and contributed modules I think it makes a good use case for how this document system could simplify and improve Drupal for any type of document sharing. Its also a good example use case of documents that are more then just file attachments.
dLd
======================================================
======================================================
I just came across the acl
dldege - March 8, 2007 - 13:39
I just came across the acl project which could potentially be leverage for doing very flexible file/folder permissions thus allowing document module to only have to worry about very simple, customary Drupal permissions.
dLd
======================================================
======================================================
Scale back the scope.
robertDouglass - March 9, 2007 - 08:10
It's too ambitious. It needs to be smaller. And I need dopry to go over it with a fine-toothed comb and edit/endorse it. Thanks.
- Robert Douglass
-----
Lullabot | my Drupal book
======================================================
======================================================
I thought I was fairly
dldege - March 9, 2007 - 12:12
I thought it was fairly doable with the emphasis being on allowing for extensibility. I'm not suggesting that the summer of code project implement any extensions (like the audio or video module or the acl support - that was just brainstorming). I'm just suggesting the hooks be there to allow this but the document module should run self contained as a basic file/folder node system. The proposal probably needs to enumerate the goals a little better.
I already have a module started that implements the basic design, with both node types defined, and the basic file/folder hierarchy database management. A few example extension hooks are also implemented. The major missing component is the actual file upload/storage support which is where fileapi comes in. The student would need to marry the document module with fileapi (perhaps requiring fileapi changes). For fileapi they would only implement a local storage driver (already part of fileapi) and not work with more complicated drivers such as A3, StreamLoad, etc. In other words, the student would start out with a fair amount of usable code.
Please let me know what items you feel are too ambitious or out of scope. dopry is also planning to review and weigh in on the design as soon as he has time.
Thanks for the input.
dLd
======================================================
======================================================
Still confused
rcross - March 11, 2007 - 23:56
This is basically boiling down to providing hierarchical structure to the fileapi - which i think is either unnecessary (possibly incoporated into fileapi instead of as a seperate module) or is just wrong. I think this module started out with great possibilities, but it sounds like fileapi is the appropriate place for 99% of this. Please correct me if i'm wrong. Either way, this definitely sounds like the proposal needs to be rewritten
--Ryan
www.ryancross.com
www.jamescrossinc.com
======================================================
======================================================
Goals
dldege - March 12, 2007 - 10:42
GOAL 1 - File Nodes
The document module introduces hierarchical file/folder node types. It moves files into Drupal as nodes instead of ancilliary objects (attachments). As such the nodes (files/folders) can take advantage of RSS, taxonomy, search, node access, and any other facilities built in Drupal for nodes. fileapi does not define nodes its simply a utility module to be used for file management. As it stands its simply a possible replacement for the core file API functions.
GOAL 2 - File Storage
files need to be stored somewhere. The document module could just roll its own file storage but why not use something existing if possible? It could use the core file API but its very basic and not well suited for this hierarchical node approach. It was your suggestion (and a good one) for me to check out fileapi. It would be perfect as the back end file storage API for the document module because, as we discussed, it abstracts out the physical storage used and provides a single API. It also would work as a flat, file store which leaves the hierarchy management to document.module (this is a good thing for many reasons such as allowing quick and easy folder renames, moves, etc. without having to physically change any of the stored files). fileapi can also continue to stand on its own for other module developers for other specialized tasks.
GOAL 3 - File Type Customization
Specialized file type handling will be desired. Things like videos, audio, pdf, openoffice, etc. The document.module makes no special handling of any file type. It is only providing the upload, storage, and organization of your files into nodes. To allow customization other modules which I'm calling "document extension modules" could be created to add the specialized handling if the document module provides a decent enough API and hooks to allow module authors to tweak the needed aspects of a document node at all life cycle points. This frees up these developers to code features for the file type while being able to rely on document for the actual file stuff. These modules no longer need to create their own node types.
GOAL 4 - Presentation Customization (Theme)
User will want to control how the file/folder information is displayed and its interactivity. Robust theme support is a must for this and all well designed modules.
The end product is a unification of how documents are created and shared in Drupal. This would simplify things for site admins and module developers.
dLd
======================================================
======================================================
Dries weighs in
dldege - March 17, 2007 - 10:03
Please see the current discussion with Dries regarding this proposal which will probably lead to a refinement of the design and goals.
http://buytaert.net/suggestions-for-drupal-core#comment-1360
dLd
======================================================
======================================================
yep
dman - March 17, 2007 - 11:21
I've had to address this task several times now.
I've worked on filebrowser extensions as well as a few custom modules to assist better attachment management, 'attach existing', 'detach', 'browse'. And a filter to scan pages for direct file links and enhance them with (size/icon) decorations and redirects to wrapper nodes.
As soon as the requirements become interesting, I recommend dealing with attachment documents as node types. As described in this proposal - it gives us all the other advantages of taxonomy classification. Not to mention access control and versioning.
My biggest annoyance so far with doc management in Drupal has been its tendancy to take ownership of the file naming and storage of everything under /files . I do a lot of integrating from legacy sites, and I don't want a filesystem API that keeps renaming or trying to move my current files so my requirement from this project is seamless integration with the existing site structure and storage pattern. Manage the metadata with me, for sure, but don't tell me where they have to be stored!
As for the proposal ... well I don't actually see the folder/heirachy feature as being at all neccessary - once you are managing filenodes properly, this functionality should come from other existing node management features. So if we wanted to trim it down to a do-able project, skip that 'till later.
The use of this new flavour of files should probably parallal as much as possible the current way of adding an upload. Ideally, it should just look exactly the same, with a few optional extra features. Perhaps the existing node_file system could even be hijacked to use this. Perhaps not, it's looking very dated.
.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/
======================================================
======================================================
Yes, there seems a lot of
dldege - March 19, 2007 - 15:41
Yes, there seems a lot of compelling reasons why files should become nodes - especially when the file is the main content being shared. I can go either way on it but as a programmer it seems logical to use the core API object ( a node) for files. Here are some notes on files as nodes compiled but another SoC person.
http://kzo.net/log/handling-images-documents-and-files-in-drupal
As for the proposal ... well I don't actually see the folder/heirachy feature as being at all neccessary - once you are managing filenodes properly, this functionality should come from other existing node management features. So if we wanted to trim it down to a do-able project, skip that 'till later.
I agree this can be removed, especially with Dries promoting the development of generic node trees support. That + taxonomy should give you all you need for organizing.
Here is my current thinking of how things could work.
1. A file manager that allows for files in drupal that aren't nodes but aren't just node attachments either.
2. A Drupal interface for managing the above.
3. A method for attaching files to a node from the above file store - not the current upload attach method.
4. A specialized file node (my current proposal) that instead of having you upload the file there, lets you browse the file store, attach a single file (see #3) , and then uses the document API hooks I'm proposing to allow document extension modules to specialize the display of the file. Maybe 3 and 4 are all part of one file attachment module and a file node is still not necessary?
5. Remove folder node from proposal and don't address file /folder hierarchy and allow that to be addressed by a separate project which is a generalized node tree module.
dLd
======================================================
======================================================
Node trees
dldege - March 19, 2007 - 15:48
which I see there is already a SoC proposal for it :)
dLd
======================================================
======================================================
What about webdav ?
flunardelli - March 19, 2007 - 08:24
Maybe a server/client webdav implementation or integration with exists server (subversion, mod_dav, jakarta slide) could be a great solution.
A webdav core module can open a universe of possibilities like: a content repository or a caldav/calendar service.
There is some work in progress: http://drupal.org/node/26292
======================================================
======================================================
I'm interested
nic28 - March 24, 2007 - 09:31
Hi everyone.
I'm a student and I'm very interested in working on this project for SoC 2007. Here are some of my thoughts:
file storage abstraction with fileapi:
Sounds good. In the end, this would mean that the document module would be smaller, thus more maintainable. And I think the separation of these two systems is totally logical.
Revisions:
I think that supporting revisions is very important for this system to be sucessful. Supporting a full-fledged version control system could be very useful, but it definitely has drawbacks. In any case, as long as we have a single API to access previous versions of files, we have the flexibility to modify the underlying way to store revisions, and possibly add new methods in the future. For a start, I think that the regular Drupal "check to create a new revision" would be good.
Referencing:
I think the idea of being able to put documents inside regular nodes (with a "[inline:document/test/test.jpg]" like syntax as mentionned in the proposal) is a very powerful concept. Just imagine, you could have an image module which could display many different file formats by transcoding the format to a regular web format. So you could simply write "[inline:my_image.svg]" and the image module would create a nice gif of jpg of your svg file to be displayed in your page. Also, in the future, browsers will hopefully support native svg rendering. If that day comes, then the image module can simply be modified to output the svg code, without creating the bitmap beforehand - and site creator's wouldn't have to change anything about their Drupal application at all! (except to update their image modules)
Idea:
I think it could be useful if you could simply point Drupal to a certain folder on your server, and Drupal would automatically register all the correct file and folder nodes representing that folder tree. Then, if you *changed* the folder structure on the server, you could run an update script (accessable through a button) and Drupal would update which *files* are linked to which *folders*. The problem here would be: we used to have the file /flower/rose.jpg which was renamed to /misc/rose1.jpg. How do we know that the new /misc/rose1.jpg is the old /flower/rose.jpg?
Use case: Jim received an archive file full of images from his buddy Joe who just returned from a LAN party. Joe has already organized the images in folders. Jim is lazy and doesn't want to tell Drupal exactly which files are in which folders, so he just untar's the archive and gives it to Drupal, which creates all the file and folder nodes.
Now, maybe allowing users to subsequently edit the local file hierarchy isn't as useful, but I think we should at least allow users to upload a folder and all it's files and subfolders automatically.
Any comments?
Nicolas
======================================================
======================================================
input
dman - March 24, 2007 - 11:43
here's some partial parts you may be interested in
the filebrowser extensions provides a browser and renderer to explore directory trees either as a GUI extension, or embedded in the middle of the page.
The directory lister you suggest, I've already implimented by adapting the upload module to allow you to attach a directory as an attachment (using the existing 'files' table) . This renders all the files currently in it, when displaying. No 'update' needed.
Also, the file_filter.module there contains a few patterns for re-finding lost/moved content. It doesn't even require a custom tag, it just reads standard HTML and detects links to resource files, then manipulates them as a filter.
Those versions of the code are 4.7. I'd be interested in seeing if someone else wants to pull off the 'attach' sorta functionality again. The internals of the way that works in Drupal currently are horrid.
.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/
======================================================
======================================================
very interesting
nic28 - March 24, 2007 - 14:28
Looks like there could be some useful stuff in there. The filebrowser_extensions in particular I could most probably use. I'll make sure to look into all your modules before writing anything if I'm selected for this project.
Nicolas
======================================================
======================================================
Thanks for the interest
dldege - March 26, 2007 - 10:29
nic28, thanks for your interest and input.
I definitely think fileapi is the place to start for the underlying file storage. Dries it already in support of a model similar to this for core. That is, for each file uploaded the user has the ability to choose where/how its stored (this is the driver model in fileAPI). I also think the fileAPI should be reworked a bit to implement its API as the bucket + key method used by Amazon A3 for all drivers. This allows the user/developer to fully control the organization of the files without implement a hierarchy. I was originally proposing a hierarchy management in the document module but I can now see why this should be left to another module. Consider folder officially removed from the proposal.. In any case, the file storage should not need to be touch when the hierarchy changes - this will make things much simpler and faster.
I'm on the fence about file revisions. I'm not sure we need a full blown system like say what SharePoint tries to do. It might be better to not have revisions in the core file manager but allow the file node module to do this in conjunction with the standard node revision system.
I like the idea of an input filter for the file store. Especially if extension modules are allowed to hook into it so that you can get rendered output specific to the file type. For xample,
[file:document/test/test.jpg] might result in an inline img output of a thumbnail (say if imagecache was hooked into the filter) and [file:document/test/test.mpg] could go to an extension module that did something specific for video. The input filter is the best way to have fine control over the node body content and would be far more flexible to the basic attachment model or to CCK fields which would require theming to do the same layout. I'm a fan of CCK but I don't think its the hammer for every nail in Drupal these days. In this model the input filter would be mostly for the appearance whereas a CCK node reference could be for a developer to make a more concrete association of a node to a node for a given application.
Rather then pointing to an existing folder it might be better to have some sort of mass import. For example, at fluxiom you can upload a .zip file which gets exploded and each file is added. You could provide this for an existing zip/tar on a server and elimate the upload step. The reason I suggest this is because I feel the key + bucket approach is better then managing file/folders on disk.
dLd
======================================================
======================================================
My wishlist
alexh - March 24, 2007 - 12:05
Hello to everyone,
I started reading this with great interest as I am about to develop with Drupal an intranet application which needs a lot of document management - Drupal's weakest point, in my opinion. I like the proposal, but:
* I don't see a need for folders - that can be done with node relativity module.
* I don't see a need for files as nodes - I'd rather like to see files as CCK fields (there is already a module, but it is too basic, not really working, not in 5)
These are my additional ideas/wishes for a file/document module:
* Start from filefield module.
* Add support for private download method (then you have access control)
* Add per field setting for the root directory used to store the files
* Store original filename in database table, but in the filesystem use the fid + extension, e.g. the filename "image.jpg" with the fid 1 is stored as 1.jpg (this allows duplicate filenames in the system, the download could provide the original, some cleaned-up or otherwise derived filename)
* Store the files in separate sub-directories to avoid too long file listings, e.g. fid 1-100 in sub-directory root/0, 101-200 in root/1, etc. or somthing similar
* Add optional duplicate file check using MD5 hash (this might sound strange wish in combination with allowing duplicate filenames, but it makes sense in a more closed environment where different people share documents)
* Another nice-to-have: support resuming downloads
Maybe some of these points make sense to some of you? Just my ideas....
Alex
======================================================
======================================================
Yes, when the documents
dldege - March 26, 2007 - 15:39
Yes, when the documents become the focus then the current Drupal system is not very robust. A system that can handle both cases where documents are simple attachments or more full blown content is what seems to be the best approach based on the feedback so far.
Folder has been removed I agree this can be done with other node organizing tools and/or the new node tree module if its developed.
I don't think files need to be nodes at the core level but I DO see cases for a file node module for turning the files in the Drupal file store into "nodeable" content that can be put into taxonomies, RSS feeds, search, and the like.
Again, my current thinking is
1. A file manager that allows for files in drupal that aren't nodes but aren't just node attachments either.
2. A Drupal interface for managing the above.
3. A method for attaching files to a node from the above file store - not the current upload attach method. and/or an input filter for putting file content into the node body, etc.
4. A specialized file node (my current proposal) that instead of having you upload the file there, lets you browse the file store, attach a single file (see #3) , and then uses the document API hooks I'm proposing to allow document extension modules to specialize the display of the file. This could be optional.
5. Remove folder node from proposal and don't address file /folder hierarchy and allow that to be addressed by a separate project which is a generalized node tree module.
I like CCK but I think it adds a lot of overhead and complexity for the average user. Another thing I don't like is that say you want a node with 5 files displayed and then next one only has 1. How do you set up CCK to do this? The input filter idea would not require any fixed number of "fields".
Your file storage suggestion is now filemanger worked. I used it for one project and it was good but had some complexity I didn't need. I definitely think the names/hierarchies, etc. need to be in the Drupal database and the files in a flat bin/bucket storage system. FileAPI or filemanager style.
I like your other ideas - private or one time downloads, resuming, and md5 if its not a big performance issue. On thing to keep in mind is allowing, when applicable, file to be transferred by Apache vs. needing to go through Drupal which requires a lot more processing.
Keep up the discussion - thanks.
dLd
======================================================
======================================================
The filemanager storage
dman - March 26, 2007 - 16:53
The filemanager storage bucket approach is precisely the reason I abandoned it.
Files are hard enough to find as it is without the system obsfucating my filenames into a random hash. For me, Drupal is not the only interface I'll ever use to manage my site, I don't want it witholding my file info.
So - allow for arbitrary 'save as' filenames - within bounds.
And/or
Provide a filename template using tokens for sites that wish to customize it
$filepath = "%filespath/%user/%suffix/%nid-%filename-%date.%suffix"
to get
files/mark/pdf/1192-layout_diagram-2007-03-23.pdf
Several other modules have tried to hack in renaming storage. I say do it with a token string like this.
Note even the filespath is configured
Maybe create profile to allow different users to have different files storage roots. The way the system currently rewrites paths into the /files/ is currently the worst bit (IMO) of the problem.
(because my tasks have been import/overlaying legacy sites with Drupal replacements - without breaking any extant links to files, eg external bookmarks to /documents/2005/report.pdf and associating the new CMS pages with them)
.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/
======================================================
======================================================
CCK: you like it - I love it!!! ;-)
alexh - March 26, 2007 - 14:49
Maybe I should describe a little bit the background for which I intend to use such a document management module, because it is not the average website. It's rather an extranet of an international organisation with lots of people working on lots of documents (drafting, commenting, discussing in meetings, revising, approving, publishing, etc.). Currently, we use a (commercial) product which offers a folder hierarchy to do this on the web. The main problem, is that we have lots of deep folder structures, nobody having complete overview, with documents buried somewhere, probably many duplicates...
From my perspective, CCK is wonderful in connection with document management because it let's you define any kind of meta-information fields for your files, e.g. author (not the Drupal-field author, which is rather the user uploading the document), date of preparation, abstract, etc. I can also relate the documents to other nodes, e.g. meetings (events) with the node_reference field. Also, I like to use taxonomy to categorize my documents and build the navigation on that - that's why I wouldn't like to have files separated from nodes. If you have lots (thousands) of files, you need the power of nodes, cck, taxonomy to structure them - otherwise you get lost.
BTW, in this context, I would also like to see a module which can relate documents to each other, and that relations between documents can have attributes, e.g. doc A "is annex" of doc B, doc C "replaces" doc B, doc D "references" doc C, etc. I know, the relationsship module does that, but it is not yet for Drupal 5 and has lots of bugs and too much added complexity.
And yes, CCK filefield can do variable number of files per node: just check "multiple values" when defining it. Database-wise it does it by having a new table for that field referencing nid and vid (but still, revisions don't work).
But, you are probably right, that filemanager (with attachment module) is closer to what I described (private files, duplicate names) - I will take a close look at this.
======================================================
======================================================
I noticed a typo in my
dldege - March 26, 2007 - 15:44
I noticed a typo in my previous reply and it should have read
I don't think files need to be nodes at the core level but I DO see cases for a file node module for turning the files in the Drupal file store into "nodeable" content that can be put into taxonomies, RSS feeds, search, and the like.
I think we need files in a flat file store that can be reused over and over again in node content and/or be nodes themselves if desired.
Don't get me wrong -CCK is great for what you describe but you need a node to do it. A file as a node makes sense here. You are definitely thinking it through and seem to get all the nuances.
dLd
======================================================
======================================================
File nodes – how would we use them?
hbfkf - March 29, 2007 - 03:52
I am thinking about the consequences of implementing files as nodes. One question that is coming up is how we would make use of files (= file nodes) then? Probably somehow like this:
1. WYSIWYG editors would allow me to browse file nodes (of a specific type like "image", for instance). I would select one and the editor "inserts" it.
2. Attachments: same story here. There would have to be a way to browse file nodes and select one/several. It would also be nice to be able to create an attachment (a file node) inline so I do not have to go through two steps (create file node, then attach).
3. Etc.
Both examples call for a standard way to select/find/browse file nodes (not that every other module needs to implement its own way)
From the two examples I get the feeling that it would be nice if file nodes provided different representations of their content. The browsing mechanism would then allow me to choose one particular representation like:
* a "small thumbnail"
* a "image preview of a PDF"
* the "file as a link"
* etc.
Maybe the Unified Document Management should be designed with a "Filtering mechanism" in mind. I can add "filters" to a document type that allow the document to be "rendered" in different versions (like above: "render to small thumbnail 60px width at most", etc.) and when a node is referenced, you can choose from the available filters (maybe there is only one): attachments would use the "file as a link" filter, a WYSIWYG editor would use, for image file nodes, a filter that outputs an image.
Some comments above mention such filtering mechanism, I see. Has anybody thought about the concrete details? For instance: Are there Drupal mechanism that allow such filtering and that we could reuse?
Kaspar
P.S. This is actually independent of whether or not files are implemented as nodes: Even if there is, as dldege suggests on comment http://drupal.org/node/120677#comment-212764, only an option to have files as nodes, we could use such a filtering mechanism.
======================================================
======================================================
Why is WebFM being ignored?
robmilne - March 31, 2007 - 20:40
I'm the developer of WebFM and I just happened to stumble on to this thread. I'm amazed that my solution to most of these requirements is being completely ignored.
GOAL 1 - File Nodes:
I agree with Dries but for different reasons. I too originally envisioned files as nodes but realized the folly of reorganizing something that is already well organized. Do you really think that you can do a better job or organizing/accessing files than the native filesys on the server? The synchronization of the database representation of files in a shared heirarchical filesystem is potentially problematic but I believe that the concurrency issues can be dealt with sysvsem extensions of php to create the file locks. Users that don't have access to sysvsem should probably get a constrained model of access. WebFM is currently in this mode since it is an admin tool only. My current plans for the module are mostly revolving around permissioning so that the module will be available to users to manage their own subsets of the filesystem.
GOAL 2 - File Storage
I have made my bed with the existing file.inc funcs which predicates using the universal filesystem path set in settings. This doesn't mean that I have to like it. The webfm_file.inc file could be expanded to dispense with file.inc so that fileroot could be located outside of webroot. This is vastly superior in terms of security. Optionally all files could be base64 encoded with a php 404 header at the top with the die() command embedded to prevent any possible trojan. I've yet to test the loss of responsiveness with this scenario.
GOAL 3 - File Type Customization
Never has it been more possible to initiate specific action on a specific file extension as it is with WebFM. Checkout this extension of WebFM to see how easily it can run processes on uploaded files. The power here is signifigant and potentially devastating to a system so it must be handled appropriately.
GOAL 4 - Presentation Customization (Theme)
Currently contained inside webfm.css since the interface is dynamically built by javascript via the DOM.
I've never lobbied to have this module put into core because of its non-degradable nature (javascript is absolutely required) but it does deserve to be considered for its merits which IMHO are many. My first investigations of file managers were of pure php implementations but the interfaces were so overwhelmingly kludgy and slow that I jettisoned this tenet of drupal faith. If I could do anything differently I might have tried to put WebFM on top of dav but then again I'd still have to rely on non-standard php extensions being compiled (sysvsem presents the same issue).
======================================================
======================================================
Honestly, when I made the
dldege - April 3, 2007 - 17:46
Honestly, when I made the proposal I didn't know about WebFM. Since then I have been introduced to it. I agree there is a lot of overlap and I've checked out your WebFM demo and its very cool. It is not a case of being ignored or to slight your work in any way.
I honestly think there are use cases for file as nodes or at least a file node that has a 1 to 1 attachment to a single file while files themselves are not nodes. I mentioned the benefits of this - RSS, taxonomy, and other node services. I'm not saying I can organize the content better then the file system but rather that the files shouldn't have to be organized on the file system. That storage could be flat, bin based, or whatever. The organization could be done in the drupal database with node hierarchies, taxonomy, etc. I'm not 100% sure its a great idea or will even work well, its simply that, an idea that sounds good to me conceptually.
For customization I'm proposing a different way of going about it using module hooks but it does look like your extensions are very similar in their usage. I saw files being viewed in drupal as nodes that could be customized (audio, video, etc.) not just links out to the content or formatted pages of content. I was also thinking of an input filter for easy inline linking of any file node that could also be custom per type.
One other nice thing about files as nodes is that they can be browsed and managed in the normal node page way without any javascript requirements because they are just content like any other content. I also saw this as being for document sharing or some user contributed use case - not just a site admin tool.
So, there is a lot of overlap, some differences - maybe we could conspire to get the best of both worlds?
Regardless, this proposal generated some good discussion but its not gaining a lot of traction toward being selected.
dLd
======================================================
======================================================
My goal
robmilne - April 5, 2007 - 23:04
My goal is to provide a more intuitive organizing principle to large quantities of documents - more intuitive because it is how people presently keep track of large quantities of data on their own systems. Taxonomy is all fine and good (especially for multiple associations) but I don't want to organize my pc hardrive with taxonomy. Taxonomy can never be as efficient as the native filesystem scheme since it is encumbered by database latencies (not to mention possible vagaries). The metadata component of webfm is fledgling but it could definitely link into taxonomy.
I don't believe that linking a file type to a specific behaviour requires making the file a node. I'm just not enslaved by the concept of nodes. I've made many custom modules that do not rely on whole node representations. For example: In a case where one needs to present a custom page that accesses multiple types of nodes and or clickable elements. This is a common case where the page is determined by path (hook_menu) rather than node#. Clicking on a file representation (either via javascript or php form element) can generate a programmatic response instead of a node presentation response. This way we aren't locked into a specific behaviour for a specific file type. It all depends on how clever the developer is to link a process to the user choice of a file.
======================================================
======================================================
worth a look...
mbria - June 8, 2007 - 05:03
I need to say that before testing webfm I was a unsure about it's stability and usability.
Now I can say it's a jewel that worth a look.
I like the approach suggested at the beginning of this thread to take advantage of other drupal modules (taxonomy, views, cck, actions, workflow...) but understanding nodes as a wrapper for files is not against this idea.
Let me explain:
Imagine you are an author and you are developing a new article for a magazine. You have version 1 of the document and you raise it to your "unified repository", add metadata and so on. Then you extend the document to version 2 ¿do we need two nodes for this? It's good to store both files in filesystem and we can enable/disable reviews if we like, but this is just a version upgrade, so could be nice if metadata is developed in the node layer, while file management is done by webfm.
To make this possible, seams obvious to me that webFM needs a kind of "versioning system" and probably a "checkin/out flags" but any case this proposal requires a "filesystem wrapper" and webFM could be the one.
robmilne, my only concern is about the new filemanagement APIs in drupal6... is or will webfm follow those APIs? Are you talking with core developers to join efforts?
Back to node/file discussion, I believe it's capital to to follow drupal's main philosophy: "Everything is a node, and the node is everything" but it's logical to think that the best way to store files is the filesystem and there (plus offering a fast and usable interface) is where webFM can help.
Any advances with the project?
======================================================
======================================================
Versioning
robmilne - June 16, 2007 - 08:28
I am beginning to think seriously about file versioning and it could be relatively simple to do via fid swapping with back links to parent and forward links to child. The beauty of this scheme is the inheritance of attachment/metadata and the runtime nature of determining lineage. Permissions work will preceed this (2.x).
I haven't really looked at 6.0 yet and how jquery/fapi changes will affect WebFM. Currently the module is standalone (can coexist with a flat filesys) and I don't see why it can't remain that way. The limitation of this approach of course is integration with other contrib modules. Interestingly there have been no inquiries about my module from core developers.
======================================================
======================================================
I like the idea
dikini - April 1, 2007 - 09:02
+1
Not sure about the folders as nodes
Why not reuse taxonomy, like, for example forum?
Ok, there are benefits to both approaches, but still, what is your take on that?
Update
I can check the code I used to do a directory browser using taxonomy terms for directories, like here. It's quite old, so not workable as a 6.0 target, but it's trivial enough to get the idea. With fapi2, the programmatic node submissions, this idea becomes feasible.
======================================================
======================================================
Folder has been removed
dldege - April 3, 2007 - 17:53
Folder has been removed since there are other ways to organize nodes including some interest in developing a generalized node hierarchy module that would also deprecate the custom book stuff.
I like your demo, similar navigation to what I had in mind. I'll look at your code in more detail.
dLd
======================================================
======================================================
I concur!
cpill - April 24, 2007 - 08:59
Well,
IF the Folder was a node then you could have data associated with it, as on a Mac where they put this info in .DS_info files. Thus if you wanted to add descriptions, define the default:
- sorting
- view
- info per file to show
- etc
MOre flexible for the future and taxonomys can be associated with them which adds a dimension of flexibility that OS file systems don't have. Drupla could be used for document archive management in a serious way... actually I just found this coz I was looking for a module that does this for an archive of PDf documents that will have thumbnails associated with each one. There are potentually thousands alread that need managing.
just some thoughts
Alexander Whillas
Taylor Square Designs
Berlin, Germany
======================================================
======================================================
Help improve this page
You can:
- Log in, click Edit, and edit this page
- Log in, click Discuss, update the Page status value, and suggest an improvement
- Log in and create a Documentation issue with your suggestion