I haven't seen a clear set of features that the document management discussion is referencing, nor have I seen a clear strategy . The basic feature request that I've seen discussed is the ability to "attach" files/nodes to a node. I believe that is covered in the feature set listed below. I will lay out a high level proposal for adding robust document management to Drupal core, and you shoot holes in it. Sound fun? For today, I will list out a feature set to discuss. Later, resolution of these features will be discussed.

Features that a basic document management system has:
1. Ability to add documents to the system.
2. Ability to store those documents.
3. Ability to retrieve those documents that have been added.
(if it sounds like I'm stating the obvious, you are accurate)
...adding complexity...
4. Allowing multiple methods of adding documents to the system.
5. Associating documents with other documents (file attachments).
6. Associating meta-data with documents (taxonomy, owner, etc).
7. Searching for documents.
...even more complexity...
8. Maintaining documents in a context-sensitive way (specific node type, location, etc.. and a generic hook_node_create($data) hook).
9. Representing/rendering documents in an abbreviated way (mini-preview, thumbnail, theme_thumb_view(..), etc).
10. Using document content and meta-data as search key indexes.
11. Using an event model/triggering system to filter incoming and outgoing files (e.g. hook_new_file($node), hook_render_file($node)). This could also be used for input content filtering as well as virus cleansing.

I know this list is not complete and doesn't address everyone's concerns (or in some cases, may only address my concerns), so I ask for this basic feature list to be scrutinized. I realize that many of these features exist already. They are mentioned here merely to complete the description. I could go on to hypothesize how this set of features would be implemented, but I would like some feedback before I taint the waters with any more of my own ideas. Besides, I need to go to work ;)

Comments

javanaut’s picture

Ok, so no comment all day. I'll give my opinion, as I am so eager to be talking about this that I'll even talk to myself.

For reference, I have provided a mime_registry module (now in CVS) as well as a moblog.module that uses it and mail_handler to create nodes from emails with attachments. The moblog.module does not use an optomized method of associating files with nodes, but is instead a prototype of basic mime_registry usage.

So, features 1-3 are handled quite easily by the filestore(2) module as well as several other techniques. My problem with this is that it is too generic. e.g. When images are stored in filestore(2), it is a waste of context when the image module would do such a better job of managing it.

Feature 4 needs to be expanded upon. Currently, each node type defines one way to create a node (nodeapi). This may not be flexible enough for multiple avenues of node creation. Many nodeapi hooks rely on form submission as their primary means of collecting node data. This is not useful when the node's data comes from a mailbox for example.

Feature 5 could be handled fairly easily with an attachment table and a special field of the node object (call it "attachments") that contains an array of attached nodes. The attachment table would need basically two fields, parent nid and child nid. Maybe a relation type field could be defined as well(?). node_load(..) could look these up and create them when loading a node. Editing the list of attached nodes would require additional form elements to create/edit/view/delete attachments.

Feature 6 is covered in the current nodeapi arrangement by way of each node module maintaining a pertinent set of data about each node. With the current state of most node type implementations, the more flexible node addition mechanism (Feature 4) may conflict with status quo. Since the data may not be coming from an HTTP form submission, there needs to be another way of providing this metadata. Perhaps enforce the "fields" hook and use it to create an extra nodeapi argument that is an array of metadata items. This might keep node module authors from having to directly referencing the POST variables, thus making it possible to create nodes using a common API.

Feature 7 will be properly supported as long as there is an appropriate node type for each type of file that is in the repository. There isn't much that a document management framework can do to better that (other than making sure that the proper node is used if it exists).

Feature 8 needs discussion. I have proposed a mime registry scheme to determine how to classify documents. The main problem keeping it from working right has to do with features 4 and 5.

Feature 9 is implemented by some node types, but there is no standard API for representing a minimized/thumbnail version of a node. Exactly how big to render a node should probably be defined before this discussion gets much further. Maybe there should be several rendering sizes for a node? A scale from 1 to 10?

Feature 10 seems like an elaboration of feature 7. I guess I was thinking something this morning that's not with me now.

Feature 11 would provide a framework for allowing all modules to meddle in the affairs of each other. Correct me if this already exists or is being planned, but I would like my modules to be able to react to new nodes being created - any module. This would be useful for content filters, virus scanners, indexing services, etc.

I also propose that some means of specifying/determining the "body" of the node's content. Nodes that maintain most of their data in external files would need a way of specifying so and providing the filename. Nodes who's data is primarily in a node property need a way of specifying so as well. Alternatively, if data is in an LDAP database or some other arbitrary content storage system, I think there should be some means of specifying it. This will allow content cleansing/filtering/indexing modules to prepare documents for storage.

javanaut’s picture

I threw together (literally) a quick diagram of what it would look like to attach a node to another node.

http://nullcraft.org/drupal_dev/attached_node/

I'll fill this in with more detail as time permits.

WhiteFire’s picture

This is just some ramblings... :)

Feature 7 will be properly supported as long as there is an appropriate node type for each type of file that is in the repository. There isn't much that a document management framework can do to better that (other than making sure that the proper node is used if it exists).

Why do this with node types? How about having a new API class of file handlers? This way you could have a png_module, gif_module, msword_module, pdf_module, html_module, etc. Have each return a capabilities list, such as "thumbnail", "full_display", etc.

Then different nodes can decide what to do with their attachment(s).

That is actually it's own question... do you allow binding of more than one document to a node? I tend to view a node as it's own "document", so I see a 1:1 relationship between the two, personaly.

Hmmmm....

There needs to be a way for a theme to tell each content module what sort of thumbnails it will need. Like a small one for index pages, and a larger one when directly displaying a node.

~ WhiteFire

ccourtne’s picture

Yes, you do need to allow binding of more than one document per node. Consider a site which is publishing a story which explains a mathematical theory. This story wants to include three diagrams to help illustrate the concept These diagrams have no context outside the document. They should not have their own comments, they don't make sense to search for, nor do you want them to remain after you delete the story.

You should read the posts on the image support thread.

It is a good thought to allow a different size thumbnail's on a preview vs. full story page.

javanaut’s picture

How about specifying _whether attachments persist after parent is deleted_ at the point of their creation? And check/uncheck an "allow comments" box to specify whether to allow comments or not? Perhaps we could set a "visibility" flag for nodes that would keep some attachments from being listed (on homepage, in galleries, blocks, etc), but still viewable when linked to/referred to directly. That sounds like a core modification, though.

It is a good thought to allow a different size thumbnail's on a preview vs. full story page.

Theoretically, a file oriented node could use the $node->teaser to render a "thumbnail" view of it.

I have neglected nedjo's image discussion lately. It appears to be getting lively :)

WhiteFire’s picture

One could make the argument that this is what Stories are for with leaf nodes that have comments disabled. Regardless of that, I think I agree with you.

It does complicate gallery type nodes where you want to inline the image on the node's view page.

I'll go look at the image support thread now. ;)

~ WhiteFire

javanaut’s picture

Thanks for your ramblings!

Why do this with node types?

One of the things that I've learned about dealing with the drupal developer community (I'm relatively new here) is that the development of new core APIs is met with resistance. That's probably for good reason, as new APIs mean more modifications for module authors to make in the future. IMO, the node concept is flexible enough to accommodate document management for the most part.

I like the idea of node types being able to specify their capabilities in a standard and useful format. Maybe a node capability API could return an array keyed on capability name with theme function names as values. Each function could take two params: a $node and the name of the capability being invoked (at a minimum). This is kind of implemented already by the hook mechanism: just define a function called nodename_render_abridged_version(...), for example, to render a brief version of a node. If that function doesn't exist, it is not capable of doing it.

Another reason to use a node as storage for a document instead of a file handler type is that not all "documents" fall neatly into a file format. For example, if you wanted to attach a recipe to a story, as well as an image of the finished product, you could use existing node types to handle this.

Then different nodes can decide what to do with their attachment(s).

That is exactly what the existing node API is capable of. The only missing piece is how to go about using the data to create a node. This seems almost trivial through a web interface ("click here to attach document"..(new window pops up).."choose type"..."enter form fields".."save"..(update parent window via javascript and close window)..). Attachment node is saved and parent node gets a reference to it.

As far as attachment associations go, my initial model for document attachments is the one that most people have used: your email inbox. The example of a MIME document is what comes to mind. That said, I never considered restricting the association to 1:1. Perhaps 1:1 file-to-node, but there could be 1:n nodes-to-nodes. I guess it could be that some nodes use multiple files, but that would be up to the module author to decide what his/her node requires. I've even toyed with the idea of attaching pre-existing nodes to new nodes. This would provide m:n associations where attachments could have multiple parents. For that matter, attachments (in true MIME fashion) could have attachments themselves. e.g. What if a really hot user posts an image of his/herself and lots of other users want to post blog entries with it attached to share the wealth? ;)

I have started work on an attached_node module, and kinda sorta have an idea how it will work, at least for web interfacing document management. It uses the mime_registry module to help figure out what to do with otherwise uncategorized file content, but for the most part lets the user decide what to attach and how. I'll post the code when it actually does something useful.

The issue of alternate attached node creation strategies (post by mail, etc) is still unresolved. Maybe soon. I would like to develop a general solution to document storage, not just one that is completely web-specific.

WhiteFire’s picture

What if a really hot user posts an image of his/herself and lots of other users want to post blog entries with it attached to share the wealth?

A filter that takes something like [inline src=nid.attachment size=small|medium|large|full|NxN] to post the image inline perhaps? Actually, that would be useful for nodes to do to themselves, obmitting the nid before the dot.

Or were you thinking about as an attachment to their message?

~ WhiteFire

javanaut’s picture

I was trying to come up with an example of a multi-parent attachment example (m:n relationship). I guess my mind was heading for the gutter ;)