I am interested in developing a node registry for associating node types, content-types and filename extentions together. There are many instances when a chunk of data needs to be inserted into the CMS and choosing the right node type must be done. The two usual meta-data items typically available are content-type and/or filename extention. If these could be used to create nodes, it would reduce a lot of hard-coding that currently goes on.

In my vision, an admin could edit the mapping to direct certain file types to become whatever node type he/she specifies. Each module would then have to implement a hook to parse the data chunk into a proper format and create a node/nodes in response. Both content-type and extention could be expressed as regular expressions.

An example of this would be a mapping with

content-type extention node type weight
image/.*   image 0
audio/.*   filestore2 0
application/.*   filestore2 5
  exe antivirus_filestore -2
  pif antivirus_filestore -2
  .*htm.* story 10

A default node type would have to be specified. This would not be part of the registry node type resolution unless specifically requested. Possible conflicts between multiple entries that would all match are resolved by the weight parameter. Data would be processed in ascending order of this weight value (i.e. the "exe" entry would match before getting to application/.*). Perhaps a registry_resolve function could return the function name of a node creation hook:

  function drupal_registry_resolve($query, $return_default=0) {
    // $query is an array containing keys content-type and/or extention
    // a non-zero value for $return_default would return the value specified for the default node type
    ...
    return $nodetype . "_create_hook";
  }

This would be called like:

  $new_node_data = array('uid' => $owners_uid, 'title' => 'hello.world', 'content-type' => 'video/mpeg', 'data' => $some_video_data_chunk);

  $nodehook = drupal_registry_resolve($new_node_data, 1);

  // no specific registry entry was found, use default (filestore for now..)
  $newnodes = $nodehook($new_node_data);

  // node has now been saved

If a node_create_hook didn't know what to do with the given data, return false or nothing at all. Otherwise, return an array containing newly created node(s).

Some issues that this brings up are:

  1. Does the poster (not current user...cron?) have permission to create nodes of this type?
  2. If we did know which permission to check against for node creation, user_access would have to be modified to accept a uid other than the current user's uid.

I could see the mailhandler module using this to sort out message attachments, the file upload process could use this to intelligently store arbitrary file types in a more appropriate format than just storing the data. Content could be created using alternative means such as WebDav. Many groupware applications store all kinds of files and many of them. This could make such collections much easier to search, manage and use due to the additional context that the more specific modules bring.

Though this could be implemented as a module and a patch, I think it belongs in Drupal core. It would need to become part of the module API to be of any use to anyone. Your thoughts and comments are requested.

Comments

Steven’s picture

To me it doesn't really make sense: the node system is a generic system, so you cannot just go around inserting any sort of data as any sort of type. This 'generic handler'-hook would have to be some sort of magic decoder as soon as you move to more advanced node types and setups. For example, how would do you deal with taxonomies?

Also, I think your example of antivirus_filestore is a bit odd: such functionality should be part of filestore or even the file system itself: you can't expect the admin to know which types are dangerous.

What would the purpose of this be? Your example seems limited to uploading/retrieving files with various protection methods.

javanaut’s picture

Your reply brings up the need for some clarification. This would be more useful for document management moreso than generic node management. Generally, any node type with content that would typically be maintained in an external file or file format would find this tool be useful. Document management modules would be most likely to use this (filestore, mailhandler, webdav, etc..). A generic (intelligent?) file upload module could be implemented that would call on this registry to contextualize the document.

Consider this example (apologies if the antivirus_filestore example was confusing):
You want the copy writers at your office to be able to edit documents via webdav. You use a webdav module. They all, for some reason, prefer to use MS Word as their medium of choice. You use a msword module to manage the document (render as HTML, do basic editing, render as Word doc, etc.). The webdav module would use the content-type passed to it to determine that it should use the msword module to store the content. User id could come from the webdav login. Note: I'm using non-existent modules as examples so that you're not caught up on the specifics of what a module does, but more how they would use the registry feature.

To answer your taxonomy question: For webdav, the folder name would provide the taxonomy information. For mailhandler, a mail command could be specified to set the taxonomy. Filestore could just ask for it on the upload page. A user preference could be referenced to calculate this as well, maybe the time of day...it's up to the calling module to figure out.

The purpose would be to add a Document Management framework to the Content Management System. There are currently several implementations of document management systems, but they're either very specific (image module) or very generic (filestore module). I think that maintaining files in a more appropriate context will help with several aspects of their lifespan. A major bonus would be finding them using a search engine. Example: If the metadata of an image is stored (by using jhead to parse exif data for example) then you have much more data to search on than just filename or taxonomy.

I've seen document collections get very huge and unmanageable. The more metadata and contextualization you can provide, the easier it is to manage.

moshe weitzman’s picture

i see how a content type registry is useful. nice examples ... you might have to write a simple document mgmt module to illustrate the power of this registry.

javanaut’s picture

I'll start my examples with a module (mime_registry.module?) containing a _mailhandler hook. It will look up the content type of each attachment and create a node for it if a proper _mime_create function exists for the configured node type. I'll also define an image_mime_create function for creating images from the mail attachments (hello moblog) as well as a filestore_mime_create function.

The beauty of this solution won't be realized until there are _mime_create implementations for multiple modules that can all handle the same type of content (e.g. multiple image/image gallery modules).

javanaut’s picture

I've finished a prototype of the basic module. I included an implementation of the mailhandler hook to act as a mime_registry client, and a mime_create hook for the filestore module (included in the same source file). Here is where I intend to keep updates to this module until I get my CVS access straightened out.

javanaut’s picture

Just an FYI, I've recently committed to cvs the cocktail of modules required to support the moblog module. From the readme:

Requirements:
* mail_handler.module must be installed
This module handles interacting with your mail server to pull postings from there.

* mime_registry.module must be installed
This module helps you determine which type of files get created into which types of nodes.

* mailalias.module *should* be installed
Lets users specify multiple email addresses (your email addy is how mail_handler knows who posted what).

* attached_node.module *should* be installed
Handy means of linking to nodes from other nodes. Adds flexibility for your users in how links to other nodes are displayed.

* You must have access to a POP/IMAP server for receiving content.

* You must be sure to setup the cron job to run frequently (like every 5 minutes).

I wrote the mime_registry, attached_node and moblog modules. I thank others for mailhandler and mailalias, two excellent modules. I suppose you would also want to install the current cvs version of the image filestore2 modules, as I'm sure most mobloggers aren't necessarily posting plain text.

There are still some missing features and random bugs, but they will hopefully be worked out by the time drupal 4.5 comes out.

moshe weitzman’s picture

thanks for the compliment on mailhandler and mailalias. i wrote mailhandler with the specific hope that moblog like apps would be written on top of them. kudos to you, javanaut.

i will soon be using these modules regularly.

javanaut’s picture

Thank you, too!

Once d4.5 comes out, I'll be more inclined to polish up these modules. Currently, mime_registry digs into the internals of other modules to create nodes, and at this point, all supported modules are moving targets. I'm quickly throwing together a video module that's basically 90% filestore2 (thanks gordon!) with some video format/conversion features added onto it. My phone posts audio and video in 3gpp format, which most browsers have no clue what to do with. MPlayer to the rescue! I've been working on it for about 2 hours, and I'm half done ;)

javanaut’s picture

Simple Mime Registry:
mime_registry.tgz

Buggy, incomplete (but mostly working) Moblog:
moblog.module

Note, these were developed as a single module, then were quickly refactored. There may be remnants of one still resident in the other.

I'm starting to collect modules that I'd like to publish on drupal.org. I know Kjartan has been on sabbatical, so I will wait until he is ready.

moshe weitzman’s picture

i suggest completing the cvs access form again if you did so already.

looking forward to seeing your modules in Contrib.

javanaut’s picture

..well, one of them was recycled:

mime_registry.module: as per discussion above
moblog.module: uses mailhandler+mime_registry to let users post via email. Still slightly buggy, but it's at least in CVS now.
friendlist.module: social networking module (n-degrees of separation, friend-of-a-friend-of-a-friend-of-a...). This mod is incomplete.

I'm personally interested in developing a WebDAV module that uses this, as well as a generic "attached_file" analog. First, however, I think a better way of attaching and associating nodes needs to come about. I also think much of this functionality for document management belongs in drupal core.