Batch import files as nodes

xurizaemon - July 27, 2009 - 23:47
Project:File import
Version:6.x-1.0-beta3
Component:Code
Category:feature request
Priority:normal
Assigned:xurizaemon
Status:active
Description

We had a need to batch-import file attachments from an old site, and create a node per attachment.

As most of these nodes were .doc, .pdf or .txt, we also considered automatically extracting the node body from the document.

This is only partial code, but we're continuing our work on it and hope to improve it further.

#1

xurizaemon - July 28, 2009 - 00:10
Title:Batch import a directory of files» Batch import files as nodes

OK, so there are a lot of TODO items sitting in this patch. But hopefully it's a step in the right direction.

Some notes:

  • There are some form elements which have been added but which don't do anything, eg the "Move or copy?" element. See the TODO list at the top of file_import.batch.inc
  • I've restricted changes in the main module to just adding the menu entry. The rest is all in a separate file for simplicity.
  • I've used ereg to define the file attachment matches, but I think this should be simplified for Regular Humans to use, and just be eg a comma-sep list of extensions matched to content extractor tools.
  • PDF imports really failed. Extracting PDF content usefully via ghostscript is non-trivial.

Some changes / future steps which this module suggests:

  • Code in file_import_form_submit() which handles attaching a file to a node can be abstracted into a new function, file_import_attach_file_to_node(), so that it can be re-used throughout the module in multiple locations.
  • Going forward, I'd like to make the attachment process an action triggered when the above function is called, so that (for example) we could also import files and attach them using other methods (eg, as CCK filefield, Ubercart file download, video for video/op_video module, OR as a node's file attachment which is the only option currently).
  • We could do better things with the node title generation.

Attached are the patch and a downloadable copy which you can experiment with easily.

To try it out:

  1. Install the module attached :)
  2. Visit admin/settings/file_import/batch and set up some commands to extract content from file types
  3. Visit admin/content/file_import/batch and import a directory of files. You should see nodes get created with the node body extracted from the file, and the node title matching the filename.

I know the UI has some gaping holes and I'd really welcome any complaints and suggestions for how this could be made more awesome. Please test and give feedback!

Thanks

AttachmentSize
532754-file_import-directory_import.patch 18.74 KB
532754-file_import-batch.tgz 15.29 KB
 
 

Drupal is a registered trademark of Dries Buytaert.