.doc / .odt parser using Docvert

thatistosay - January 4, 2009 - 13:01
Project:Document Import API
Version:6.x-2.x-dev
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:needs work
Description

I've started working on a parser that uses Docvert ( http://holloway.co.nz/docvert/ ) to import Word and ODT documents. It requires a current version of Docvert to be installed (set up to convert word documents, for .doc support). Right now it's at the 'it mostly works for my set-up' stage :).

- Import of metadata works (just forwards the ODT style metadata that docvert spits out)
- A basic, untested, Parse function is written, but not called by the current dev package of docapi...
- No admin UI yet - docvert path is hardcoded
- Requires: zip, and XMLReader php extensions, and Docvert.

This is a request for comments, I suppose. If there is anything fundamentally wrong with this approach, I'd like to know about it! I'll be continuing to brush this into shape for my purposes, and will post updates when I have them. Any help would be appreciated!

AttachmentSize
docapi_docvert.patch7.01 KB

#1

thatistosay - January 22, 2009 - 16:22

Is anyone actually interested in this? I have the parse function working, and have sucessfully imported documents into new drupal nodes using docvert... the state of docapi means I have created my own means of importing the document so that the parse function is called. I will post my updated code soon, but it would be nice to here comments on this work. Is docapi still alive?

#2

Summit - February 2, 2009 - 15:52

Subscribing, willing to test!
Greetings,
Martijn

#3

bradfordcp - February 19, 2009 - 14:12

It is still alive, albeit lacking quite a few key areas. I am in the middle of porting a module now, but will look at your patch before the week is over. Thanks for your help!

~Chris

#4

thatistosay - March 5, 2009 - 13:32

Ok, here's a small update...

- Still no UI, but now assumes docvert is at [host]/docvert
- Changed from storing the zip files to storing dirs with the contents, allowing use of the images
- image links in the doc body are automatically prefixed with the appropriate path
- a cron job will clear out docvert dirs if there is no docapi doc of the same name in the docapi library
(this could be cleaner..)

I suppose using an existing image module for storing images might be better that my approach, but the above is enough for me, and makes it easy to remove image files when the doc is removed from the library

AttachmentSize
docapi_docvert.patch 8.68 KB

#5

Dinis - March 10, 2009 - 20:23

Very interesting project, will start testing.

#6

estebandido - April 28, 2009 - 14:36

I'm very interested in this project, and willing to help test. I just discovered it, so I haven't grokked it yet, but I will look for documentation.

#7

pvanerk - July 23, 2009 - 12:00

I am also very interested in this project. If you need help testing, please let me know!

#8

kriskras - September 16, 2009 - 13:30

Awesome idea! Unfortunately I always get:

# Could not convert this file.
# A valid parser was not found, please manually select from the list of available parsers.

Even when I manually select the doc/odt parser from the advanced settings I get this error.

I hope you still have the time to sort this out!

#9

developer-x - October 3, 2009 - 19:42

Is this still being worked on? I'd be happy to test. If you could supply some installation instructions, that would be great.

#10

Moophz - October 9, 2009 - 06:48

Good day everyone, where to download docapi_docvert addon from ?
Thanks anyways.

#11

thatistosay - October 11, 2009 - 11:55

A couple of comments. Docapi itself is rather incomplete. It doesn't actually work, as it stands (the parser is never called). This means my docvert plugin is only usable when one manually calls it's functions in code.
I am using it in this way as a method of posting content to my site, and am considering making it separate to docapi, which seems rather dead. A shame really..

 
 

Drupal is a registered trademark of Dries Buytaert.