I tried to search around to see if there is any Microsoft Word/Excel import module. Unfortuantly, all the posts that I was able to find were about a year old. Is there a Word/Excel import module in development or has this idea been passed off as being unneccesary?

This is the one hurdle that is keeping Drupal from being implemented at my work. What my users would need is to specify the word document. After that, Drupal would need to rip out any images and move them to the files location. Drupal should then attach the document to the node (eg: attachment module) and generate the node body with the documents contents while stripping out comments and any other hidden fields.

I have also considered using the attachment module with the htmlarea module. The users would have to export their word docs to HTML and then copy and paste into the htmlarea. And even then, the images would not work correctly. This didnt seem like a solid soultion for this need.

I am aware that going from RTF files or Open Office documents would probably be easier. However, word is whats used (in this company, as well as many others) and would have to be supported with any DMS to be considered.

Thanks,
- DF

Comments

binford2k’s picture

This tool would probably be a good place to start with for your module.

http://www.winfield.demon.nl/ (antiword)

It does text extraction and formatting pretty well. It even does a pretty good job of rendering tables.

If you wanted to go all out and get full html, pics, etc, then this might be a better choice:

http://holloway.co.nz/docvert/ (docvert)

However, it depends on OpenOffice (and X and all its requirements) so it may not be the best choice for a server environment. It all depends on your needs.

andrewgearhart’s picture

I was curious if anybody has made any movement on this. I too am interested in the ability to convert Word -> HTML.

Andrew 'Mickey Knox' Gearhart
Web Developer kinda guy

cog.rusty’s picture

If you use TinyMCE for a wysiwyg input format, there is "paste from word" button. I do not use it myself because I have no control over the consistency of style, but you could look it up and try it.

Another solution, without a wysiwyg editor, is to require from the users to save as html in MSWord. Then you cleanup the html using the htmltidy module as a filter.

andrewgearhart’s picture

In my particular situation... I need a way to do it programmatically. The problem I have is that the editorial department where I work is creating all of their stories in MS Word and uses a piece of horrendous software to "index"... well... indexing is probably too strong a word... point to a series of sequentially numbered word documents. The software also is responsible for retrieving stories off of the AP newswire and store them in RTF format. Sooo... the goal was to be able to query for the items in the database (msSQL) and then rip the content out of the MS-Word documents so I can stop copying/pasting from word into a TinyMCE editor.

I'd love to eventually migrate them away from the other software... but I would need a method to be able to drop content from a web-browser onto a page layout application such as QuarkXpress or Adobe inDesign.

Ahh... the wonders of "modern" publishing!
Andrew 'Mickey Knox' Gearhart
Web Developer kinda guy

twohills’s picture

But TinyMCE has "Automatically cleanup MS Office/Word HTML will be executed automatically on paste operations. (Only works in Internet Explorer) " which does seem to work nicely and the end-user has to do nothing different at all.

As I recall htmlarea has nothing??

I haven't looked at wgHTML