POOOO, the Pet OpenOffice Odt Obtainer, reads OpenOffice Odt format files in to nodes for presentation on your Web site.
Well, it will when I finish the code.
People asked about sucking various XML formats in to Drupal nodes. This is about sucking OO through POOOO. You need OO version 2.
The following links are discussions on XML input and editing. POOOO can be pushed in to some of the solutions.
http://drupal.org/node/5887 Create articles from HTML, RTF, Microsoft Word, Microsoft Excel documents
http://drupal.org/node/33737 Best format for flexible content import, export, and publishing
http://drupal.org/node/33843 Bitflux Editor module
I have a class to unzip an OpenOffice Odt file into text strings. I have XSL transformations working in my Web site. I will add an XSL transformation to merge the OO Odt component strings into one XML document. The XSLT will have to know about the classes available in CSS. I use only a few CSS classes in my site theme so will start with them. Perhaps other people can then test with other themes.
My requirement is to store the original text in parallel with the node so that I can rerun the transformation when the transformation changes. Once stable, the transformation could be performed during the upload.
I also want to edit the XML online using one of the XML editors, which is another project mentioned in another post.
If someone wants to think about the module side of the code then I would be happy to have some input and eventually some help. My module will manually upload to table node_xml and transform across to the final node. I will make the node id the same in both tables. I will manually create the XSLT. This will suit the 0.18 percent of people who are paranoid about data accuracy. I am happy to have the other 99.82 percent of people set up other options.
There could be a configuration option to bypass the intermediate table and stuff the transformed XML straight in to the node. Someone else could go mad with the administration options.
The XSLT will do what I do in some non Drupal sites. Paragraph styles will become divisions and text styles will become spans. The styles in my OpenOffice documents will look the same as the CSS in my site. Initially I will convert Microsoft Word documents using OpenOffice. One day I may bring across my Word converter.
Comments
headings, paragraphs, and span working.
I converted a small OO document to the following text.
OO presents style
Heading 1asHeading_20_1in the XML and I change it to<h1>.Character styles become spans with their class set to the style name.
Paragraph styles become paragraphs with their class set to the style name. These could be changed to divisions so that you have more freedom with the formatting.
http://petermoulding.com/technology/content_management_systems/drupal/
petermoulding.com/web_architect
this is great!
I was just thinking today that this would be awesome to have in Drupal. Now I can't code, but I would be more than willing to help test this module once you are ready :)
Great stuff!
This will be a nice complement to other XML formats such as DocBook and DITA. It might be nice to support Microsoft Office XML also :)
Djun
--
puregin
files system
hello there,
This sounds like a brilliant initiative. I was draawn to this post because I thought we had another ranting user ;)
Anyway, Keep an eye on betterupload.module, i am transforming it in a general file "thing". As discussed on the DrupalCON.
In a nutshell: For you the interesting part is that it will fire hooks, based on mime types on uploads of files. I will post more about all this when I have more available.
Just posting this comment to make you aware of this.
---
if you dont like the choices being made for you, you should start making your own.
---
[Bèr Kessels | Drupal services www.webschuur.com]
application/vnd.oasis.opendocument.text
When you get mimetype
application/vnd.oasis.opendocument.textthen fireWe should be able to test a few OO documents using your upload.
http://petermoulding.com/technology/content_management_systems/drupal/
petermoulding.com/web_architect
ODT != open office
I would like to stress the fact that ODT is a *general* open document format. It is the native OO.org format, but it is NOT an open office document per sé.
Koffice supports it, ABIword supports it and there are rumors that MS office will support it too (haha).
In any case: ODT is the open document format from OASIS, not just from open office.
---
if you dont like the choices being made for you, you should start making your own.
---
[Bèr Kessels | Drupal services www.webschuur.com]
I am working only with OO 2
I am working with files created by OO 2.0. My sample files probably use o.012 percent of the OpenDocument range. Within the ODT format there are lots of different formats defined by namespaces.
One namespace is office: which is probably unique to OO. The office: namespace points to urn:oasis:names:tc:opendocument:xmlns:office:1.0.
You do not have to support everything that OO puts in the XML because a lot of it is replaced by your CSS. The OO file contains definitions of styles but you replace them with the styles defined in CSS. All I transfer is the class name. If you try to transfer all the formatting from the OO file to the node then you have to transform the styles to CSS styles for inclusion in the page. You can then get conflicts when two nodes include two different styles with the same name. By transferring only the class name you remove duplication errors. The OO documents adopt your standard CSS styles.
You might need a translation table at some stage. The admin page could let you put in aliases for your CSS styles. If one document defines code as style
Codeand another document uses styleexamplethen both could be translated to classcode.I used Abiword way back when computers used sheets of slate instead of floppy disks and Abiword used a format where styles were put in long strings that were nothing to do with XML. They may have used ODT but that does not mean they used all the facilities of ODT. Hopefully they have or are converting to the full ODT so that the one processor can process both Abiword and OO in the same load.
Abiword used to have everything in one XML file but OO has the same data spread over several files with those separate files zipped together. They both create ice cream but one produces passionfruit ice cream while the other produces chilli mango ice cream with olives on top. (Dont panic, I have recipes for both.)
I use XSL and one function from ZLIB to separate out the OO data. Some time ago I did write PHP code to expand a zip file but that was for a different compression method to the one used by OO. My code for OO processes the file descriptors in PHP and then inflates the text using gzinflate. That gets around the problem with the PHP ZLIB functions working only with files on disk. My code does everything in memory.
The unzipping is in class punzip and the OO XML to XML is in class pet_openoffice. I like consistent naming conventions, consistently different. :-)
I have other code that calls PHP functions from XSLT which means I can pick up information from an administration interface. If you want to translate class names based on entries in a table or Drupal variables, then that can be done through a PHP function. You could form a committee now to work on what you might need in the administration interface. I could maintain the classes and the ZSLT. A module could load the OO document in to memory via an upload or anything else and then pass the string through my class and then save the result in to a node.
I can contribute documentation for the transformation. I have test pages set up with test documents. I just added a template to convert ordered lists in to ordered lists. OO uses a predefined class name for the list. I have not tried any more examples.
http://petermoulding.com/technology/content_management_systems/drupal/
petermoulding.com/web_architect
OASIS on OpenDocument
From (with comments!):
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
OpenDocument v1.0 Specification
The OpenDocument v1.0 specification document is available in PDF format (706 pages!) at
http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1....
and in OpenOffice.org XML format at
http://www.oasis-open.org/committees/download.php/12573/OpenDocument-v1.....
The specification defines three Relax-NG schemas:
1. The schema for office documents (defined in chapters 1 to 16),
2. the normative schema for the manifest file used by the OpenDocument package format (defined in chapter 17), and
3. the strict schema for office documents that permits only meta information and formatting properties contained in this specification itself (defined in appendix A).
These three schemas are also available separately:
1. OpenDocument v1.0 Relax-NG Schema (extracted from chapter 1 to 16 of the specification):
http://www.oasis-open.org/committees/download.php/12571/OpenDocument-sch...
2. OpenDocument v1.0 Manifest Relax-NG Schema (extracted from chapter 17 of the specification):
http://www.oasis-open.org/committees/download.php/12570/OpenDocument-man...
3. OpenDocument v1.0 Strict Relax-NG Schema (extracted from appendix A of the specification):
http://www.oasis-open.org/committees/download.php/12569/OpenDocument-str...
OpenDocument FAQ
Many questions (but nothing important!) about OpenDocument are addressed in the FAQ at
http://www.oasis-open.org/committees/office/faq.php
---
http://petermoulding.com/technology/content_management_systems/drupal/
petermoulding.com/web_architect
I read the 706 page specification
Over a cup of tea, a very large cup, I read the 706 page specification. The OpenOffice files produced by OpenOffice 2.0 are different to the specification. I will base my work on the files instead of the specification.
http://petermoulding.com/technology/content_management_systems/drupal/
petermoulding.com/web_architect
Table styles
The OpenDocument specification specifies three types of styles. I created a document with two of those style types but OO squashed both into one type. It will not materially affect the XSLT but may mean the XSLT will break when OO fix up their creation of styles.
OO lets you create table styles and table content styles. Both create a paragraph around the content of a cell and neither applies a style to a table. I will have to make the XSLT find the paragraph style and apply that as the table class name. The XSLT can then throw away the paragraph element. There will be a real problem if anybody uses paragraphs within table cells and then styles the paragraphs.
It is a pity OO defined their XML for tables without looking at CSS or any one of many excellent table definition systems that existed long before OO created their first table. Some aspects of the OO approach are similar to using a da Vinci surgical system, http://www.intuitivesurgical.com/products/davinci_surgicalsystem/, to open a can of beans while others are similar to using the empty bean can to perform a heart lung transplant.
OO lets you create a border around a table without placing the border in a style or anywhere else that is useful. This is one of many inconsistencies in the way OO handles the presentation of tables. The easiest approach is to throw away OO's styling of tables and apply a CSS style. The CSS style has to be a fixed style or created from the paragraph style name.
If you autoformat a table, the autoformat is not carried through to the document. You cannot use autoformat to provide a class name for a table. I tried removing autoformat from a table but could not. OpenOffice must have all the logic hard coded in to their application.
If you find a way in OO to style a table using a named style then please post detailed steps here so that I can add a test case to my XSLT development.
http://petermoulding.com/technology/content_management_systems/drupal/
petermoulding.com/web_architect
I also have created a
I also have created a quick-and-dirty XSLT stylesheet at http://books.evc-cit.info/odf_utils/odt_to_xhtml.html; it's LGPL, so feel free to use any part of it that you find useful.
OpenDocument importing would be useful
This will be a useful addition to Drupal. Last year, when I first noticed that the OpenDocument format was mainly a zip file containing XML and images, I started thinking about how to create a Content Management System using OpenOffice.org 2.0 for all (or most) content creation. Drupal already fills the CMS role nicely. Now we just need the OD handling part and we'll be good to go. :-)
What is the current status of this project?
4.7.0
I am rewriting stuff for Drupal 4.7.0. OD is the lowest priority. Perhaps the end of May.
petermoulding.com/web_architect
petermoulding.com/web_architect
Is this still being worked
Is this still being worked on?
I'd like to see the ability to import ODT files into Drupal nodes... It's the one feature that's lacking for me at the moment.
--
\/ushi - xushi.co.uk
/\ socialprotest.com
Replacement
It doesn't appear that it's being worked on. But I've thought of another solution, which I have yet to try out.
1. Convert ODT to MediaWiki
- http://wiki.services.openoffice.org/wiki/Odt2Wiki
2. Use a drupal filter that allows MediaWiki input, either pearwiki_filter or http://drupal.org/project/flexifilter
HTH,
-=-=-=-
http://www.jdarx.info/
subscribing
subscribing