|Forwarding a copy to the mailing list:
The ticket number 426 at Trac has not been updated to reflect the
Greenstone Team's more recent discussions on the topic. It seems likely
that we will be trying to use Open Office as an alternative means of
converting various documents--particularly the different kinds of MS
Office documents, such as .docx, ppt--where this is installed, else we
may default to using the current conversion tools.
You are right that a plugin to handle Open Office documents is also a
requirement. Our present thinking at this stage is not to parse the docx
or other MS Office formats ourselves, but to rather use Open Office's
ability to convert them. Similarly, we might try to use Open Office
itself to convert Open Office documents or otherwise fall back on the
OpenDocument plugin (see below).
Your intention in writing a plugin for OOXML sounds like it might not
require Open Office to be installed in order to accomplish the
conversion to HTML, is that right? It does sound very helpful.
> I haven't dug into the OpenDocument plugin,
At present, the OpenDocumentPlugin.pm seems to extract only the text
from the document, going by what the starting comments say:
# Processes OASIS Open Document format.
# Word processing document: .odt, template: .ott
# Spreadsheet document: .ods, template: .ots
# Presentation document: .odp, template: .otp
# Graphics document: .odg, template: .otg
# Formulas document: .odf, template: .otf (not supported)
#This basically extracts any text out of the document, but not much else.
# this inherits ReadXMLFile, and therefore offers -xslt option, but does
# nothing with it.
Doug Carter wrote:
> Hi all,
> Can anyone give me a rough development status on this? All of the
> converters mentioned on "http://trac.greenstone.org/ticket/426" are
> deficient in one way or another. Plus most of them are concerned only
> with docx and not the whole OOXML family of documents.
> I noticed that you've already got an OpenDocument plugin available.
> This may be an ignorant question, but isn't the framework of a ODT
> similar to OOXML? (a zip file with some xml/images)
> I haven't dug into the OpenDocument plugin, but I'm wondering why it
> wouldn't be a good starting place to develop a OOXML plugin.
> Before I get too far looking into this, I'd like to know if someone
> is already working on it. I don't want to duplicate the effort.
> On Thu, Dec 04, 2008 at 06:52:10PM -0800, Jeff Crump wrote:
>> Another message from Greenstone, with a link to the converter "to do"
>> ticket. There are lots of potential conversion tools in the ticket
>> report. They are open to our input on the tools (and I guessing you may
>> have already made determinations on some of them).
>> -----Original Message-----
>> From: Anupama of Greenstone Team
>> [mailto:email@example.com] Sent: Thursday, December
04, 2008 6:39 PM
>> To: Jeff Crump; Katherine Don
>> Subject: Re: [greenstone-users] Microsoft Office 2007 revisited
>> Hello Jeff,
>> > Is this a matter of finding a pre-existing converter that works with
>>> Office 2007, or writing it yourselves?
>> It is indeed a matter of finding a pre-existing open-source converter
>> tool and embedding it into Greenstone's building workflow by wrapping it
>> in a plugin.
>> I have added a "To Do" ticket for this, as intimated (see
>> http://trac.greenstone.org/ticket/426), but will ask whether we can
>> expedite our looking into this matter. There are many people using
>> Office documents who want Greenstone to be compatible with the more
>> recent formats. It is certainly the only way to proceed.
>> The filed ticket has some links to conversion tools for us to
consider. Is there any *open-source* tool that you may know of in your
>> and which you prefer for its ability to accurately accomplish the
>> conversion? If so, we can give it preference when we consider which one
>> to incorporate. If you do not know of any, don't worry.
>> Thanks for your valuable input,
>> Jeff Crump wrote:
>>> Hi Anupama,
>>> Thank you VERY much for clarifying this! I asked my original question
>> badly. Without the ability to convert Office 2007 to html and make it
>> searchable full-text, we would have no reason to upgrade from 2.52.
>>> I'm afraid our organization will soon be upgrading entirely to Office
>> 2007. Our Greenstone Digital Library is actually our organization's main
>> document repository - documents go in and out each day (we rebuild the
>> library every night), so it isn't really an option for us to have be
>> unable to convert Office 2007.
>>> Is this a matter of finding a pre-existing converter that works with
>> Office 2007, or writing it yourselves?
>>> Thanks and best regards,
>>> -----Original Message-----
>>> From: Anupama of Greenstone Team
>>> Sent: Wednesday, December 03, 2008 6:17 PM
>>> To: Jeff Crump
>>> Subject: Re: [greenstone-users] Microsoft Office 2007 revisited
>>> Hello Jeff,
>>> Greenstone at present is still using the wvWare tool to convert MS
>> Word documents to html. It is not able to handle the more recent Word
>>> We intend to look into alternative conversion tools both for Word and
>> other MS Office formats. I'll add in a "To Do" ticket about this.
>>> Jeff Crump wrote:
>>>> Hi, We're on Greenstone 2.52. We are considering an upgrade to 2.8x
>> but our main requirement is that Microsoft Office 2007 documents build
>>>> I was told on this list to try 2.8, but we can't find any
>> documentation anywhere saying that Office 2007 (Word, Excel, Powerpoint,
>> etc.) is supported in 2.8.
>>>> Can anyone point me to this documentation? Or can anyone verify from
>> experience that Office 2007 documents are supported in Greenstone 2.8 or
>>>> If all we need is the right plugin, rather than the full upgrade,
>> that would be great to know, too.
>>>> Thanks again,
>>>> greenstone-users mailing list