|On Thu, 2004-11-18 at 08:22, Manfred Jung wrote:
> Is there a plugin for OO.org documents?
> If not
> 1. what would it take to make one?
Fortunately, OO.org uses nice formats to store its documents, unlike
Microsoft Word. For example, an OpenOffice.org text document (.sxw)
is a .ZIP file that contains a number of XML files. Eg:
$ unzip Invitation.sxw
All the user-supplied text is inside the content.xml file. If you just
wanted to do very simple extraction of text for indexing, you could
merely remove all XML tags and index the remaining text.
In terms of doing this inside greenstone, probably the easiest
way would be to take an existing plugin and rename and modify it.
You would need to supply a 'read_file' function that is given the .sxw
filename and does the unzipping to a temporary directory.
I suggest looking at the MARC plugin for this, and maybe at the
TEXT plugin for the minimum of what a simple plugin has to do.
Greenstone on windows comes with an unzip.exe binary, and nearly all
Unix platforms have unzip installed on the system.
> 2. is there any documentation for that?
The Greenstone Developers Manual talks about plugins, and I assume that
the OO.org file formats (writer, impress, etc) are documented on Sun
Microsystems' website somewhere, but they are xml anyway so it should be
straight-forward enough to figure out.
> 3. Anyone willing to work jointly on this project?
I personally don't have time to help on this, sorry.