From Katherine Don
DateTue, 21 Feb 2006 16:49:30 +1300
Subject Re: [greenstone-devel] Word and RDF files
Hi NataLy

> I have a problem with adding PDF and MS Word files . These files are very
> big, consisting 300-500 pages.
> I use plugins: HTMLPlug, WordPlug, RTFPlug, PDFlug. During "import"
> operation I have error: files can't be processed with any plugin.
> What is reason of this problem?
In the GLI, try setting the mode to expert (File->Preferences->Mode),
and then rebuilding. This should tell you what the error was. If it
doesn't, set the verbosity option in import options to 5 and rebuild.

> Can I add Word or PDF files without any processing? I want to have only
> icons of PDF and MS Word files (srcicon); when user clicks on icon file
> opens.
You can process them using UnknownPlug - see
Ifyou do this, no text will be extracted for the files, and you won't be
able to search on the text, only on any metadata you have added.

> Does Greenstone support few files in one document? For example, one book has
> html, pdf versions and few audio or video files.
Yes. If you are using version 2.63, you can set some options to the plugins.
If you have images or other files linked to from an html file, then they
will be included anyway. If you have other files, you can use the
associate_ext and associate_tail_re options.

For example, if I have book.pdf, with a word version, book.doc and a
couple of supplementary images, book_1.jpg and book_2.jpg, then I can
add the following options to PDFPlug:

-associate_ext doc
-associate_tail_re _d.jpg

Note that all files have the same root (book).
The PDFf document should end up with all the other files as associated
documents (check the archive file to make sure). And there should be
metadata like: equivlink, jpg.1, jpg.2, jpg.assoclink. again check the
archive file to see what metadata has been assigned.
This metadata can then be used in the document format statements to link
to the other files.
Note that this feature is very new, and may be a bit buggy, so please
let us know if you have any problems with it.


