|Date||Thu, 18 Aug 2005 10:06:40 +1200|
|Subject||[greenstone-users] Instructions for improved Word, PPT, PDF Plugins|
These are some simple instructions on how to play around with the improved Word, PPT, PDF plugins.
WordPlug: You can take advantage of VB script by switching on -windows_scripting to convert the Word document to HTML. It also allows user-defined header setting for up to three levels and extraction of metadata from the document. These features are only available on Windows.
WordPlug -windows_scripting -level1_header (level1Header1|level2Header2|...) -level2_header(level2Header1|level2Header2|...) -extracted_word_metadata_fields Author<Creator>,Subject,Keyword<Subject),...
The headers set in the regular expression are the possible user-defined heading styles from the documents you collected. The default is to split the document on <H1>,<H2> tags and so forth. Word documents that use the built in "Heading 1", "Heading 2" styles automatically get mapped to <H1>, <H2> respectively. If the Word documents you are using are indeed like this then you do not need to activate these options to take advantage of the enhanced hierarchical section ability as this will happen automatically.
With the extracted_word_metadata_fields option, a comma separated list of metada fields needs to be specified. This works similarly to HTMLPlug metadata_fields. Use 'tag<tagname>' to have the contents of the first ‘tag’ put in a metadata element called 'tagname'. Capitalise this as you want the metadata capitalised in Greenstone, since the tag extraction is case insensitive. This is only available when windows_scripting is on.
PPTPlug: Under Windows, Powerpoint document can be converted to HTML, TEXT, GIF, PNG, JPEG. In Linux, the documents can only be converted to HTML and TEXT. To enable the conversion to image types on Windows, you need to switch on windows_scripting. Once the document has been converted, it can then be processed by PagedImgPlug and each slide in the PPT document will be displayed as a single image.
PPTPlug -windows_scripting -convert_to pagedimg_gif
PDFPlug -convert_to pagedimg_gif
Hope these instructions are useful. Please let us know if any of these does not work the way they should be. It will help us to improve and stabilise the new features.