I forgot to answer the rest of your question; I does work with word and pdf files. (also RTF) For pdf files I use the line plugin PDFPlug -no_metadata -use_sections 1 and this gives me numbered pages for word documents it is exactly the same as with the html; you just embed the <!-- <section><description> etc. --> in the body of the documents good luck s. Stephen.DeGabrielle@cdu.edu.au Sent by: greenstone-users-bounces@list.scms.waikato.ac.nz 17/11/2003 09:59 AM
To: nathan shan <canelib@yahoo.com> cc: greenstone-users@list.scms.waikato.ac.nz bcc: Subject: [greenstone-users] Re: Table of contents
Hi Nathan,
you need to have a line in your collect.cfg file
format DocumentContents true (p43 Developers Guide)
and you need taging like in the demo collection document "The Courier - N°158 - July - August 1996 Dossier Communication and the media - Country report Cape Verde" Look at the source ref: ec158e.htm in the gsdl/collect/demo/import folder
You will see the the TOC is generated from the Title metadata in the description blocks of the Sections; (as per p36 Developers Guide)
You can also have paged documents with a pageflipper / goto page number selector by having the chaper titles being 1,2,3 etc.
Look below to see a chopped up version - where I have tried to show the TOC heirachy by inserting some CAPS text of my own
<HTML> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252"> <META NAME="Generator" CONTENT="Microsoft Word 97"> <TITLE><<TOC1>> The Courier - N�158 - July - August 1996 Dossier Communication and the media - Country report Cape Verde</TITLE> </HEAD> <BODY>
<B><FONT FACE="Arial" SIZE=2><P ALIGN="JUSTIFY"></P> <!-- <Section> <Description> <Metadata name="Title">DOCUMENT TITLE: The Courier - N�158 - July - August 1996 Dossier Communication and the media - Country report Cape Verde</Metadata> </Description> -->
<!-- <Section> <Description> <Metadata name="Title">CHAPTER 1: Meeting point (HAS NO TEXT OF ITS OWN _ JUST CONTAINS ANOTHER SECTION</Metadata> </Description> --> </B><P ALIGN="JUSTIFY"></P> <B><P></P> <!-- <Section> <Description> <Metadata name="Title">CHAPTER SUBSECTION Robert Ménard, Director of 'Reporters sans frontières'</Metadata> </Description> --> [TEXT HERE]
<!-- </Section> [CLOSES SUBSECTION] </Section>[CLOSES CHAPTER] <Section>[ANOTHER CHAPTER] <Description> <Metadata name="Title">Aknowlegments</Metadata> </Description> --> [lots of text here...] <!-- </Section>[CLOSES CHAPTER] </Section>[CLOSES DOCUMENT] --> </BODY> </HTML>
---ENO OF FILE-------
Hi
When a document is displayed (especially journal article / book / theses) either through Browsing/Searching, Is it possible to display the Table of contents (i.e. the different headings and subheadings, just like book-marks in MS Word document) of that document and to select and see whichever portion of the document that is required. If possible, pl tell how it could be got.
[There is a mention as "Tagging source document files" in the GSDL documentation. Is it possible only with HTML documents or possible with other documents also like pdf, word? ]
Shanmuganathan
__________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree
_______________________________________________ greenstone-users mailing list greenstone-users@list.scms.waikato.ac.nz https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
_______________________________________________ greenstone-users mailing list greenstone-users@list.scms.waikato.ac.nz https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users |