[greenstone-users] Section classifiers

From Vladimir R. Risojevic
DateSat Jan 17 07:12:33 2009
Subject [greenstone-users] Section classifiers
In-Reply-To (003401c977ef$d1ea6ba0$75bf42e0$-gov-ar)
Diego,

I'm glad that you liked the site. Thank you.

I also think that Books from the Past is not native Greenstone, but I like it
very much. If someone from the Greenstone team could give me some ideas where
to start I could try to do some modifications to the source code to enable
this feature as well as showing both the table of contents and the goto box,
which I already mentioned in my first post.

Thank you very much for your comments.

Regards,

Vladimir


On Fri, 16 Jan 2009 13:33:45 -0200, Diego Spano wrote
> Hi Vladimir,
>
> Very nice site!!!.
>
> I think that bookfromthepast is not Greenstone in its native
> version. I think that people who do the job modified the source code
> to get that behaviour. The case is that when you enter a section,
> you get a list of pages, so GS will not show anything until you
> select one of those pages. The key point I think is to give the
> instruction to show the first page of the section automatically!.
> But I don□t know how....
>
> Diego
>
> -----Mensaje original-----
> De: Vladimir R. Risojevic [mailto:vlado@etfbl.net]
> Enviado el: viernes, 16 de enero de 2009 13:05
> Para: dspano@orsna.gov.ar
> CC: greenstone-users@list.scms.waikato.ac.nz
> Asunto: Re: [greenstone-users] Section classifiers
>
> Diego,
>
> Thanks for the tip, I haven't thought about it. I'll try to use it
> to detect empty pages and skip them. At the moment, I have something
> similar to your collection, see screenshots, or
> http://www.nubrs.rs.ba/dl/ and I'm pretty satisfied with that. What
> I would like to do is to skip pages like the one on screen2.jpg,
> which is the header page of a section I presume. But I'm not sure
> how to do it. Something like that is done on
> http://www.booksfromthepast.org, which is also based on Greenstone
> as stated on http://www.booksfromthepast.org/Aboutus.asp?l=en,
> although they obviously modified Greenstone. Any ideas how that
> could be done?
>
> Regards,
>
> Vladimir
>
> ----- Original Message -----
> From: "Diego Spano" <dspano@orsna.gov.ar>
> To: "'Vladimir R. Risojevic'" <vlado@etfbl.net>;
> <greenstone-users@list.scms.waikato.ac.nz>
> Sent: Thursday, January 15, 2009 1:40 PM
> Subject: RE: [greenstone-users] Section classifiers
>
> Vladimir,
>
> There is another way. If you take a look at doc.xml file, you can
> find that empty pages and "normal" pages differs in one thing: the
> first ones doesn□t have [Image] metadata. So you can manage that
> problem with a format statement like this:
>
> {If}{[Image],html_code_to_show_the_image}
>
> I have a collection built with pagedimgplug too and Pagegroups. I□m sending
> 3 screenshots just to show you how I get the section list and the
> image. I□m displaying tiff images with an embed plugin.
>
> Regards.
>
> Diego
>
> -----Mensaje original-----
> De: greenstone-users-bounces@list.scms.waikato.ac.nz
> [mailto:greenstone-users-bounces@list.scms.waikato.ac.nz] En nombre
> de Vladimir R. Risojevic Enviado el: mi□rcoles, 14 de enero de 2009 20:29
> Para: greenstone-users@list.scms.waikato.ac.nz
> Asunto: RE: [greenstone-users] Section classifiers
>
> Diego,
>
> Thank you for your post. Unfortunately, I don't have the "-
> headerpage" option set. I can avoid empty pages if I don't use the
> PageGroup, but instead assign metadata with section titles to the
> first pages of sections. But that doesn't feel right because these
> item files don't represent the structure of the document anymore.
>
> Regards,
>
> Vladimir
>
> On Wed, 14 Jan 2009 10:12:29 -0200, Diego Spano wrote
> > Vladimir,
> >
> > Just want to give some help to one of your problems. You said that
> > "the problem of empty introductory pages for each section. Is it
> > possible to avoid them, because they break the continuity of the
> material".
> >
> > Do you have the option "-headerpage" set in PagedImgPlug?. If you have
> > it, remove it and rebuild the collection. This way there will be no
> > more blank pages!.
> >
> > Hope this helps..
> >
> > Diego
> >
> > -----Mensaje original-----
> > De: greenstone-users-bounces@list.scms.waikato.ac.nz
> > [mailto:greenstone-users-bounces@list.scms.waikato.ac.nz] En nombre de
> > Vladimir R. Risojevic Enviado el: mi□rcoles, 14 de enero de 2009 6:28
> > Para: kjdon@cs.waikato.ac.nz
> > CC: greenstone-users@list.scms.waikato.ac.nz
> > Asunto: Re: [greenstone-users] Section classifiers
> >
> > Dear Katherine,
> >
> > Thank you very much for your answer. I apologize for not answering
> > your questions earlier. My first question was concerned with the
> classifiers.
> > Now, I understand that sorting of the metadata is the default behavior
> > of classifiers, and I have figured out that SectionList has -sort
> > nosort option that suppresses the sorting. In this way I am able to
> > produce a classifier which is essentially a table of contents for
> > chapters in my book. This is a satisfactory functionality to start
> > with.
> >
> > I obviously didn't research that thoroughly before my previous post.
> > I still haven't experimented with the Hierarchy classifier but I will
> > try it as soon as possible.
> >
> > However, I still don't understand what is a top level bookshelf in the
> > case of SectionList -metadata dc.Title -sort nosort. I have only one
> > document in a collection and it seems that this is somehow connected.
> > In the future there will probably be more documents in this
> > collection.
> >
> > At the moment, I am in the process of deciding what would be the best
> > way to organize metadata in order to comply with standards and ensure
> > future interoperability. That is why I decided to use PageGroups to
> > define a logical structure for chapters and sections.
> > This approach, on the other hand, causes the problem of empty
> > introductory pages for each section. Is it possible to avoid them,
> > because they break the continuity of the material?
> >
> > As I said I am pretty satisfied with a table of contents implemented
> > as a SectionList classifier but I would like to investigate the
> > possiblities of having both the table of contents and a goto box on a
> > document page. Could you give me some clues where to start with the
> > source code to achieve this functionality?
> >
> > I hope that I answered your questions, but I also asked some new
> > ones...
> >
> > Thank you very much for your patience.
> >
> > Regards,
> >
> > Vladimir
> >
> > --
> >
> > Hi Vladimir
> >
> > I would just like some clarification on what you are wanting.
> >
> > When you say table of contents, do you mean on the document page
> > itself, or as a classifier?
> >
> > In the standard greenstone interface (not sure if you have done much
> > modification or not) we have:
> >
> > classifiers: Built using AZList, Section list etc. Accessed from the
> > navigation bar. Not shown on a document page (the link is shown but
> > the content is not). Sorting is always done as this is the point of
> > classifiers, to organise the documents into some structure so that
> > they can be found easily.
> >
> > document page: for multi section documents, we have a choice of
> > navigation
> > structures: table of contents, and goto page box. Currently greenstone
> > lets you have one or the other.
> >
> > Do you have many documents in your collection, or just one?
> > Do you want a "table of contents" on the document page or as a classifier?
> >
> > If you are interested in doing coding, you may be able to get a table
> > of contents and a goto box on the same document page.
> >
> > The Hierarchy classifier can be used to produce a classifier with a
> > fixed order. You write a structure file, and assign metadata to the
> > documents based on that structure. Then that structure is used in the
> > classifier instead of sorting. This only works at the document level,
> > not at section level.
> >
> > When you are searching, the search results come back either ranked or
> > in build order. This depends a little bit on which indexer you are
> > using. MG/MGPP: If you do a "some" search then the results are ranked,
> > if you do an "all" search then the results are in build order. Lucene:
> > you have an option to sort by rank or by metadata that you have built
> > indexes on.
> >
> > Build order is the order that documents were processed during build.
> > This can be changed using -sortmeta option to import. Unfortunately,
> > whole documents are processed at once, so you can never change the
> > order of sections inside a document.
> >
> > You may want to use Lucene as your indexer. The user can then choose
> > what to sort search results by. If you want a fixed order, then you
> > can modify macros so that the sort option is not displayed, but it
> > hard coded to a specific field. To do section sorting, you may need to
> > add -sections_index_document_metadata unless_section_metadata_exists
> > option to buildcol, unless all sections have the metadata you want to
> > sort by (which they may do in your case).
> >
> > I hope you can understand all this.
> > Regards,
> > Katherine
> >
> > Vladimir R. Risojevic wrote:
> > > Dear all,
> > >
> > > I have a PagedImage collection with the following structure:
> > >
> > > <PagedDocument>
> > > <Metadata name="dc.Title">Book Title</Metadata> <PageGroup>
> > > <Metadata name="dc.Title">Chapter 1</Metadata> <Page pagenum="1"
> > > imgfile="page1.tif" txtfile="page1.txt" /> ...
> > > </PageGroup>
> > > <PageGroup>
> > > <Metadata name="dc.Title">Chapter 2</Metadata> <Page ... /> ...
> > > </PageGroup>
> > > ...
> > > </PagedDocument>
> > >
> > > I would like to have a table of contents with sections (Chapter 1,
> > > Chapter 2, etc.). To this end I built a paged document and created a
> > > classifier SectionList -metadata dc.Title which produced a list of
> > > sections sorted in some strange order (my titles are in Cyrillic
> > > script and I know that Unicode sorting is not quite right with
> > > SectionList), but there is no way to turn off sorting - I would like
> > > section titles to appear in the same order as in the item file.
> > > Moreover, there is a top bookshelf which is always expanded, labeled
> > > "Title", and clicking on it crashes the server. I tried with Latin
> > > metadata and the list is alphabetically sorted and everything else
> > > is the
> > same.
> > > Then I tried
> > > AZSectionList -metadata dc.Title
> > > There aren't many sections and a hlist is not produced. Everything
> > > is the same as before except for the top bookshelf which is missing.
> > > AZCompactSectionList -metadata dc.Title -doclevel section returns
> > > nothing for Cyrillic script and for Latin script is the same as
> > > AZSectionList except that chapters are bookshelves.
> > > Finally,
> > > GenericList -metadata dc.Title -classify_sections sorts Cyrillic
> > > metadata alphabetically. I tried to add some additional metadata and
> > > use -sort_leaf_nodes_using option but it didn't work, probably
> > > because these are not leaf nodes.
> > >
> > > When I build a hierarchical document the order of sections in the
> > > list is the same as with a paged document. However, when I remove
> > > -classify_sections from GenericList then sections are in the same
> > > order as in the item file, which is fine.
> > >
> > > I can live with a hierarchical document (although I would like to
> > > have something else, see 3. below) but I would like to know is there
> > > a way to avoid sorting the titles of sections. Well, maybe AZ*
> > > classifiers have to be sorted which is suggested by their name, but
> > > what with
> > SectionList and GenericList?
> > > Also, I don't think that I understand the difference between
> > > AZSectionList and AZCompactSectionList.
> > >
> > > 2. The documents are OCR'ed so I want to add the full text searching.
> > > When I build a search index on full text at the section level in the
> > > search results I get a list of pages which is not sorted in any way.
> > > Contrary to the above here I would like to sort the list. I tried
> > > the -sortmeta ex.Title option but that didn't help. Is there a way
> > > to sort the search results according to the page numbers?
> > >
> > > 3. For me the holy grail of the organization of this collection is
> > > to have a paged document with prev/next buttons, a goto box and a
> > > table of contents (as produced with GenericList above) which is
> > > always present, similar as in hierarchical documents. I've built a
> > > few collections with Greenstone and I don't see how this is possible
> > > with standard Greenstone. Please correct me if I'm wrong or give me
> > > some suggestions would it be possible to modify Greenstone to allow
> > > for this, and if the answer is positive give me some pointers where
> > > to look in
> > the source code because I would like to try to do it.
> > >
> > > I apologize for this extremely long post but I would like to get
> > > some things straight, and to achieve some functionlity for the
> > > collections I'm
> > building.
> > >
> > > Thank you very much in advance.
> > >
> > > Best regards,
> > >
> > > Vladimir Risojevic
> > >
> > >
> > >
> > >
> > >
>
> ----------------------------------------------------------------------------
> ----
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.176 / Virus Database: 270.10.7/1894 - Release Date: 14.1.2009
> 19:27