[greenstone-users] Section classifiers

From Katherine Don
DateMon Mar 23 14:27:36 2009
Subject [greenstone-users] Section classifiers
In-Reply-To (20090116180601-M70698-etfbl-net)
Hi Vladimir

Books from the past is obviously quite customized. A quick look (at only
two books, I could be wrong) suggests that the books are linear lists of
sections, and that is why toc and goto page both work.

I think that toc is output by vlistbrowserclass and the goto box by
pagedbrowserclass. The different class is used depending on what type of
document it is.

I think a goto page thing might be quite difficult for a hierarchical
doc. If the user types page 4 for example, then it needs to be
translated to the correct section id, eg 1.4 or 2.1, 2.2 etc. How to
tell which section it should be?? You would need to write code for that.

Do you want the structure of the doc to be visible to the user? Even if
the item file has the structure in, you could modify the plugin to save
the doc as a linear list of sections, then goto box will work. but the
user won't see the nice hierarchy.

For hiding empty sections:
OID.fc can be used to get the first child of a doc/section. How about
modifying the format statement (DocumentVList) so that the link to the
document goes to the first child if its an internal node.

Instead of [link][icon][/link] in DocumentVList, try eg
{If}{[PageNum],[link],<a href='/cgi-bin/library?e=d-00000-00---off-0gsarch--00-0----0-10-0---0---0direct-10---4-----dfr--0-1l--11-en-50---20-about-Alexander+Dobreff--00-0-21-00-0-0-11----0-0-&a=d&d=[DocOID].fc'>}[icon][/link]
This mucks up the link for the top icon (closing the document and going
back to the classifier). So a better way would be to have a metadata
element in the item file for the nodes you want to hide ([Invisible] for
example), and use {If}{[Invisible],...fc,[link]} instead.

Hope this helps a little bit,

Regards,
Katherine

Vladimir R. Risojevic wrote:
> Diego,
>
> I'm glad that you liked the site. Thank you.
>
> I also think that Books from the Past is not native Greenstone, but I like it
> very much. If someone from the Greenstone team could give me some ideas where
> to start I could try to do some modifications to the source code to enable
> this feature as well as showing both the table of contents and the goto box,
> which I already mentioned in my first post.
>
> Thank you very much for your comments.
>
> Regards,
>
> Vladimir
>
>
> On Fri, 16 Jan 2009 13:33:45 -0200, Diego Spano wrote
>
>> Hi Vladimir,
>>
>> Very nice site!!!.
>>
>> I think that bookfromthepast is not Greenstone in its native
>> version. I think that people who do the job modified the source code
>> to get that behaviour. The case is that when you enter a section,
>> you get a list of pages, so GS will not show anything until you
>> select one of those pages. The key point I think is to give the
>> instruction to show the first page of the section automatically!.
>> But I don?t know how....
>>
>> Diego
>>
>> -----Mensaje original-----
>> De: Vladimir R. Risojevic [mailto:vlado@etfbl.net]
>> Enviado el: viernes, 16 de enero de 2009 13:05
>> Para: dspano@orsna.gov.ar
>> CC: greenstone-users@list.scms.waikato.ac.nz
>> Asunto: Re: [greenstone-users] Section classifiers
>>
>> Diego,
>>
>> Thanks for the tip, I haven't thought about it. I'll try to use it
>> to detect empty pages and skip them. At the moment, I have something
>> similar to your collection, see screenshots, or
>> http://www.nubrs.rs.ba/dl/ and I'm pretty satisfied with that. What
>> I would like to do is to skip pages like the one on screen2.jpg,
>> which is the header page of a section I presume. But I'm not sure
>> how to do it. Something like that is done on
>> http://www.booksfromthepast.org, which is also based on Greenstone
>> as stated on http://www.booksfromthepast.org/Aboutus.asp?l=en,
>> although they obviously modified Greenstone. Any ideas how that
>> could be done?
>>
>> Regards,
>>
>> Vladimir
>>
>> ----- Original Message -----
>> From: "Diego Spano" <dspano@orsna.gov.ar>
>> To: "'Vladimir R. Risojevic'" <vlado@etfbl.net>;
>> <greenstone-users@list.scms.waikato.ac.nz>
>> Sent: Thursday, January 15, 2009 1:40 PM
>> Subject: RE: [greenstone-users] Section classifiers
>>
>> Vladimir,
>>
>> There is another way. If you take a look at doc.xml file, you can
>> find that empty pages and "normal" pages differs in one thing: the
>> first ones doesn?t have [Image] metadata. So you can manage that
>> problem with a format statement like this:
>>
>> {If}{[Image],html_code_to_show_the_image}
>>
>> I have a collection built with pagedimgplug too and Pagegroups. I?m sending
>> 3 screenshots just to show you how I get the section list and the
>> image. I?m displaying tiff images with an embed plugin.
>>
>> Regards.
>>
>> Diego
>>
>> -----Mensaje original-----
>> De: greenstone-users-bounces@list.scms.waikato.ac.nz
>> [mailto:greenstone-users-bounces@list.scms.waikato.ac.nz] En nombre
>> de Vladimir R. Risojevic Enviado el: mi?rcoles, 14 de enero de 2009 20:29
>> Para: greenstone-users@list.scms.waikato.ac.nz
>> Asunto: RE: [greenstone-users] Section classifiers
>>
>> Diego,
>>
>> Thank you for your post. Unfortunately, I don't have the "-
>> headerpage" option set. I can avoid empty pages if I don't use the
>> PageGroup, but instead assign metadata with section titles to the
>> first pages of sections. But that doesn't feel right because these
>> item files don't represent the structure of the document anymore.
>>
>> Regards,
>>
>> Vladimir
>>
>> On Wed, 14 Jan 2009 10:12:29 -0200, Diego Spano wrote
>>
>>> Vladimir,
>>>
>>> Just want to give some help to one of your problems. You said that
>>> "the problem of empty introductory pages for each section. Is it
>>> possible to avoid them, because they break the continuity of the
>>>
>> material".
>>
>>> Do you have the option "-headerpage" set in PagedImgPlug?. If you have
>>> it, remove it and rebuild the collection. This way there will be no
>>> more blank pages!.
>>>
>>> Hope this helps..
>>>
>>> Diego
>>>
>>> -----Mensaje original-----
>>> De: greenstone-users-bounces@list.scms.waikato.ac.nz
>>> [mailto:greenstone-users-bounces@list.scms.waikato.ac.nz] En nombre de
>>> Vladimir R. Risojevic Enviado el: mi?rcoles, 14 de enero de 2009 6:28
>>> Para: kjdon@cs.waikato.ac.nz
>>> CC: greenstone-users@list.scms.waikato.ac.nz
>>> Asunto: Re: [greenstone-users] Section classifiers
>>>
>>> Dear Katherine,
>>>
>>> Thank you very much for your answer. I apologize for not answering
>>> your questions earlier. My first question was concerned with the
>>>
>> classifiers.
>>
>>> Now, I understand that sorting of the metadata is the default behavior
>>> of classifiers, and I have figured out that SectionList has -sort
>>> nosort option that suppresses the sorting. In this way I am able to
>>> produce a classifier which is essentially a table of contents for
>>> chapters in my book. This is a satisfactory functionality to start
>>> with.
>>>
>>> I obviously didn't research that thoroughly before my previous post.
>>> I still haven't experimented with the Hierarchy classifier but I will
>>> try it as soon as possible.
>>>
>>> However, I still don't understand what is a top level bookshelf in the
>>> case of SectionList -metadata dc.Title -sort nosort. I have only one
>>> document in a collection and it seems that this is somehow connected.
>>> In the future there will probably be more documents in this
>>> collection.
>>>
>>> At the moment, I am in the process of deciding what would be the best
>>> way to organize metadata in order to comply with standards and ensure
>>> future interoperability. That is why I decided to use PageGroups to
>>> define a logical structure for chapters and sections.
>>> This approach, on the other hand, causes the problem of empty
>>> introductory pages for each section. Is it possible to avoid them,
>>> because they break the continuity of the material?
>>>
>>> As I said I am pretty satisfied with a table of contents implemented
>>> as a SectionList classifier but I would like to investigate the
>>> possiblities of having both the table of contents and a goto box on a
>>> document page. Could you give me some clues where to start with the
>>> source code to achieve this functionality?
>>>
>>> I hope that I answered your questions, but I also asked some new
>>> ones...
>>>
>>> Thank you very much for your patience.
>>>
>>> Regards,
>>>
>>> Vladimir
>>>
>>> --
>>>
>>> Hi Vladimir
>>>
>>> I would just like some clarification on what you are wanting.
>>>
>>> When you say table of contents, do you mean on the document page
>>> itself, or as a classifier?
>>>
>>> In the standard greenstone interface (not sure if you have done much
>>> modification or not) we have:
>>>
>>> classifiers: Built using AZList, Section list etc. Accessed from the
>>> navigation bar. Not shown on a document page (the link is shown but
>>> the content is not). Sorting is always done as this is the point of
>>> classifiers, to organise the documents into some structure so that
>>> they can be found easily.
>>>
>>> document page: for multi section documents, we have a choice of
>>> navigation
>>> structures: table of contents, and goto page box. Currently greenstone
>>> lets you have one or the other.
>>>
>>> Do you have many documents in your collection, or just one?
>>> Do you want a "table of contents" on the document page or as a classifier?
>>>
>>> If you are interested in doing coding, you may be able to get a table
>>> of contents and a goto box on the same document page.
>>>
>>> The Hierarchy classifier can be used to produce a classifier with a
>>> fixed order. You write a structure file, and assign metadata to the
>>> documents based on that structure. Then that structure is used in the
>>> classifier instead of sorting. This only works at the document level,
>>> not at section level.
>>>
>>> When you are searching, the search results come back either ranked or
>>> in build order. This depends a little bit on which indexer you are
>>> using. MG/MGPP: If you do a "some" search then the results are ranked,
>>> if you do an "all" search then the results are in build order. Lucene:
>>> you have an option to sort by rank or by metadata that you have built
>>> indexes on.
>>>
>>> Build order is the order that documents were processed during build.
>>> This can be changed using -sortmeta option to import. Unfortunately,
>>> whole documents are processed at once, so you can never change the
>>> order of sections inside a document.
>>>
>>> You may want to use Lucene as your indexer. The user can then choose
>>> what to sort search results by. If you want a fixed order, then you
>>> can modify macros so that the sort option is not displayed, but it
>>> hard coded to a specific field. To do section sorting, you may need to
>>> add -sections_index_document_metadata unless_section_metadata_exists
>>> option to buildcol, unless all sections have the metadata you want to
>>> sort by (which they may do in your case).
>>>
>>> I hope you can understand all this.
>>> Regards,
>>> Katherine
>>>
>>> Vladimir R. Risojevic wrote:
>>>
>>>> Dear all,
>>>>
>>>> I have a PagedImage collection with the following structure:
>>>>
>>>> <PagedDocument>
>>>> <Metadata name="dc.Title">Book Title</Metadata> <PageGroup>
>>>> <Metadata name="dc.Title">Chapter 1</Metadata> <Page pagenum="1"
>>>> imgfile="page1.tif" txtfile="page1.txt" /> ...
>>>> </PageGroup>
>>>> <PageGroup>
>>>> <Metadata name="dc.Title">Chapter 2</Metadata> <Page ... /> ...
>>>> </PageGroup>
>>>> ...
>>>> </PagedDocument>
>>>>
>>>> I would like to have a table of contents with sections (Chapter 1,
>>>> Chapter 2, etc.). To this end I built a paged document and created a
>>>> classifier SectionList -metadata dc.Title which produced a list of
>>>> sections sorted in some strange order (my titles are in Cyrillic
>>>> script and I know that Unicode sorting is not quite right with
>>>> SectionList), but there is no way to turn off sorting - I would like
>>>> section titles to appear in the same order as in the item file.
>>>> Moreover, there is a top bookshelf which is always expanded, labeled
>>>> "Title", and clicking on it crashes the server. I tried with Latin
>>>> metadata and the list is alphabetically sorted and everything else
>>>> is the
>>>>
>>> same.
>>>
>>>> Then I tried
>>>> AZSectionList -metadata dc.Title
>>>> There aren't many sections and a hlist is not produced. Everything
>>>> is the same as before except for the top bookshelf which is missing.
>>>> AZCompactSectionList -metadata dc.Title -doclevel section returns
>>>> nothing for Cyrillic script and for Latin script is the same as
>>>> AZSectionList except that chapters are bookshelves.
>>>> Finally,
>>>> GenericList -metadata dc.Title -classify_sections sorts Cyrillic
>>>> metadata alphabetically. I tried to add some additional metadata and
>>>> use -sort_leaf_nodes_using option but it didn't work, probably
>>>> because these are not leaf nodes.
>>>>
>>>> When I build a hierarchical document the order of sections in the
>>>> list is the same as with a paged document. However, when I remove
>>>> -classify_sections from GenericList then sections are in the same
>>>> order as in the item file, which is fine.
>>>>
>>>> I can live with a hierarchical document (although I would like to
>>>> have something else, see 3. below) but I would like to know is there
>>>> a way to avoid sorting the titles of sections. Well, maybe AZ*
>>>> classifiers have to be sorted which is suggested by their name, but
>>>> what with
>>>>
>>> SectionList and GenericList?
>>>
>>>> Also, I don't think that I understand the difference between
>>>> AZSectionList and AZCompactSectionList.
>>>>
>>>> 2. The documents are OCR'ed so I want to add the full text searching.
>>>> When I build a search index on full text at the section level in the
>>>> search results I get a list of pages which is not sorted in any way.
>>>> Contrary to the above here I would like to sort the list. I tried
>>>> the -sortmeta ex.Title option but that didn't help. Is there a way
>>>> to sort the search results according to the page numbers?
>>>>
>>>> 3. For me the holy grail of the organization of this collection is
>>>> to have a paged document with prev/next buttons, a goto box and a
>>>> table of contents (as produced with GenericList above) which is
>>>> always present, similar as in hierarchical documents. I've built a
>>>> few collections with Greenstone and I don't see how this is possible
>>>> with standard Greenstone. Please correct me if I'm wrong or give me
>>>> some suggestions would it be possible to modify Greenstone to allow
>>>> for this, and if the answer is positive give me some pointers where
>>>> to look in
>>>>
>>> the source code because I would like to try to do it.
>>>
>>>> I apologize for this extremely long post but I would like to get
>>>> some things straight, and to achieve some functionlity for the
>>>> collections I'm
>>>>
>>> building.
>>>
>>>> Thank you very much in advance.
>>>>
>>>> Best regards,
>>>>
>>>> Vladimir Risojevic
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>> ----------------------------------------------------------------------------
>> ----
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com
>> Version: 8.0.176 / Virus Database: 270.10.7/1894 - Release Date: 14.1.2009
>> 19:27
>>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>