[greenstone-users] Section classifiers

From Vladimir R. Risojevic
DateSun Jan 4 10:20:11 2009
Subject [greenstone-users] Section classifiers
Dear all,

I have a PagedImage collection with the following structure:

<PagedDocument>
<Metadata name="dc.Title">Book Title</Metadata>
<PageGroup>
<Metadata name="dc.Title">Chapter 1</Metadata>
<Page pagenum="1" imgfile="page1.tif" txtfile="page1.txt" />
...
</PageGroup>
<PageGroup>
<Metadata name="dc.Title">Chapter 2</Metadata>
<Page ... />
...
</PageGroup>
...
</PagedDocument>

I would like to have a table of contents with sections (Chapter 1, Chapter 2,
etc.). To this end I built a paged document and created a classifier
SectionList -metadata dc.Title
which produced a list of sections sorted in some strange order (my titles are
in Cyrillic script and I know that Unicode sorting is not quite right with
SectionList), but there is no way to turn off sorting - I would like section
titles to appear in the same order as in the item file. Moreover, there is a
top bookshelf which is always expanded, labeled "Title", and clicking on it
crashes the server. I tried with Latin metadata and the list is alphabetically
sorted and everything else is the same.
Then I tried
AZSectionList -metadata dc.Title
There aren't many sections and a hlist is not produced. Everything is the same
as before except for the top bookshelf which is missing.
AZCompactSectionList -metadata dc.Title -doclevel section
returns nothing for Cyrillic script and for Latin script is the same as
AZSectionList except that chapters are bookshelves.
Finally,
GenericList -metadata dc.Title -classify_sections
sorts Cyrillic metadata alphabetically. I tried to add some additional
metadata and use -sort_leaf_nodes_using option but it didn't work, probably
because these are not leaf nodes.

When I build a hierarchical document the order of sections in the list is the
same as with a paged document. However, when I remove -classify_sections from
GenericList then sections are in the same order as in the item file, which is
fine.

I can live with a hierarchical document (although I would like to have
something else, see 3. below) but I would like to know is there a way to avoid
sorting the titles of sections. Well, maybe AZ* classifiers have to be sorted
which is suggested by their name, but what with SectionList and GenericList?
Also, I don't think that I understand the difference between AZSectionList and
AZCompactSectionList.

2. The documents are OCR'ed so I want to add the full text searching. When I
build a search index on full text at the section level in the search results I
get a list of pages which is not sorted in any way. Contrary to the above here
I would like to sort the list. I tried the -sortmeta ex.Title option but that
didn't help. Is there a way to sort the search results according to the page
numbers?

3. For me the holy grail of the organization of this collection is to have a
paged document with prev/next buttons, a goto box and a table of contents (as
produced with GenericList above) which is always present, similar as in
hierarchical documents. I've built a few collections with Greenstone and I
don't see how this is possible with standard Greenstone. Please correct me if
I'm wrong or give me some suggestions would it be possible to modify
Greenstone to allow for this, and if the answer is positive give me some
pointers where to look in the source code because I would like to try to do it.

I apologize for this extremely long post but I would like to get some things
straight, and to achieve some functionlity for the collections I'm building.

Thank you very much in advance.

Best regards,

Vladimir Risojevic