[greenstone-users] Re: New bug in List classifier....

From Diego Spano
DateThu Jun 16 04:43:37 2011
Subject [greenstone-users] Re: New bug in List classifier....
In-Reply-To (4DF7285B-5010100-cs-waikato-ac-nz)
Hi Anu, PDFPlugin with "pdfbox_conversion" now seems to be extracting
section OK!!! Thanks!

You also said:

" I will look into the List classifier shortly. One question to do
with it: is this something that was working in List in GS 2.83 and has
now stopped working (is it a bug?), or is the allvalues option simply
absent in List (never been implemented for List)?"

Allvalues option is absent in 2.83 too. The classifier works the same
in both version!.

Regards

Diego

Diego Spano
Prodigio Consultores
Capital Federal - Argentina
Tel: (54 11) 5093-5313
http://ar.linkedin.com/in/diegospano
www.prodigioconsultores.com

On Tue, Jun 14, 2011 at 6:22 AM, Greenstone Team
<greenstone_team@cs.waikato.ac.nz> wrote:
> Hi Diego,
>
>> Hi Anu, how are you?.
>
> I'm very good thanks. How are you keeping?
>
> First, I have some good news! Remember how you wanted PDFPlugin's
> "use_sections" to work with PDFBox? I think I've got it to do so now. I
> think I have committed everything related to it. To try it out:
>
> 1. Quite GLI and the Greenstone server if running
>
> 2. Move your old GS284/perllib/plugins/PDFPlugin.pm out of your Greenstone
> installation and replace with the updated version from
> http://trac.greenstone.org/browser/main/trunk/greenstone2/perllib/plugins/PDFPlugin.pm?format=txt
>
> 3. Rename (or move out of the way) your pdfbox extension in your
> Greenstone2.84/ext folder.
>
> 4. Finally, grab the latest version of the Greenstone PDFBox extension by
> visiting http://trac.greenstone.org/browser/gs2-extensions/pdf-box/trunk and
> choosing the zip or tar.gz as best suits you. Decompress the downloaded
> archived file into your Greenstone2.84/ext folder.
>
>
> I will look into the List classifier shortly. One question to do with it: is
> this something that was working in List in GS 2.83 and has now stopped
> working (is it a bug?), or is the allvalues option simply absent in List
> (never been implemented for List)?
>
> Tell me if the PDFPlugin's use_sections option now works PDFBox when you get
> the chance to try it out.
> I'll keep you informed about any developments regarding List.pm (hopefully
> I'll be able to work out how to solve it).
> See you,
> Anu
>
> Diego Spano wrote:
>>
>> Hi Anu, how are you?.
>>
>> I □noticed one thing with List classifier. You can specify more than
>> one metadata to use separated with ";" but the case is that the
>> classifier will only use the first one!!!!.
>>
>> In hierarchy classifier and AZList you have the "allvalues" option
>> that enables you to use all metadata values found, but List seems to
>> be different. This is not the rigth way to work!.
>>
>> Could you take a look?
>>
>> Cheers
>>
>> Diego
>>
>> Diego Spano
>> Prodigio Consultores
>> Capital Federal - Argentina
>> Tel: (54 11) 5093-5313
>> http://ar.linkedin.com/in/diegospano
>> www.prodigioconsultores.com
>>
>>
>>
>> On Wed, May 11, 2011 at 2:25 AM, □<ak19@cs.waikato.ac.nz> wrote:
>>
>>>
>>> Sorry for the multiple e-mails Diego,
>>>
>>> I just noticed that my comments to the changes I've just committed to
>>> List.pm in order to fix the bug you described in your e-mail (see your
>>> e-mail below) are not very clear on how the changes tried to solve the
>>> problem.
>>>
>>> In case you are interested:
>>> The code was gathering all the documents under dc.Creator and then it was
>>> gathering all those under a subclassifier for that dc.Creator. In your
>>> example, this means it was now gathering all those documents for which
>>> dc.Type was specified for that creator. However, the code was throwing
>>> away all those documents under that dc.Creator for which no dc.Type was
>>> specified. This is all just as you predicted was happening.
>>>
>>> The solution was to simply preserve which documents had no
>>> subclassification metadata (like dc.Type in the example) specified for
>>> the
>>> higher level metadata (e.g. dc.Creator) and then put those documents
>>> under
>>> the dc.Creator, even when there is no dc.Type for them.
>>>
>>> It just took a while to work out at what point the docs were being lost
>>> and so at which point I had to make sure they were preserved and added as
>>> children to the higher level classification metadata (like dc.Creator in
>>> your situation).
>>>
>>>
>>> One question. Assuming the committed changes fixed the dc.Creator/dc.Type
>>> problem, is the unicode issue that you found, where non-English
>>> characters
>>> in List bookshelve names were no longer displaying correctly, the sole
>>> outstanding issue to do with List or is there more?
>>>
>>> Will write back when I have looked further into that.
>>> Regards,
>>> Anu
>>>
>>>
>>>
>>>>
>>>> Anu, this is the first email I sent to greenstone team.....
>>>> Cheers
>>>> Diego
>>>>
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Diego Spano <diegospano@gmail.com>
>>>> Date: Tue, Apr 26, 2011 at 5:01 PM
>>>> Subject: Bug in List classifier
>>>> To: greenstone_team@cs.waikato.ac.nz
>>>>
>>>>
>>>> Hi you all..
>>>>
>>>> I defined a classifier like this:
>>>>
>>>> classify □ □ □ □List -metadata dc.Creator/dc.Type -bookshelf_type always
>>>> -partition_type_within_level none
>>>>
>>>> This should create a bookshelf for every Creator (even when it has
>>>> only one item) and for those documents that have a Type value, it
>>>> should create a hierarchy like
>>>>
>>>> Creator --> Type --> Document.
>>>>
>>>> The case is that it only creates bookshelves for those items that have
>>>> both metadata: Creator and Type. See the attached pdf. There you can
>>>> see that "Baccarin Ricardo" should be a bookshelf with documents
>>>> inside (he has only one document but I specified "-bookshelf_type
>>>> always"). In the other side "BOLSA DE CEREALES DE BUENOS AIRES.
>>>> DEPARTAMENTO DE ESTIMACIONES
>>>> Y PROYECCIONES AGRCOLAS" is the creator of the document that also has
>>>> dc.Type metadata. In this case, it works ok.
>>>>
>>>> Look that as the email I sent last week, List classifier seems to be
>>>> wrong handling utf8 values. Where you read "PROYECCIONES AGRCOLAS"
>>>> should say "PROYECCIONES AGR□COLAS".
>>>>
>>>> Is it a bug or am I doing something wrong?
>>>>
>>>> I tested it with 2.83 and 2.84...
>>>>
>>>>
>>>> Regards!!!
>>>>
>>>> Diego
>>>>
>>>>
>>>
>>>
>
>