[greenstone-users] Pdf metadata, accents and Greenstone 2.84

From Pier Luigi ROSSI
DateThu Apr 14 00:28:22 2011
Subject [greenstone-users] Pdf metadata, accents and Greenstone 2.84
In-Reply-To (20110401083013-7ADDBA9A257-fx404-security-mail-net)
Dear Greenstone Team,

after many tests for PDF files and thier metadata (WITH latin accents)
in beta 2.84,
(remember my plails)
I use the final 2.84 for my pdf files with accents in metadata.
I find bugs.

I use a pdf file : joint in this mail

I test the standard pdf metadata fields : Author, Keywords, Subject, Title.
With or without pdfbox.

ex.Encoding is UTF8

ex.NumPages = 0 My opinion : BAD value
ex.PDF.PageCount = 15 My opinion : OK

ex.Title = accents and data OK
ex.Author = ABSENT
ex.Subject = ABSENT
ex.Keywords = ABSENT

ex.PDF.Title = BAD accent codage
ex.PDF.Author = BAD accent codage
ex.PDF.Subject = BAD accent codage
ex.PDF.Keywords = BAD accent codage AND BAD DATA : split by "space" (why
the plug-in split ??????)

ex.XMP.Title = BAD accent codage
ex.PDF.Author = ABSENT
ex.PDF.Description = accents and data OK (Subject fild in pdf)
ex.PDF.Keywords = BAD accent codage AND BAD DATA : split by "space" (why
the plug-in split ??????)


For us, it is a big problem !!!!!

Regards

--
Pier Luigi ROSSI
IRD
32, avenue Henri Varagnat
93140 Bondy
France

Tel : 33 (0)1 48 02 56 96
Fax : 33 (0)1 48 47 30 88

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 01947.pdf
Type: application/pdf
Size: 266701 bytes
Desc: not available
Url : https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20110413/b8c0dc86/01947-0001.pdf