Dear Greenstone Team,
after many tests for PDF files and thier metadata (WITH latin accents)
in beta 2.84,
(remember my plails)
I use the final 2.84 for my pdf files with accents in metadata.
I find bugs.
I use a pdf file : joint in this mail
I test the standard pdf metadata fields : Author, Keywords, Subject, Title.
With or without pdfbox.
ex.Encoding is UTF8
ex.NumPages = 0 My opinion : BAD value
ex.PDF.PageCount = 15 My opinion : OK
ex.Title = accents and data OK
ex.Author = ABSENT
ex.Subject = ABSENT
ex.Keywords = ABSENT
ex.PDF.Title = BAD accent codage
ex.PDF.Author = BAD accent codage
ex.PDF.Subject = BAD accent codage
ex.PDF.Keywords = BAD accent codage AND BAD DATA : split by "space" (why
the plug-in split ??????)
ex.XMP.Title = BAD accent codage
ex.PDF.Author = ABSENT
ex.PDF.Description = accents and data OK (Subject fild in pdf)
ex.PDF.Keywords = BAD accent codage AND BAD DATA : split by "space" (why
the plug-in split ??????)
For us, it is a big problem !!!!!
Regards
--
Pier Luigi ROSSI
IRD
32, avenue Henri Varagnat
93140 Bondy
France
Tel : 33 (0)1 48 02 56 96
Fax : 33 (0)1 48 47 30 88
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 01947.pdf
Type: application/pdf
Size: 266701 bytes
Desc: not available
Url : https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20110413/b8c0dc86/01947-0001.pdf |