[greenstone-users] Load fulltext from separate file

From Katherine Don
DateWed Mar 25 11:42:09 2009
Subject [greenstone-users] Load fulltext from separate file
In-Reply-To (49B16F73-2010902-touro-edu)

No, not exactly. But it depends on what you want to achieve whether you
can make it work.
Do you want the text extracted from the PDF as well as text from a
separate file? If not, you could process the text file as the primary
document, then associate the PDF with it. You can then choose to display
the PDF or the text to the user, but the PDF won't be processed in any
way. ie no meta or text extraction. This can be done without coding.

If you want text from two files (PDF and text) to be the content of a
document then you would need to modify a plugin - uses Perl code.

Alternatively, you could save the text as metadata instead, e.g. have a
"fulltext" metadata element. This would get associated with the PDF
document, and will be available for search/display depending on how you
set up indexes and format statements.
Instead of text files, you should have the files in our metadata.xml format.

I can help with this if you like. If you are not sure if these
approaches will achieve what you want, then please describe what you are
trying to achieve in more detail and I might be able to suggest which
one is better.


Yitzchak Schaffer wrote:
> Greetings:
> We have a collection of PDFs we'd like to ingest. The PDFs do have
> text embedded, but we would like to specify other files to load from.
> CONTENTdm includes the option of doing this by specifying the filename
> for fulltext in the metadata CSV file. Does Greenstone have this
> functionality available out of the box?
> Many thanks,