[greenstone-users] RE: BibTex and PDF

From Yao, Jixian (Research)
DateWed, 3 Dec 2003 17:58:12 -0500
Subject [greenstone-users] RE: BibTex and PDF
Katherine,
 
Thanks for your suggestions. I like the idea "A" better simply because the pdfs can be made searchable that way and the bibtex records replace the text converted from the pdf to save disk space, better than I thought.
 
One thing not clear though, suppose I converted my bibtex and annotations to metafile, how to relate the .xml to the pdf when importing and building? Thanks.
 
Jason
 
-----Original Message-----
From: Katherine Don [mailto:kjdon@cs.waikato.ac.nz]
Sent: Wednesday, December 03, 2003 4:59 PM
To: Yao, Jixian (Research); rfergu@music.mcgill.ca
Cc: 'greenstone-users@list.scms.waikato.ac.nz'
Subject: Re: BibTex and PDF

Hi Jason and Robert

You are both trying to do similar things, so I'll answer you together.

I am not sure if anyone has done something like this before - judging from the lack of response to Robert's email, I guess not.

Greenstone is not good at associating documents together, so you won't be able to do what you want without a bit of extra work.
There are several solutions depending on what you want to achieve.

1. Treat the bibtex entry, the pdf file and the annotation all as separate documents. They will all be searchable, and you may end up with eg a bibtex entry and its pdf file both in the results list. If you add metadata to each item (by editing the bibtex file and writing metadata.xml file for the pdf) you can then link from one to the other. Eg add a bibtexlink as metadata to the pdf, so when you display the pdf you can have a link to the bibtex entry, with href='[bibtexlink]'.
You will need to create this metadata, and create your own format statements to use it.

2. Combine all the information about one file into a single greenstone document. This way there is only one item in a search list or browse list per (pdf/bibtex/annotation) combination. What you need to end up with at the end of importing is a greenstone archive document with the text of the pdf as the content, and other bits as metadata, such as the link to the original pdf if you want that available, all the bibtex fields etc.
What metadata you will need depends on what you want to be able to search on and display. If you only want to display the entire bibtex record, and do no searching on the fields, you could add the record as a single metadata element. If you added each field as a separate metadata element, then you can do searching/browsing by any of the fields.

There are two ways I can think of to achieve this.
A. Convert the bibtex records (and annotations) into metadata.xml files, relating the appropriate data to each pdf document.
Then use only  the pdfs and the metadata files in the collection.
B. Write a plugin that somehow joins all the bits together. Eg you could modify the bibtex plugin to look for the pdf document and do the conversion to html and add that as content. Or modify the pdf plugin to look for a bibtex entry and add that as metadata.
There will need to be somewhere something that matches the pdf file to the appropriate record in the bibtex file.

So there you go. Lots of ideas - I hope you can come up with something that suits your needs.
Regards,
Katherine Don
 

"Yao, Jixian (Research)" wrote:

Hi,

I am trying to build my lab's digital library with mainly pdf files.
Most pdf files don't have the fields that we want to search, i.e. Abstracts,
Notes, etc. We also have bibtex info for these pdf files. The bibtex file
has a lot more information about the pdf file. I'd like to build a
collection that combines both pdf and biblio info (the pdf alone is
searchable), so that when a user search, say, a title, it'll display the
biblio record, and also has a link pointing to the pdf file.

Greenstone seems to treat .bib and .pdf separately unrelated, and even I add
an URL in bibtex file, it would not display as a link but a plain text (not
clickable).

Any suggestions would be appreciated. (I am new to Greenstone)

Thank,

Jason

rfergu@music.mcgill.ca wrote:
  Dear List,

  I have several pdfs of papers, their bibtex entries, and some annotations I wrote.  I would like to
  keep these all as a greenstone database (so I can search the files and view them as html).  I have
  seen people periodically post similar interest on this list.

  I would prefer to not reinvent the wheel.  Has anyone done this, and mind showing me your
  example?

  Kind Regards,
  Robert Ferguson