Re: [greenstone-users] PDF full text file and additional HTML abstract(metadata) file

From Marcin Malik
DateMon, 4 Jun 2007 12:16:17 +0200
Subject Re: [greenstone-users] PDF full text file and additional HTML abstract(metadata) file
In-Reply-To (c0a9e0f40706010248o345973afmc1e35fab350ff9a-mail-gmail-com)
Hello Xiao
 
 
Thank you for your help, using sent by you format statement solved the problem, all works fine. And yes we are using mgpp indexer.
 
 
Once again thank you very much.
 
 
Best regards
 
Marcin Malik, University of Podlasie Main Library, Poland
 
 
----- Original Message -----
From: xiao
Sent: Friday, June 01, 2007 11:48 AM
Subject: Re: [greenstone-users] PDF full text file and additional HTML abstract(metadata) file



On 5/31/07, Marcin Malik <maliczek1@tlen.pl> wrote:

 

I'm preparing Greenstone database of PHD thesis from our university. Full text of thesis are in PDF format and I use Unknown Plug because I don't want to convert files to HTML. But I would like to have apart from Full text PDF file also displayed additional  HTML file with just bibliographic data, abstract and author's keywords all those displayed from metadata. I can create that file and it is shown besides Full Text PDF file, only problem is that in HTML file which displays abstract and keywords from metadata  I have "this document has no text" text that I don't want to see and I don't know how to hide it. I know that this text shows up because I don't use PDF plug but i can't hide this text because command {If}{[Text] ne 'This document has no text. ',[Text]}in the DocumentText format statement doesn't work at all.


 

And this is an example of  those HTML(metadata) files:

 

------------------------------------------------------------------------------------------------------------

This document has no text.

TITLE: Funkcje osoki aloesowatej (Stratiotes aloides L.) w ekosystemie starorzecza Bugu : rozprawa doktorska / Małgorzata Strzałek ; Akademia Podlaska. Wydział Rolniczy. - 2006. - Promotor: dr hab. Lech Kufel

KYEWORDS: Bug (rzeka); ekosystem

 

 

Best Regards

 

Marcin Malik, University of Podlasie, Poland


_______________________________________________
greenstone-users mailing list
greenstone-users@list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users


Hello Marcin,

The reason why the statement {If}{[Text] ne 'This document has no text. ',[Text]} doesn't work is that, the content of [Text] does not only contain the words inside the quote. If you view the source of the web page which displays 'This document has no text. ', you'll see there are tags around the sentence: <Doc> ... </Doc>. You built your collection using the mgpp indexer (right?). MGPP indexer builds indexes based on <Doc> blocks, while mg does not. The statement in question works with mg, but not with mgpp. One way to get around the problem is to condition the statement on a real piece of metadata. Specific to your situation, since you used the unknown plugin to process the pdf files, the following can be used:
{If}{[Plugin] ne 'UnknownPlug',[Text]}

Best
xiao

--
Greenstone Digital Library
New Zealand