Re: [greenstone-users] PDF plugin and number of pages

From Michael Dewsnip
DateFri, 24 Sep 2004 09:26:59 +1200
Subject Re: [greenstone-users] PDF plugin and number of pages
In-Reply-To (4152DB25-4040602-unesco-org-uy)
Hi Eduardo,

> > In terms of the "number of pages" metadata, luckily this isn't too difficult to add. The pdftohtml program that Greenstone uses creates anchor tags in the HTML output (<a name=1>, <a name=2> etc.) at the start of each page. It is fairly simple to look for these tags and count them to
> > determine the number of pages. I've added this code near the end of gsdl/perllib/plugins/PDFPlug.pm:
> [...]
>
> It works! Thanks.

Great!

> > If you want to add the file size metadata yourself you'll need to determine the size of the original file, then call add_utf8_metadata as I've done above.
>
> $filestat = stat($filename);
> $doc_obj->add_metadata($doc_obj->get_top_section(), "FileSize", $filestat[7]);
>
> I added the code above in sub read, BasPlug.pm, right after:
>
> $doc_obj->add_utf8_metadata($doc_obj->get_top_section(), "Plugin", "$self->{'plugin_type'}");
>
> But it doesn't work. What am I missing? I thought of BasPlug because I would like to have the FileSize element in all files, not just PDFs.

You're right that this should go into BasPlug. Unfortunately the "inheritance structure" of the Greenstone plugins isn't too pure, so you need to put it in other places as well. PDFPlug inherits from ConvertToPlug, which overrides BasPlug's read function, so you'll need to add the code
into there too.

Also, stat() returns an array, so you need

@filestat = stat($filename);

Actually, I've just been looking in a Perl book regarding the stat function and it notes that if you are only interested in the size of a file, you can use the -s file test operator. So the code becomes

$doc_obj->add_metadata($doc_obj->get_top_section(), "FileSize", (-s $filename));

All the best,

Michael