Re: Extracting HTML meta data

From Stefan Boddie
DateWed, 25 Sep 2002 16:26:26 +1200
Subject Re: Extracting HTML meta data
In-Reply-To (B9B68F24-112B%eric-morgan-infomotions-com)
>
> I have two questions, and the first one is about extracting HTML metadata.
>
> I have a set of .shtml files that contain HTML meta tags looking like
this:
>
> <META NAME="creator" CONTENT="Morgan, Eric Lease">
> <META NAME="title" CONTENT="Review of some ebook technology">
> <META NAME="abstract" CONTENT="This column describes my experience...">
> <META NAME="date" CONTENT="2001-01-01">
> <META NAME="subject" CONTENT="CIL (Computers In Libraries); ebooks; ">
>
> How do I edit my collect.cfg file to extract the meta data correctly? I am
> specifically interested in the values of the subject and abstract tags.
>

You should be able to do this by adding a "-metadata_fields
Creator,Title,Abstract,Date,Subject" option to the HTMLPlug line in your
collect.cfg file. That is, make it look something like the following (all on
one line):

plugin HTMLPlug -metadata_fields Creator,Title,Abstract,Date,Subject

Stefan.