From | Stefan Boddie |
Date | Wed, 25 Sep 2002 16:26:26 +1200 |
Subject | Re: Extracting HTML meta data |
In-Reply-To | (B9B68F24-112B%eric-morgan-infomotions-com) |
>
> I have two questions, and the first one is about extracting HTML metadata. > > I have a set of .shtml files that contain HTML meta tags looking like this: > > <META NAME="creator" CONTENT="Morgan, Eric Lease"> > <META NAME="title" CONTENT="Review of some ebook technology"> > <META NAME="abstract" CONTENT="This column describes my experience..."> > <META NAME="date" CONTENT="2001-01-01"> > <META NAME="subject" CONTENT="CIL (Computers In Libraries); ebooks; "> > > How do I edit my collect.cfg file to extract the meta data correctly? I am > specifically interested in the values of the subject and abstract tags. > You should be able to do this by adding a "-metadata_fields
plugin HTMLPlug -metadata_fields Creator,Title,Abstract,Date,Subject Stefan. |