Re: Indexes

From Roman Chyla
DateWed, 05 Mar 2003 11:32:48 +0100 (CET)
Subject Re: Indexes
> >
> > Is it possible to create indexes other than
> > the
> > standard ones of section, document and
> > paragraph? By
> > using metadata for example? And if so, how
> > do you tag
> > the html documents?
> >
>

Hi Melina,

I think that you want to extract some special parts of
text, different from html META. I did it in this way - Perl folks
excuse me!

Look at the BasPlug or HTMLPlug, choose some similar procedure
and follow the code. I used emailaddress inside BasPlug to
extract text enclosed in <div
class="citace">..</div>

my @citace = ($$textref =~ /(<divsclass="citace.*/div[>]*)/s);
foreach my $citace (@citace) {
$doc_obj->add_utf8_metadata ($thissection, "CITACE",
$citace); }

Now I know one can do it better, but it works.

Then you can build index from metadata i.e.

index Title,Citace,References


There is problem with extraction. Here you can get very useful
tool for building regular expression query
http://anso.virtualave.net/delphi_stuff.htm (but I really do not
know whether r.e. in perl is same as r.e. in other
languages, the basis is the same i think)

have a nice day
roman