[greenstone-users] Emailplug error

From Stephen DeGabrielle
DateThu, 5 May 2005 12:15:05 +0930
Subject [greenstone-users] Emailplug error
Hi,

I am trying to process
http://list.cs.brown.edu/pipermail/plt-scheme.mbox/plt-scheme.mbox
(I download it first - it is 30mb)

I am a bit stumped why this call fails

909> # convert to unicode
910> $self->convert2unicode($charset, $text);

The only change from the default collection is to add mbox to the
-process_exp for the EMAILPlug.pm plugin directive.

I am tempted to disable that part of the plugin - but I think it is
text to be found...so I'd rather not.

--

import.pl> segment 281 - EMAILPlug: processing plt-scheme.mbox
import.pl> segment 282 - EMAILPlug: processing plt-scheme.mbox
import.pl> segment 283 - EMAILPlug: processing plt-scheme.mbox
import.pl> Can't call method "convert2unicode" without a package or
object reference at C:Program
FilesGreenstoneperllibpluginsEMAILPlug.pm line 910.
import.pl> Command failed.


--EMAILPlug.pm-
# Process a MIME part. Return "" if we can't decode it.
# should only be called for parts with type "text/*" ?
sub text_from_part {
my $self = shift;
my $text = shift || '';
my $part_header = $text;

# check for empty part header (leading blank line)
if ($text =~ /^s*r? /) {
$part_header="Content-type: text/plain; charset=us-ascii";
} else {
$part_header =~ s/r? r? (.*)$//s;
$text=$1; if (!defined($text)) {$text="";}
}
$part_header =~ s/r? [ ]+/ /gs; #unfold
$part_header =~
/content-type:s*([w.-/]+).*?charset="?([^;"s]+)"?/is;
my $type=$1;
my $charset=$2;
if (!defined($type)) {$type="";}
if (!defined($charset)) {$charset="ascii";}
my $encoding="";
if ($part_header =~ /^content-transfer-encoding:s*([^s]+)/mis) {
$encoding=$1; $encoding=~tr/A-Z/a-z/;
}
# Content-Transfer-Encoding is per-part
if ($encoding ne "") {
if ($encoding =~ /quoted-printable/) {
$text=qp_decode($text);
} elsif ($encoding =~ /base64/) {
$text=base64_decode($text);
} elsif ($encoding !~ /[78]bit/) { # leave 7/8 bit as is.
# rfc2045 also allows binary, which we ignore (for now).
my $outhandle=$self->{'outhandle'};
print $outhandle "EMAILPlug: unknown transfer encoding: $encoding ";
return "";
}
}
if ($type eq "text/html") {
# only get stuff between <body> tags, or <html> tags.
$text =~ s@^.*<html[^>]*>@@is;
$text =~ s@</html>.*$@@is;
$text =~ s/^.*?<body[^>]*>//si;
$text =~ s/</body>.*$//si;
}
elsif ($type eq "text/xml") {
$text=~s/</&lt;/g;$text=~s/>/&gt;/g;
$text="<pre> $text </pre> ";
}
# convert to unicode
$self->convert2unicode($charset, $text);

$text =~ s@_@\_@g; # protect against GS macro language
return $text;
}


--

Stephen De Gabrielle