Re: [greenstone-users] Asking about Greenstone Software!

From Stephen DeGabrielle
DateThu, 14 Apr 2005 01:26:35 +0930
Subject Re: [greenstone-users] Asking about Greenstone Software!
In-Reply-To (20050413063917-22058-qmail-web50508-mail-yahoo-com)
Using unknownplug is best

here are copies of the older plugins that I think were used in the
videocollection

You can take a look at them to see how they worked.

I think you could probably find more in the list archives
http://www.nzdl.org/gsarchives
put video mtv in the search box
to get this as the first hit
----
>From Michael Dewsnip
DateMon, 18 Oct 2004 17:15:59 +1300
Subject Re: [greenstone-devel] Patch for UnknownPlug.pm to handle
filenameswith spaces - and begging for video plugins.
In-Reply-To(1276bce126f9a0.126f9a01276bce-email.bigpond.com)
Hi Stephen,

> PS Does anyone have plugins for video formats? I found mention on MTVPlug and MovPlug in the music video demo collect.cfg file but they are not included in the distro.
> (Or should I sniff around cvs?)

Both MTVPlug and MovPlug are very specific plugins for the music video
demo collection that process HTML files from http://www.mtv.com --
they are, unfortunately, far less exciting than you
would think, and don't do anything with movie files. I'd wager that
http://www.mtv.com has changed sufficiently for them to not work any
more, as well.

However, there should be some video plugins on the horizon: one choice
of project for the "Digital Libraries" course running here at Waikato
is to create a plugin for a video format. I'm not
sure how many students are choosing this option (or the quality of
code we'll get!), but I imagine there will be a few and they should
end up in Greenstone eventually.

If you want to mess with Perl for a day or so it wouldn't be too hard
to write your own. The easiest path is to download a Perl module from
http://www.cpan.org that reads the video format you
are interested in, then take something like OggVorbisPlug.pm and
modify it to extract the video file information (using the CPAN perl
module) and add it as metadata.

Regards,

Michael
---

On 4/13/05, Duc Nguyen <ronevn@yahoo.com> wrote:
>
> Dear Sirs/Madams,
>
>
>
> My name is Dang Duc Nguyen, a library staff of Natural Sciences University, Ho Chi Minh City, Vietnam.
>
>
>
> As we know, Greenstone software is very helpful in developing modern library. So, we have used it to build many collections for our library. As many libraries in the world, we intend to use this software as the main technology in building a modern digital library.
>
>
>
> In order to meet the needs of many patrons, we would like to find out about function how to build the video collection.
>
> Please share with us your documents, tips about this procedure. We very happy to hear your advice, construction and your experience too.
>
>
>
> Best regards,
>
>
>
> --------------------------------------------------------
>
> Dang Duc Nguyen
>
> Graduated Library, Natural Sciences University
>
> 227 Nguyen Van Cu Street, Ward 5, Dist. 5
>
> Ho Chi Minh City, Vietnam
>
> --------------------------------------------------------
>
> ________________________________
Do you Yahoo!?
> Yahoo! Small Business - Try our new resources site!
>
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>
>

--

--

Stephen De Gabrielle
-- http://users.bigpond.com/spdegabrielle/automata


<<attachment>>
Type: text/plain
Filename: MovPlug.pm

###########################################################################
#
# MovPlug.pm --
# A component of the Greenstone digital library software
# from the New Zealand Digital Library Project at the
# University of Waikato, New Zealand.
#
# Copyright (C) 1999 New Zealand Digital Library Project
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
#
###########################################################################

# This plugin processes an HTML file from www.mtv.com

package MovPlug;

use plugin;
use BasPlug;
use util;

sub BEGIN {
@ISA = ('BasPlug');
}

sub new {
my ($class) = @_;
$self = new BasPlug ();

return bless $self, $class;
}


# return 0 if this class might recurse using $pluginfo
sub is_recursive {
my $self = shift (@_);

return 0; # not recursive
}


sub begin {
my $self = shift (@_);
my ($pluginfo, $base_dir, $processor, $maxdocs) = @_;

$self->{'pluginfo'} = $pluginfo;
$self->{'base_dir'} = $base_dir;
$self->{'processor'} = $processor;
$self->{'maxdocs'} = $maxdocs;
}

sub pretty_print_filesize
{
my ($filename) = @_;
my $a_meg = 1048576;
my $a_half_meg = 524288;
my $a_kilo = 1024;

my $filesize = -s $filename;
my $formatted_size
= ($filesize > $a_half_meg) ? sprintf("%.2f MBytes",$filesize/$a_meg)
: sprintf("%.2f KBytes",$filesize/$a_kilo);

return $formatted_size;
}


# return number of files processed, undef if can't process
# Note that $base_dir might be "" and that $file might
# include directories
sub read {
my $self = shift (@_);
my ($pluginfo, $base_dir, $file, $metadata, $processor, $maxdocs) = @_;

my $filename = &util::filename_cat($base_dir, $file);

if ($file !~ m/.mov(.Z|.gz)?$/)
{
return undef; # let something else have a go at it
}

return 0 if (!-e $filename);

return 0 if (!exists $metadata->{'Artist'}); # not yet ready to be processed


# restrict area of videos for testing purposes
# if ($file !~ m/video/clips/b//)
# {
#return return 0;
# }

# create a new document
my $doc_obj = new doc ($file, "indexed_doc");
my $cursection = $doc_obj->get_top_section();


$metadata->{'Qtsize'} = pretty_print_filesize($filename);

# associate snapshot image with document
system("/bin/rm /tmp/frame1.ppm") if (-e "/tmp/frame1.ppm");
system("/bin/rm /tmp/frame1.gif") if (-e "/tmp/frame1.gif");

my $gen_cmd = "gsanim $filename";
system($gen_cmd);


my $gif_filename1 = "/tmp/frame1.gif";
my $gif_cmd1 = "pnmscale -xysize 50 50 /tmp/frame1.ppm | ppmquant 256 ";
$gif_cmd1 .= "| ppmtogif > $gif_filename1";


system($gif_cmd1);
$doc_obj->associate_file ($gif_filename1, "frame1.gif", "image/gif");

# check to see if there is an mpeg version of file
my $mpeg_filename = $filename;
$mpeg_filename =~ s/.mov$/.mpg/;
if (-e $mpeg_filename)
{
print STDERR "**** Adding mpeg video counter-part to collection ";
$doc_obj->associate_file ($mpeg_filename, "video.mpeg", "video/mpeg");
$metadata->{'hasmpeg'} = "1";
$metadata->{'Mpegsize'} = pretty_print_filesize($mpeg_filename);
}

# check to see if there is an realvideo version of file
my $rm_filename = $filename;
$rm_filename =~ s/.mov$/.rm/;
if (-e $rm_filename)
{
print STDERR "**** Adding realvideo counter-part to collection ";
$doc_obj->associate_file ($rm_filename, "video.rm", "video/pn-realvideo");
$metadata->{'hasrm'} = "1";
$metadata->{'rmsize'} = pretty_print_filesize($rm_filename);

# generate one-line rm file to stream video content
$rm_stream_filename = $rm_filename;
$rm_stream_filename =~ s/.rm$/_stream.rm/;
open(RMSOUT,">$rm_stream_filename")
|| die "Unable to open $rm_stream_filename for RealVideo stream:$!";

my $rm_url = $rm_filename;
$rm_url =~ s/$ENV{'GSDLCOLLECTDIR'}/http://www.cosc.canterbury.ac.nz/~davidb/gsdl/collect/musvid/;

print RMSOUT "$rm_url ";
close RMSOUT;
$doc_obj->associate_file ($rm_stream_filename, "video_stream.rm", "video/pn-realvideo");

}

my $text = join(' ',$metadata->{'Artist'}, $metadata->{'Title'});
$doc_obj->add_text ($cursection, $text);

# metadata table
$self->extra_metadata($doc_obj,$cursection,$metadata);

# add OID
$doc_obj->set_OID ();

# process the document
$processor->process($doc_obj);

return 1; # one file processed
}


1;


<<attachment>>
Type: text/plain
Filename: MTVPlug.pm

###########################################################################
#
# MTVPlug.pm --
# A component of the Greenstone digital library software
# from the New Zealand Digital Library Project at the
# University of Waikato, New Zealand.
#
# Copyright (C) 1999 New Zealand Digital Library Project
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
#
###########################################################################

# This plugin processes an HTML file from www.mtv.com

package MTVPlug;

use plugin;
use BasPlug;
use util;

use File::Basename;

sub BEGIN {
@ISA = ('BasPlug');
}

sub new {
my ($class) = @_;
$self = new BasPlug ();

$self->{'all_mtv_video'} = [];

return bless $self, $class;
}


# return 0 if this class might recurse using $pluginfo
sub is_recursive {
my $self = shift (@_);

return 1; # recursive
}


sub eval_dir_dots
{
# evaluate any "../" to next directory up
# evaluate any "./" as here
#--
my ($self,$filename) = @_;

my $dirsep_os = &util::get_os_dirsep();
my @dirsep = split(/$dirsep_os/,$filename);

my @eval_dirs = ();
foreach $d (@dirsep)
{
if ($d eq "..")
{
pop(@eval_dirs);
}
elsif ($d eq ".")
{
# do nothing!
}
else
{
push(@eval_dirs,$d);
}
}

return &util::filename_cat(@eval_dirs);
}

sub capitalise_word
{
my ($start,$rest) = @_;

return $start.lc($rest);
}

sub parse_mtv_file
{
my ($self,$url_dir) = @_;

my @video_list = ();

my $artist = undef;

my $html_text = "";
my $line;
while (defined($line=<FILE>))
{
chop $line;
$html_text .= $line;
}


if ($html_text =~ m/<title>([^>]*)</title>/i)
{
$artist = $1;
$artist =~ s/^s+//;
$artist =~ s/s+$//;

if ($artist =~ m/[A-Z]([A-Z]|s|-)+/)
{
$artist =~ s/([A-Z])([A-Z]+)/&capitalise_word($1,$2)/ge;
}
}

while ($html_text =~ s/<as+href=("?)([^"]*)("?)[^>]*>([^>]*)</a>//i)
{
my $url = $2;
my $song_title = $4;


if ($url =~ m/.mov$/i)
{

if (!defined($artist))
{
print STDERR "Warning: no artist detected for $song_title ";
}

# make absolute URL
my $url = "$url_dir/$url";
$url = $self->eval_dir_dots($url);
$url = "http:/$url";

my $video_item = { "artist" => $artist,
"title" => $song_title,
"input_url" => $url };

push(@video_list,$video_item);

}
}

return (@video_list);
}


sub begin {
my $self = shift (@_);
my ($pluginfo, $base_dir, $processor, $maxdocs) = @_;

$self->{'pluginfo'} = $pluginfo;
$self->{'base_dir'} = $base_dir;
$self->{'processor'} = $processor;
$self->{'maxdocs'} = $maxdocs;
}


# return number of files processed, undef if can't process
# Note that $base_dir might be "" and that $file might
# include directories
sub read {
my $self = shift (@_);
my ($pluginfo, $base_dir, $file, $metadata, $processor, $maxdocs) = @_;

my $filename = &util::filename_cat($base_dir, $file);

if ($file eq "")
{
return undef; # top level dir
}

if ($filename =~ m/.((gif|jpeg|jpg)(.gz|.Z)?)$/i)
{
return 0; # ignore it
}

if ($filename =~ m/.((gif|jpeg|jpg)(.gz|.Z)?)$/i)
{
return 0; # ignore it
}

if ($filename =~ m/.((mpg|mpeg)(.gz|.Z)?)$/i)
{
return 0; # ignore it
}

# if ($filename =~ m/.(mov(.gz|.Z)?)$/i)
# {
#return 0; # ignore it
# }

if ($filename !~ m/.(html?(.gz|.Z)?)$/i || (!-e $filename))
{
return undef;
}

my $gz = (defined $2) ? 1 : 0;

my $mtv_filename = $filename;
if ($gz)
{
open (FILE, "zcat $mtv_filename |")
|| die "MTVPlug::read - zcat can't open $mtv_filename ";
}
else
{
open (FILE, $mtv_filename)
|| die "MTVPlug::read - can't open $mtv_filename ";
}

# found an HTML file
print STDERR "MTVPlug: processing $file "
if $processor->{'verbosity'};

my $all_mtv_videos = $self->{'all_mtv_videos'};
my $url_dir = File::Basename::dirname($file);

my @video_list = $self->parse_mtv_file($url_dir);
push(@$all_mtv_videos,@video_list);

$self->{'all_mtv_videos'} = $all_mtv_videos;


close(FILE);

return 1; # one file processed
}


sub plugin_mov_doc
{
my ($self,$item) = @_;

my $artist = $item->{'artist'};
my $title = $item->{'title'};
my $input_url = $item->{'input_url'};

print STDERR " Artist = $artist ";
print STDERR " Title = $title ";
print STDERR " Input URL = $input_url ";

if ($input_url =~ m/^http:///)
{
my $without_http = $input_url;
$without_http =~ s/^http:////; # remove http://

if ($without_http =~ m//$/)
{
# directory will be saved as dir/index.html
$without_http .= "index.html";
}

# set title and artist to the specified entry
my $metadata = {};
$metadata->{'Title'} = $title;
$metadata->{'Artist'} = $artist;
$metadata->{'URL'} = "/gsdl/collect/gsarch/index/$without_http";

# call plug in
my @plugin_args
= ($self->{'pluginfo'}, $self->{'base_dir'}, $without_http,
$metadata, $self->{'processor'},
$self->{'maxdocs'});

my $count = &plugin::read(@plugin_args);
}
else
{
print STDERR "Warning: unsupported type of URL $input_url ";
}
}


sub end {
my ($self) = @_;

my $all_mtv_videos = $self->{'all_mtv_videos'};

my $v;
foreach $v ( @$all_mtv_videos )
{
$self->plugin_mov_doc($v);
}
}

1;