Re: [greenstone-users] Asking about Greenstone Software!

From Michael Dewsnip
DateMon, 18 Apr 2005 09:48:22 +1200
Subject Re: [greenstone-users] Asking about Greenstone Software!
In-Reply-To (595b9ab205041308567dc9e665-mail-gmail-com)
Hi,

Just a quick note about the video plugins from the Digital Libraries
course: we did get a few (AVI, Quicktime, RealMedia), but they haven't
been added into the Greenstone source yet. They were all very similar,
most using the Video::Info module from http://www.cpan.org to read
information from the video files. If you know a little bit of Perl it is
not difficult to modify a simple plugin like OggVorbisPlug.pm to use the
Video::Info module to create a video plugin for your needs.

We may be able to give out the video plugins the students created,
provided they give their permission to do so.

Regards,

Michael

Stephen DeGabrielle wrote:

>Using unknownplug is best
>
>here are copies of the older plugins that I think were used in the
>videocollection
>
>You can take a look at them to see how they worked.
>
>I think you could probably find more in the list archives
>http://www.nzdl.org/gsarchives
>put video mtv in the search box
>to get this as the first hit
>----
>>From Michael Dewsnip
>DateMon, 18 Oct 2004 17:15:59 +1300
>Subject Re: [greenstone-devel] Patch for UnknownPlug.pm to handle
>filenameswith spaces - and begging for video plugins.
>In-Reply-To(1276bce126f9a0.126f9a01276bce-email.bigpond.com)
>Hi Stephen,
>
>
>
>>PS Does anyone have plugins for video formats? I found mention on MTVPlug and MovPlug in the music video demo collect.cfg file but they are not included in the distro.
>>(Or should I sniff around cvs?)
>>
>>
>
>Both MTVPlug and MovPlug are very specific plugins for the music video
>demo collection that process HTML files from http://www.mtv.com --
>they are, unfortunately, far less exciting than you
>would think, and don't do anything with movie files. I'd wager that
>http://www.mtv.com has changed sufficiently for them to not work any
>more, as well.
>
>However, there should be some video plugins on the horizon: one choice
>of project for the "Digital Libraries" course running here at Waikato
>is to create a plugin for a video format. I'm not
>sure how many students are choosing this option (or the quality of
>code we'll get!), but I imagine there will be a few and they should
>end up in Greenstone eventually.
>
>If you want to mess with Perl for a day or so it wouldn't be too hard
>to write your own. The easiest path is to download a Perl module from
>http://www.cpan.org that reads the video format you
>are interested in, then take something like OggVorbisPlug.pm and
>modify it to extract the video file information (using the CPAN perl
>module) and add it as metadata.
>
>Regards,
>
>Michael
>---
>
>
>
>On 4/13/05, Duc Nguyen <ronevn@yahoo.com> wrote:
>
>
>>Dear Sirs/Madams,
>>
>>
>>
>>My name is Dang Duc Nguyen, a library staff of Natural Sciences University, Ho Chi Minh City, Vietnam.
>>
>>
>>
>>As we know, Greenstone software is very helpful in developing modern library. So, we have used it to build many collections for our library. As many libraries in the world, we intend to use this software as the main technology in building a modern digital library.
>>
>>
>>
>>In order to meet the needs of many patrons, we would like to find out about function how to build the video collection.
>>
>>Please share with us your documents, tips about this procedure. We very happy to hear your advice, construction and your experience too.
>>
>>
>>
>>Best regards,
>>
>>
>>
>>--------------------------------------------------------
>>
>>Dang Duc Nguyen
>>
>>Graduated Library, Natural Sciences University
>>
>>227 Nguyen Van Cu Street, Ward 5, Dist. 5
>>
>>Ho Chi Minh City, Vietnam
>>
>>--------------------------------------------------------
>>
>> ________________________________
>>
>>
>Do you Yahoo!?
>
>
>> Yahoo! Small Business - Try our new resources site!
>>
>>
>>_______________________________________________
>>greenstone-users mailing list
>>greenstone-users@list.scms.waikato.ac.nz
>>https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>>
>>
>>
>>
>>
>
>
>
>
>
>------------------------------------------------------------------------
>
>###########################################################################
>#
># MovPlug.pm --
># A component of the Greenstone digital library software
># from the New Zealand Digital Library Project at the
># University of Waikato, New Zealand.
>#
># Copyright (C) 1999 New Zealand Digital Library Project
>#
># This program is free software; you can redistribute it and/or modify
># it under the terms of the GNU General Public License as published by
># the Free Software Foundation; either version 2 of the License, or
># (at your option) any later version.
>#
># This program is distributed in the hope that it will be useful,
># but WITHOUT ANY WARRANTY; without even the implied warranty of
># MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
># GNU General Public License for more details.
>#
># You should have received a copy of the GNU General Public License
># along with this program; if not, write to the Free Software
># Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
>#
>###########################################################################
>
># This plugin processes an HTML file from www.mtv.com
>
>package MovPlug;
>
>use plugin;
>use BasPlug;
>use util;
>
>sub BEGIN {
> @ISA = ('BasPlug');
>}
>
>sub new {
> my ($class) = @_;
> $self = new BasPlug ();
>
> return bless $self, $class;
>}
>
>
># return 0 if this class might recurse using $pluginfo
>sub is_recursive {
> my $self = shift (@_);
>
> return 0; # not recursive
>}
>
>
>sub begin {
> my $self = shift (@_);
> my ($pluginfo, $base_dir, $processor, $maxdocs) = @_;
>
> $self->{'pluginfo'} = $pluginfo;
> $self->{'base_dir'} = $base_dir;
> $self->{'processor'} = $processor;
> $self->{'maxdocs'} = $maxdocs;
>}
>
>sub pretty_print_filesize
>{
> my ($filename) = @_;
> my $a_meg = 1048576;
> my $a_half_meg = 524288;
> my $a_kilo = 1024;
>
> my $filesize = -s $filename;
> my $formatted_size
> = ($filesize > $a_half_meg) ? sprintf("%.2f MBytes",$filesize/$a_meg)
>: sprintf("%.2f KBytes",$filesize/$a_kilo);
>
> return $formatted_size;
>}
>
>
>
>
># return number of files processed, undef if can't process
># Note that $base_dir might be "" and that $file might
># include directories
>sub read {
> my $self = shift (@_);
> my ($pluginfo, $base_dir, $file, $metadata, $processor, $maxdocs) = @_;
>
> my $filename = &util::filename_cat($base_dir, $file);
>
> if ($file !~ m/.mov(.Z|.gz)?$/)
> {
>return undef; # let something else have a go at it
> }
>
> return 0 if (!-e $filename);
>
> return 0 if (!exists $metadata->{'Artist'}); # not yet ready to be processed
>
>
># restrict area of videos for testing purposes
># if ($file !~ m/video/clips/b//)
># {
>#return return 0;
># }
>
>
>
> # create a new document
> my $doc_obj = new doc ($file, "indexed_doc");
> my $cursection = $doc_obj->get_top_section();
>
>
> $metadata->{'Qtsize'} = pretty_print_filesize($filename);
>
> # associate snapshot image with document
> system("/bin/rm /tmp/frame1.ppm") if (-e "/tmp/frame1.ppm");
> system("/bin/rm /tmp/frame1.gif") if (-e "/tmp/frame1.gif");
>
> my $gen_cmd = "gsanim $filename";
> system($gen_cmd);
>
>
> my $gif_filename1 = "/tmp/frame1.gif";
> my $gif_cmd1 = "pnmscale -xysize 50 50 /tmp/frame1.ppm | ppmquant 256 ";
> $gif_cmd1 .= "| ppmtogif > $gif_filename1";
>
>
> system($gif_cmd1);
> $doc_obj->associate_file ($gif_filename1, "frame1.gif", "image/gif");
>
> # check to see if there is an mpeg version of file
> my $mpeg_filename = $filename;
> $mpeg_filename =~ s/.mov$/.mpg/;
> if (-e $mpeg_filename)
> {
>print STDERR "**** Adding mpeg video counter-part to collection ";
>$doc_obj->associate_file ($mpeg_filename, "video.mpeg", "video/mpeg");
>$metadata->{'hasmpeg'} = "1";
>$metadata->{'Mpegsize'} = pretty_print_filesize($mpeg_filename);
> }
>
> # check to see if there is an realvideo version of file
> my $rm_filename = $filename;
> $rm_filename =~ s/.mov$/.rm/;
> if (-e $rm_filename)
> {
>print STDERR "**** Adding realvideo counter-part to collection ";
>$doc_obj->associate_file ($rm_filename, "video.rm", "video/pn-realvideo");
>$metadata->{'hasrm'} = "1";
>$metadata->{'rmsize'} = pretty_print_filesize($rm_filename);
>
># generate one-line rm file to stream video content
>$rm_stream_filename = $rm_filename;
>$rm_stream_filename =~ s/.rm$/_stream.rm/;
>open(RMSOUT,">$rm_stream_filename")
> || die "Unable to open $rm_stream_filename for RealVideo stream:$!";
>
>my $rm_url = $rm_filename;
>$rm_url =~ s/$ENV{'GSDLCOLLECTDIR'}/http://www.cosc.canterbury.ac.nz/~davidb/gsdl/collect/musvid/;
>
>print RMSOUT "$rm_url ";
>close RMSOUT;
>$doc_obj->associate_file ($rm_stream_filename, "video_stream.rm", "video/pn-realvideo");
>
> }
>
> my $text = join(' ',$metadata->{'Artist'}, $metadata->{'Title'});
> $doc_obj->add_text ($cursection, $text);
>
> # metadata table
> $self->extra_metadata($doc_obj,$cursection,$metadata);
>
> # add OID
> $doc_obj->set_OID ();
>
> # process the document
> $processor->process($doc_obj);
>
> return 1; # one file processed
>}
>
>
>1;
>
>
>------------------------------------------------------------------------
>
>###########################################################################
>#
># MTVPlug.pm --
># A component of the Greenstone digital library software
># from the New Zealand Digital Library Project at the
># University of Waikato, New Zealand.
>#
># Copyright (C) 1999 New Zealand Digital Library Project
>#
># This program is free software; you can redistribute it and/or modify
># it under the terms of the GNU General Public License as published by
># the Free Software Foundation; either version 2 of the License, or
># (at your option) any later version.
>#
># This program is distributed in the hope that it will be useful,
># but WITHOUT ANY WARRANTY; without even the implied warranty of
># MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
># GNU General Public License for more details.
>#
># You should have received a copy of the GNU General Public License
># along with this program; if not, write to the Free Software
># Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
>#
>###########################################################################
>
># This plugin processes an HTML file from www.mtv.com
>
>package MTVPlug;
>
>use plugin;
>use BasPlug;
>use util;
>
>use File::Basename;
>
>sub BEGIN {
> @ISA = ('BasPlug');
>}
>
>sub new {
> my ($class) = @_;
> $self = new BasPlug ();
>
> $self->{'all_mtv_video'} = [];
>
> return bless $self, $class;
>}
>
>
># return 0 if this class might recurse using $pluginfo
>sub is_recursive {
> my $self = shift (@_);
>
> return 1; # recursive
>}
>
>
>sub eval_dir_dots
>{
> # evaluate any "../" to next directory up
> # evaluate any "./" as here
> #--
> my ($self,$filename) = @_;
>
> my $dirsep_os = &util::get_os_dirsep();
> my @dirsep = split(/$dirsep_os/,$filename);
>
> my @eval_dirs = ();
> foreach $d (@dirsep)
> {
>if ($d eq "..")
>{
> pop(@eval_dirs);
>}
>elsif ($d eq ".")
>{
> # do nothing!
>}
>else
>{
> push(@eval_dirs,$d);
>}
> }
>
> return &util::filename_cat(@eval_dirs);
>}
>
>sub capitalise_word
>{
> my ($start,$rest) = @_;
>
> return $start.lc($rest);
>}
>
>sub parse_mtv_file
>{
> my ($self,$url_dir) = @_;
>
> my @video_list = ();
>
> my $artist = undef;
>
> my $html_text = "";
> my $line;
> while (defined($line=<FILE>))
> {
> chop $line;
> $html_text .= $line;
> }
>
>
> if ($html_text =~ m/<title>([^>]*)</title>/i)
> {
> $artist = $1;
> $artist =~ s/^s+//;
> $artist =~ s/s+$//;
>
> if ($artist =~ m/[A-Z]([A-Z]|s|-)+/)
>{
> $artist =~ s/([A-Z])([A-Z]+)/&capitalise_word($1,$2)/ge;
>}
> }
>
> while ($html_text =~ s/<as+href=("?)([^"]*)("?)[^>]*>([^>]*)</a>//i)
> {
> my $url = $2;
> my $song_title = $4;
>
>
> if ($url =~ m/.mov$/i)
>{
>
> if (!defined($artist))
> {
> print STDERR "Warning: no artist detected for $song_title ";
> }
>
> # make absolute URL
> my $url = "$url_dir/$url";
> $url = $self->eval_dir_dots($url);
> $url = "http:/$url";
>
> my $video_item = { "artist" => $artist,
> "title" => $song_title,
> "input_url" => $url };
>
> push(@video_list,$video_item);
>
>}
> }
>
> return (@video_list);
>}
>
>
>sub begin {
> my $self = shift (@_);
> my ($pluginfo, $base_dir, $processor, $maxdocs) = @_;
>
> $self->{'pluginfo'} = $pluginfo;
> $self->{'base_dir'} = $base_dir;
> $self->{'processor'} = $processor;
> $self->{'maxdocs'} = $maxdocs;
>}
>
>
># return number of files processed, undef if can't process
># Note that $base_dir might be "" and that $file might
># include directories
>sub read {
> my $self = shift (@_);
> my ($pluginfo, $base_dir, $file, $metadata, $processor, $maxdocs) = @_;
>
> my $filename = &util::filename_cat($base_dir, $file);
>
> if ($file eq "")
> {
>return undef; # top level dir
> }
>
> if ($filename =~ m/.((gif|jpeg|jpg)(.gz|.Z)?)$/i)
> {
>return 0; # ignore it
> }
>
> if ($filename =~ m/.((gif|jpeg|jpg)(.gz|.Z)?)$/i)
> {
>return 0; # ignore it
> }
>
> if ($filename =~ m/.((mpg|mpeg)(.gz|.Z)?)$/i)
> {
>return 0; # ignore it
> }
>
># if ($filename =~ m/.(mov(.gz|.Z)?)$/i)
># {
>#return 0; # ignore it
># }
>
> if ($filename !~ m/.(html?(.gz|.Z)?)$/i || (!-e $filename))
> {
>return undef;
> }
>
> my $gz = (defined $2) ? 1 : 0;
>
> my $mtv_filename = $filename;
> if ($gz)
> {
>open (FILE, "zcat $mtv_filename |")
> || die "MTVPlug::read - zcat can't open $mtv_filename ";
> }
> else
> {
>open (FILE, $mtv_filename)
> || die "MTVPlug::read - can't open $mtv_filename ";
> }
>
> # found an HTML file
> print STDERR "MTVPlug: processing $file "
>if $processor->{'verbosity'};
>
> my $all_mtv_videos = $self->{'all_mtv_videos'};
> my $url_dir = File::Basename::dirname($file);
>
> my @video_list = $self->parse_mtv_file($url_dir);
> push(@$all_mtv_videos,@video_list);
>
> $self->{'all_mtv_videos'} = $all_mtv_videos;
>
>
> close(FILE);
>
> return 1; # one file processed
>}
>
>
>
>
>sub plugin_mov_doc
>{
> my ($self,$item) = @_;
>
> my $artist = $item->{'artist'};
> my $title = $item->{'title'};
> my $input_url = $item->{'input_url'};
>
> print STDERR " Artist = $artist ";
> print STDERR " Title = $title ";
> print STDERR " Input URL = $input_url ";
>
> if ($input_url =~ m/^http:///)
> {
>my $without_http = $input_url;
>$without_http =~ s/^http:////; # remove http://
>
>if ($without_http =~ m//$/)
>{
> # directory will be saved as dir/index.html
> $without_http .= "index.html";
>}
>
># set title and artist to the specified entry
>my $metadata = {};
>$metadata->{'Title'} = $title;
>$metadata->{'Artist'} = $artist;
>$metadata->{'URL'} = "/gsdl/collect/gsarch/index/$without_http";
>
># call plug in
>my @plugin_args
> = ($self->{'pluginfo'}, $self->{'base_dir'}, $without_http,
> $metadata, $self->{'processor'},
> $self->{'maxdocs'});
>
>my $count = &plugin::read(@plugin_args);
> }
> else
> {
>print STDERR "Warning: unsupported type of URL $input_url ";
> }
>}
>
>
>sub end {
> my ($self) = @_;
>
> my $all_mtv_videos = $self->{'all_mtv_videos'};
>
> my $v;
> foreach $v ( @$all_mtv_videos )
> {
>$self->plugin_mov_doc($v);
> }
>}
>
>1;
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>greenstone-users mailing list
>greenstone-users@list.scms.waikato.ac.nz
>https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>