From | Stephen.DeGabrielle@nt.gov.au |
Date | Thu, 13 Jan 2005 08:52:51 +0930 |
Subject | [greenstone-devel] plugin perl problems |
Hi, thinking about this last night I have decided my problem is I don't understand how the multi-part/page documents get generated, the same difficulty goes for multidocument plugins like MARCPlug or OAIPlug. Any help understanding what is going on with these sort of plugins is appreciated. Thanks s. -- Have a look at the following snippet: (starts somewhere near line 100 of PDFPlug.pm) #### # following title_sub removes "Page 1" added by pdftohtml, and a leading # "1", which is often the page number at the top of the page. Bad Luck # if your document title actually starts with "1 " - is there a better way? #my $self = new ConvertToPlug ($class, @args, "-title_sub", '^(Pages+d+)?(s*1s+)?'); my $self = new ConvertToPlug ($class, @args); $self->{'plugin_type'} = "PDFPlug"; if ($use_sections) { $self = new ConvertToPlug ($class, @args, "-title_sub", '^(Pages+d+)?(s*1s+)?'); $self->{'use_sections'}=1; } #### It is part of 'sub new' in PDFPlug.pm, and is my attempt to fix PDFPlug so it doesn't overide a title_sub specified in the arguments of PDFPlug.pm with the '^(Pages+d+)?(s*1s+)?' arguments required to remove the "Page 1" added by pdftohtml (as noted in the comments) I thought I got it right - but attempts to rebuild with a small collection seem to have killed my ability to generate sections at all (my test document included the greenstone developers guide). Any help/suggestions appreciated regards, s. ---------- Stephen De Gabrielle 8922 0887 http://www.birdguides.com/html/vidlib/species/Carduelis_cannabina.htm |