[greenstone-users] Re: Problem with Greenstone Preview

From Anupama of Greenstone Team
DateSun Apr 17 17:48:12 2011
Subject [greenstone-users] Re: Problem with Greenstone Preview
In-Reply-To (559573-47134-qm-web402-biz-mail-mud-yahoo-com)
> Hello Anupama, Please look at some of the pdf
> returning error when we want to build and preview in Greenstone for
> but when we use the same files in Greenstone for windows and carry out
> build and preview no error , the operations run sucessful.
> Please see what you can do for us.
> Thanks.
> Ogo Adesanya

Hello Ogo,

1. Two of the three PDFs you sent are processed fine here for me on both
Windows and Linux, using both the default setting for PDFPlugin (which
uses PDF_to_html) as well as if I have the PDFBox plugin extension
turned on.
(An aside about these 2 PDFs: I suggest using the default
PDFplugin--without PDFBox--to process the file "some pharmacological
effect.PDF", since the PDF_to_html tool that Greenstone uses by default
preserves the images in this case. For the other PDF, you can choose to
use either plugin to process it. See
http://wiki.greenstone.org/wiki/gsdoc/tutorial/en/enhanced_pdf.htm for
details on how to set specific plugin settings to work on some documents
while another instance of the plugin with different settings can be
applied to process other documnents of the same type.)

The 3rd PDF "Computer Science 7.pdf" fails to get converted, and
Greenstone displays the underlying problem that PDFBox has encountered:

"Exception in thread "main" java.io.IOException: You do not have
permission to extract text"

If you can ask the copyright-holder to remove the permissions on the
document and re-save it in PDF, you may be able to convert this last
document as well.

2. I think these files could be failing on your Linux for reasons like
environment variables being differently set up from what's expected. But
for me to be able to diagnose what is going wrong, I really need to have
the error output of your build process once you pressed the Create
button. Could you look through my previous e-mail to your colleage Boye
Adesanya on this topic (which I will copy below) where I have given
instructions on how to set GLI to Expert mode? Then rebuild the
collection and mail me the section containing the error messages in the
build output.

You can then also set the verbosity for the output higher in GLI's
Create panel > Import Options (to the left) > verbosity field. Set the
value to 5. Repeat for the Build Options > verbosity field.
Now rebuild. Perhaps there may be even more detailed output on the
errors occurring.


BELOW FOLLOWS THE EMAIL SENT TO Boye Adesanya ON 04/04/11 17:29. I want
you to at least try the bit embedded between >>>> and <<<<.

Hello Boye,

There's not enough information provided as to the manner in which things
are going wrong, but there's 2 things I can think of that you could try
(please also answer my questions below).
So we know that your Greenstone can work with same PDF files work on
Windows, and that they are unable to process these files on Linux.
Perhaps your environment is not set up right.

- You appear to be using GLI (the Greenstone Librarian Interface
application). Are there any errors shown in the output? More detailed
error messages would appear if you went to the File menu > Preferences >
Mode. Then tick "Expert" and click OK. Rebuild your collection and this
time there may be more specific error messages in the build log area of
the Create Panel. Can you copy the relevant section of the output (the
error messages) and send this to us.

- If your Linux system is an Ubuntu and the Greenstone version you
happen to be using is 2.83 or earlier, then it may be a problem with
Perl. In that case, see suggestion 2 below.

1. Otherwise, try the following first. We're going to try building from
the command-line, instead of from GLI (the Greenstone Librarian
application), just to check whether it has something to do with the

a) Open a linux terminal (x-term) and go into your Greenstone
installation folder:
$ cd /type/the/full/path/to/your/greenstone/installation/

b) Next, set up the Greenstone environment by typing the following in
your x-term:
$ source setup.bash

c) Run the import script -- which is the first step of the build process
- and provide the name of collection you wish to build as argument to it:
$ import.pl -removeold <type your collection's name>

Are there any errors at this stage (check for errors in the text that
moves past in the terminal during the execution of the import.pl command)?

d) Next run the 2nd step of the build process, once again providing the
collection name as argument:
$ buildcol.pl -removeold <type your collection's name>

Once again, does the output show any errors?

e) If all went well, rename the folder "building" inside your collection
folder to "index":
$ cd collect/<type your collection's name>
$ mv building index

! If the above worked for some reason, then the environment GLI runs in
when it is launched is different from the environment that the
command-line scripts manage to work in.

f) If steps d and e above showed up no errors to do with your PDF files,
then go back to your Greenstone installation folder and run the
Greenstone server from there to visit your collection page:
> cd /type/the/full/path/to/your/greenstone/installation/
> ./gs2-server.sh
This should launch a dialog. Press its central button to go to your
Greenstone digital library's home page and from there manoeuvre to your

2. If suggestion 1 above did not work, and your work is urgent, then I
think it a good idea for you to try out the latest version of Greenstone
(released last Friday): Greenstone 2.84. This latest Greenstone version
can work with a plugin extension which makes it cope with later versions
of PDF. It may be that doing so will bypass your linux-specific problems
of handling the PDFs you have:

a) Download the Greenstone 2.84 installer for *Linux* by clicking the
link at the top of http://www.greenstone.org/download
(or visit http://sourceforge.net/projects/greenstone/ and press the
green Download button)

b) Then run the installer to install Greenstone2.84. Make sure to
install it somewhere else than your previous Greenstone installation.

c) Next, point your browser to
Click on the small red "download" link on that page and make sure to
save this file (the PDF-Box Greenstone extension) into your Greenstone
2.84 installations "ext" folder.

d) Use a terminal (x-term) to cd into your Greenstone installation
folder and then go into its ext folder where you have saved the tar.gz
file downloaded above. Then extract this archive file in this location:
$ cd /type/the/full/path/to/your/Greenstone 2.84/installation/
$ cd ext
$ tar -xvzf pdf-box-java.tar.gz

e) Copy your collection folder across from the old Greenstone
installation into the new one:
$ cp -r
/<full/path/to/your/OLD-Greenstone-installation>/collect/<type name of
PDF collection> /<full/path/to/your/Greenstone

(Note that the folder "collect" is the name of the directory containing
your collection which is to be copied. The "collect" folder exists in
all normal Greenstone installations, so you need to type it as shown.
Just replace the strings inside the <> marks.)

f) In a *fresh* terminal (this is important, so make sure to open a
brand new x-term), go back to your Greenstone installation folder and
run GLI from here:
$ cd /full/path/to/your/Greenstone 2.84/installation/
$ ./gli/gli.sh

The reason you need a fresh x-term is because when GLI is run this time,
it will know to set up the Greenstone environment all over again. And
this time, it will detect the new PDF Box extension that you downloaded
and unpacked in steps c and d.

g) Go to File > Open. Click the "Change Dir..." button at the bottom
and, in the dialog that appears, make sure it is pointing to the collect
folder inside your new Greenstone 2.84 installation. (Else use the
Change Dir dialog to go to the Greenstone 2.84 installation's collect
directory.) Now open your collection. This should open the collection
you copied into your Greenstone 2.84.

h) Go into the Design panel. Make sure that on the left hand side,
"Document Plugins" is selected. Then, to the right, double click on the
PDFPlugin in the list of Document plugins. In the Plugin Configuration
dialog that appears, scroll down to the section titled "Autoload
Converters" and tick the checkbox next to "pdfbox_conversion".
This tells GLI to use the PDFBox extension to process PDF files. This
extension to the PDFplugin allows newer PDF versions to be processed as

i) Now go to GLI's Create Panel and click the Build button. Hopefully
there will be no errors this time and your PDFs will get processed. Then
click the Preview button to preview your collection.

If you still have problems after trying the above, then, in your next
e-mail will you send us the error messages from the build output? And
also tell us what version of Greenstone you are using and what
particular Linux (Ubuntu for example).

Best of luck,

> Sandton Consulting Ltd wrote:

> Hello ,
> We are a company interested in Greenstone Digital library
> We have this challenge pdf files in Greenstone linux environment .
When we attach pdf files , we go on to Design and build our pdf files,
the pdf files are rejected . The same pdf files are accepted
> and we build with them in Greenstone for Windows.
> Please assist us to solve this problem of Design and Build of pdf
files in Greenstone for linux environment.
> Please it is urgent.
> Thanking you
> Boye Adesanya