[greenstone-users] Re:Apache Server crashing during Preview Search

From Greenstone Team
DateFri Jul 29 19:28:34 2011
Subject [greenstone-users] Re:Apache Server crashing during Preview Search
In-Reply-To (CAM3bH=2-VAzpyrzEF0rRO1REmE44Tqx3YF=fYGN65m2-VtMVfQ-mail-gmail-com)
Hi Ted,

Here's what my testing around has revealed so far. Note that I'm using
the current Greenstone source code from SVN (which is an improvement
from 2.84, but not in this matter) and, like you, am also working on
Windows.

1. I was able to reproduce both problems: the *.doc you sent me crashed
WvWare. And, when using the OpenOfficePlugin, the *.odt would be built
but upon searching would result in a blank page instead of search results.

2. I went back to considering the *.doc file. I have OpenOffice
installed and therefore decided to try to use the OpenOffice Extension
for Greenstone to convert the doc to html instead of wvware . For more
information on this plugin extension, refer to
http://wiki.greenstone.org/wiki/index.php/2.84_Release_Notes#Extensions)
I switched on the extension in the WordPlugin, and tried to build the
collection with it. While there were no failures or crashes, it does not
appear to have preserved the comments that you wanted, which meant these
were not indexed and not searchable.

3. Finally, I tried to build a new collection containing the same *.doc
on a machine where we have a recent version of Microsoft Word installed.
The idea was to use Windows native scripting this time and let Word
itself convert the document to html. I turned on the windows_scripting
option in the WordPlugin and rebuilt the collection containing the *.doc
you sent me. This time, the comments were indexed too, and were
searchable. The remaining drawback is that when you view the document,
all the comments appear at the end, being taken out of their original
context where they referred to various parts of the main body of the text.

I think the results are perhaps due to your document being more complex
than the average ones we have so far had to work with. The comments seem
to be something that the version of OpenOffice I have here (version 3)
is unable to extract. Even Microsoft Word can only work with it to some
extent, being able to extract it, yet putting all the comments at the
end instead of preserving their original locations.


If you wish to try the third method I attempted above, then you will
need to make some minor changes to your Greenstone installation, since
the script to launch Windows native scripting has been updated since
2.84. The instructions to use that obtain and use that script follow
shortly.

Tell us how you get on,
Anupama

1. On a Windows machine, stop GLI and the Greenstone server if either are
running.

2. Open up a file browser and go into your Greenstone installation directory

3. Go into its binscript folder.

4. Rename the file gsConvert.pl to gsConvert.pl_bak

5. Visit
http://trac.greenstone.org/browser/main/trunk/greenstone2/bin/script/gsConvert.pl?format=txt
and save the file there into your Greenstone 2.84 installation's binscript folder.

6. Visit
http://trac.greenstone.org/export/24168/main/trunk/binaries/windows/bin/docx2html.vbs

and save this 2nd file into your Greenstone2.84's binwindows folder.

Use a File Browser to go into that binwindows folder to check that the
downloaded file has indeed been saved been saved with the ".vbs" extension
and not as ".vbs.txt". In the latter case, you will need to rename it to
plain ".vbs".

7. To test it out, startup GLI, and create a new collection with your *.doc file.

8. In the Design Pane, go to Document Plugins and Configure your WordPlugin by ticking
the "windows_scripting" option and click OK to close the Configure Plugin dialog.

9. Build your collection.


Ted Maust wrote:
> Anupama,
>
> I used all the Greenstone defaults for my collection, then loaded a
> few emails (which worked fine) and the two files attached. The .doc
> file crashed the wvWare.exe and the .odt file shuts down the preview
> when I try searching. I've found a workaround by saving the .doc file
> as a PDF and uploading that, but it's less than ideal. I think the
> problems stem from the comments along the side, and perhaps the length
> of the document. Thank you for helping me with this! If you have any
> more questions about my process that generated this error, I'll be
> glad to be more specific. I'm using 2.84.
>
> Thanks!
>
> Ted
>
> On Wed, Jul 20, 2011 at 10:13 PM, Greenstone Team
> <greenstone_team@cs.waikato.ac.nz
> <mailto:greenstone_team@cs.waikato.ac.nz>> wrote:
>
> Hello Ted,
>
> If you would send me the document that crashes the server (I may
> not need the others, since they don't crash the server), then I
> will try to reproduce the problem here. Alternatively, if you find
> the same problem manifests when using any other documents of that
> type, then we can try with any .odt file here.
>
> Could you also tell us exactly which steps you followed in
> creating your collection, especially where that document is
> concerned. E.g. what plugins you added to the default set, what
> indexes you created and what metadata you added to the particular
> document. You could also tell me how you set up your browsing
> classifiers for the collection, although that may not be necessary
> as it was the search that failed. Finally, if you could tell me
> how you set up your search, such as whether it was the default
> search box, or whether you changed preferences to advanced or
> fielded search, what index field(s) you searched on, perhaps even
> the search term(s).
>
> To explain it a different way: if you tell us how to recreate your
> collection here and how to attempt the same search, by following
> the same steps, we could try to recreate the context in which
> searching on your document failed and caused the server crash.
>
> We'd like very much to eliminate such bugs, therefore any help you
> can give us in this matter will help us make Greenstone more
> robust. Thanks,
> Anupama
>
> Ted Maust wrote:
>
> Hello,
>
> I think I identified the problem I was having earlier, and so
> I just wanted to test my diagnosis out with all of you.
>
> The problem stems from the fact that the Google Document I was
> trying to include in a collection has comments along the side.
> In exporting it, I created four versions, a zipped HTML (which
> is okay but lacks the comments), a PDF (which also cuts out
> the comments and has some formatting mistakes in just a few
> places with words going off the side etc.), and then a .doc
> and a .odt file. The .doc file causes wvWare.exe to close
> during build and so doesn't get included in the collection.
> The .odt makes it into the collection, but I think it can't be
> fully indexed. For this reason, it is fine for browsing
> functions but when I search, it crashes the server somehow.
> Ideas? Solutions? If possible I would like to retain the
> comments and have them be searchable as well.
>
>
> Thanks!
>
> Ted Maust
> ------------------------------------------------------------------------
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> <mailto:greenstone-users@list.scms.waikato.ac.nz>
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>
>
>