[greenstone-users] Using Greenstone as a webography : some problems with php files

From ak19@cs.waikato.ac.nz
DateWed Mar 26 14:07:55 2008
Subject [greenstone-users] Using Greenstone as a webography : some problems with php files
In-Reply-To (996a19c0803210819s699ffce7hb0f7f0f5f0d96c7b-mail-gmail-com)
Hi again,

The solution I described in my previous response does not work after all,
but causes GLI to indicate an error instead. However, here is a partial
solution that I have tried out and that appears to work on a php file that
I have manually saved under the filename "search.php@query=php-gtk-pcntl":

1. Set GLI to run Expert mode
File > Preferences > Mode tab. Tick Expert.

2. In GLI's Design tab, Select the HTMLPlug and press the Configure Plugin
button.

3. Set the process_exp field to:
(?i)(.html?|.shtml|.shm|.asp|.phpd?|.cgi|.+?.+=.*|.+@.+=.*)$

4. Now click on the Gather tab of GLI and drag and drop your php@article
files. The change we made in (3) should hopefully mean that GLI will no
longer warn you that "None of Greenstone's plugins are expected to process
the file "search.php@query=php-gtk-pcntl". Check that the file has the
correct extension. If it is correct then you may have to use UnknownPlug
to process this file."

5. I built my collection containing the file
search.php@query=php-gtk-pcntl and it is indexed and browseable for me. I
have attached this file so that you can try to build a test collection
with it and see whether it works for you as well.

I included a regular html file in my trial collection alongside the file
named search.php@query=php-gtk-pcntl and they both work, so we know the
change has not broken the usual behaviour for html files at least.

However, it is still unknown whether the steps I described above will work
with the mirrored web pages you have. Therefore will you try it out for
your collection of php@article files?

Regards,
Anupama


> Hello everyone,
>
> I discovered Greenstone last couple of months, and I decided to include it
> in a Information project, about the french student's strike last november.
> Our plot was to use Greenstone as a tool which link documents to their
> real
> sources (for legal issues). So, the value "file_as_url" is indispensable.
>
> To make indexes and logical organization of our documents, I used the
> mirroring module. For the majority of documents, there isn't any problems,
> their're coming where I gathered some PHP files.
> Indeed, some mirrored websites uses some CMS like Spip :
>
> 1) The url has been modified. Normally, it would be spip.php?article... it
> became spip.php@article...
> 2) Greenstone can't identify the filetype because of the text after, so
> the
> plugin UnknownPlug indexes the document.
>
> Is there any solution to configure Greenstone to index correctly that kind
> of file, in order to get all my it