[greenstone-users] Large dynamic collection in Greenstone 2? A fewquestions....

From Sharyn Wise
DateFri Dec 24 15:41:18 2010
Subject [greenstone-users] Large dynamic collection in Greenstone 2? A fewquestions....
In-Reply-To (FFC62F89E22F419C9EB4910E50E4B397-DSC46)
Thanks again for Diego for this helpful information. It's very much
appreciated.
Sharyn

On Thu, Dec 23, 2010 at 5:19 AM, Diego Spano <dspano@anac.gov.ar> wrote:

> Sharyn,
>
> with version 2.83 when you move from windows to linux you have to rebuild
> the collection because the internal database stores "/" and "" depending on
> where do you built the index. But the new version (2.84) will fix that
> problem so you can copy from one to the other and the collection will be
> ready for new documents to be added.
>
> About depositor. I f you take a look to macro deposit.dm, it has "steps"
> you can define. Each step is associated with a script. I think that you can
> modify these scripts as you want.
>
> Read this:
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.8060&rep=rep1&type=pdf
>
> It is a document explaining depositor and how it works.
>
> Hope this helps.
>
> Diego Spano
>
> ------------------------------
> *De:* sharyn.wise@gmail.com [mailto:sharyn.wise@gmail.com] *En nombre de *Sharyn
> Wise
> *Enviado el:* mi□rcoles, 15 de diciembre de 2010 17:30
> *Para:* Cathy Chang
> *CC:* dspano@anac.gov.ar; greenstone-users@list.scms.waikato.ac.nz
> *Asunto:* Re: [greenstone-users] Large dynamic collection in Greenstone 2?
> A fewquestions....
>
> Hi Diego and Cathy
>
> Diego: Thanks so very much for sharing your valuable experience. Would it
> be a viable option to build the DL on a windows server and port it to Linux
> if performance looks to be becoming an issue?
>
> Also can anyone advise if the depositor page can be modified to include
> client side field validation?
>
> Cathy: I had the same problem initially (using greenstone 2.83). You don't
> say which Greenstone or Windows versions you are using, but on the page
> "2.83 release notes" in the wiki, under "known issues and patches" there is
> an alternative library.cgi you can download to prevent this problem
> happening under Windows XP (here's the link, but it may be scrubbed:
> 2.83-windows-library.cgi<http://wiki.greenstone.org/wiki/gsdoc/patches/2.83-windows-library.cgi>).
> I'm actually running Windows 7, but this file fixed the problem anyway, so
> it may be worth a try.. back up first and you've nothing to lose!
>
> cheers
> Sharyn
>
> On Wed, Dec 15, 2010 at 12:59 PM, Cathy Chang <Cathy.Chang@wintec.ac.nz>wrote:
>
>> Hi, Diego!
>>
>>
>>
>> Your experience using depositor is really helpful. I am working on
>> configuring Depositor on Window. After enabling Depositor, I chose one
>> collection to deposit. The error message comes out as follows: Do I got
>> wrong step to configure it? I cannot figure it out. Any advice on
>> configuration? Thanks!
>>
>>
>>
>> *?Internal Server Error*
>>
>> The server encountered an internal error or misconfiguration and was
>> unable to complete your request.
>>
>> Please contact the server administrator, admin@example.com and inform
>> them of the time the error occurred, and anything you might have done that
>> may have caused the error.
>>
>> More information about this error may be available in the server error
>> log.?
>>
>> Cheers,
>>
>> Cathy
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* greenstone-users-bounces@list.scms.waikato.ac..nz<greenstone-users-bounces@list.scms.waikato.ac.nz>[mailto:
>> greenstone-users-bounces@list.scms.waikato.ac.nz] *On Behalf Of *Diego
>> Spano
>> *Sent:* Wednesday, 15 December 2010 9:21 a.m.
>> *To:* sharynwise@optushome..com.au <sharynwise@optushome.com.au>;
>> greenstone-users@list.scms.waikato.ac.nz
>> *Subject:* RE: [greenstone-users] Large dynamic collection in Greenstone
>> 2? A fewquestions....
>>
>>
>>
>> Sharyn, I wil tell you my experience. GS is totally viable for that
>> project.
>>
>>
>>
>> "The project is ongoing, so it is a requirement that new documents/audio
>> files can be added dynamically at any time. I was thinking of using the
>> Depositor for that functionality and possibly scheduling full builds
>> nightly? "
>>
>>
>>
>> Sure. Depositor is a good tool to let user archive their files. You must
>> configure Depositor to not build the collection on-line, so you can set cron
>> tasks to do the import/build at nights.
>>
>>
>>
>>
>>
>> "One problem that occurs to me is that the even the minimal rebuild
>> triggered by the Depositor might become very time-consuming as the
>> collection grows (is there a way of calculating this?). Secondly, will
>> Greenstone scale to handle large numbers of largish audio files (ie approx.
>> an hour's recording per file, totalling possibly hundreds of hours) as well
>> as hundreds (possibly thousands) of documents? Finally can the depositor
>> interface be modified to allow javascript validation of input metadata and
>> if so how?"
>>
>>
>>
>> You must use Lucene as the indexer engine. Lucene allows to do incremental
>> building so only new documents will be added to an existing index. About
>> large audio file: it is not a problem. From Audio files Greenstone will not
>> extract any text, so the time it takes to import an audio file is the time
>> the operating system takes to copy that file from import folder to archive
>> folder. This kind of files are symply managed as file system objects, there
>> is no conversion process. I think that it takes more time to process a pdf
>> (it is converted to html to extract the text) than an audio file. I have a
>> collection with more than 13.000 pdfs and every night a few ones are added
>> without problems. I also have another collection with documents composed by
>> tiff images and txt files from OCR. Now the collection has more than 700.000
>> images with a full text index.
>>
>>
>>
>> Some recomendations:
>>
>>
>>
>> - Use Linux, not Windows. File system management is better.
>>
>> - You can separate import, archive and index folder in different disks, so
>> you can have a better performance.
>>
>> - You can also modify some of the GS processes to avoid copying the files
>> from /archives to /index/assoc. You can create a link from index that points
>> to /archives, and with this modification you will have only one copy of the
>> original file and you will also reduce processing time because you don□t
>> hace to copy those big files again.
>>
>>
>>
>> Hope this helps.
>>
>>
>>
>> Diego
>>
>>
>>
>> Diego Spano
>> Prodigio Consultores
>> Capital Federal - Argentina
>> Tel: (54 11) 5093-5313
>> http://ar.linkedin.com/in/diegospano
>> www.prodigioconsultores.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> *De:* greenstone-users-bounces@list.scms.waikato.ac.nz<greenstone-users-bounces@list.scms.waikato..ac.nz>[mailto:
>> greenstone-users-bounces@list.scms.waikato.ac.nz<greenstone-users-bounces@list.scms.waikato.ac..nz>]
>> *En nombre de *Sharyn Wise
>> *Enviado el:* martes, 14 de diciembre de 2010 15:30
>> *Para:* greenstone-users@list.scms.waikato.ac.nz
>> *Asunto:* [greenstone-users] Large dynamic collection in Greenstone 2? A
>> fewquestions....
>>
>> Hi all
>>
>> I'm looking at whether Greenstone 2 is a viable and scalable solution for
>> building a large dynamic collection.. The collection will be composed of
>> data from a three year research project - primarily documents (pdf, doc,
>> email text) and audio files.
>>
>> The project is ongoing, so it is a requirement that new documents/audio
>> files can be added dynamically at any time. I was thinking of using the
>> Depositor for that functionality and possibly scheduling full builds
>> nightly?
>>
>> One problem that occurs to me is that the even the minimal rebuild
>> triggered by the Depositor might become very time-consuming as the
>> collection grows (is there a way of calculating this?). Secondly, will
>> Greenstone scale to handle large numbers of largish audio files (ie approx.
>> an hour's recording per file, totalling possibly hundreds of hours) as well
>> as hundreds (possibly thousands) of documents? Finally can the depositor
>> interface be modified to allow javascript validation of input metadata and
>> if so how?
>>
>> I'd be very interested to hear the community's and Greenstone team's
>> thoughts on these questions, any other potential problems you foresee, and
>> any recommendations. Thanks in advance!
>> cheers
>> Sharyn
>>
>> _______________________________________________
>> greenstone-users mailing list
>> greenstone-users@list.scms.waikato.ac.nz
>> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>>
>>
>
> ------------------------------
> ------------------------------
>
> No se encontraron virus en este mensaje.
> Comprobado por AVG - www.avg.com
> Versi□n: 10.0.1170 / Base de datos de virus: 426/3315 - Fecha de
> publicaci□n: 12/14/10
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20101224/84059af5/attachment-0001.html