Re: [greenstone-users] Greenstone 2.5: Images are not seen after importing from MS Word files

From Katherine Don
DateTue, 29 Jun 2004 10:45:53 +1200
Subject Re: [greenstone-users] Greenstone 2.5: Images are not seen after importing from MS Word files
In-Reply-To (BAY15-F7ft9RQBUenEC000031a2-hotmail-com)
Hi Raitis

I have had a look at your test files.
There are two different problems.
Firstly, on Linux/Win XP the documents with embedded images work fine.
The HTML files are generated in the tmp directory and the images are
copied there too.
However, the document with a link to an image doesn't work, because the
conversion script doesn't copy over the image: the link stays like
imagesb.jpg, and there is no imagesb.jpg in the tmp directory.

The second problem is there seems to be a bug when you run under win98 -
here you get the double paths like
C:Program Filesgsdlcollectdocfiles mpC:Program
Filesgsdlcollectdocfiles mpD10.jpg
Newer Windows don't seem to have this problem.

I don't have time to look into this now - it will have to go onto our
TODO list.
If possible, I suggest that you use documents with embedded images
rather than links, and use a newer version of Windows.

Sorry I couldn't be of more help
Regards,
Katherine Don

Raitis Brodezhonok wrote:
> Hello!
>
> I have tried to create test collection from Word Doc files with pictures
> inside.
> There are hyperlink to JPG files and created pictures in Word doc. from
> the JPG files.
> All source files are located in the same directory "Bildes".
> I use Win98 and Word 97.
>
> The using GLI I have created "docfiles" collection:
> 1) just Drag-Drop directory "Berni" from left pane to the right.
> 2) just simple Enrich
> 3) Nothing changed in Design
> 4) In create tab just selected verbosity=3
>
> Result:
> The HTML files are created, but images are not seen, because they are
> not exist in docfiles/index/assoc
> directory.
>
> What is wrong? See my marked comments (**Comment) in the log file below.
>
> Best regards,
> Raitis.
> ===================================
>
> s
> Command: C:PROGRA~1GSDLBINWINDOWSPERLBINPerl.exe -S C:Program
> Filesgsdlbinscriptimport.pl -gli -language en -importdir C:Program
> Filesgsdlcollectdocfilesimport docfiles -removeold -verbosity 3
> import.pl> Removing current contents of the archives directory...
> import.pl> RecPlug: getting directory C:Program
> Filesgsdlcollectdocfilesimport
> import.pl> RecPlug: preparing metadata for Bildes
> import.pl> RecPlug recurring: Bildes
> import.pl> RecPlug: getting directory C:Program
> FilesgsdlcollectdocfilesimportBildes
> import.pl> RecPlug: found metadata in C:Program
> FilesgsdlcollectdocfilesimportBildesmetadata.xml
> import.pl> RecPlug: preparing metadata for a1.jpg
> import.pl> File "a1.jpg" matches filespec ".*"
> import.pl> RecPlug recurring: a1.jpg
> import.pl> RecPlug: preparing metadata for a2.jpg
> import.pl> File "a2.jpg" matches filespec ".*"
> import.pl> RecPlug recurring: a2.jpg
> import.pl> RecPlug: preparing metadata for a3.jpg
> import.pl> File "a3.jpg" matches filespec ".*"
> import.pl> RecPlug recurring: a3.jpg
> import.pl> RecPlug: preparing metadata for a4.jpg
> import.pl> File "a4.jpg" matches filespec ".*"
> import.pl> RecPlug recurring: a4.jpg
> import.pl> RecPlug: preparing metadata for D1.doc
> import.pl> File "D1.doc" matches filespec ".*"
> import.pl> File "D1.doc" matches filespec "D1.doc"
> import.pl> RecPlug recurring: D1.doc
> import.pl> Converting D1.doc to HTML format
> import.pl> I won't mmap that file, using a slower method
> import.pl> Diagnostic: (./wvWare.c:1276) picture 0x01 here, at offset 0
> in Data Stream, obj is 0, ole is 0
> import.pl>
> import.pl> WordPlug: WARNING: C:Program
> Filesgsdlcollectdocfiles mpD1.html was read using utf8 encoding but
> appears to be encoded as iso_8859_1.
> import.pl> WordPlug: passing BildesD1.doc on to HTMLPlug
> import.pl> HTMLPlug: processing D1.html
> import.pl> extracted "Title" metadata "Test fails D1"
> import.pl> extracted "GENERATOR" metadata "wvWare/wvWare version 0.7.1"
> import.pl> docsave::process couldn't copy the associated file C:Program
> Filesgsdlcollectdocfiles mpC:Program
> Filesgsdlcollectdocfiles mpD10.jpg to 0.jpg
> **Comment **********************************
> 1) why does path to assciated file is Double path?
> ************************************
> import.pl> RecPlug: preparing metadata for D2.doc
> import.pl> File "D2.doc" matches filespec ".*"
> import.pl> File "D2.doc" matches filespec "D2.doc"
> import.pl> RecPlug recurring: D2.doc
> import.pl> Converting D2.doc to HTML format
> import.pl> I won't mmap that file, using a slower method
> import.pl> Diagnostic: (./wvWare.c:1225) field began
> import.pl>
> import.pl> Diagnostic: (./field.c:340) command HYPERLINK "a2.jpg" ,
> ret is 0
> import.pl>
> import.pl> WordPlug: WARNING: C:Program
> Filesgsdlcollectdocfiles mpD2.html was read using utf8 encoding but
> appears to be encoded as iso_8859_1.
> import.pl> WordPlug: passing BildesD2.doc on to HTMLPlug
> import.pl> HTMLPlug: processing D2.html
> import.pl> extracted "Title" metadata "D2 testa fails"
> import.pl> extracted "GENERATOR" metadata "wvWare/wvWare version 0.7.1"
> import.pl> docsave::process couldn't copy the associated file C:Program
> Filesgsdlcollectdocfiles mpa2.jpg to 0.jpg
> **Comment *****************************************
> Here is prepared the hiperlink to jpg file.
> The Word files are in the same directory as jpg files :
> "C:Program Filesgsdlcollect estcollimportBildes"
>
> The file a2.jpg is not generated in tmp directory. (?)
> ******************************************
> import.pl> RecPlug: preparing metadata for D3.doc
> import.pl> File "D3.doc" matches filespec ".*"
> import.pl> File "D3.doc" matches filespec "D3.doc"
> import.pl> RecPlug recurring: D3.doc
> import.pl> Converting D3.doc to HTML format
> import.pl> I won't mmap that file, using a slower method
> import.pl> Diagnostic: (./wvWare.c:1276) picture 0x01 here, at offset 0
> in Data Stream, obj is 0, ole is 0
> import.pl>
> import.pl> WordPlug: WARNING: C:Program
> Filesgsdlcollectdocfiles mpD3.html was read using utf8 encoding but
> appears to be encoded as iso_8859_2.
> import.pl> WordPlug: passing BildesD3.doc on to HTMLPlug
> import.pl> HTMLPlug: processing D3.html
> import.pl> extracted "Title" metadata "D3 testa fails"
> import.pl> extracted "GENERATOR" metadata "wvWare/wvWare version 0.7.1"
> import.pl> docsave::process couldn't copy the associated file C:Program
> Filesgsdlcollectdocfiles mpC:Program
> Filesgsdlcollectdocfiles mpD30.jpg to 0.jpg
> import.pl> RecPlug: preparing metadata for D4.doc
> import.pl> File "D4.doc" matches filespec ".*"
> import.pl> File "D4.doc" matches filespec "D4.doc"
> import.pl> RecPlug recurring: D4.doc
> import.pl> Converting D4.doc to HTML format
> import.pl> I won't mmap that file, using a slower method
> import.pl> WordPlug: WARNING: C:Program
> Filesgsdlcollectdocfiles mpD4.html was read using utf8 encoding but
> appears to be encoded as iso_8859_2.
> import.pl> WordPlug: passing BildesD4.doc on to HTMLPlug
> import.pl> HTMLPlug: processing D4.html
> import.pl> extracted "Title" metadata "D4 testa fails"
> import.pl> extracted "GENERATOR" metadata "wvWare/wvWare version 0.7.1"
> import.pl> *********************************************
> import.pl> Import complete
> import.pl> *********************************************
> import.pl> * 4 documents were considered for processing
> import.pl> * 4 were processed and included in the collection
> import.pl> Command complete.
> import.pl> Extracting new metadata from archive files.
> import.pl> Extracted 9 pieces of metadata for HASH013a.dir.
> import.pl> Extracted 9 pieces of metadata for HASH52e6.dir.
> import.pl> Extracted 9 pieces of metadata for HASH01f4.dir.
> import.pl> Extracted 9 pieces of metadata for HASH0146.dir.
> import.pl> Archived metadata extraction complete.
> Command: C:PROGRA~1GSDLBINWINDOWSPERLBINPerl.exe -S C:Program
> Filesgsdlbinscriptbuildcol.pl -gli -language en docfiles -verbosity 3
> buildcol.pl> *** creating the compressed text
> buildcol.pl> collecting text statistics
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Compressing text from section:text)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in section:text: 5984
> buildcol.pl> creating the compression dictionary
> buildcol.pl> compressing the text
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Compressing text from section:text)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in section:text: 5984
> buildcol.pl> *** building index document:text in subdirectory dtx
> buildcol.pl> creating index dictionary
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:text)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:text: 5984
> buildcol.pl> inverting the text
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:text)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:text: 5984
> buildcol.pl> ivf.pass2 : M
> buildcol.pl> create the weights file
> buildcol.pl> .
> buildcol.pl> L = 2.954812
> buildcol.pl> U = 5.046188
> buildcol.pl> B = 1.002093
> buildcol.pl> creating 'on-disk' stemmed dictionary
> buildcol.pl> creating stem indexes
> buildcol.pl> deleting docfiles.trc
> buildcol.pl> deleting docfiles.ic
> buildcol.pl> deleting docfiles.id
> buildcol.pl> deleting docfiles.ict
> buildcol.pl> deleting docfiles.idh
> buildcol.pl> deleting docfiles.ii
> buildcol.pl> deleting docfiles.invf.state.-495985
> buildcol.pl> deleting docfiles.chunk.state.-495985
> buildcol.pl> deleting docfiles.chunks.-495985
> buildcol.pl> deleting docfiles.w
> buildcol.pl> deleting docfiles.tmp
> buildcol.pl> *** building index document:Title in subdirectory dtt
> buildcol.pl> creating index dictionary
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:Title)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:Title: 55
> buildcol.pl> inverting the text
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:Title)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:Title: 55
> buildcol.pl> ivf.pass2 : M
> buildcol.pl> create the weights file
> buildcol.pl> .
> buildcol.pl> L = 1.415829
> buildcol.pl> U = 1.960516
> buildcol.pl> B = 1.001272
> buildcol.pl> creating 'on-disk' stemmed dictionary
> buildcol.pl> creating stem indexes
> buildcol.pl> deleting docfiles.trc
> buildcol.pl> deleting docfiles.ic
> buildcol.pl> deleting docfiles.id
> buildcol.pl> deleting docfiles.ict
> buildcol.pl> deleting docfiles.idh
> buildcol.pl> deleting docfiles.ii
> buildcol.pl> deleting docfiles.invf.state.-519057
> buildcol.pl> deleting docfiles.chunk.state.-519057
> buildcol.pl> deleting docfiles.chunks.-519057
> buildcol.pl> deleting docfiles.w
> buildcol.pl> deleting docfiles.tmp
> buildcol.pl> *** building index document:Source in subdirectory dsr
> buildcol.pl> creating index dictionary
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:Source)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:Source: 24
> buildcol.pl> ***************
> buildcol.pl> WARNING: There is very little or no text to process for
> document:Source
> buildcol.pl> Was this your intention?
> buildcol.pl> ***************
> buildcol.pl> inverting the text
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:Source)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:Source: 24
> buildcol.pl> ***************
> buildcol.pl> WARNING: There is very little or no text to process for
> document:Source
> buildcol.pl> Was this your intention?
> buildcol.pl> ***************
> buildcol.pl> ivf.pass2 : M
> buildcol.pl> create the weights file
> buildcol.pl> .
> buildcol.pl> L = 1.386294
> buildcol.pl> U = 1.386294
> buildcol.pl> B = 1.000000
> buildcol.pl> creating 'on-disk' stemmed dictionary
> buildcol.pl> creating stem indexes
> buildcol.pl> deleting docfiles.trc
> buildcol.pl> deleting docfiles.ic
> buildcol.pl> deleting docfiles.id
> buildcol.pl> deleting docfiles.ict
> buildcol.pl> deleting docfiles.idh
> buildcol.pl> deleting docfiles.ii
> buildcol.pl> deleting docfiles.invf.state.-518917
> buildcol.pl> deleting docfiles.chunk.state.-518917
> buildcol.pl> deleting docfiles.chunks.-518917
> buildcol.pl> deleting docfiles.w
> buildcol.pl> deleting docfiles.tmp
> buildcol.pl> *** creating the info database and processing associated files
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> *** creating auxiliary files
> buildcol.pl> Command complete.
>
> _________________________________________________________________
> Protect your PC - get McAfee.com VirusScan Online
> http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
> ---------------------------
>
> s
> Command: C:PROGRA~1GSDLBINWINDOWSPERLBINPerl.exe -S C:Program
> Filesgsdlbinscriptimport.pl -gli -language en -importdir C:Program
> Filesgsdlcollectdocfilesimport docfiles -removeold -verbosity 3
> import.pl> Removing current contents of the archives directory...
> import.pl> RecPlug: getting directory C:Program
> Filesgsdlcollectdocfilesimport
> import.pl> RecPlug: preparing metadata for Bildes
> import.pl> RecPlug recurring: Bildes
> import.pl> RecPlug: getting directory C:Program
> FilesgsdlcollectdocfilesimportBildes
> import.pl> RecPlug: found metadata in C:Program
> FilesgsdlcollectdocfilesimportBildesmetadata.xml
> import.pl> RecPlug: preparing metadata for a1.jpg
> import.pl> File "a1.jpg" matches filespec ".*"
> import.pl> RecPlug recurring: a1.jpg
> import.pl> RecPlug: preparing metadata for a2.jpg
> import.pl> File "a2.jpg" matches filespec ".*"
> import.pl> RecPlug recurring: a2.jpg
> import.pl> RecPlug: preparing metadata for a3.jpg
> import.pl> File "a3.jpg" matches filespec ".*"
> import.pl> RecPlug recurring: a3.jpg
> import.pl> RecPlug: preparing metadata for a4.jpg
> import.pl> File "a4.jpg" matches filespec ".*"
> import.pl> RecPlug recurring: a4.jpg
> import.pl> RecPlug: preparing metadata for D1.doc
> import.pl> File "D1.doc" matches filespec ".*"
> import.pl> File "D1.doc" matches filespec "D1.doc"
> import.pl> RecPlug recurring: D1.doc
> import.pl> Converting D1.doc to HTML format
> import.pl> I won't mmap that file, using a slower method
> import.pl> Diagnostic: (./wvWare.c:1276) picture 0x01 here, at offset 0
> in Data Stream, obj is 0, ole is 0
> import.pl>
> import.pl> WordPlug: WARNING: C:Program
> Filesgsdlcollectdocfiles mpD1.html was read using utf8 encoding but
> appears to be encoded as iso_8859_1.
> import.pl> WordPlug: passing BildesD1.doc on to HTMLPlug
> import.pl> HTMLPlug: processing D1.html
> import.pl> extracted "Title" metadata "Test fails D1"
> import.pl> extracted "GENERATOR" metadata "wvWare/wvWare version 0.7.1"
> import.pl> docsave::process couldn't copy the associated file C:Program
> Filesgsdlcollectdocfiles mpC:Program
> Filesgsdlcollectdocfiles mpD10.jpg to 0.jpg
> **Comment **********************************
> 1) why does path to assciated file is Double path?
> ************************************
> import.pl> RecPlug: preparing metadata for D2.doc
> import.pl> File "D2.doc" matches filespec ".*"
> import.pl> File "D2.doc" matches filespec "D2.doc"
> import.pl> RecPlug recurring: D2.doc
> import.pl> Converting D2.doc to HTML format
> import.pl> I won't mmap that file, using a slower method
> import.pl> Diagnostic: (./wvWare.c:1225) field began
> import.pl>
> import.pl> Diagnostic: (./field.c:340) command HYPERLINK "a2.jpg" ,
> ret is 0
> import.pl>
> import.pl> WordPlug: WARNING: C:Program
> Filesgsdlcollectdocfiles mpD2.html was read using utf8 encoding but
> appears to be encoded as iso_8859_1.
> import.pl> WordPlug: passing BildesD2.doc on to HTMLPlug
> import.pl> HTMLPlug: processing D2.html
> import.pl> extracted "Title" metadata "D2 testa fails"
> import.pl> extracted "GENERATOR" metadata "wvWare/wvWare version 0.7.1"
> import.pl> docsave::process couldn't copy the associated file C:Program
> Filesgsdlcollectdocfiles mpa2.jpg to 0.jpg
> **Comment *****************************************
> Here is prepared the hiperlink to jpg file.
> The Word files are in the same directory as jpg files :
> "C:Program Filesgsdlcollect estcollimportBildes"
>
> The file a2.jpg is not generated in tmp directory. (?)
> ******************************************
> import.pl> RecPlug: preparing metadata for D3.doc
> import.pl> File "D3.doc" matches filespec ".*"
> import.pl> File "D3.doc" matches filespec "D3.doc"
> import.pl> RecPlug recurring: D3.doc
> import.pl> Converting D3.doc to HTML format
> import.pl> I won't mmap that file, using a slower method
> import.pl> Diagnostic: (./wvWare.c:1276) picture 0x01 here, at offset 0
> in Data Stream, obj is 0, ole is 0
> import.pl>
> import.pl> WordPlug: WARNING: C:Program
> Filesgsdlcollectdocfiles mpD3.html was read using utf8 encoding but
> appears to be encoded as iso_8859_2.
> import.pl> WordPlug: passing BildesD3.doc on to HTMLPlug
> import.pl> HTMLPlug: processing D3.html
> import.pl> extracted "Title" metadata "D3 testa fails"
> import.pl> extracted "GENERATOR" metadata "wvWare/wvWare version 0.7.1"
> import.pl> docsave::process couldn't copy the associated file C:Program
> Filesgsdlcollectdocfiles mpC:Program
> Filesgsdlcollectdocfiles mpD30.jpg to 0.jpg
> import.pl> RecPlug: preparing metadata for D4.doc
> import.pl> File "D4.doc" matches filespec ".*"
> import.pl> File "D4.doc" matches filespec "D4.doc"
> import.pl> RecPlug recurring: D4.doc
> import.pl> Converting D4.doc to HTML format
> import.pl> I won't mmap that file, using a slower method
> import.pl> WordPlug: WARNING: C:Program
> Filesgsdlcollectdocfiles mpD4.html was read using utf8 encoding but
> appears to be encoded as iso_8859_2.
> import.pl> WordPlug: passing BildesD4.doc on to HTMLPlug
> import.pl> HTMLPlug: processing D4.html
> import.pl> extracted "Title" metadata "D4 testa fails"
> import.pl> extracted "GENERATOR" metadata "wvWare/wvWare version 0.7.1"
> import.pl> *********************************************
> import.pl> Import complete
> import.pl> *********************************************
> import.pl> * 4 documents were considered for processing
> import.pl> * 4 were processed and included in the collection
> import.pl> Command complete.
> import.pl> Extracting new metadata from archive files.
> import.pl> Extracted 9 pieces of metadata for HASH013a.dir.
> import.pl> Extracted 9 pieces of metadata for HASH52e6.dir.
> import.pl> Extracted 9 pieces of metadata for HASH01f4.dir.
> import.pl> Extracted 9 pieces of metadata for HASH0146.dir.
> import.pl> Archived metadata extraction complete.
> Command: C:PROGRA~1GSDLBINWINDOWSPERLBINPerl.exe -S C:Program
> Filesgsdlbinscriptbuildcol.pl -gli -language en docfiles -verbosity 3
> buildcol.pl> *** creating the compressed text
> buildcol.pl> collecting text statistics
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Compressing text from section:text)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in section:text: 5984
> buildcol.pl> creating the compression dictionary
> buildcol.pl> compressing the text
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Compressing text from section:text)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in section:text: 5984
> buildcol.pl> *** building index document:text in subdirectory dtx
> buildcol.pl> creating index dictionary
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:text)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:text: 5984
> buildcol.pl> inverting the text
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:text)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:text: 5984
> buildcol.pl> ivf.pass2 : M
> buildcol.pl> create the weights file
> buildcol.pl> .
> buildcol.pl> L = 2.954812
> buildcol.pl> U = 5.046188
> buildcol.pl> B = 1.002093
> buildcol.pl> creating 'on-disk' stemmed dictionary
> buildcol.pl> creating stem indexes
> buildcol.pl> deleting docfiles.trc
> buildcol.pl> deleting docfiles.ic
> buildcol.pl> deleting docfiles.id
> buildcol.pl> deleting docfiles.ict
> buildcol.pl> deleting docfiles.idh
> buildcol.pl> deleting docfiles.ii
> buildcol.pl> deleting docfiles.invf.state.-495985
> buildcol.pl> deleting docfiles.chunk.state.-495985
> buildcol.pl> deleting docfiles.chunks.-495985
> buildcol.pl> deleting docfiles.w
> buildcol.pl> deleting docfiles.tmp
> buildcol.pl> *** building index document:Title in subdirectory dtt
> buildcol.pl> creating index dictionary
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:Title)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:Title: 55
> buildcol.pl> inverting the text
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:Title)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:Title: 55
> buildcol.pl> ivf.pass2 : M
> buildcol.pl> create the weights file
> buildcol.pl> .
> buildcol.pl> L = 1.415829
> buildcol.pl> U = 1.960516
> buildcol.pl> B = 1.001272
> buildcol.pl> creating 'on-disk' stemmed dictionary
> buildcol.pl> creating stem indexes
> buildcol.pl> deleting docfiles.trc
> buildcol.pl> deleting docfiles.ic
> buildcol.pl> deleting docfiles.id
> buildcol.pl> deleting docfiles.ict
> buildcol.pl> deleting docfiles.idh
> buildcol.pl> deleting docfiles.ii
> buildcol.pl> deleting docfiles.invf.state.-519057
> buildcol.pl> deleting docfiles.chunk.state.-519057
> buildcol.pl> deleting docfiles.chunks.-519057
> buildcol.pl> deleting docfiles.w
> buildcol.pl> deleting docfiles.tmp
> buildcol.pl> *** building index document:Source in subdirectory dsr
> buildcol.pl> creating index dictionary
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:Source)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:Source: 24
> buildcol.pl> ***************
> buildcol.pl> WARNING: There is very little or no text to process for
> document:Source
> buildcol.pl> Was this your intention?
> buildcol.pl> ***************
> buildcol.pl> inverting the text
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> Stats (Creating index document:Source)
> buildcol.pl> Total bytes in collection: 5984
> buildcol.pl> Total bytes in document:Source: 24
> buildcol.pl> ***************
> buildcol.pl> WARNING: There is very little or no text to process for
> document:Source
> buildcol.pl> Was this your intention?
> buildcol.pl> ***************
> buildcol.pl> ivf.pass2 : M
> buildcol.pl> create the weights file
> buildcol.pl> .
> buildcol.pl> L = 1.386294
> buildcol.pl> U = 1.386294
> buildcol.pl> B = 1.000000
> buildcol.pl> creating 'on-disk' stemmed dictionary
> buildcol.pl> creating stem indexes
> buildcol.pl> deleting docfiles.trc
> buildcol.pl> deleting docfiles.ic
> buildcol.pl> deleting docfiles.id
> buildcol.pl> deleting docfiles.ict
> buildcol.pl> deleting docfiles.idh
> buildcol.pl> deleting docfiles.ii
> buildcol.pl> deleting docfiles.invf.state.-518917
> buildcol.pl> deleting docfiles.chunk.state.-518917
> buildcol.pl> deleting docfiles.chunks.-518917
> buildcol.pl> deleting docfiles.w
> buildcol.pl> deleting docfiles.tmp
> buildcol.pl> *** creating the info database and processing associated files
> buildcol.pl> ArcPlug: processing C:Program
> Filesgsdlcollectdocfilesarchivesarchives.inf
> buildcol.pl> GAPLug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH52e6.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH01f4.dirdoc.xml
> buildcol.pl> GAPLug: processing HASH0146.dirdoc.xml
> buildcol.pl> *** creating auxiliary files
> buildcol.pl> Command complete.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users