Re: [greenstone-users] Importing CDS/ISIS dbfailure...arcinfo::save_info couldn't write

From Michael Dewsnip
DateWed, 01 Mar 2006 15:57:04 +1300
Subject Re: [greenstone-users] Importing CDS/ISIS dbfailure...arcinfo::save_info couldn't write
In-Reply-To (43F5538B-3010508-cs-waikato-ac-nz)
Hi Ruben,

OK, we think we've fixed this. You can download a new version of
RecPlug.pm from
http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/temp-2.63/RecPlug.pm
(overwrite the existing RecPlug.pm in Greenstone's "perllib/plugins"
directory), then rebuild the collection.

Regards,

Michael

Katherine Don wrote:

>Hi Ruben
>
>This appears to be a bug in Greenstone. The metadata.xml files end up
>encoded in UTF-8, but when the metadata names get to the archive doc.xml
>files, they are no longer in UTF-8, and hence the XML parse error.
>
>We'll try and have a look at this next week, but Michael and I may both
>be away, so it may be the week after.
>
>Cheers,
>Katherine
>
>ruben pandolfi wrote:
>
>
>>Thank you Michael, Thank you guys!
>>
>>now I can import the DB :-) corretcly, and setting dos-850 does shows
>>the correct charset, great.
>>
>>I have imported correclty, and want to explode the .mst to be able to
>>use gsdl to add/edit metadata, and associate full text docs when
>>available to the relevant record.
>>
>>Unfortunately I have the same encoding error, peraphs there is another
>>fix for this?
>>
>>
>>import.pl> NULPlug processing
>>"/var/www/gsdl/collect/babel/import/EM20/0019.nul"
>>import.pl> NULPlug processing
>>"/var/www/gsdl/collect/babel/import/EM20/0020.nul"
>>import.pl> *********************************************
>>import.pl> Import complete
>>import.pl> *********************************************
>>import.pl> * 20 documents were considered for processing
>>import.pl> * 20 were processed and included in the collection
>>import.pl> Command complete.
>>import.pl> Extracting new metadata from archive files.
>>import.pl> Archived metadata extraction complete.
>>Command: /var/www/gsdl/bin/script/buildcol.pl -gli -language en
>>-collectdir /var/www/gsdl/collect/ -removeold babel
>>buildcol.pl> *** creating the compressed text
>>buildcol.pl> collecting text statistics
>>buildcol.pl> ArcPlug: processing
>>/var/www/gsdl/collect/babel/archives/archives.inf
>>buildcol.pl> GAPlug: processing HASHedda.dir/doc.xml
>>buildcol.pl> **** Error is:
>>buildcol.pl> not well-formed (invalid token) at line 12, column 23, byte
>>509 at /usr/lib/perl5/XML/Parser.pm line 187
>>buildcol.pl> WARNING: No plugin could process HASHedda.dir/doc.xml
>>buildcol.pl> GAPlug: processing HASH01c2.dir/doc.xml
>>buildcol.pl> **** Error is:
>>
>>
>>
>>
>>and finally
>>
>>
>>
>>buildcol.pl> WARNING: No plugin could process HASH73fe.dir/doc.xml
>>buildcol.pl> *** creating auxiliary files
>>buildcol.pl> arcinfo::save_info couldn't write
>>/var/www/gsdl/collect/babel/archives/HASH73fe.dir/doc.xml/archives.inf
>>buildcol.pl> Command failed.
>>
>>
>>
>>
>>thank you again
>>
>>Ruben
>>
>>
>>
>>
>>Michael Dewsnip wrote:
>>
>>
>>
>>>Hi Ruben,
>>>
>>>It turns out your problem is caused by a bug in ISISPlug -- obviously
>>>you're the first person to try it on a database with non-ASCII
>>>characters in the field names! (The .fdt file wasn't being read using
>>>the encoding provided).
>>>
>>>I've fixed this; you can download a new version of ISISPlug.pm from
>>>http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/temp-2.63/ISISPlug.pm
>>>(this should overwrite your existing ISISPlug.pm file in Greenstone's
>>>"perllib/plugins" directory).
>>>
>>>Regards,
>>>
>>>Michael
>>>
>>>PS Your database seems to be a bit inconsistent: it contains data for
>>>tags that are not defined in the .fdt file. For example, the .mst file
>>>seems to have two Date tags: 45 and 50, but only 50 is defined in the
>>>.fdt file.
>>>
>>>
>>>
>>>ruben pandolfi wrote:
>>>
>>>
>>>
>>>
>>>>Hi,
>>>>
>>>>John R. McPherson wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>Normally, a "not well-formed" error in the XML Parser means that a
>>>>>source file has badly encoded data, and the plugin has not detected
>>>>>
>>>>>
>>>>this
>>>>
>>>>
>>>>
>>>>>and has made a non-utf8 archive .xml file. It might also mean that the
>>>>>plugin has used or passed in an invalid xml tag.
>>>>>
>>>>>
>>>>
>>>>yes, I can see there is an encoding problem.
>>>>
>>>>Anyway , I have set GAPplug ArcPlug RecPlug and isisPlug to dos 850
>>>>
>>>>(I'm 50 % sure this is the correct code , altough I thought it was
>>>>called ibm 850 )
>>>>
>>>>It contains italian, french and portuguese characters.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Most of the plugins are careful enough to convert any wrongly encoded
>>>>>metadata/text into the correct encoding, so perhaps the ISIS plugin
>>>>>doesn't. Are you able to make your input documents available for
>>>>>
>>>>>
>>>>testing?
>>>>
>>>>
>>>>
>>>>>That might be the quickest way for a developer to work out where the
>>>>>problem is.
>>>>>
>>>>>
>>>>
>>>>if someone have time and want to check ;-) , you can temporarly
>>>>download the complete db isis files here:
>>>>
>>>>http://www.evk2cnr.org/ruben/Babel809.zip
>>>>
>>>>
>>>>thank you for your help!
>>>>
>>>>ruben
>>>>
>>>>John R. McPherson wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>On Sat, Feb 11, 2006 at 02:54:15PM +0100, ruben pandolfi wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Jonathan Gorman wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Check "How do I fix XML::Parser errors during import.pl?" in the FAQ.
>>>>>>>
>>>>>>>Jon Gorman
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>Thank you Jon,
>>>>>>
>>>>>>I do not think the error is due to perl.
>>>>>>
>>>>>>Infact I only have warnings from perl:
>>>>>>
>>>>>>buildcol.pl> not well-formed (invalid token) at line 31, column 34,
>>>>>>byte 1572 at /usr/lib/perl5/XML/Parser.pm line 187
>>>>>>buildcol.pl> WARNING: No plugin could process
>>>>>>HASH7bca/b456434f/1d719200/0bs809.dir/doc.xml
>>>>>>
>>>>>>
>>>>>>
>>>
>>>
>
>_______________________________________________
>greenstone-users mailing list
>greenstone-users@list.scms.waikato.ac.nz
>https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>
>
>