Re: [greenstone-users] Arabic CDS/ISIS

From Michael Dewsnip
DateWed, 20 Dec 2006 13:11:18 +1300
Subject Re: [greenstone-users] Arabic CDS/ISIS
In-Reply-To (BAY121-F17ADF90F67B1635837F9AECFCA0-phx-gbl)
Dear Kamal,

Please try upgrading your Greenstone
"perllib/plugins/MetadataXMLPlug.pm" file with this version:
http://www.cs.waikato.ac.nz/~mdewsnip/greenstone/temp-2.72/MetadataXMLPlug.pm.
Does this fix the problem?

All the best,

Michael

kamal khalafala wrote:

>
> Dear Micheal,
> Thank you very much for fixing the core problem of ISISPlug,but the
> .FDT text require some works and you have solved this problem earlier
> (version 2.63).It looks
> just like this in Design-->Search index-->New Index area and ex. also.
> ------------------------------------------------------------------------------------------------------------------------
>
> FDT file
>
> -□□□□□□□^all □□□□□□□ □□□□□□^all
> □□□□□□□^all □□□□□□□□□□^all
> defaultindex □□□□□□□^all
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> isis doc.xml version 2.72 is ok
>
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <!DOCTYPE Archive SYSTEM
> "http://greenstone.org/dtd/Archive/1.0/Archive.dtd">
> <Archive>
> <Section>
> <Description>
> <Metadata name="gsdlsourcefilename">importUKARB.MST</Metadata>
> <Metadata name="gsdldoctype">indexed_doc</Metadata>
> <Metadata name="Language">ar</Metadata>
> <Metadata name="Encoding">windows_1256</Metadata>
> <Metadata name="Source">UKARB.MST</Metadata>
> <Metadata name="SourceSegment">4</Metadata>
> <Metadata name="Plugin">ISISPlug</Metadata>
> <Metadata name="□□□□□□□□□">□□□</Metadata>
> <Metadata name="□□□□□□□□□^all">□□□</Metadata>
> <Metadata name="□□□□□□□">□□□□□□ □□□□□□□□ □ □□□□□ □□□□□□□</Metadata>
> <Metadata name="□□□□□□□^all">□□□□□□ □□□□□□□□ □ □□□□□ □□□□□□□</Metadata>
> <Metadata name="□□□□□□(□□□)">□□□ □□□□□□ □□□□</Metadata>
> <Metadata name="□□□□□□(□□□)^all">□□□ □□□□□□ □□□□</Metadata>
> <Metadata name="□□□□□□(□□□□□)">□□□□ □□□□□□</Metadata>
> <Metadata name="□□□□□□(□□□□□)^all">□□□□ □□□□□□</Metadata>
> <Metadata name="□□□□□□□□□□□^*">□□□□□□□□□□□</Metadata>
> <Metadata name="□□□□□□□□□□□">□□□□□□□□□□□, □□□□□□ □□□□□□□□</Metadata>
> <Metadata name="□□□□□□□□□□□^all">□□□□□□□□□□□, □□□□□□ □□□□□□□□</Metadata>
> <Metadata name="□□□□□□□□□□">1996</Metadata>
> <Metadata name="□□□□□□□□□□^all">1996</Metadata>
> <Metadata name="□□□□□□□□□□□">659□.</Metadata>
> <Metadata name="□□□□□□□□□□□^all">659□.</Metadata>
> <Metadata name="□□□□□□□□^sub">□□□□□ □□□□□□□</Metadata>
> <Metadata name="□□□□□□□□^sub">□□□□□□ □□□□□□□□</Metadata>
> <Metadata name="□□□□□□□□">&amp;lt;□□□□□ □□□□□□□&amp;gt;&amp;lt;□□□□□□
> □□□□□□□□&amp;gt;</Metadata>
> <Metadata name="□□□□□□□□^all">&amp;lt;□□□□□
> □□□□□□□&amp;gt;&amp;lt;□□□□□□ □□□□□□□□&amp;gt;</Metadata>
> <Metadata name="□□□□□□□^sub">□□□□□□□</Metadata>
> <Metadata name="□□□□□□□">&amp;lt;□□□□□□□&amp;gt;</Metadata>
> <Metadata name="□□□□□□□^all">&amp;lt;□□□□□□□&amp;gt;</Metadata>
> <Metadata name="□□□□□□□□□□">106</Metadata>
> <Metadata name="□□□□□□□□□□^all">106</Metadata>
> <Metadata name="ISISRawRecord">tag=60 data=□□□
>
> tag=200 data=□□□□□□ □□□□□□□□ □ □□□□□ □□□□□□□
>
> tag=300 data=□□□ □□□□□□ □□□□
>
> tag=320 data=□□□□ □□□□□□
>
> tag=400 data=^□□□□□□□□□□□^□□□□□□ □□□□□□□□
>
> tag=440 data=1996
>
> tag=460 data=659□.
>
> tag=610 data=658.3
>
> tag=620 data=&amp;lt;□□□□□ □□□□□□□&amp;gt;&amp;lt;□□□□□□ □□□□□□□□&amp;gt;
>
> tag=640 data=&amp;lt;□□□□□□□&amp;gt;
>
> tag=801 data=106</Metadata>
> <Metadata name="FileFormat">CDS/ISIS</Metadata>
> <Metadata name="Identifier">HASH0151de88f18fe6002c100a5cs4</Metadata>
> <Metadata
> name="assocfilepath">HASH0151/de88f18f/e6002c10/0a5cs4.dir</Metadata>
> </Description>
> <Content>&lt;table cellpadding=&quot;4&quot;
> cellspacing=&quot;0&quot;&gt;&lt;tr&gt;&lt;td
> valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□
> □□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
> valign=top&gt;□□□&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
> valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
> valign=top&gt;□□□□□□ □□□□□□□□ □ □□□□□
> □□□□□□□&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
> valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□
> (□□□)&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td valign=top&gt;□□□ □□□□□□
> □□□□&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
> valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□
> (□□□□□)&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td valign=top&gt;□□□□
> □□□□□□&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
> valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□
> □□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
> valign=top&gt;□□□□□□□□□□□, □□□□□□
> □□□□□□□□&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
> valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□
> □□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
> valign=top&gt;1996&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
> valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□
> □□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
> valign=top&gt;659□.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
> valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
> valign=top&gt;&amp;lt;□□□□□ □□□□□□□&amp;gt;&amp;lt;□□□□□□
> □□□□□□□□&amp;gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
> valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
> valign=top&gt;&amp;lt;□□□□□□□&amp;gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
> valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□
> □□□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
> valign=top&gt;106&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</Content>
> </Section>
> </Archive>
> -------------------------------------------------------------------------------------------------
>
> Display [text] (ok).
>
> □□□□□ □□ □□□□
> □□□□□□□ □□□□□ □□□□ □□□□□□□□ □ □□□□□□□□□ : □□□□□□□ □□□□□□ □□□□ □□□□□□□□
> □ □□□□□□□□□ □ □□□□□ □□□□
> □□□□□□ (□□□) □□□□ □□□□ □□□ □□□□
> □□□□□□ (□□□□) □□□ □□□□□□□□ □ □□□□□□□□□ : □□□□□ □□□□
> □□□□□ □□□□□ 2005
> □□□□□□□□ □□□□□ □□□□□□□□ □□□□□□□□ □□□□□□
> □□□□□□□ □□□□ □□□□□□□□
> --------------------------------------------------------------
> But when you try to display record with a format including
> metadata(Title) the Arabic coding of the output text is not correct.
> ---------------------------------------------------------------------
>
>
> The output of the NULPlug
>
> log file index mgpp
> s
> □□□: C:Program FilesGreenstonebinwindowsperlbinPerl.exe -S
> C:Program FilesGreenstonebinscriptimport.pl -gli -language ar
> -collectdir C:Program FilesGreenstonecollect -removeold hhhh
> import.pl> □□□□□ □□□□□□□□□ □□□□□□□ □□ □□□□ □□□□□□□
> import.pl> RecPlug: getting directory C:Program
> FilesGreenstonecollecthhhhimport
> import.pl> RecPlug: getting directory C:Program
> FilesGreenstonecollecthhhhimportISA
> import.pl> MetadataXMLPlug: processing ISAmetadata.xml
> import.pl> NULPlug processing "C:Program
> FilesGreenstonecollecthhhhimportISA00000001.nul"
> import.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/doc.pm line 342.
> import.pl> Wide character in print at C:Program
> FilesGreenstoneperllibplugoutsGAPlugout.pm line 94.
> import.pl> NULPlug processing "C:Program
> FilesGreenstonecollecthhhhimportISA00000002.nul"
> import.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/doc.pm line 342.
> import.pl> Wide character in print at C:Program
> FilesGreenstoneperllibplugoutsGAPlugout.pm line 94.
> import.pl> NULPlug processing "C:Program
> FilesGreenstonecollecthhhhimportISA00000003.nul"
> import.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/doc.pm line 342.
> import.pl> Wide character in print at C:Program
> FilesGreenstoneperllibplugoutsGAPlugout.pm line 94.
> import.pl> NULPlug processing "C:Program
> FilesGreenstonecollecthhhhimportISA00000004.nul"
> import.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/doc.pm line 342.
> import.pl> Wide character in print at C:Program
> FilesGreenstoneperllibplugoutsGAPlugout.pm line 94.
> import.pl> NULPlug processing "C:Program
> FilesGreenstonecollecthhhhimportISA00000005.nul"
> import.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/doc.pm line 342.
> -------------------------------------
>
> import.pl> *********************************************
> import.pl> □□□□□ □□□□□□□□□
> import.pl> *********************************************
> import.pl> * 27 □□□□□ □□ □□□□□□□□ □□□□□□□□
> import.pl> * 27 □□□ □□□□□□□□ □ □□□□□□□ □□□□□□□□
> import.pl> □□□□□□ □□□□□□□ □□□□ □□□□□□□□ .
> import.pl> □□□□□□□ □□□□□□ □□□□□□ □□□□□ □□ □□□□□ □□□□□□□.
> import.pl> □□□□□ □□□□□ □□□□□□ □□□□□□□□ □□□□□□□□□.
> □□□: C:Program FilesGreenstonebinwindowsperlbinPerl.exe -S
> C:Program FilesGreenstonebinscriptbuildcol.pl -gli -language ar
> -collectdir C:Program FilesGreenstonecollect -removeold hhhh
> buildcol.pl> *** creating the compressed text
> buildcol.pl> collecting text statistics (mgpp_passes -T1)
> buildcol.pl> ArcPlug: □□□□□□ C:Program
> FilesGreenstonecollecthhhharchivesarchives.inf
> buildcol.pl> GAPlug: processing HASH0116.dirdoc.xml
> buildcol.pl> GAPlug: processing HASH1ee4.dirdoc.xml
> buildcol.pl> GAPlug: processing HASH0147.dirdoc.xml
> buildcol.pl> GAPlug: processing HASH7c8b.dirdoc.xml
> buildcol.pl> GAPlug: processing HASH013a.dirdoc.xml
> buildcol.pl> GAPlug: processing HASH011b.dirdoc.xml
> ---------------------------------------------------------------------------------
>
> buildcol.pl> GAPlug: processing HASH0125.dirdoc.xml
> buildcol.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/basebuildproc.pm line 413.
> buildcol.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/basebuildproc.pm line 413.
> buildcol.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/basebuildproc.pm line 413.
> buildcol.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/basebuildproc.pm line 413.
> buildcol.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/basebuildproc.pm line 413.
> buildcol.pl> Wide character in print at C:Program
> FilesGreenstone/perllib/basebuildproc.pm line 413.
> buildcol.pl> Warning: No metadata values assigned to exp.□□□□□□□.
> buildcol.pl> *** outputting information for classifier: CL1
> buildcol.pl> *** outputting information for classifier: CL2
> buildcol.pl> *** outputting information for classifier: CL3
> buildcol.pl> *** outputting information for classifier: CL4
> buildcol.pl> *** outputting information for classifier: oai
> buildcol.pl> *** creating auxiliary files
> buildcol.pl> □□□□□□ □□□□□□□ □□□□ □□□□□□□□ .
>
> ======================================================================
> The doc.xml of NULPlug is not (ok) and amazingly the .FDT or exp. is Ok
>
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <!DOCTYPE Archive SYSTEM
> "http://greenstone.org/dtd/Archive/1.0/Archive.dtd">
> <Archive>
> <Section>
> <Description>
> <Metadata name="gsdlsourcefilename">importISA00000016.nul</Metadata>
> <Metadata name="gsdldoctype">indexed_doc</Metadata>
> <Metadata name="Plugin">NULPlug</Metadata>
> <Metadata name="Source">00000016.nul</Metadata>
> <Metadata name="FileSize">0</Metadata>
> <Metadata name="null_file">00000016.nul</Metadata>
> <Metadata
> name="exp.□□□□□□□^*">&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#136;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Ugrave;&#130;&Ugrave;&#136;&Ugrave;&#133;&Ugrave;&#138;
> &Ugrave;&#132;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;</Metadata>
>
> <Metadata
> name="exp.□□□□□">&Ugrave;&#134;&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Ugrave;&#132;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#132;&Ugrave;&#137;
> &Ugrave;&#132;&Ugrave;&#132;&Oslash;□&Oslash;□&Oslash;□&Oslash;□
> &Ugrave;&#136;
> &Oslash;□&Ugrave;&#132;&Oslash;□&Ugrave;&#138;&Oslash;□&Oslash;□&Oslash;□,
> &Oslash;&ordf;1977</Metadata>
> <Metadata
> name="exp.□□□□□^*">&Ugrave;&#134;&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Ugrave;&#132;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#132;&Ugrave;&#137;
> &Ugrave;&#132;&Ugrave;&#132;&Oslash;□&Oslash;□&Oslash;□&Oslash;□
> &Ugrave;&#136;
> &Oslash;□&Ugrave;&#132;&Oslash;□&Ugrave;&#138;&Oslash;□&Oslash;□&Oslash;□</Metadata>
>
> <Metadata
> name="exp.□□□□□□□□□□□□□□□">&Ugrave;&#134;&Oslash;□&Ugrave;&#138;&Oslash;□&Oslash;&#140;
> &Oslash;□&Oslash;□&Ugrave;&#133;&Oslash;□</Metadata>
> <Metadata name="exp.□□□□□□□□□□□^*">&Oslash;□70 &Oslash;□ .</Metadata>
> <Metadata
> name="exp.□□□□□□□□□□□□□□□□">&amp;lt;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;&amp;gt;&amp;lt;&Ugrave;&#133;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□&Oslash;□&Oslash;&ordf;&amp;gt;&amp;lt;&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#131;&Oslash;□
> &Oslash;□&Ugrave;&#130;&Oslash;□&Ugrave;&#129;&Ugrave;&#138;&Oslash;□&amp;gt;</Metadata>
>
> <Metadata
> name="exp.□□□□□□□">&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#136;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Ugrave;&#130;&Ugrave;&#136;&Ugrave;&#133;&Ugrave;&#138;
> &Ugrave;&#132;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;</Metadata>
>
> <Metadata name="exp.□□□□□□□□□□□">&Oslash;□70 &Oslash;□ .</Metadata>
> <Metadata name="exp.ISISRawRecord">tag=24
> data=&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#136;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Ugrave;&#130;&Ugrave;&#136;&Ugrave;&#133;&Ugrave;&#138;
> &Ugrave;&#132;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;
>
>
> tag=26
> data=^&Ugrave;&#134;&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Ugrave;&#132;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#132;&Ugrave;&#137;
> &Ugrave;&#132;&Ugrave;&#132;&Oslash;□&Oslash;□&Oslash;□&Oslash;□
> &Ugrave;&#136;
> &Oslash;□&Ugrave;&#132;&Oslash;□&Ugrave;&#138;&Oslash;□&Oslash;□&Oslash;□^&Oslash;&ordf;1977
>
>
> tag=30 data=^&Oslash;□70 &Oslash;□ .
>
> tag=69 data=&amp;lt;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;&amp;gt;&amp;lt;&Ugrave;&#133;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□&Oslash;□&Oslash;&ordf;&amp;gt;&amp;lt;&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#131;&Oslash;□
> &Oslash;□&Ugrave;&#130;&Oslash;□&Ugrave;&#129;&Ugrave;&#138;&Oslash;□&amp;gt;
>
>
> tag=70
> data=&Ugrave;&#134;&Oslash;□&Ugrave;&#138;&Oslash;□&Oslash;&#140;
> &Oslash;□&Oslash;□&Ugrave;&#133;&Oslash;□</Metadata>
> <Metadata
> name="exp.□□□□□□□□□□□□□□□□^sub">&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
> &Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;</Metadata>
>
> <Metadata
> name="exp.□□□□□□□□□□□□□□□□^sub">&Ugrave;&#133;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□&Oslash;□&Oslash;&ordf;</Metadata>
>
> <Metadata
> name="exp.□□□□□□□□□□□□□□□□^sub">&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#131;&Oslash;□
> &Oslash;□&Ugrave;&#130;&Oslash;□&Ugrave;&#129;&Ugrave;&#138;&Oslash;□</Metadata>
>
> <Metadata name="Title">00000016</Metadata>
> <Metadata name="Identifier">HASH01dcee5f55f9dd478eaff67f</Metadata>
> <Metadata name="assocfilepath">HASH01dc.dir</Metadata>
> </Description>
> <Content>&Ugrave;&#135;&Oslash;□&Oslash;□
> &Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Ugrave;&#132;&Ugrave;&#129;
> &Ugrave;&#132;&Oslash;□
> &Ugrave;&#138;&Oslash;&ordf;&Oslash;□&Ugrave;&#133;&Ugrave;&#134;
> &Ugrave;&#134;&Oslash;□.</Content>
> </Section>
> </Archive>
> ==================================================================================================================
>
> Refferring to my email concerning version (2.71) ,I did check the
> exploded file and i have found that the maximum exploded records
> not more than 15000
> Over the past fifteen years CDS/ISIS is widly used in the Arabic
> Region.Once the above problems are solved,it will be a practical
> solutions (Multilingual Interface)
> for many institutions and invidual of skills to come and maximize the
> value of using Greenstone.
> Thank you once again Micheal for this.
> All the best,
> Kamal
> *************************************************************************
>
> _________________________________________________________________
> Don't just search. Find. Check out the new MSN Search!
> http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>greenstone-users mailing list
>greenstone-users@list.scms.waikato.ac.nz
>https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users
>
>