[greenstone-users] Arabic CDS/ISIS

From kamal khalafala
DateSun, 17 Dec 2006 09:27:50 +0000
Subject [greenstone-users] Arabic CDS/ISIS
Dear Micheal,
Thank you very much for fixing the core problem of ISISPlug,but the .FDT
text require some works and you have solved this problem earlier (version
2.63).It looks
just like this in Design-->Search index-->New Index area and ex. also.
------------------------------------------------------------------------------------------------------------------------
FDT file

-□□□□□□□^all □□□□□□□ □□□□□□^all □□□□□□□^all
□□□□□□□□□□^all
defaultindex□□□□□□□^all
-----------------------------------------------------------------------------------------------------------------------------------------------------


isis doc.xml version 2.72 is ok

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE Archive SYSTEM
"http://greenstone.org/dtd/Archive/1.0/Archive.dtd">
<Archive>
<Section>
<Description>
<Metadata name="gsdlsourcefilename">importUKARB.MST</Metadata>
<Metadata name="gsdldoctype">indexed_doc</Metadata>
<Metadata name="Language">ar</Metadata>
<Metadata name="Encoding">windows_1256</Metadata>
<Metadata name="Source">UKARB.MST</Metadata>
<Metadata name="SourceSegment">4</Metadata>
<Metadata name="Plugin">ISISPlug</Metadata>
<Metadata name="□□□□□□□□□">□□□</Metadata>
<Metadata name="□□□□□□□□□^all">□□□</Metadata>
<Metadata name="□□□□□□□">□□□□□□ □□□□□□□□ □ □□□□□ □□□□□□□</Metadata>
<Metadata name="□□□□□□□^all">□□□□□□ □□□□□□□□ □ □□□□□ □□□□□□□</Metadata>
<Metadata name="□□□□□□(□□□)">□□□ □□□□□□ □□□□</Metadata>
<Metadata name="□□□□□□(□□□)^all">□□□ □□□□□□ □□□□</Metadata>
<Metadata name="□□□□□□(□□□□□)">□□□□ □□□□□□</Metadata>
<Metadata name="□□□□□□(□□□□□)^all">□□□□ □□□□□□</Metadata>
<Metadata name="□□□□□□□□□□□^*">□□□□□□□□□□□</Metadata>
<Metadata name="□□□□□□□□□□□">□□□□□□□□□□□, □□□□□□ □□□□□□□□</Metadata>
<Metadata name="□□□□□□□□□□□^all">□□□□□□□□□□□, □□□□□□ □□□□□□□□</Metadata>
<Metadata name="□□□□□□□□□□">1996</Metadata>
<Metadata name="□□□□□□□□□□^all">1996</Metadata>
<Metadata name="□□□□□□□□□□□">659□.</Metadata>
<Metadata name="□□□□□□□□□□□^all">659□.</Metadata>
<Metadata name="□□□□□□□□^sub">□□□□□ □□□□□□□</Metadata>
<Metadata name="□□□□□□□□^sub">□□□□□□ □□□□□□□□</Metadata>
<Metadata name="□□□□□□□□">&amp;lt;□□□□□ □□□□□□□&amp;gt;&amp;lt;□□□□□□
□□□□□□□□&amp;gt;</Metadata>
<Metadata name="□□□□□□□□^all">&amp;lt;□□□□□
□□□□□□□&amp;gt;&amp;lt;□□□□□□ □□□□□□□□&amp;gt;</Metadata>
<Metadata name="□□□□□□□^sub">□□□□□□□</Metadata>
<Metadata name="□□□□□□□">&amp;lt;□□□□□□□&amp;gt;</Metadata>
<Metadata name="□□□□□□□^all">&amp;lt;□□□□□□□&amp;gt;</Metadata>
<Metadata name="□□□□□□□□□□">106</Metadata>
<Metadata name="□□□□□□□□□□^all">106</Metadata>
<Metadata name="ISISRawRecord">tag=60 data=□□□

tag=200 data=□□□□□□ □□□□□□□□ □ □□□□□ □□□□□□□

tag=300 data=□□□ □□□□□□ □□□□

tag=320 data=□□□□ □□□□□□

tag=400 data=^□□□□□□□□□□□^□□□□□□ □□□□□□□□

tag=440 data=1996

tag=460 data=659□.

tag=610 data=658.3

tag=620 data=&amp;lt;□□□□□ □□□□□□□&amp;gt;&amp;lt;□□□□□□ □□□□□□□□&amp;gt;

tag=640 data=&amp;lt;□□□□□□□&amp;gt;

tag=801 data=106</Metadata>
<Metadata name="FileFormat">CDS/ISIS</Metadata>
<Metadata name="Identifier">HASH0151de88f18fe6002c100a5cs4</Metadata>
<Metadata
name="assocfilepath">HASH0151/de88f18f/e6002c10/0a5cs4.dir</Metadata>
</Description>
<Content>&lt;table cellpadding=&quot;4&quot;
cellspacing=&quot;0&quot;&gt;&lt;tr&gt;&lt;td
valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□
□□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
valign=top&gt;□□□&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
valign=top&gt;□□□□□□ □□□□□□□□ □ □□□□□
□□□□□□□&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□
(□□□)&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td valign=top&gt;□□□ □□□□□□
□□□□&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□
(□□□□□)&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td valign=top&gt;□□□□
□□□□□□&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□
□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td valign=top&gt;□□□□□□□□□□□,
□□□□□□ □□□□□□□□&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□
□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
valign=top&gt;1996&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□
□□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
valign=top&gt;659□.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
valign=top&gt;&amp;lt;□□□□□ □□□□□□□&amp;gt;&amp;lt;□□□□□□
□□□□□□□□&amp;gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
valign=top&gt;&amp;lt;□□□□□□□&amp;gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td
valign=top&gt;&lt;nobr&gt;&lt;b&gt;□□□
□□□□□□□&lt;/b&gt;&lt;/nobr&gt;&lt;/td&gt;&lt;td
valign=top&gt;106&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</Content>
</Section>
</Archive>
-------------------------------------------------------------------------------------------------
Display [text] (ok).

□□□□□□□ □□□□
□□□□□□□□□□□□ □□□□ □□□□□□□□ □ □□□□□□□□□ : □□□□□□□ □□□□□□ □□□□ □□□□□□□□ □
□□□□□□□□□ □ □□□□□ □□□□
□□□□□□ (□□□)□□□□ □□□□ □□□ □□□□
□□□□□□ (□□□□)□□□ □□□□□□□□ □ □□□□□□□□□ : □□□□□ □□□□
□□□□□ □□□□□2005
□□□□□□□□□□□□□ □□□□□□□□ □□□□□□□□ □□□□□□
□□□□□□□□□□□ □□□□□□□□
--------------------------------------------------------------
But when you try to display record with a format including metadata(Title)
the Arabic coding of the output text is not correct.
---------------------------------------------------------------------


The output of the NULPlug

log file index mgpp
s
□□□: C:Program FilesGreenstonebinwindowsperlbinPerl.exe -S C:Program
FilesGreenstonebinscriptimport.pl -gli -language ar -collectdir
C:Program FilesGreenstonecollect -removeold hhhh
import.pl> □□□□□ □□□□□□□□□ □□□□□□□ □□ □□□□ □□□□□□□
import.pl> RecPlug: getting directory C:Program
FilesGreenstonecollecthhhhimport
import.pl> RecPlug: getting directory C:Program
FilesGreenstonecollecthhhhimportISA
import.pl> MetadataXMLPlug: processing ISAmetadata.xml
import.pl> NULPlug processing "C:Program
FilesGreenstonecollecthhhhimportISA00000001.nul"
import.pl> Wide character in print at C:Program
FilesGreenstone/perllib/doc.pm line 342.
import.pl> Wide character in print at C:Program
FilesGreenstoneperllibplugoutsGAPlugout.pm line 94.
import.pl> NULPlug processing "C:Program
FilesGreenstonecollecthhhhimportISA00000002.nul"
import.pl> Wide character in print at C:Program
FilesGreenstone/perllib/doc.pm line 342.
import.pl> Wide character in print at C:Program
FilesGreenstoneperllibplugoutsGAPlugout.pm line 94.
import.pl> NULPlug processing "C:Program
FilesGreenstonecollecthhhhimportISA00000003.nul"
import.pl> Wide character in print at C:Program
FilesGreenstone/perllib/doc.pm line 342.
import.pl> Wide character in print at C:Program
FilesGreenstoneperllibplugoutsGAPlugout.pm line 94.
import.pl> NULPlug processing "C:Program
FilesGreenstonecollecthhhhimportISA00000004.nul"
import.pl> Wide character in print at C:Program
FilesGreenstone/perllib/doc.pm line 342.
import.pl> Wide character in print at C:Program
FilesGreenstoneperllibplugoutsGAPlugout.pm line 94.
import.pl> NULPlug processing "C:Program
FilesGreenstonecollecthhhhimportISA00000005.nul"
import.pl> Wide character in print at C:Program
FilesGreenstone/perllib/doc.pm line 342.
-------------------------------------

import.pl> *********************************************
import.pl> □□□□□ □□□□□□□□□
import.pl> *********************************************
import.pl> * 27 □□□□□ □□ □□□□□□□□ □□□□□□□□
import.pl> * 27 □□□ □□□□□□□□ □ □□□□□□□ □□□□□□□□
import.pl> □□□□□□ □□□□□□□ □□□□ □□□□□□□□ .
import.pl> □□□□□□□ □□□□□□ □□□□□□ □□□□□ □□ □□□□□ □□□□□□□.
import.pl> □□□□□ □□□□□ □□□□□□ □□□□□□□□ □□□□□□□□□.
□□□: C:Program FilesGreenstonebinwindowsperlbinPerl.exe -S C:Program
FilesGreenstonebinscriptbuildcol.pl -gli -language ar -collectdir
C:Program FilesGreenstonecollect -removeold hhhh
buildcol.pl> *** creating the compressed text
buildcol.pl> collecting text statistics (mgpp_passes -T1)
buildcol.pl> ArcPlug: □□□□□□ C:Program
FilesGreenstonecollecthhhharchivesarchives.inf
buildcol.pl> GAPlug: processing HASH0116.dirdoc.xml
buildcol.pl> GAPlug: processing HASH1ee4.dirdoc.xml
buildcol.pl> GAPlug: processing HASH0147.dirdoc.xml
buildcol.pl> GAPlug: processing HASH7c8b.dirdoc.xml
buildcol.pl> GAPlug: processing HASH013a.dirdoc.xml
buildcol.pl> GAPlug: processing HASH011b.dirdoc.xml

---------------------------------------------------------------------------------
buildcol.pl> GAPlug: processing HASH0125.dirdoc.xml
buildcol.pl> Wide character in print at C:Program
FilesGreenstone/perllib/basebuildproc.pm line 413.
buildcol.pl> Wide character in print at C:Program
FilesGreenstone/perllib/basebuildproc.pm line 413.
buildcol.pl> Wide character in print at C:Program
FilesGreenstone/perllib/basebuildproc.pm line 413.
buildcol.pl> Wide character in print at C:Program
FilesGreenstone/perllib/basebuildproc.pm line 413.
buildcol.pl> Wide character in print at C:Program
FilesGreenstone/perllib/basebuildproc.pm line 413.
buildcol.pl> Wide character in print at C:Program
FilesGreenstone/perllib/basebuildproc.pm line 413.
buildcol.pl> Warning: No metadata values assigned to exp.□□□□□□□.
buildcol.pl> *** outputting information for classifier: CL1
buildcol.pl> *** outputting information for classifier: CL2
buildcol.pl> *** outputting information for classifier: CL3
buildcol.pl> *** outputting information for classifier: CL4
buildcol.pl> *** outputting information for classifier: oai
buildcol.pl> *** creating auxiliary files
buildcol.pl> □□□□□□ □□□□□□□ □□□□ □□□□□□□□ .

======================================================================
The doc.xml of NULPlug is not (ok) and amazingly the .FDT or exp. is Ok

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE Archive SYSTEM
"http://greenstone.org/dtd/Archive/1.0/Archive.dtd">
<Archive>
<Section>
<Description>
<Metadata name="gsdlsourcefilename">importISA00000016.nul</Metadata>
<Metadata name="gsdldoctype">indexed_doc</Metadata>
<Metadata name="Plugin">NULPlug</Metadata>
<Metadata name="Source">00000016.nul</Metadata>
<Metadata name="FileSize">0</Metadata>
<Metadata name="null_file">00000016.nul</Metadata>
<Metadata
name="exp.□□□□□□□^*">&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#136;&Oslash;□
&Oslash;□&Ugrave;&#132;&Ugrave;&#130;&Ugrave;&#136;&Ugrave;&#133;&Ugrave;&#138;
&Ugrave;&#132;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
&Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;</Metadata>
<Metadata
name="exp.□□□□□">&Ugrave;&#134;&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Ugrave;&#132;&Oslash;□
&Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#132;&Ugrave;&#137;
&Ugrave;&#132;&Ugrave;&#132;&Oslash;□&Oslash;□&Oslash;□&Oslash;□
&Ugrave;&#136;
&Oslash;□&Ugrave;&#132;&Oslash;□&Ugrave;&#138;&Oslash;□&Oslash;□&Oslash;□,
&Oslash;&ordf;1977</Metadata>
<Metadata
name="exp.□□□□□^*">&Ugrave;&#134;&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Ugrave;&#132;&Oslash;□
&Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#132;&Ugrave;&#137;
&Ugrave;&#132;&Ugrave;&#132;&Oslash;□&Oslash;□&Oslash;□&Oslash;□
&Ugrave;&#136;
&Oslash;□&Ugrave;&#132;&Oslash;□&Ugrave;&#138;&Oslash;□&Oslash;□&Oslash;□</Metadata>
<Metadata
name="exp.□□□□□□□□□□□□□□□">&Ugrave;&#134;&Oslash;□&Ugrave;&#138;&Oslash;□&Oslash;&#140;
&Oslash;□&Oslash;□&Ugrave;&#133;&Oslash;□</Metadata>
<Metadata name="exp.□□□□□□□□□□□^*">&Oslash;□70 &Oslash;□ .</Metadata>
<Metadata
name="exp.□□□□□□□□□□□□□□□□">&amp;lt;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
&Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;&amp;gt;&amp;lt;&Ugrave;&#133;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□&Oslash;□&Oslash;&ordf;&amp;gt;&amp;lt;&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#131;&Oslash;□
&Oslash;□&Ugrave;&#130;&Oslash;□&Ugrave;&#129;&Ugrave;&#138;&Oslash;□&amp;gt;</Metadata>
<Metadata
name="exp.□□□□□□□">&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#136;&Oslash;□
&Oslash;□&Ugrave;&#132;&Ugrave;&#130;&Ugrave;&#136;&Ugrave;&#133;&Ugrave;&#138;
&Ugrave;&#132;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
&Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;</Metadata>
<Metadata name="exp.□□□□□□□□□□□">&Oslash;□70 &Oslash;□ .</Metadata>
<Metadata name="exp.ISISRawRecord">tag=24
data=&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#136;&Oslash;□
&Oslash;□&Ugrave;&#132;&Ugrave;&#130;&Ugrave;&#136;&Ugrave;&#133;&Ugrave;&#138;
&Ugrave;&#132;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
&Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;

tag=26
data=^&Ugrave;&#134;&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Oslash;□&Ugrave;&#132;&Oslash;□
&Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#132;&Ugrave;&#137;
&Ugrave;&#132;&Ugrave;&#132;&Oslash;□&Oslash;□&Oslash;□&Oslash;□
&Ugrave;&#136;
&Oslash;□&Ugrave;&#132;&Oslash;□&Ugrave;&#138;&Oslash;□&Oslash;□&Oslash;□^&Oslash;&ordf;1977

tag=30 data=^&Oslash;□70 &Oslash;□ .

tag=69 data=&amp;lt;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
&Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;&amp;gt;&amp;lt;&Ugrave;&#133;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□&Oslash;□&Oslash;&ordf;&amp;gt;&amp;lt;&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#131;&Oslash;□
&Oslash;□&Ugrave;&#130;&Oslash;□&Ugrave;&#129;&Ugrave;&#138;&Oslash;□&amp;gt;

tag=70 data=&Ugrave;&#134;&Oslash;□&Ugrave;&#138;&Oslash;□&Oslash;&#140;
&Oslash;□&Oslash;□&Ugrave;&#133;&Oslash;□</Metadata>
<Metadata
name="exp.□□□□□□□□□□□□□□□□^sub">&Ugrave;&#131;&Oslash;&ordf;&Oslash;□
&Oslash;□&Ugrave;&#132;&Oslash;□&Oslash;□&Ugrave;&#129;&Oslash;□&Ugrave;&#132;</Metadata>
<Metadata
name="exp.□□□□□□□□□□□□□□□□^sub">&Ugrave;&#133;&Ugrave;&#131;&Oslash;&ordf;&Oslash;□&Oslash;□&Oslash;&ordf;</Metadata>
<Metadata
name="exp.□□□□□□□□□□□□□□□□^sub">&Ugrave;&#133;&Oslash;□&Oslash;□&Ugrave;&#131;&Oslash;□
&Oslash;□&Ugrave;&#130;&Oslash;□&Ugrave;&#129;&Ugrave;&#138;&Oslash;□</Metadata>
<Metadata name="Title">00000016</Metadata>
<Metadata name="Identifier">HASH01dcee5f55f9dd478eaff67f</Metadata>
<Metadata name="assocfilepath">HASH01dc.dir</Metadata>
</Description>
<Content>&Ugrave;&#135;&Oslash;□&Oslash;□
&Oslash;□&Ugrave;&#132;&Ugrave;&#133;&Ugrave;&#132;&Ugrave;&#129;
&Ugrave;&#132;&Oslash;□
&Ugrave;&#138;&Oslash;&ordf;&Oslash;□&Ugrave;&#133;&Ugrave;&#134;
&Ugrave;&#134;&Oslash;□.</Content>
</Section>
</Archive>
==================================================================================================================
Refferring to my email concerning version (2.71) ,I did check the exploded
file and i have found that the maximum exploded records
not more than 15000
Over the past fifteen years CDS/ISIS is widly used in the Arabic
Region.Once the above problems are solved,it will be a practical solutions
(Multilingual Interface)
for many institutions and invidual of skills to come and maximize the value
of using Greenstone.
Thank you once again Micheal for this.
All the best,
Kamal
*************************************************************************

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/