[greenstone-users] urgent problem Please Help

From Amin Hedjazi
DateSun Jan 18 01:04:51 2009
Subject [greenstone-users] urgent problem Please Help
Hi every one
today i run into this problem and i absolutely have no idea where to look
for to solve this problem
hear it :
i am indexing some txt collections with persion text that woulde be with
utf-8 encoding
whene i index the text using the mg and mgpp indexers there arent any
problems but whene i index them using Lucene
the text shown in the content off those files are not readable somthing like
this :

□?□?????□ ?□?□???□???? ?(c)?□???? ?????? ?? ???□ ???□?□?□?□?□??

i also tryed this with other document types than txt and it had the same
problem like before
know the strange thing is that while searching for the text it fineds out
where the text is but dosent shows any content whene i click the document to
be shown

this does hapen with the older Gs3 version i have to (2 month old) this one
i am using know is fresh from the svn
in the olde version i get this execptions whene trying to index the doc.xml
files created by the import proccess

Doc: 22parse error:
buildcol.pl> at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
buildcol.pl> at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
buildcol.pl> at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
buildcol.pl> at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
buildcol.pl> at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown
Source)
buildcol.pl> at
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
buildcol.pl> at javax.xml.parsers.SAXParser.parse(Unknown Source)
buildcol.pl> at
org.greenstone.LuceneWrapper.Indexer.index(Indexer.java:117)
buildcol.pl> at
org.greenstone.LuceneWrapper.IndexXML.indexFile(IndexXML.java:65)
buildcol.pl> at
org.greenstone.LuceneWrapper.GS2LuceneIndexer.main(GS2LuceneIndexer.java:110)

and also with the same effect that the text is not read able

Please Help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20090117/392bf0b0/attachment.html