[greenstone-users] Lucene indexer

From Michael Dewsnip
DateFri Dec 21 15:45:36 2007
Subject [greenstone-users] Lucene indexer
In-Reply-To (00c201c83c33$71dff340$7c3401c8-diegos)
Hi Diego,

It looks like you're using the GNU version of Java -- this has never
worked very well for us. Please try downloading the Sun version of Java
and see if this works better.

There is also a problem with HTML entities and Lucene that has been
fixed since the Greenstone v2.80 release. If using the Sun version of
Java doesn't fix your problem, let me know and I'll send you the patched

All the best,


DL Consulting
Greenstone Digital Library and Digitisation Specialists

Diego Spano wrote:
> H i List, when I want to build indexes, I get the following error:
> Starting to index <xml doc on stdin>
> [ Doc: 9parse error:
> org.xml.sax.SAXParseException: not a name start character: "U+26"
> at gnu.xml.stream.SAXParser.parse(libgcj.so.7rh)
> at javax.xml.parsers.SAXParser.parse(libgcj.so.7rh)
> at org.greenstone.LuceneWrapper.Indexer.index(Indexer.java:117)
> at org.greenstone.LuceneWrapper.IndexXML.indexFile(IndexXML.java:65)
> at
> org.greenstone.LuceneWrapper.GS2LuceneIndexer.main(GS2LuceneIndexer.java:110)
> Caused by: javax.xml.stream.XMLStreamException: not a name start
> character: "U+26"
> at gnu.xml.stream.XMLParser.error(libgcj.so.7rh)
> at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.7rh)
> at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.7rh)
> at gnu.xml.stream.XMLParser.readCharData(libgcj.so.7rh)
> at gnu.xml.stream.XMLParser.next(libgcj.so.7rh)
> at gnu.xml.stream.XMLParser.hasNext(libgcj.so.7rh)
> at gnu.xml.stream.SAXParser.parse(libgcj.so.7rh)
> ...4 more
> This happens for many documents. Any help?. GS version is 2.74 running
> on Centos5.
> Diego Spano
> ------------------------------------------------------------------------
> _______________________________________________
> greenstone-users mailing list
> greenstone-users@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/greenstone-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20071221/2eb2a58f/attachment.html