This sounds like the Word to HTML conversion failed, or was performed
badly. What exactly do you see when you view the HTML version of the
You might like to start by finding the doc.xml files for the Word
documents you are having trouble with. These are the "Greenstone
versions" of the documents and show all extracted and assigned metadata,
and the text of the documents. They are located in the archives folder
of the collection and are produced by the import process. You'll need to
look at the Source metadata to check if the doc.xml file is one you are
looking for (this will be easier if you build a very small collection
that exhibits the problem).
Once you've found the offending doc.xml files you'll be able to narrow
this down to an import or build/run-time problem. If the text of the
Word file *is* present then this is a build or run-time problem (not
very likely). If the text of the Word file is not present in the doc.xml
file then it is an import problem: the Word to HTML conversion has gone
badly (this is not uncommon, since the Word format is so complex), or
the WordPlug has done something wrong. You can try running the wvWare
program on the Word files directly, and see what output you get.
Hope this helps you identify the underlying problem, and we'll work with
it from there.
Rich Robinson wrote:
> As a followup to my previous message, actually, only two new
> documents that I added today fail to display the HTML version of the
> Word doc. I'm mystified...does the version of Word have anything to
> do with it? The Word docs themselves download fine.
> Thanks much -
> Rich R.
> greenstone-users mailing list