From | p shanthi |
Date | Wed, 30 Oct 2002 15:55:21 +0530 |
Subject | RE: RE: pdf to html errors |
Hi,
I have created a collection using gsdl 2.38 on XP system. Main source files are
all in Pdf format. The alignment is not good in the HTML format
file.
i
would like to hide the html linked file. but i am not able to. i had
inserted
format
CL1vlist
"<td>[srclink][srcicon][/srclink]</td><td>[title]</td>"
in
the configure collection....but it is not making any difference. I remember
applying the same technique for the 2.37 version. it worked fine.
can
u pl. help me out ?
Thanks
P.
Shanthi
Information
Services,
Hindustan
Lever Research Centre & Unilever Research India,
64
Main Road,
Whitefield,
Bangalore,
INDIA
Phone
No: +91-80 8451505 Extn 121
Fax
No: +91-80 8453086
-----Original
Message-----
From:
Jared Potter [SMTP:jpotter@mercycorps.org]
Sent:
Wednesday, October 30, 2002 5:06 AM
To:
greenstone@tripath.colosys.net
Subject:
RE: pdf to html errors
An
easy work around to the Images of text that John has described is to insert a
coverpage that has a title and perhaps the author. That give greenstone enough
text to at least include the file, even if it won't do a full text index of it.
To do that, you will need a copy of adobe acrobat 5, or some other similar
software.
-jared
Jared
Potter
Mercy
Corps
Portland,
Oregon
-----Original
Message-----
From:
John R. McPherson [mailto:jrm21@cs.waikato.ac.nz]
Sent:
Tuesday, October 29, 2002 12:52 PM
To:
greenstone@tripath.colosys.net
Subject:
Re: pdf to html errors
Bhakti
P Beke wrote:
>
>
Hi,
>
>
I have also come across similar problem. I am working on GSDL 2.38
>
version. Though I have submitted a query from Greenstone Support
>
website I am still waiting for the rely.
>
>
Bhakti
>
On Tue, 29 Oct 2002 Marie-Jose Quintard wrote :
>
>Goo dafternoon,
>
>
>
>I have created a new collection and have imported a few books
>
>which are
>
>available in pdf format (scanner used = HP9100C digital sender)
>
>When I add these books to the new collection, a lot of error
are
>
>generated
>
>when the pdftohtml is executed
>
>and most of these books can't be converted to HTML.
>
>
>
>Any solutions, suggestions to solve this problem would be
>
>welcome.
>
>Thanks,
Hi,
it
would help if you told us things such as which operating system you
are
using, and what kind of error messages you are getting. Otherwise
it
is too difficult for us to know what is going wrong.
Having
said that, we are aware of some problems when converting pdf
files
on some versions of microsoft windows, although I'm not sure if
we
have yet come up with a 100% fool-proof work-around yet. Windows NT,
2000
and XP seem to not see these problems as often as windows 95 and 98
do.
Also
remember that some pdf files don't actually contain text, but
contain
images
of text, and these pdf files cannot have the text easily extracted
from
them.
John
McPherson |