Re: HTML entities

From Tod Olson
DateFri, 07 Feb 2003 17:52:43 -0600
Subject Re: HTML entities
In-Reply-To (150BDEEA-3952-11D7-9582-0050E4D04E27-reltech-org)
>>>>> "J" == James R Adair <jadair@reltech.org> writes:

J> I want to display a variety of standard HTML entities in
J> Greenstone. The entity refs (e.g., &uuml;) are in the HTML files
J> in the import directory, but after I run import.pl and buildcol.pl,
J> the resulting files display garbage (in the case of &uuml;, ü). I
J> assume this is some sort of attempt to generate Unicode, but what I
J> really want is just to have the entity pass straight through so
J> that the Web browser can interpret it properly. Any suggestions?

ü seems likely to be the UTF-8 value for □. Maybe it's being
converted to UTF-8 twice (I've had that), or maybe the browser is set
to use a specific character set.

Check the imported document in the collection's archives directory.
You can get the document ID from the URL (the "d" parameter, value
probably starts "HASH...") and look that up in archives.inf. That
should let you know whether the problem has to do with importing the
document.

Also, look at what character set your browser is using and check
whether that makes sense wrt the charset settings in main.cfg.

These have both helped me track down charater set issues in the past.

Tod A. Olson <tod@uchicago.edu> "How do you know I'm mad?" said Alice.
Programmer / Analyst "If you weren't mad, you wouldn't have
The University of Chicago Library come here," said the Cat.