close this bookVolume 3: No. 16
View the documentNeuroscience information research
View the documentPublic interest in high-tech
View the documentTechnology forecasting
View the documentMultimedia and CD-ROM trends
View the documentVirtual reality
View the documentNeural networks
View the documentData sources
View the documentJob opportunities
View the documentComputational linguistics
View the documentCharacter sets and fonts
View the documentComputists' news

Conrad F. Sabourin at UMontreal has an NLP bibliographic database with 67K references (more than 13K related to AI) indexed by 3,400 keywords. Compilation has taken 15 years. Thematic subsets will soon be published in hardcopy, but there's still time if you'd like to send in your own list of publications. Citations include: NL interfaces, 3K entries; text understanding, 4K; parsing, 7K; computational morphology, 2K; text generation, 2K; speech analyses and synthesis, 3K; speech understanding, 3K; text information extraction, 2K; full-text information retrieval, 3K; computer translation, 7K; linguistics, 3K; psycholinguistics, 1.6K; literary computing, 3K; statistical linguistics, 2.4K; computer-assisted language teaching, 5.5K; electronic document processing, 2.3K; computational lexicography, 3K; OCR, 3K; character-level processing, 2K; computer-mediated communication, 2K; and corpus-based dialect study, 1K. P.O. Box 187, Snowdon, Montreal, Quebec, H3X 3T4. [, LINGUIST, 4/16/93.]

Mark A. Mandel ( has compiled a bibliography of dictionaries and reference works that compare US and British pronunciation and spelling. [LINGUIST, 3/16/93.]

Grady Ward is revising his for-profit Moby lexical databases. Contact during April if there are features you'd like to see incorporated. [, 4/11/93.]

Masterpiece Library CD-ROM from UUtah contains 1300 public- domain texts, planned for release at $39.95. Browsing and search software for Macs and PCs is included. To volunteer as a reviewer, contact Pacific HiTech (, (800) 765-8369, (801) 278-2666 Fax. [Cliff Miller (, LINGUIST, 3/31/93.]

Project Gutenberg provides machine readable texts such as the Bible, Koran, CIA World Factbook, Anne of Green Gables, presidential speeches, etc. Contact Michael Hart (, of subscribe to the Project Gutenberg e-conference by sending a "sub gutnberg your name" message to listserv@uiucvmd. [Ellie Fogarty(, VPIEJ-L, 3/19/93.]

RLIN users can access Rutgers' CETH-managed inventory of machine-readable texts in the humanities with RLIN's SEL FIL MDF database-selection command. [James O'Donnell (, HUMANISTS, 4/9/93.]

Common Lisp source code for the Xerox part-of-speech tagger, tagger-1-0.tar.Z, is available for FTP from pub/tagger on [Doug Cutting (, LINGUIST, 4/16/93.]

LEXA, a set of MS-DOS programs for lexical data processing, is available from the Norwegian Computing Centre for the Humanities for about $100. The 40+ programs (with common GUI) carry out lexical analysis and information retrieval for linguistic investigation of text corpora. LEXA was written by Raymond Hickey of UMunich. It comes as 4 diskettes with a 3-volume manual of 750 pages. Post a "send icame" message to, or use FTP or Gopher to get the file from directory icame on [Knut Hofland (, Humanist, 3/23/93.]

E'ci is a PC/Mac program that sorts and optionally responds to mail based on keywords in the first 60 lines. Send a message to and ask E'ci -- in your own words -- to introduce herself. "She" will return a file with suggestions for use. [Laszlo G. Boros (boros@ohstmvsa.bitnet), PACS-L, 4/8/93.]

Need to convert word-processor text to ASCII? The TO5 program in TXTOUT13.ZIP is very good for WordPerfect 5.0 conversions on PCs. FTP it (binary mode) from pub/msdos/txtutl on, or from and other Simtel20 mirrors. Other files there convert to and from Unix format (CRLF15B.ZIP, LFCRLF11.ZIP), WordStar (F-11.ZIP, UNSOFT11.ZIP, WSASCII.ZIP, WSCONV.BAS, WSX21A.ZIP), and EBCDIC, ISO, or arbitrary character sets (CTRANS15.ZIP). [Elliott Parker (, CARR-L, 3/28/93.]

The forthcoming Release BZ of the electronic journal SCHOLAR will include notes on Project Xanadu, the British National Corpus (1M words), a new catalog for the Oxford Text Archive, humanities conference and resource announcements, and several linguistics job openings. For access, send a "sub scholar your name" message to [Joe Raben (, LINGUIST, 4/14/93.]

The Chinese Knowledge Information Processing Group (CKIP) at Academia Sinica is offering four frequency counts from a Chinese newspaper corpus of 20M characters (9.5M words after segmentation.) There were 4,666 distinct characters. The most common 14,956 words constitute more than 99.9995 percent of the corpus. 19,907 verbs and 21,368 nouns appear more than twice. 30-300 pages, $5-$20 each. Contact Miss Tsai Shu-hui of the Computational Linguistic Society of the R.O.C. (ROCLING) at, 886-2-788-1638, 886-2-788-1638 Fax. [LINGUIST, 3/16/93.]

Need a memorable password or a new host name? Here are a few hand-picked from 3rd-order Markov output: accomis, actir, adilint, apide, argenic, arnic, aroxide, auding, audivist, barm, bartive, chandivity, charide, chatory, coluted, contomic, cycleus, dion, ditiples, enic, firmas, forund, grevigy, haride, hydrocaric, hydroke, hydrome, hydropod, hydrot, hythar, hythemiatic, impaltic, implect, inalus, inertion, isolorte, istron, lumbin, lutrumber, matebroge, maton, mearboid, menal, mulfure, neumicide, numassy, orgent, ortic, oxentic, palum, paric, parlig, parivis, partil, pervatis, poider, pronsis, propoilic, quarse, rachat, ractic, senal, solrin, sumad, themetry, umberache, umisian, vache, varide, volring, watic, zintal.

If you prefer French: anire, artait, avansur, baser, auculecasse, cereducula, chaqu'ent, chaquest, comble, d'accera, d'expelie, danductris, danien, deserce, doperside, espassage, esprommece, etteme, exisurs, fainution, ficuperce, foriquoire, fortions, inutentien, kaus, laintiela, lasseur, leute, logenotil, lormalere, maqis, mativeral, motique, nouscie, onsiqu'ent, opposigne, ouvoiten, peutique, phestil, phypotion, prempla, priqu'es, psychaques, puration, qualimecme, rementrel, reutemes, sciele, serniquis, sondiation, sustibles, tourelaise, toutalors, unermus. [Robert S. Fritzius (,, 12/29/92.] (Is there an expert system or network for recognizing likely words? For cryptanalysis, perhaps, or data entry validation, or recovering damaged files?)