|Expanding Access to Science and Technology (UNU, 1994, 462 pages)|
|Session 4 : Intelligent access to information: Part 2|
The Japan Information Center of Science and Technology (JICST) is translating abstracts of scientific and technical papers from Japanese to English by the improved version of the Mu machine translation system that we developed in 1985. Sentences in abstracts are comparatively long, and sentence structures tend to be very complex. Therefore, a certain degree of pre-editing is performed on the original Japanese abstracts. Post-editing is also done for the translated English abstracts. There is a complementary relation between the pre-editing and post-editing. That is, when a heavy pre-editing is done, the post-editing is light, and vice versa. JICST is now measuring what degrees of pre-editing and post-editing are best from the standpoint of overall cost.
The original system had a dictionary of about 70,000 words. Among them, about 20,000 words were common words and the rest were terminology in computer and electrical engineering. This dictionary was quite insufficient for JICST because abstracts from various scientific and technical fields were to be translated. So, JICST increased the vocabulary to 200,000 words and obtained an improved but still unsatisfactory translation rate. Finally, when the vocabulary was increased to 540,000 words, the expected translation rate was achieved.
Measurement of the translation quality at JlCST is very difficult because both pre-editing and post-editing are performed to achieve a tolerable quality for general readers. JICST reports that the cost of machine translation, including pre-editing and post-editing, is about 60 per cent that of human translation, and that the speed of translation has been improved significantly.
Many private companies are involved in machine translation. Although they do not yet make a profit, they invest a lot because they consider that natural language processing will be a key technology in the future information society. The systems listed below are typical ones in Japan. Some other companies are developing small systems on personal computers. Those marked with an asterisk are R&D systems.
ALT J/E: Information Processing Laboratories, NTT*
AS-TRANSAC: Toshiba Corp.
ATLAS-I: Fujitsu, Ltd.
ATLAS-II: Fujitsu, Ltd.
DUET-E/1: Sharp Corp.
HICATS E/J: Hitachi, Ltd.
HICATS J/E: Hitachi, Ltd.
KAN-TRAN III: Carozelia Japan*
LAMB: Canon Inc.*
MELTRAN-J/E: Mitsubishi Electric Corp.
MU: Kyoto University* MU2: JICST
PAROLE: Matsushita Electric Industrial Co., Ltd. PENSEE, PENSEE II: Oki Industrial Co., Ltd.
PIVOT: NEC Corp.
RMT E/J: Ricoh Co., Ltd. SHALT: IBM Japan, Ltd. SHALT/JETS: IBM Japan, Ltd. STAR: CATENA Resource (on Unix) SYSTRAN: Systran
THE TRANSLATOR: CATENA Resource (on Macintosh)
Translation Word Processor SWP-7800: Sanyo Electric Co., Ltd.
UNNAMED: Nippon-Data General Corp.*
Two or three nationwide on-line machine translation services via computer network are commercially used.