page 1  (11 pages)
2to next section

ARPA Image Understanding Workshop, pages 817-826, 1994.

Document Understanding Research At Maryland

David Doermann
Document Processing Group, Center for Automation Research University of Maryland, College Park, MD 20742-3275

Abstract

Research in the Computer Vision Laboratory at the University of Maryland covers a broad range of topics and applications related to visual processing. This paper highlights some recent and ongoing research in the area of Document Understanding. The topics addressed include page segmentation, page decomposition, stroke interpretation, logo recognition, forms processing and the generation of synthetic data.

1 Introduction

The goal of document understanding has evolved far beyond the age-old task of character recognition. The term understanding suggests that the problem involves a great deal more then the classification of a finite set of symbols. In many cases, the visual and perceptual processes required to understand complex documents can parallel those necessary to process more general classes of scenes.
Document understanding research at the University of Maryland focuses on applying nontraditional approaches to the processing of handwritten as well as printed documents. Many of the techniques used are motivated by general image understanding and computer vision problems. Such approaches have shown promise for a variety of difficult document understanding tasks.
One of the general characteristics of our approach is an attempt to circumvent the traditional staged" approach to analysis and treat document understanding in terms of cooperation between high- and low-level processes. In simple cases, this may involve only providing feedback mechanisms, but it may also involve computing confidence measures and dealing with multiple interpretations.
We have also found that in many applications it is desirable to avoid traditional hreshold

and thin" approaches which destroy information. Such applications benefit from deriving high level constraints prior to lower level detailed processing and building a representation general enough to support multiple alternatives at higher levels.
In this paper we review several areas of our research, including page decomposition, segmentation and semantic analysis, document recovery and its applications, and the generation of synthetic data.

2 Page Decomposition

The structural analysis of documents involves the derivation of the logical or semantic meaning of a set of salient fields or regions within a document. For example, if we are given a newspaper page, the structural analysis may involve using the layout to identify headlines, locate bylines, group paragraphs from different columns which belong to the same article, or associate a picture with the article which references it and the photographer who took it. In general the problem involves using attributes and structural relationships of the document to label document components within the contextual rules dictated by the document class or type (memo, letter, journal article, newspaper, etc.). Our ability to label these components in a meaningful way is due, in part, to our ability to understand the functionality of the document. By knowing the intent of the document, we associate the document with a document class and invoke a model space which defines a general description of which types of components we expect to find and how they may be arranged. We then attempt to instantiate the model and label components in a way which is consistent with the model expectations. If the class of documents is known, the interpretation is constrained by the layout characteristics which make the document an instance of that class. Unfortunately, the class is not always known a priori to an automated system, and must be inferred from the document, in