| ![]() | |||||||||
ARPA Image Understanding Workshop, pages 817-826, 1994.
Document Understanding Research At Maryland
David Doermann
Document Processing Group, Center for Automation Research
University of Maryland, College Park, MD 20742-3275
Abstract
Research in the Computer Vision Laboratory at the University of Maryland covers a broad range of topics and applications related to visual processing. This paper highlights some recent and ongoing research in the area of Document Understanding. The topics addressed include page segmentation, page decomposition, stroke interpretation, logo recognition, forms processing and the generation of synthetic data.
1 Introduction
The goal of document understanding has
evolved far beyond the age-old task of character
recognition. The term understanding suggests
that the problem involves a great deal more then
the classification of a finite set of symbols. In
many cases, the visual and perceptual processes
required to understand complex documents can
parallel those necessary to process more general
classes of scenes.
Document understanding research at the University
of Maryland focuses on applying nontraditional
approaches to the processing of
handwritten as well as printed documents.
Many of the techniques used are motivated by
general image understanding and computer vision
problems. Such approaches have shown
promise for a variety of difficult document understanding
tasks.
One of the general characteristics of our approach
is an attempt to circumvent the traditional
staged" approach to analysis and treat
document understanding in terms of cooperation
between high- and low-level processes.
In simple cases, this may involve only providing
feedback mechanisms, but it may also involve
computing confidence measures and dealing
with multiple interpretations.
We have also found that in many applications
it is desirable to avoid traditional hreshold
and thin" approaches which destroy information.
Such applications benefit from deriving
high level constraints prior to lower level detailed
processing and building a representation
general enough to support multiple alternatives
at higher levels.
In this paper we review several areas of our research,
including page decomposition, segmentation
and semantic analysis, document recovery
and its applications, and the generation of
synthetic data.
2 Page Decomposition
The structural analysis of documents involves the derivation of the logical or semantic meaning of a set of salient fields or regions within a document. For example, if we are given a newspaper page, the structural analysis may involve using the layout to identify headlines, locate bylines, group paragraphs from different columns which belong to the same article, or associate a picture with the article which references it and the photographer who took it. In general the problem involves using attributes and structural relationships of the document to label document components within the contextual rules dictated by the document class or type (memo, letter, journal article, newspaper, etc.). Our ability to label these components in a meaningful way is due, in part, to our ability to understand the functionality of the document. By knowing the intent of the document, we associate the document with a document class and invoke a model space which defines a general description of which types of components we expect to find and how they may be arranged. We then attempt to instantiate the model and label components in a way which is consistent with the model expectations. If the class of documents is known, the interpretation is constrained by the layout characteristics which make the document an instance of that class. Unfortunately, the class is not always known a priori to an automated system, and must be inferred from the document, in