page 1  (6 pages)
2to next section

1994 International Conference on Pattern Recognition

Page Segmentation Using Decision Integration and Wavelet Packets

Kamran Etemadyz, David Doermanny and Rama Chellappayz

yDocument Processing Group, Center for Automation Research zDepartment of Electrical Engineering

University of Maryland

College Park, MD 20742


A new algorithm for layout-independent document page segmentation is suggested. Text, image and graphics regions in a document image are treated as three different exture" classes. Soft local decisions on small blocks are made using wavelet packet based feature vectors. Segmentation is performed by propagating and integrating soft local decisions over neighboring blocks, within and across scales.The uncertainties" associated with local decisions are reduced as more contextual evidence is incorporated in the process of decision integration. The majority, taken over weighted combined votes, determines the final decision. The suggested algorithm is based on parallel independent computations which have low complexity. It can also be applied to other signal and image segmentation tasks.

1 Introduction

Recent advances in information and communications technologies have increased the need for, and therefore the interest in, automated reading and processing of documents. Efficient storage and transmission of documents as well as archiving and information retrieval for document databases and digital libraries" have become important research issues. For coding or understanding document images it is essential to identify text, image and graphics regions, as physical segments of the page, in order to be able to process them properly. Therefore the task of document page segmentation is one of the primary stages of most document processing systems. In this paper

The support of this research by the Advanced Research Projects Agency (ARPA Order No. A550), under contract MDA 9049-3C-7217, is greatfully acknowledged.

we describe a new method of layout independent physical page segmentation, in which few assumptions are made about the document's textual and graphical attributes or layout structure. The system is designed in such a way that, as hypotheses about document components are made and verified, more domain specific post-processing can occur.

Other methods of page segmentation and layout analysis described in the literature can be broadly classified into bottom-up and top-down approaches. Bottom-up techniques such as connected components [1] start from the pixel level and merge regions into larger and larger components (e.g. characters, words, text lines, paragraphs, etc.). In top-down methods the page is first split into blocks, and these blocks are identified and subdivided appropriately. For example, one might first locate columns and then split them into paragraphs, text lines, or even words. Examples of top-down methods include recursive projection profile cuts [2,3], run length smoothing or constrained run length [4] algorithms. There are also hybrid methods that combine the top-down and bottomup approaches. In some approaches, after detecting major blocks, simple statistical tests classify them as text or non-text regions [3,5]. Black pixel density, black/white ratio or transitions, average vertical or horizontal run length, and cross-line correlations [5] are some of the features used in these postclassification stages.

Most of these techniques rely on prior knowledge or assumptions about generic document layout structure and textual and graphical attributes. Utilizing such knowledge results in simple, elegant and efficient page decomposition systems but also limits the range of applicability of the algorithm. Noise and degradations, multi-directional and curved text lines, touching or overlapping components, gray level or inverted text, complex layout structures, and differences in language, font size and other textual attributes are among