| ![]() | |||||||||
SPATIAL JOIN PROCESSING AND DATA
PLACEMENT IN PARALLEL SPATIAL
DATABASES
Nick Koudas Kenneth C. Sevcik
Computer Systems Research Institute
University of Toronto
Toronto, Ontario
Canada
ABSTRACT
In this paper we study the problem of computing spatial joins in a Shared Nothing parallel data base architecture. We propose an algorithm to perform the join of two or more spatial data sets. The proposed algorithm is general and is implemented on commodity hardware using a message passing interface. Unlike previous work, we do not make the restrictive assumption that data can fit in main memory and we do not make use of operations special to a particular model of parallel computation. The proposed algorithm exploits properties of spatial joins and makes use of the idea of 'bulk internode requests' which effectively reduces both processor and IO time requirements. Next we examine variants of the algorithm and identify performance benefits of different options. Our experimental results are based on real data sets for a wide range of parameters. Finally, we study the performance implications of different page striping policies on the proposed algorithm. We divide striping policies into two categories, namely two-dimensional,
which involve alternating between dimensions and one-dimensional which involve a dimension at a time. We present a detailed performance evaluation of policies from both categories involving both real and synthetic data sets.
Our conclusions are that two-dimensional striping techniques adapt better to data skew (non uniformity) and consistently outperform one-dimensional techniques. In addition the relative performance of one-dimensional techniques appears very sensitive to the amount of data placed on each node during striping.
Keywords: Parallel data bases, spatial access methods, shared nothing architecture.
1 INTRODUCTION
Parallel Data Base Systems have been a topic of active research for the last decade [DG92]. During these years there have been numerous contributions in areas like Data Placement [GD90],[CABK88] and Parallel Database Algorithms [SD90], [SD89].