| ![]() | |||||||||
Issues in the Design and Implementation of User-Level DMA
Evangelos P. Markatos Manolis G.H. Katevenis
George Kalokerinos Gregory Maglis
George Milolidakis Thanos Oikonomou
Institute of Computer Science (ICS)
Foundation for Research & Technology { Hellas (FORTH)
P.O.Box 1385, Science and Technology Park of Crete,
Heraklion, Crete, GR-711-10 GREECE
markatos@ics.forth.gr
Technical Report 182, ICS-FORTH
URL: http://www.ics.forth.gr/proj/arch-vlsi/telegraphos.html
1 Introduction
The goal of several current supercomputing projects is to demonstrate supercomputer performance at workstation cost. A supercomputer is being created by interconnecting a set of high-performance workstations via a high-speed SCI interconnect. Host workstations are connected to the network over Dolphin's PCI-SCI interface [5] that plugs in the PCI I/O bus of the workstation.
This interface has been carefully designed so as to achieve supercomputing-like communication performance over a network of workstations (NOW). To achieve very low message-passing latency, the interface implements a remote-write operation. The remote-write operation (also called direct-deposit) is initiated by a store assembly instruction to a non-local memory location. Using the remote write primitive, a processor may write a message directly to its destination memory using regular store instructions to non-local memory locations. For example, suppose that a processor has a single-word message stored in variable source, and wants to send it to a remote processor that will store it in variable destination, the sending processor can send the message by executing a single assignment instruction:
destination := source ;
Most compilers will translate the above assignment statement into a two-instruction sequence:
LOAD Register1 FROM source address ;
STORE Register1 TO destination address ; // remote write
The LOAD instruction fetches the single-word message into a processor's register, and the STORE instruction sends the message directly to its destination. This STORE instruction is also called a remote-write operation. Although remote-write operations achieve very low latency for sending short messages, they are expensive when sending large messages as sequences of remote write operations. 1 To overcome this problem, the PCI-SCI interface (along with similar high-speed interfaces) provide a DMA operation. The DMA transfers a large chunk of data from the host computer's main memory into the network interface, and from there into the SCI network without keeping the host processor busy during the transfer. The host processor is only needed to initialize the DMA transfer, and to be notified of its completion. During the DMA transfer, the processor is free to execute other useful work. Besides freeing the processor, DMA transfers impose low memory bus traffic, since they transfer data directly from the
1To draw an analogy from real life, lets consider the fax. The fastest way to send a short document is probably to fax it to the recipient. When faxing a document, the sender essentially deposits the information directly at the receiver's office (much like a remote-write or direct-deposit operation). However, sending long documents (e.g. several books) using fax, is not a good idea as it results in slow and expensive data transfer.