Dynamic Parallel Schedules (DPS) is a framework for developing parallel applications
on distributed memory computers, such as clusters of PCs or workstations. DPS uses a high-level
graph-based application description model, allowing for complex application designs. DPS also
provides a dynamic application execution environment, allowing for malleable applications and
The framework is provided as an
open-source (GPL), cross-platform (Linux, Solaris, Windows, etc.) C++ library, which allows
DPS applications to run on a wide range of heterogeneous clusters.
Some aspects of DPS application development and execution are highlighted below:
High-level: DPS describes applications as directed acyclic flow graphs
composed of user-defined operations.
Compositional: DPS allows arbitrary nesting of flow graphs, even across application
boundaries. This enables the creation of complex applications and of parallel components.
Parallel components can be exported and used by multiple applications simultaneously.
Pipelined: DPS applications are pipelined and multithreaded by construction,
allowing a maximal overlap of computations and communications.
Dynamic: Flow graphs and the mapping of operations to processing nodes are specified
dynamically at runtime. Flow graphs can be invoked at any time.
Malleable: The mapping of operations to processing nodes can be modified during execution.
Processing nodes can be added or removed. Computations can be moved from one processing node to
another, e.g. for load balancing.
Analysis tools: DPS comes with a rich set of tools for monitoring processing nodes,
launching parallel applications and for post-mortem trace analysis.
Fault tolerant: In case of processing node failure, DPS automatically recovers the
state of the parallel application and pursues its execution on a reduced set of processing nodes.
DPS can handle multiple successive and/or simultaneous failures during a single execution. The runtime
overheads caused by the fault-tolerance mechanism are low (typically <5%) thanks to DPSís pipelining
and automatic overlapping of computations and communications.