OFI/LLC is a communication library designed for high performance computing. It sits on top of network hardware drivers and provides communication functions to applications and parallel language run-times. It aims at the following targets.
- Provide scalable application performance by exploiting the new technologies in all layers, i.e. OS, processor and network hardware
- Support new programming models requiring inter-job/inter-domain communication
- Provide portability across different supercomputer architectures by providing an unified interface
Scalable communication faces four important issues as the size of supercomputers increase to exascale and beyond. The first issue is that communiction latency can take a substantial portion of application execution time. The second issue is per-node memory consumption can amount to the order of GB. The third issue is the new programming models with new communication patterns have been introduced. Figure 1 shows a data assimilation application which uses inter-job communication. The simulation job communicates with the data assimilation job. Figure 2 shows a genome analysis application which uses inter-domain communication. The genome sequencer residing in a facility remote to a supercomputer site generates data and send it to the calculation job residing in the supercomputer site.
|Figure 1: An data assimilation application with inter-job communication.||Figure 2: An genome analysis application with inter-domain communication.|
The fourth issue is a wide range of supercomputer arcthitectures should be supported.
In response to this situation, we propose a new low-level communication library called LLC. It deals with the four issues with the four solutions. The first solution is to exploit new features that are available in OS, processor and network hardware. For example, Tofu interconnect allows for the communication library to choose a DMA engine, which fetches payload from memory and injects it into the local router, to avoid resource contention among communication flows. The second solution is to dynamically acquire/release memory areas for communication states of end-point pairs and to share memory-area among processes in one node. The third solution is to provide an interface with an abstraction which makes it easy to describe the inter-job/inter-domain communication. Communication participants belonging to different network domains or jobs are represented with a similar way to those belonging to the same network domain or the same job. The fourth solution is to use an unified interface. OS-specific or hardware-specific optimizations are hidden under the interface to reduce the variations.
Figure 3 shows the relationships between applications, parallel language runtimes (e.g. MPI and PGAS), OFI/LLC and network drivers. OFI/LLC provides functions to applications or parallel language runtimes. A network driver provides communication functions to LLC. Their interfaces to their immediately above layer are more abstracted when going upward in the hierarchy.
|Figure 3: Relationships between OFI/LLC and related software.|
- Processor: x86_64 architecture
- Network hardware: Mellanox ConnectX, ConnectX-2, ConnectX-3, Connect-IB
- OS: CentOS 7.1, 7.2
- GPL V2
Masamichi Takagi, Norio Yamaguchi, Balazs Gerofi, Atsushi Hori, Yutaka Ishikawa:
“Adaptive transport service selection for MPI with InfiniBand network”,
In Proc. ExaMPI workshop held in conjunction with SC, Article No. 3, pp. 1-10, 2015.
Norio Yamaguchi, Masamichi Takagi, Atsushi Hori, Yutaka Ishikawa: “Memory Consumption Reduction in Communication Libraries to Support Massively Parallel Execution”, In IPSJ SIG Notes, 2015-HPC-152(8), pp. 1-10, Dec. 2015, (in Japanese).
Norio Yamaguchi, Masayuki Hatanaka, Masamichi Takagi, Atsushi Hori, Yutaka Ishikawa: “Design and Implementation of a Portable Communication Library for High Performance Computing”, In IPSJ SIG Notes, 2014-HPC-145(15), pp. 1-9, Jul. 2014, (in Japanese).
Masamichi Takagi, Yuichi Nakamura, Atsushi Hori, Balazs Gerofi, Yutaka Ishikawa: “Revisiting rendezvous protocols in the context of RDMA-capable host channel adapters and many-core processors”, In Proc. EuroMPI, pp. 85-90, 2013.