TESIS
Τ-Lop: Scalably And Accurately Modeling Contention And Mapping Effects In Multi-Core Clusters
2016-01-29
Tecnologïa De Computadores, Señales Y Comunicaciones
Tecnologia De Los Ordenadores
DIRECTORES
Díaz Martín, Juan Carlos (Director)
TRIBUNAL
Carretero Pérez, Jesús (Presidente)
Garcia Sanchez, Jose Daniel (Secretario)
González Sánchez, José Luis (Vocal)
Lastovetsky, Alexey (Vocal)
Plaza Miguel, Antonio José (Vocal)
DESCRIPCIÓN
Modern HPC multi-core platforms are complex systems composed of heterogeneous processors and a hierarchy of shared communication channels. Achieving optimal performance of MPI applications on that platforms is not trivial. Formal analysis using parallel performance models contributes to depict algorithms behavior and communication complexities, with the goal of predicting their cost and improving their performance.Current accepted communication models, as the representative LogGP, were initially conceived to predict the cost of algorithms in mono-processor clusters as a sequence of point-to-point transmissions characterized by network latency and bandwidth parameters. Although multiple extensions have been proposed for covering issues derived from current platforms complexities, as contention and channels hierarchy, such specific extensions are not enough to meaningfully and accurately model more than simple algorithms. As modern supercomputers are built upon cheap commodity boards with a growing number of cores, intra-node communication becomes progressively more relevant, as well as the derived contention in the communication channels. These heterogeneous high performance computing platforms need new approaches for the communication performance modeling to address their complexities.This work unveils the reasons for the poor fit of the cited representative models in this domain, and proposes a new model named τ–Lop, which addresses the challenge of accurately modeling MPI communications on heterogeneous multi-core clusters. τ–Lop is based on the concept of concurrent transfers, and applies it to meaningfully represent the behavior of algorithms in platforms with hierarchical shared communication channels, taking into account the effects of contention and deployment of processes on the processors. It demonstrates the ability to predict the cost of advanced algorithms and communication mechanisms used by mainstream MPI implementations, such as MPICH or Open MPI, with a high accuracy. In addition, an exhaustive and reproducible methodology for measuring the parameters of the model is described.