Detecting Performance Outliers for Task-based HPC Applications in multi-[CPU|GPU|Node] clusters By Lucas Schnorr (Porto Allegre)
– November 16, 2017
Detecting Performance Outliers for Task-based HPC Applications in
multi-[CPU|GPU|Node] clusters
Programming paradigms in High-Performance Computing have
been shifting towards task-based models which are capable of
adapting readily to heterogeneous and scalable
supercomputers. Detecting performance outliers in such environments
is particularly difficult because it must consider architecture
heterogeneity and variability. In this work we present how we have
employed a very simple performance model to highlight task outliers
of the well-known tiled-based dense Cholesky factorization running
on top of StarPU-MPI, a runtime for task-based applications. Such
work has been integrated into our visualization framework based on the
R programming language and the tidyverse meta-package. Experiments
have been conducted in a controlled environment using the Chifflet
cluster at Lille, part of the Grid'5000 infrastructure, using up to
eight nodes, each one equipped with 28 cores and two GPUs. The
preliminary results, derived from collected traces, indicate that
explicit binding for the MPI and GPU-managing threads, within
StarPU, alleviate the issue, leading to performance gains.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.