The team’s goal is to develop effective software tools and methods for assessing the correctness and improve the performance of next generation HPC applications.
We propose to achieve scientific and technological advances in the following research directions:
Determining if a HPC program behaves as expected on any execution is challenging due to non-deterministic executions and imprecise analysis information at compile time. We plan to target codes using SPMD, data and task parallelism or hybrid parallelism (e.g. SPMD+OpenMP) in a combination of wide used programming models. We will target several classes of bugs: 1) data races; 2) deadlocks and 3) compliance with specifications. Our previous experience with code optimization using one-sided communication indicates that programmers very easily introduce races when starting to optimize for communication overlap. This is particularly pertinent since code modernization efforts to use MPI 3.0 one-sided communication primitives are underway. Our previous experience with runtime interoperability, resource managers or performing synchronization in hybrid mod- els, indicates that deadlock is a common occurrence, especially at very large scale. Our previous experience with hybrid MPI+OpenMP codes showed many examples where the runtime API specification was violated due to concurrency bugs.
Communication optimizations are important for end-to-end application scalability. Currently, several efforts explore new APIs to provide non-blocking split-phase communication and synchronization. Examples include MPI 3 RMA and the non-blocking collectives proposal. Yet, these APIs are still in design stage and not deployed in applications. The first thrust of our optimization work will dynamically replace communication and synchronization patterns in applications with their non-blocking counterparts, maximize overlap and replace expensive synchronization with simpler patterns (aka collective to point-to-point). These optimizations will provide both performance improvements and a vehicle for easy prototyping of algorithms and runtime APIs. The second thrust of our effort will attempt to perform cross-module optimizations in applications. The insight is that applications are increasingly modular and compose multiple third party libraries, solvers or runtime, e.g. MPI and OpenMP. Today, very little support for cross layer transformations exists and no optimizations are performed. Our goal is to start providing a toolkit to enable cross runtime optimizations, the equivalent of compiling the MPI, OpenMP stacks together with the application and performing inter-procedural optimizations. Based on our preliminary experiences, such methodology has a lot of potential for performance improvements due to optimizations not attainable by any human.
The framework developed during this project will provide mechanisms to aid developers with reasoning about correctness and profitability. Our dynamic analyses will analyze traces of test executions to discover potential communication and synchronization optimizations. The framework will then present the opportunities of optimizations along with possible transformations to the developers in an understandable way for manual verification.