Parallel Sequence Alignment of Whole Chromosomes with Hundreds of GPUs and Pruning (by Alba Cristina Magalhaes Alves de Melo, University of Brasilia )
Parallel Sequence Alignment of Whole Chromosomes with Hundreds of GPUs and Pruning (by Alba Cristina Magalhaes Alves de Melo, University of Brasilia )
– February 15, 2018
Biological Sequence Alignment is a very basic operation in
Bioinformatics used routinely worldwide. Smith-Waterman is the exact
algorithm used to compare two sequences, obtaining the optimal alignment
in quadratic time and space. In order to accelerate Smith-Waterman, many
GPU-based strategies were proposed in the literature. However, aligning
DNA sequences of millions of characters, or Base Pairs (MBP), is still a
very challenging task. In this talk, we discuss related work in the area
of parallel biological sequence alignment and present our multi-GPU
strategy to align DNA sequences with up to 249 millions of characters in
384 GPUs. In order to achieve this, we propose an innovative speculation
technique, which is able to parallelize a phase of the Smith-Waterman
algorithm that is inherently sequential. We combined our speculation
technique with sophisticated buffer management and fine-grain linear
space matrix processing strategies to obtain our parallel algorithm. As
far as we know, this is the first implementation of Smith-Waterman able
to retrieve the optimal alignment between sequences with more than 50
millions of characters. We will also present a pruning technique for one
GPU that is able to prune more than 50% of the Smith-Waterman matrix and
still retrieve the optimal alignment. We will show the results obtained
in the Keeneland cluster (USA), where we compared all the human x
chimpanzee homologous chromosomes (ranging from 26 MBP to 249 MBP). The
human_chimpanzee chromosome 5 comparison (180 MBP x 183 MBP) attained
10.35 TCUPS (Trillions of Cells Updated per Second) using 384 GPUs. In
this case, we processed 45 petacells, being able to produce the optimal
alignment in 53 minutes and 7 seconds, with a speculation hit ratio of
98.2%.
Short Bio: Alba Cristina Magalhaes Alves de Melo obtained her PhD degree
in Computer Science from the Institut National Polytechnique de Grenoble
(INPG), France, in 1996. In 2008, she did a postdoc at the University of
Ottawa, Canada; in 2011, she was invited as Guest Scientist at
Université Paris-Sud, France; and in 2013 she did a sabbatical at the
Universitat Polytecnica de Catalunya, Spain. Since 1997, she works at
the Department of Computer Science at the University of Brasilia (UnB),
Brazil, where she is now a Full Professor. She is also a CNPq Research
Fellow level 1D in Brazil. She was the Coordinator of the Graduate
Program in Informatics at UnB for several years (2000-2002, 2004-2006,
2008, 2010, 2014) and she coordinated international collaboration
projects with the Universitat Politecnica de Catalunya, Spain (2012,
2014-2016) and with the University of Ottawa, Canada (2012-2015). In
2016, she received the Brazilian Capes Award on “Advisor of the Best PhD
Thesis in Computer Science”. Her research interests are High Performance
Computing, Bioinformatics and Cloud Computing. She advised 2 postdocs, 4
PhD Thesis and 22 MsC Dissertations. Currently, she advises 4 PhD
students and 2 MsC students. She is Senior Member of the IEEE Society
and Member of the Brazilian Computer Society. She gave invited talks at
Universitat Karlshure, Germany, Université Paris-Sud, France,
Universitat Polytecnica de Catalunya, Spain, University of Ottawa,
Canada and at Universidad del Chile, Chile. She has currently 91 papers
listed at DBLP
(www.informatik.uni-trier.de/~ley/db/indices/a-tree/m/Melo:Alba_Cristina_Magalhaes_Alves_de.html).
Parallel Sequence Alignment of Whole Chromosomes with Hundreds of GPUs and Pruning (by Alba Cristina Magalhaes Alves de Melo, University of Brasilia )
– February 15, 2018
Biological Sequence Alignment is a very basic operation in
Bioinformatics used routinely worldwide. Smith-Waterman is the exact
algorithm used to compare two sequences, obtaining the optimal alignment
in quadratic time and space. In order to accelerate Smith-Waterman, many
GPU-based strategies were proposed in the literature. However, aligning
DNA sequences of millions of characters, or Base Pairs (MBP), is still a
very challenging task. In this talk, we discuss related work in the area
of parallel biological sequence alignment and present our multi-GPU
strategy to align DNA sequences with up to 249 millions of characters in
384 GPUs. In order to achieve this, we propose an innovative speculation
technique, which is able to parallelize a phase of the Smith-Waterman
algorithm that is inherently sequential. We combined our speculation
technique with sophisticated buffer management and fine-grain linear
space matrix processing strategies to obtain our parallel algorithm. As
far as we know, this is the first implementation of Smith-Waterman able
to retrieve the optimal alignment between sequences with more than 50
millions of characters. We will also present a pruning technique for one
GPU that is able to prune more than 50% of the Smith-Waterman matrix and
still retrieve the optimal alignment. We will show the results obtained
in the Keeneland cluster (USA), where we compared all the human x
chimpanzee homologous chromosomes (ranging from 26 MBP to 249 MBP). The
human_chimpanzee chromosome 5 comparison (180 MBP x 183 MBP) attained
10.35 TCUPS (Trillions of Cells Updated per Second) using 384 GPUs. In
this case, we processed 45 petacells, being able to produce the optimal
alignment in 53 minutes and 7 seconds, with a speculation hit ratio of
98.2%.
Short Bio: Alba Cristina Magalhaes Alves de Melo obtained her PhD degree
in Computer Science from the Institut National Polytechnique de Grenoble
(INPG), France, in 1996. In 2008, she did a postdoc at the University of
Ottawa, Canada; in 2011, she was invited as Guest Scientist at
Université Paris-Sud, France; and in 2013 she did a sabbatical at the
Universitat Polytecnica de Catalunya, Spain. Since 1997, she works at
the Department of Computer Science at the University of Brasilia (UnB),
Brazil, where she is now a Full Professor. She is also a CNPq Research
Fellow level 1D in Brazil. She was the Coordinator of the Graduate
Program in Informatics at UnB for several years (2000-2002, 2004-2006,
2008, 2010, 2014) and she coordinated international collaboration
projects with the Universitat Politecnica de Catalunya, Spain (2012,
2014-2016) and with the University of Ottawa, Canada (2012-2015). In
2016, she received the Brazilian Capes Award on “Advisor of the Best PhD
Thesis in Computer Science”. Her research interests are High Performance
Computing, Bioinformatics and Cloud Computing. She advised 2 postdocs, 4
PhD Thesis and 22 MsC Dissertations. Currently, she advises 4 PhD
students and 2 MsC students. She is Senior Member of the IEEE Society
and Member of the Brazilian Computer Society. She gave invited talks at
Universitat Karlshure, Germany, Université Paris-Sud, France,
Universitat Polytecnica de Catalunya, Spain, University of Ottawa,
Canada and at Universidad del Chile, Chile. She has currently 91 papers
listed at DBLP
(www.informatik.uni-trier.de/~ley/db/indices/a-tree/m/Melo:Alba_Cristina_Magalhaes_Alves_de.html).
Randomized Load Balancing: Asymptotic Optimality of Power-of-d-Choices with Memory by Jonatha Anselmi (Inria Bordeaux)
– March 8, 2018
In multi-server distributed queueing systems, the access of stochastically arriving jobs to resources is often regulated by a dispatcher. A fundamental problem consists in designing a load balancing algorithm that minimizes the delays experienced by jobs. During the last two decades, the power-of-d-choice algorithm, based on the idea of dispatching each job to the least loaded server out of $d$ servers randomly sampled at the arrival of the job itself, has emerged as a breakthrough in the foundations of this area due to its versatility and appealing asymptotic properties. We consider the power-of-d-choice algorithm with the addition of a local memory that keeps track of the latest observations collected over time on the sampled servers. Then, each job is sent to a server with the lowest observation. We show that this algorithm is asymptotically optimal in the sense that the load balancer can always assign each job to an idle server in the large-server limit. This holds true if and only if the system load $\lambda$ is less than $1-\frac{1}{d}$. If this condition is not satisfied, we show that queue lengths are bounded by $j^\star+1$, where $j^\star\in\mathbb{N}$ is given by the solution of a polynomial equation. This is in contrast with the classic version of the power-of-d-choice algorithm, where queue lengths are unbounded. Our upper bound on the size of the most loaded server, $j^*+1$, is tight and increases slowly when $\lambda$ approaches its critical value from below. For instance, when $\lambda= 0.995$ and $d=2$ (respectively, $d=3$), we find that no server will contain more than just $5$ ($3$) jobs in equilibrium. Our results quantify and highlight the importance of using memory as a means to enhance performance in randomized load balancing.
Obtaining Dynamic Scheduling Policies with Simulation and Machine Learning (by Danilo Santos, Datamove)
– March 15, 2018
Obtaining Dynamic Scheduling Policies with Simulation and Machine Learning
Abstract: Dynamic scheduling of tasks in large-scale HPC platforms is normally accomplished using ad-hoc heuristics, based on task characteristics, combined with some backfilling strategy. Defining heuristics that work efficiently in different scenarios is a difficult task, specially when considering the large variety of task types and platform architectures. In this work, we present a methodology based on simulation and machine learning to obtain dynamic scheduling policies. Using simulations and a workload generation model, we can determine the characteristics of tasks that lead to a reduction in the mean slowdown of tasks in an execution queue. Modeling these characteristics using a nonlinear function and applying this function to select the next task to execute in a queue improved the mean task slowdown in synthetic workloads. When applied to real workload traces from highly different machines, these functions still resulted in performance improvements, attesting the generalization capability of the obtained heuristics.
A Class of Stochastic Multilayer Networks: Percolation, Exact and Asymptotic Results by Philippe Nain (inria, Lyon)
– March 22, 2018
Abstract:
In this talk, we will introduce a new class of stochastic multilayer networks. A stochastic multilayer network is the aggregation of M networks (one per layer) where each is a subgraph of a foundational network G. Each layer network is the result of probabilistically removing links and nodes from G. The resulting network includes any link that appears in at least K layers. This model, which is an instance of a non-standard site-bond percolation model, finds applications in wireless communication networks with multichannel radios, multiple social networks with overlapping memberships, transportation networks, and, more generally, in any scenario where a common set of nodes can be linked via co-existing means of connectivity. Percolation, exact and asymptotic results will be presented.
Parallel Space-Time Kernel Density Estimation By Erik Saule (U. Caroline du Nord)
– March 28, 2018
The exponential growth of available data has increased the need for
interactive exploratory analysis. Dataset can no longer be understood
through manual crawling and simple statistics. In Geographical
Information Systems (GIS), the dataset is often composed of events
localized in space and time; and visualizing such a dataset involves
building a map of where the events occurred.
We focus in this paper on events that are localized among three
dimensions (latitude, longitude, and time), and on computing the first
step of the visualization pipeline, space-time kernel density
estimation (STKDE), which is most computationally expensive. Starting
from a gold standard implementation, we show how algorithm design and
engineering, parallel decomposition, and scheduling can be applied to
bring near real-time computing to space-time kernel density
estimation. We validate our techniques on real world datasets
extracted from infectious disease, social media, and ornithology.
Polyhedral Optimization at Runtime, by Manuel Selva.
– March 29, 2018
The polyhedral model has proven to be very useful to optimize and parallelize a particular class of compute intensive application kernels. A polyhedral optimizer needs to have affine functions defining loop bounds, memory accesses and branching conditions. Unfortunately, this information is not always available at compile time. To broaden the scope of polyhedral optimization opportunities, runtime information can be considered. This talk will highlight the challenges of integrating polyhedral optimization in runtime systems:
- When and how to detect opportunities for polyhedral optimization?
- How to model the observed runtime behavior in a polyhedral fashion?
- How to deal at runtime with the complexity of polyhedral algorithm?
These challenges will be illustrated in the context of both the APOLLO framework targeting C and C++ applications and of the JavaScript engine from Apple.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.