2nd RNALands meeting in Vienna

The second meeting of the RNALands project took place at the TBI Vienna on the 5-6th of Nov 2015.

Program:

  • Thursday, Nov. 5th
    •   10:00 Welcome address + Morning session
      Gregor Entzian – Introduction + Local flooding
      Maria Waldl – Data collection + Format
      Andrea Tanzer – RNA editing site: A case study
    •   12:00 Lunch
    •   14:00 Afternoor session
      Alain Denise – Frameshift and kinetics
      Ronny Lorenz & Yann Ponty – Iterated 2D projections + Clustering
      Loic Paulevé – Sliding windows + Exact integration
      Hélène Touzet & Yann Ponty – Counting and sampling locally optimal (?) structures
    •   18:00 Free discussions + Departure for restaurant (probably Wickerl)
  • Friday, Nov. 5th
    Various discussions in smaller groups

AMIB and McGill (Canada)

March, 6-th-March, 10th : J. M. Steyaert is visiting our partners in Mc Gill : J. Waldispühl and M. Blanchette.

[Internship/PhD] Algorithmic aspects of RNA locally optimal structures

Scientific context. RiboNucleic Acids (RNAs) are single-stranded macromolecules which can be crudely described as sequences of length ranging from 20 to 3 000 letters, over the four-letters alphabet {A,C,G,U}. Due to its single-stranded nature, any synthesized RNA molecule undergoes a folding process, in which connections, also known as base-pairs mediated by hydrogen bonds, are established between its nucleotides. The outcome of RNA folding is a large variety of structures, or spatial conformation, which strongly determine the function of a given RNA within cellular mechanisms. From a computer science perspective, the structure(s) of an RNA can be acceptably abstracted as a graph, or a tree.

Internship objective. The main goal of this internship is the design and implementation of one (or several) algorithm(s) that would enable the enumeration and random generation of RNA locally optimal structures. Such structures act as kinetic traps in the energy landscape, aka local minima of the energy function, and are generally believed to significantly slow-down, or even disrupt, the folding of structured RNAs. Their enumeration represents a mandatory first step towards an efficient in silico analysis of the kinetic behavior of RNA from its sequence. Previous works [1] have resulted in a polynomial-time algorithm for the enumeration of locally optimal structure in a combinatorial setting based on base-pair maximization. This algorithm will be the starting point of the internship, and it will be completed/extended to capture the complete set of features supported by the – more accurate and realistic – Turner model [2].

Environment. The intern will be hosted by the Bonsai team (Inria Lille Nord Europe et LIFL), and co-supervised by Hélène Touzet and Yann Ponty (CNRS LIX/PIMS Vancouver). It may start between January and April 2015, and will last from 4 to 6 months. A compensation (gratification de stage mensuelle) may be allocated to the successful candidate.
This work is part of the French/Austrian project RNALands, recently funded by ANR and FWF, whose aim is the design of efficient predictive methods for RNA kinetics. It may be followed by a PhD at LIX (Computer Science Dept of Ecole Polytechnique –Palaiseau), and include research visits to the other partners of the project, in Lille and Vienna.

Candidate profile. The perfect candidate for this internship is a fifth-year student in Computer Science with a strong background in algorithms/data structures and/or bioinformatics, and a real taste for algorithm design and implementation. A preliminary experience in C/C++, plus a scripting language of the candidate’s choice (Python, Ruby, Perl…), is required. A background in Molecular Biology/Biochemistry, or some measurable level of intellectual curiosity for the subject, will be considered a plus.
To apply, please send:

  • a complete resume;
  • a cover letter stating your objectives and expectations;
  • a copy of your academic record/transcript for the past two years;
  • the contact of 2-3 reference;

to Hélène Touzet (helene.touzet@lifl.fr) and Yann Ponty (yann.ponty@lix.polytechnique.fr).

References

  • [1] Azadeh Saffarian, Mathieu Giraud, Antoine de Monte, and Hélène Touzet. RNA locally optimal secondary structures. J Comput Biol, 19(10):1120–1133, Oct 2012.
  • [2] D. H. Mathews, J. Sabina, M. Zuker, and D. H. Turner. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol, 288(5):911–940, May 1999.

[PhD] Ab-Initio Classification And Detection Of Non-Coding RNAs From Thermodynamics Principles

Biological context. Once overlooked by a protein-centric view of cellular mechanisms, noncoding RNAs (ncRNAs) have recently been found to play many unsuspected roles (Regulation, self-maturation, genome defense. . . ), either alone or through a complex with a protein. The action of ncRNAs has also been associated with diseases such as cancer (Lu et al., 2005), autism (Nakatani et al., 2009), Alzheimer’s disease (Faghihi et al., 2008). . . More generally the conclusions of the ENCODE effort (Consortium, 2007), which analyzed a portion of the humain genome, showed that a very large majority of DNA is transcribed at some stage of cell life or in some cellular context. This contrasts direly with the small (2-3%) proportion of genomes that are coding for protein genes, suggesting that a large amount of – currently unknown – ncRNAs might be involved in cellular mechanisms. This conclusion is further supported by the explosive growth of the number of functional families indexed in the reference RFAM database (Griffiths-Jones et al., 2003) (176 in 2005, 574 in 2007, 1500 in 2011, and 2497 in 2015). Understanding which, of the remaining transcripts, lead to functional ncRNAs is one of the key challenges of RNA computational biology.

Beyond Minimal Free Energy (MFE) models. The functional role of ncRNAs is mainly characterized by its structure and the secondary structure of RNA, a computationally-tractable relaxation of the 3D structure, constitutes a valuable tool for RNA bioinformaticians, e.g. for the characterization of families (Consensus structures of the RFam database (Griffiths-Jones et al., 2003)) and the prediction of its folding (Zuker and Stiegler, 1981). At the core of these methods, the Turner model assigns free-energies to components (or loops) of the secondary structure, and structure prediction can be performed from a single sequence through a minimization of the free-energy (Zuker and Stiegler, 1981). Lately, this approach was extended, based on the assumption that the different secondary structures compatible with a sequence co-exist within a Boltzmann distribution, yielding slightly more sensitive and more specific predictions on ncRNAs (Ding et al., 2005). Finally, features of the Boltzmann distribution, e.g. expectation and variance of the free-energy, can be efficiently extracted for a given sequence (Ding et al., 2014). Taking these features into account was shown by Miklos et al. (2005) to better discriminate between mRNAs and random sequence than the sole consideration of the free-energy. Extending this approach to other additive features of compatible structures in the Turner model may lead to an alternative characterization of ncRNA families based on thermodynamic signatures.

Toward thermodynamics-based models for ncRNAs detection. The goal of this project is to contribute a unifying algorithmic framework for the computation of RNA thermodynamic signatures, and to test their discriminatory power in the classification and identification of ncRNA families. Such signatures will primarily include the moments of the distribution for additive features (Free-energy, #hairpins, #unpaired positions, . . . ) in the Boltzmann ensemble. Such signatures, by capturing the whole folding landscape of ncRNAs in a weighted ensemble, are expected to be less prone to inaccuracies, for instance in the case of multistable or pseudoknotted RNAs. Although the project may eventually integrate evolutionary data (conservations or covariations) to improve its predictions, its primary emphasis is put on the extraction of sequence-only signals, since: 1) Most in silico approaches for the classification/detection of ncRNAs rely on the MFE paradigm, which has shown to be a limiting factor in many contexts, e.g. for the RNA folding problem; and 2) Such signals are associated with natural biochemical interpretations, from which bottom-up, and mechanical, biological hypotheses can be established and tested.
Towards this goal, the candidate will build on and extend a previous contribution by Ponty and Saule (2011) to compute arbitrary moments in generic dynamic programming schemes. He/she will combine grammar transformations with algebraic dynamic programming techniques (Sauthoff et al., 2013) to perform an automated generation of code for each features. These features will be systematically computed, and integrated in a machine learning approach to scan for occurrences of ncRNAs belonging to existing classes (ncRNA classification problem), and unravel new signals for the detection of novel classes of ncRNAs (ncRNA detection problem).

Main tasks. After a thorough literature search, especially critical in the context of an interdisciplinary project, the candidate will design and implement a compiler which, for any desired feature and statistical moment, will generate a suitable grammar for Bellman’s GAP compiler Sauthoff et al. (2013). The compilation of this grammar will yield the necessary C code for an automatic extraction of features (or correlations of features) in the Boltzmann distribution. The main reason for such a compilation is build on an existing implementation for a related problem using the latest thermodynamics parameters (Vienna package (Hofacker et al., 1994), whose reimplementation would be a tedious and unrewarding task).
Secondly, a list of features of interest will be established and a corresponding family of software will be automatically-produced using the software tool produced in the first step. The moments of these features will be systematically computed on selected RFam families (Griffiths-Jones et al., 2003), and the candidate will test their capacity to discriminate RNA sequences belonging to different functional families. To that purpose, a list of natural hypotheses will be tested, coupled with exploratory approaches using machine learning approaches based on the Weka toolbox (Bouckaert et al., 2010).
Finally, depending on the outcome of the previous phases, the signal extracted during the previous phase of the project will be validated on a larger scale, and validated against/used jointly with evolutionary information (covariation) to detect novel ncRNA sequences, with a special emphasis on multistable RNAs (riboswitches).

Contact: yann.ponty@lix.polytechnique.fr

References

  • Jun Lu, Gad Getz, Eric A Miska, et al. MicroRNA expression profiles classify human cancers. Nature, 435(7043):834-838, Jun 2005. doi: 10.1038/nature03702.
  • Jin Nakatani, Kota Tamada, Fumiyuki Hatanaka, et al. Abnormal behavior in a chromosome engineered mouse model for human 15q11-13 duplication seen in autism. Cell, 137(7):1235-1246, Jun 2009. doi: 10.1016/j.cell.2009.04.024.
  • Mohammad Ali Faghihi, Farzaneh Modarresi, Ahmad M Khalil, et al. Expression of a noncoding RNA is elevated in alzheimer’s disease and drives rapid feed-forward regulation of betasecretase. Nat Med, 14(7):723-730, Jul 2008. doi: 10.1038/nm1784.
  • The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature, 447:799-816, 2007.
  • Sam Griffiths-Jones, Alex Bateman, Mhairi Marshall, Ajay Khanna, and Sean R Eddy. Rfam: an RNA family database. Nucleic Acids Res, 31(1):439-441, Jan 2003.
  • M. Zuker and P. Stiegler. Optimal computer folding of large RNA sequencesusing thermodynamics and auxiliary information. Nucleic Acids Res., 9:133-148, 1981.
  • Y. Ding, C. Y. Chan, and C. E. Lawrence. RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA, 11:1157-1166, 2005.
  • Yang Ding, William A. Lorenz, Ivan Dotu, Evan Senter, and Peter Clote. Computing the probability of RNA hairpin and multiloop formation. J Comput Biol, 21(3):201-218, Mar 2014. doi: 10.1089/cmb.2013.0148.
  • Istvan Miklos, Irmtraud M Meyer, and Borbala Nagy. Moments of the boltzmann distribution for RNA secondary structures. Bull Math Biol, 67(5):1031-1047, Sep 2005. doi: 10.1016/j.bulm.2004.12.003.
  • Yann Ponty and Cédric Saule. A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms. In WABI – 11th Workshop on Algorithms in Bioinformatics – 2011, Saarbrucken, Allemagne, 2011.
  • Georg Sauthoff, Mathias Möhl, Stefan Janssen, and Robert Giegerich. Bellman’s GAP-a language and compiler for dynamic programming in sequence analysis. Bioinformatics, 29(5):551-560, Mar 2013. doi: 10.1093/bioinformatics/btt022.
  • I. L. Hofacker, W. Fontana, P. F. Stadler, et al. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie / Chemical Monthly, 125(2):167-188, 1994.
  • Remco R. Bouckaert, Eibe Frank, Mark A. Hall, et al. WEKA – Experiences with a Java Open-Source Project. Journal of Machine Learning Research, 11:2533-2541, 2010.

[PhD] Towards quantitative modeling of cell metabolism and its control by external environment

Sorry, this entry is only available in French.

Visit of Indrajit Saha

On the invitation of Mireille Régnier,Indrajit Saha, ERCIM post-doctoral fellow at Wroclaw University,
Poland, will visit AMIB from February 22nd until February 27th.

Conference is scheduled at LIX, February 26th at 14h30, salle Philippe Flajolet.

Inria-Industry day

AMIB a participé le 11 février aux Journées Inria-Industrie sur le thème Bio-informatique et outils numériques pour les produits de santé.

Présentation des logiciels

Integrative Approaches for Modeling Biomolecular Complexes 2013

Co-organization by Inria, Nice University and McGill University.

Nice, May 29th-31st,

See the site.

Best Application Paper at EGC’2013 conference

Best application paper at EGC 2013 conference for the paper “Identification de complexes protéine-protéine par combinaison de classifieurs” by Thomas Bourquard, Damien de Vienne and Jérôme Azé

Visit of Professor Davit Saakian

On the invitation of J.-M. Steyaert and L. Schwartz, D. Saakian, invited professor at Academia Sinica,
Republic of China, Taiwan, will visit AMIB from January 21st until January 25th.

Conferences are scheduled at LIX, January 21st at 14h and January 22nd at 11h.