Genome Information Analysis

Genome sequences are not merely simple strings, but hidden behind them are real molecules with real structures that hold information about complicated biological mechanisms. ‘Meanings’ are hidden behind the ‘visible’ sequences. Recent research has revealed that genomes are dynamically controlled – for instance there are relations between cell differentiation and the structural change of genome.

 We have been developing software for genome sequence analysis, especially for large amount of data from high-throughput sequencers, in order to extract information based on stochastic framework.


RNA Informatics

Since the discovery of RNA interference and micro RNAs, a number of functional non-coding RNAs have been found. They are transcribed but not translated to proteins, play various roles in cells, not limited to repression of translation.

 We have developed theories and leading software in the field of RNA informatics, such as CentroidFold, one of the most accurate tools for the secondary structures prediction of RNAs( The probability of a specific RNA secondary structure, even if it is the most stable structure, is astronomically small, because RNA structures undergo thermodynamic fluctuation. We are developing various methods to extract useful information from the probability distribution of the RNA secondary structures.

 Recently, it has been shown that the modification of genomic DNA is essential to the regulation of processes such as cell differentiation. The modification plays important role also in RNA. In order to predict the structures of RNAs which include modified bases, we are trying to identify the energy parameters of modified bases by combining MD simulations and melting temperature scaling experiments. The results will be implemented to various analysis tools of RNA secondary structures.




Biological Sequence Design

We are studying the design of genome sequences for efficient production of target materials by micro-organisms. We have designed clusters of genes of anti-body in the AMED project. In the NEDO project, we are trying to optimize the DNA sequence for efficient production by machine learning, based on a large number of combinations of DNA sequences experimentally produced. In such a design, the efficiency of the translation of mRNA as well as that of transcription, should be optimized to improve productivity. This area has an abundance of wide-ranging research subjects, such as the relationship between the efficiency of translation and the structure of mRNAs.


Privacy Preserving Calculations

From large amount of data, including DNA sequences of personal genomes, we expect that valuable information can be extracted using AI technologies such as machine learning. Recently privacy data mining technologies, which safely process sensitive data in the encrypted form, have become important. In CREST project, we develop a general framework of delegate calculation that enables easy implementation of various privacy preserving services.