Bioinformatics

Biological sciences currently produce a wealth of data, especially on the molecular level. Over the last decade, the growth rate of these data has surpassed Moor’s laws of doubling size at every 18 months, with a recently estimated doubling size at every 7 months. These diverse high-throughput data are often denoted as ’omics’ to collectively characterize different sources – genomics, transcriptomics, proteomics, metabolomics, etc. The plethora of data types and their quantity has induced a fundamental shift in molecular biology research. The field of bioinformatics emerged from the ambition to develop more powerful tools for the analysis and drawing conclusions from omics data.

 

The bioinformatics team at BioSense is developing tools and methods for different domains within molecular biology, such as functional genomics, metagenomics, quantitative trait loci (QTL) mapping and genome-wide association studies (GWAS) for traits of significance, mapping of abiotic and biotic stress resistance, etc. To meet the computational and analytical challenges we rely on machine learning, graph algorithms, information theory, data compression and data/knowledge integration. Most of the data we handle falls within the agricultural domain, mainly crop genomics, genome-phenome interactions, genotype-by-environment interactions, but also microbiome data related both to crop and human health.

 

Bioinformatics algorithms also have an important role for the emerging technology of DNA-based data storage. Large amounts of generated data call for efficient data storage solutions. DNA-based data storage was proven to possess great potential in this regard, due to its longevity and enormous information density, and is becoming an attractive alternative to the conventional information storage systems. Within this domain our group is focused on the analysis of the fundamental limits of such systems, devising new storage architectures, and developing the corresponding algorithms. Some of the problems we are currently working on are sequence design, error-correction mechanisms, and encoding/decoding algorithms.