Omics data analysis: computational infrastructure and data stewardship
Докладчик: Dr. Lennart Karssen

Affordable technology has been the motor behind data generation in many different fields. In the (biomedical) life sciences this is exemplified by the large scale collection of MRI data, as well as the generation of what is currently known as 'omics data. Starting with genomics (microarray-based genotyping of large cohorts, followed by next-generation sequencing techniques) and, more recently, metabolomics and transcriptomics, the shear wealth of data available to us is huge.

However, this process of ever increasing data size poses some very practical problems for researchers in the field. Not only do we (both bioinformaticians and other researchers!) have to think (up front) about where the data needs to be stored and processed, but also on what needs to be backed up and where. And who will pay for this? These questions are closely linked to data stewardship: "how do we keep our data secure?", "where did the data come from and how was is processed?".

In this lecture I will address several issues of data stewardship, based on my practical experience. I will also provide an overview of the types of compute and storage solutions that are currently available, from small local servers to large cloud instances, showing the trade-offs that have to be made in each case. Given the need for reliable storage of our big data I will give a short overview of modern data storage, for example next-generation file systems like ZFS.

Lennart Karssen obtained his MSc in experimental physics at Utrecht University (The Netherlands), followed by a PhD in experimental atomic and optical physics at the same university in 2008 with a thesis titled “Trapping ultracold atoms with ultrashort laser pulses”. In 2009 he worked at the National University of Rwanda as postdoc in the Research Commission and as senior lecturer at the Department of Physics. Upon his return to The Netherlands he worked for some time as a Unix/Linux consultant at a company called Snow. From 2010 – 2013 Lennart Karssen worked as a postdoc in bioinformatics and server guru in the Genetic Epidemiology group of Prof. Cornelia van Duijn at the Erasmus University Medical Centre, Rotterdam, The Netherlands. At the Erasmus MC he worked on various projects including the Genome of the Netherlands project which used a parent-offspring design to create a reference set for genomic imputations, and the exome sequencing project of the GRIP study, which sequenced the exomes of ~ 1300 people from an isolated population with more than 3000 deeply phenotyped people within a complex pedigree of more than 23,000 individuals spanning 23 generations. Since late 2013 Lennart works together with Dr. Yurii Aulchenko in their company called PolyOmica.

Currently he is working on several projects, including two EUfunded research projects: MIMOmics which develops methods for integrated analysis of multiple omics datasets, in which PolyOmica leads the work package on “Data Integration and Distributed Computing”; and PainOmics, which takes a multi-dimensional omics approach to stratification of patients with low back pain. Here PolyOmica leads the work package on “Integrated models for identification of biomarkers and potential new therapeutic targets”.

Together with computer scientists, data generators and data analysts he currently works on a tool that speeds up this genome-wide association analysis by several orders of magnitude when using many (e.g. 102 – 105 ) phenotypes and 106 – 108 genotypes, as are currently found in e.g.

metabolomics studies. Lennart has (co-)supervised 1 PhD thesis in Genetic Epidemiology, as well as 1 MSc thesis and 4 BSc theses in experimental physics and recently obtained an MSc in health sciences with specialisation in genetic epidemiology from the Erasmus University Rotterdam.

