In the past years, machine learning has started to help map, understand and predict the molecular biology of single cells. We develop methods that address specific biological hypotheses and originate from different areas of machine learning.
Before joining the field in 2015 as a postdoc with Fabian Theis, I developed computational techniques for predicting the emergent behavior of models of strongly correlated quantum materials, basic models of quantum computers, and chemical reactions in solar cells.
The introduction of RNA velocity in single cells has opened up new ways of studying cellular differentiation in scRNA-seq [LaManno18]. It describes the rate of gene expression change for an individual gene at a given time point based on the ratio of its spliced and unspliced messenger RNA (mRNA). With scVelo, we solve the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to a wide variety of systems comprising transient cell states, which are common in development and in response to perturbations. The paper made it on the cover of Nature Biotechnology.
We showed that generative models are able to predict single-cell perturbation responses out-of-distribution [P27]. In principle, this approach should enable training models to predict the effects of disease and disease treatment across cell types and species. While the first implementation of the approach (scGen) relied on latent space vector arithmetics, we recently published an end-to-end-trained model based on a conditional variational autoencoder (trVAE) [P29] and a deep factor model [P32]. We wrote a review about the emerging field [P31].
Partition-based graph abstraction (PAGA) aims to reconcile clustering with manifold learning by explaining variation using both discrete and continuous latent variables [P26]. PAGA generates coarse-grained maps of manifolds with complex topologies in a computationally efficient and robust way. In [P24], we used it to infer the first lineage tree of a whole complex animal - a Science breakthrough of the year 2018. It has been benchmarked as the overall best performing trajectory inference method in a review of ~70 methods by Saelens et al. (Nat. Biotechn., 2019) [tweet]. PAGA also builds on diffusion pseudotime [P19], which defined a robust global measure of similarity among cells.
Scanpy [P23] is a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. Together with the underlying anndata it has become widely used and lead to a little ecosystem. It has been selected as an Essential Open Source Software for Science by CZI among 32 projects, alongside giants such as numpy, pandas, scikit-learn, matplotlib, and others. See software.
Existing methods for learning latent representations for single-cell RNA-seq data are based on autoencoders and factor models where the former are hard to interpret and the latter have limited flexibility. Here, we introduce a framework for learning interpretable autoencoders based on regularized linear decoders, decomposing variation into interpretable components using prior knowledge.
Using large-scale imaging data, we show how to reconstruct continuous biological processes using deep learning for the examples of cell cycle and disease progression in diabetic retinopathy [P20]. Read more.
The goal of the Data Science Bowl 2017 was to predict lung cancer from tomography scans. It was the highest endowed machine learning competition with $1M total in prize money in 2017. We won the 7th prize among nearly 2.4k teams and more than 10k participants; the best result among all German teams.
Tensor trains (MPS, DMRG) constitute - together with quantum monte carlo and the numerical renormalization group - the key numerical approach for tackling the exponential computational complexity of models of strongly correlated materials and quantum computers.
We developed a way to use tensor trains within dynamical mean-field theory to enabable the simulation of previously inaccessible emergent properties of strongly correlated materials [O6,P12-P18] - this worked to some degree, but turned out to be a hard problem. This is computational many-body physics at the interface of quantum information and field theory. With U. Schollwöck and A. Millis.
The low energy conversion efficiency of established solar cells is largely due to chemical imperfections of the material at which excited photons recombine. While at Bosch research, I established models for material syntheses to optimize processes for the minimization of such imperfections [O5,P8-P11]. Mathematically, these models reduce to diffusion-reaction equations. I wrote a proprietary software, which was productionized at Bosch Solar Energy. With P. Pichler.
The quantum Rabi model is the basic model for understanding decoherence of a Q-bit that is coupled to a bath, and hence, a basic model for the technical foundations of quantum computing [P6,P7]. By exploiting a recent exact solution of the static system, we established several dynamical properties, amonth others, Schroedinger-cat like states that show particular robustness towards decoherence. With D. Braak.
During studies, I focused on emergent properties of quantum-many body systems and their applications. Using a phenomenological theory of superconductivity (Bogoliubov de Gennes), we showed how grain boundaries and strong correlations affect high-temperature superconductivity [P5]. With T. Kopp.
Collapse and revival oscillations and coherent expansions have been suggested for realizing matter-wave lasers. The following two projects [P2,P4] provided first in-depth models in one- and two-dimensional lattices. With M. Rigol.
We investigated the non-equilibrium behavior of quantum many-body systems [P1-P4], in particular, the fundamental problem of how such systems transition from an excited state to equilibrium. This happens through chaotic dynamics in the classical case, but is an active area of research in the quantum case. We showed that the transition proceeds through an intermediate, prethermalized, plateau for which we developed a statistical theory. I contributed the central analytical calculation [T1] to the highly cited paper [P3] during a summer lab project. With M. Kollar.
During high school, I tried to gain a better understanding of how philosophical and political ideas stimulate change in society and culture. In my thesis, I investigated why J.-P. Sartre publicly supported the German terrorist group RAF upon his visit in Stammheim in 1974 [O1]. For more context, see Der Spiegel (2013).