Inference of gene regulation using pseudotemporal ordering of single cell snapshots

F. Alex Wolf

Lunch Seminar, February 3rd 2016

Institute of Computational Biology

Helmholtz Zentrum M√ľnchen

fullscreen: 'f' / navigation: arrow keys / black screen: 'b' / overview: 'o'

Thanks to ...

Philipp  ▷  reveal.js, jupyter notebook for teaching

Laleh  ▷  pseudotime, diffusion maps

Maren  ▷  biology, bioinformatics, ...

Niklas Köhler, Philipp Eulenberg  ▷  deep learning

Thomas  ▷  classification, imaging flow cytometry data

Fabian, Norbert, Valeriya Naumova  ▷  teaching, ML basics

Robert Küffner, Fabian  ▷  GRN inference

Motivation

Standard approach to GRN inference

  • perturb system (drug, ...)
  • $\boldsymbol{x}_i \in \mathbb{R}^D$ state of single condition $i$, averaged over all cells
  • no or few dynamic information

Single-cell based GRN inference

  • take snapshot of evolving system
  • $\boldsymbol{x}_i \in \mathbb{R}^D$ state of single cell $i$
  • dense but hidden dyn. information
  • How can we reveal and exploit this?

Introduction: Correlation

  • Correlation does not imply causation.
  • Absence of correlation does not imply absence of causation.

example from Haghverdi, Buettner & Theis, Bioinformatics (2015)
data from M. Strasser

Introduction: Is $Y$ causal for $X$?

  • Granger causality Granger, Econometrica (1969) $$ \text{Var}(X|\text{Universe}) < \text{Var}(X|\text{Universe}\backslash Y) $$
  • Transfer entropy Schreiber, Phys. Rev. Lett. (2000) $$ \text{Entropy}(X|Y) < \text{Entropy}(X)$$

    ▷  uncertainty in pred. $\simeq$ information flow Barnett et al., Phys. Rev. Lett. (2009)

    ▷  applicable to stochastic system

  • Convergent cross mapping (CCM) Sugihara et al., Science (2012), Takens' Theorem (1980) $$ \text{Var}(X|Y) = 0 \text{ if } 'Y \text{ is coupled to } X' $$

    ▷  functional coupling

    ▷  applicable to deterministic (chaotic) system

CCM for temporal data Sugihara et al., Science (2012)

▷  predict $X_{g'}$ given $X_{g}$:

$$~~~~\hat X_{t g'} | M_{g} = {\textstyle\sum_{\scriptscriptstyle \underline{x}_{t'g}\in\, \mathcal{N}_{t g}}} \!\!\!\!\!\!\!\overbrace{w(\underline{x}_{tg},\underline{x}_{t'g})}^{\text{"selects } t' \text{ based on } M_g \text{"}} \!\!\!\!\!x_{t'g'}$$

CCM for snapshot data

▷  predict $X_{g'}$ given $X_{g}$:

$$~~~~\hat X_{t g'} | M_{g} = {\textstyle\sum_{\scriptscriptstyle \underline{x}_{t'g}\in\, \mathcal{N}_{t g}}} \!\!\!\!\!\!\!\overbrace{w(\underline{x}_{tg},\underline{x}_{t'g})}^{\text{"selects } t' \text{ based on } M_g \text{"}} \!\!\!\!\!x_{t'g'}$$

Bendall, Davis, ... Peer & Nolan, Cell (2014)
Trapnell, Cacchiarelli, ... Mikkelsen & Rinn, Nat. Biotechn. (2014)
Haghverdi, Büttner, ... Buettner & Theis, submitted (2016)

CCM for snapshot data

Straight-forward improvements

  • use monotonicity / variance
  • use relative predictivity and time lag  ▷  transitivity

Conceptual comparisons

  • Transfer Entropy
  • GENIE3: Random Forests

    Huynh-Thu, ..., Geurts, PLOS One (2010)
    $$ \sum_{i=1}^N (x_{ig} - f_g(\boldsymbol{x}_i))^2 $$

What is already there?

  • ODE based model Oconce, Haghverdi, Müller & Theis, Bioinformatics (2015)
     ▷  edges using GENIE3, then optimize for $ u_g(\hat{\boldsymbol{x}})$
    $$\frac{d}{dt} \hat x_g = \alpha u_g(\hat{\boldsymbol{x}}) - \lambda \hat x_g,~~ \textstyle p(\mathcal{D}|\theta) \propto \prod_{t=1}^T \exp\big(-\frac{(x_{tg} - \hat x_g(t,\theta))^2}{2 \sigma^2}\big) $$
  • Discrete state space model Moignard, Woodhouse ... Goettgens, Nat. Biotech. (2015)
    • generate discrete state graph of 1-gene transitions
    • check consistency of trial Boolean networks

Summary

Convergent Cross mapping (CCM) Sugihara et al., Science (2012)

  • detects "causality"=couplings in noisy deterministic systems
    (even without noise, such systems might seem stochastic when they are "merely" chaotic)
  • is alternative to Transfer Entropy / Granger Causality
    (for "purely" stochastic systems)

Pseudotime + CCM for GRN inference

  • pseudotime: reveals structure of manifold associated with "deterministic part" of a GRN (i.e. a dynamic system)
  • peudotime+CCM: qualitatively better than naive approaches
  • many questions still open ...

Thank you!