Photo: Sylvain Crouzet

I am an Associate Professor
in the
Department of Mathematics
at the
University of Wisconsin-Madison,
where I am affiliated with the
Probability
Group and the
Applied Mathematics
Group. I am also an Affiliate Faculty
in the Department of Statistics.
I was previously at UCLA. My CV
can be found
here.

I work at the intersection of **applied probability**,
**statistics** and
**theoretical computer science**, with an emphasis on **biological
applications**.
More details on my research interests and
publications can be found
here.
My work is currently supported by NSF grant
DMS-1149312 (CAREER).

I am an associate editor for the
Annals of Applied Probability.
To submit a paper, follow this
link.

### News

Jointly with Antonio Auffinger
and Jian Ding,
we are organizing a special session on Probability Theory at the
AMS Central Fall Sectional Meeting
at Loyola University in Chicago on October 2-4, 2015.
The schedule can be found here.

### Recent Preprints

Phase transition in the sample complexity of likelihood-based
phylogeny inference

Submitted, 2015. With A. Sly.

Reconstructing evolutionary trees from molecular sequence data
is a fundamental problem in computational biology. Stochastic
models of sequence evolution are closely related to spin systems
that have been extensively studied in statistical physics and
that connection has led to important insights on the theoretical
properties of phylogenetic reconstruction algorithms as well as the
development of new inference methods. Here, we study maximum
likelihood, a classical statistical technique which is perhaps
the most widely used in phylogenetic practice because of its
superior empirical accuracy.
At the theoretical level, except for its consistency, that is,
the guarantee of eventual correct reconstruction as the size of
the input data grows, much remains to be understood about the
statistical properties of maximum likelihood in this context.
In particular, the best bounds on the sample complexity or
sequence-length requirement of maximum likelihood, that is,
the amount of data required for correct reconstruction, are
exponential in the number, n, of tips---far from known lower
bounds based on information-theoretic arguments. Here we close
the gap by proving a new upper bound on the sequence-length
requirement of maximum likelihood that matches up to constants
the known lower bound for some standard models of evolution.
More specifically, for the r-state symmetric model of sequence
evolution on a binary phylogeny with bounded edge lengths, we
show that the sequence-length requirement behaves logarithmically
in n when the expected amount of mutation per edge is below what
is known as the Kesten-Stigum threshold. In general, the
sequence-length requirement is polynomial in n. Our results imply
moreover that the maximum likelihood estimator can be computed
efficiently on randomly generated data provided sequences are as above.

Species trees from gene trees despite a high rate of lateral genetic
transfer: A tight bound

Submitted, 2015. With C. Daskalakis.

Reconstructing the tree of life from molecular sequences
is a fundamental problem in computational biology. Modern
data sets often contain a large number of genes which can
complicate the reconstruction problem due to the fact that
different genes may undergo different evolutionary histories.
This is the case in particular in the presence of lateral
genetic transfer (LGT), whereby a gene is inherited from a
distant species rather than an immediate ancestor. Such an
event produces a gene tree which is distinct from
(but related to) the species phylogeny.
In previous work, a stochastic model of LGT was introduced
and it was shown that the species phylogeny can be reconstructed
from gene trees despite surprisingly high rates of LGT. Both
lower and upper bounds on this rate were obtained, but a large
gap remained. Here we close this gap, up to a constant. Specifically,
we show that the species phylogeny can be reconstructed perfectly
even when each edge of the tree has a constant probability of being
the location of an LGT event. Our new reconstruction algorithm builds
the tree recursively from the leaves. We also provide a matching bound
in the negative direction (up to a constant).

Distance-based species tree estimation
under the coalescent: information-theoretic trade-off between number of loci and sequence length

Submitted, 2015. With E. Mossel.

Conference abstract in Proceedings of RANDOM 2015, 931-942.

We consider the reconstruction of a phylogeny
from multiple genes under the multispecies coalescent.
We establish a connection with the sparse signal detection
problem, where one seeks to distinguish between
a distribution and a mixture of the distribution
and a sparse signal. Using this connection,
we derive an information-theoretic trade-off
between the number of genes, $m$, needed for an accurate
reconstruction and the sequence length, $k$, of the
genes. Specifically, we show that to detect
a branch of length $f$, one needs $m = \Theta(1/[f^{2} \sqrt{k}])$ genes.

Phase transition on the convergence rate of parameter estimation under an Ornstein-Uhlenbeck diffusion on a tree

Submitted, 2014. With Cecile Ane and Lam Si Tung Ho.

Diffusion processes on trees are commonly used in evolutionary biology to model the joint distribution of continuous traits, such as body mass, across species.
Estimating the parameters of such processes
from tip values
presents challenges because of the intrinsic correlation
between the observations produced by the shared evolutionary
history, thus violating the standard independence assumption
of large-sample theory.
For instance Ho and An\'e \cite{HoAne13} recently proved
that the mean (also known in this context as selection optimum)
of an Ornstein-Uhlenbeck process on a tree
cannot be estimated consistently from
an increasing number of tip observations
if the tree height is bounded.
Here, using a fruitful connection to the so-called
reconstruction problem in probability theory,
we study the convergence rate of parameter estimation
in the unbounded height case.
For the mean of the process,
we provide a necessary and
sufficient condition for the consistency of the maximum
likelihood estimator (MLE)
and establish a phase transition on its convergence rate
in terms of the growth of the tree. In particular we show that
a loss of $\sqrt{n}$-consistency
(i.e., the variance of the MLE becomes $\Omega(n^{-1})$,
where $n$ is the number of tips)
occurs when the tree growth is larger than a threshold
related to the phase transition of the reconstruction problem.
For the covariance parameters, we give a novel, efficient
estimation method which achieves
$\sqrt{n}$-consistency under natural assumptions on the tree.
Our theoretical results provide practical suggestions in design of experiments for biologists.

A full list of publications is available here.

### Lecture notes

A graduate course on
stochastic processes
in evolutionary genetics

The first semester of
graduate probability theory

### Currently Teaching

MATH 632 - Introduction to Stochastic Processes

###
Contact Information

Office: Van Vleck 823

Phone: 608-263-3053

Fax: 608-263-8891

lastname[at]math[dot]wisc[dot]edu

Department of Mathematics

University of Wisconsin-Madison

480 Lincoln Drive

Madison, WI 53706