# Difference between revisions of "Colloquia"

(→December 4, 2020, Federico Ardila (San Francisco)) |
(→February 8, 2021 [Mon 4-5pm], Mohamed Ndaoud (USC)) |
||

(21 intermediate revisions by 3 users not shown) | |||

Line 6: | Line 6: | ||

<!--- in Van Vleck B239, '''unless otherwise indicated'''. ---> | <!--- in Van Vleck B239, '''unless otherwise indicated'''. ---> | ||

− | = | + | =Spring 2021= |

− | == | + | == January 27, 2021 '''[Wed 4-5pm]''', [https://sites.google.com/view/morganeaustern/home Morgane Austern] (Microsoft Research) == |

− | (Hosted by | + | (Hosted by Roch) |

+ | |||

+ | '''Asymptotics of learning on dependent and structured random objects''' | ||

+ | |||

+ | Classical statistical inference relies on numerous tools from probability theory to study | ||

+ | the properties of estimators. However, these same tools are often inadequate to study | ||

+ | modern machine problems that frequently involve structured data (e.g networks) or | ||

+ | complicated dependence structures (e.g dependent random matrices). In this talk, we | ||

+ | extend universal limit theorems beyond the classical setting. | ||

+ | |||

+ | Firstly, we consider distributionally “structured” and dependent random object–i.e | ||

+ | random objects whose distribution are invariant under the action of an amenable group. | ||

+ | We show, under mild moment and mixing conditions, a series of universal second and | ||

+ | third order limit theorems: central-limit theorems, concentration inequalities, Wigner | ||

+ | semi-circular law and Berry-Esseen bounds. The utility of these will be illustrated by | ||

+ | a series of examples in machine learning, network and information theory. Secondly | ||

+ | by building on these results, we establish the asymptotic distribution of the cross- | ||

+ | validated risk with the number of folds allowed to grow at an arbitrary rate. Using | ||

+ | this, we study the statistical speed-up of cross validation compared to a train-test split | ||

+ | procedure, which reveals surprising results even when used on simple estimators. | ||

+ | |||

+ | == January 29, 2021, [https://sites.google.com/site/isaacpurduemath/ Isaac Harris] (Purdue) == | ||

+ | |||

+ | (Hosted by Smith) | ||

+ | |||

+ | == February 1, 2021 '''[Mon 4-5pm]''', [https://services.math.duke.edu/~nwu/index.htm Nan Wu] (Duke) == | ||

− | + | (Hosted by Roch) | |

− | + | '''From Manifold Learning to Gaussian Process Regression on Manifolds''' | |

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | + | In this talk, I will review the concepts in manifold learning and discuss a famous manifold learning algorithm, the Diffusion Map. I will talk about my recent research results which theoretically justify that the Diffusion Map reveals the underlying topological structure of the dataset sampled from a manifold in a high dimensional space. Moreover, I will show the application of these theoretical results in solving the regression problems on manifolds and ecological problems in real life. | |

− | ( | + | == February 5, 2021, [https://hanbaeklyu.com/ Hanbaek Lyu] (UCLA) == |

− | + | (Hosted by Roch) | |

− | + | '''Dictionary Learning from dependent data samples and networks''' | |

− | + | Analyzing group behavior of systems of interacting variables is a ubiquitous problem in many fields including probability, combinatorics, and dynamical systems. This problem also naturally arises when one tries to learn essential features (dictionary atoms) from large and structured data such as networks. For instance, independently sampling some number of nodes in a sparse network hardly detects any edges between adjacent nodes. Instead, we may perform a random walk on the space of connected subgraphs, which will produce more meaningful but correlated samples. As classical results in probability were first developed for independent variables and then gradually generalized for dependent variables, many algorithms in machine learning first developed for independent data samples now need to be extended to correlated data samples. In this talk, we discuss some new results that accomplish this including some for online nonnegative matrix and tensor factorization for Markovian data. A unifying technique for handling dependence in data samples we develop is to condition on the distant past, rather than the recent history. As an application, we present a new approach for learning "basis subgraphs" from network data, that can be used for network denoising and edge inference tasks. We illustrate our method using several synthetic network models as well as Facebook, arXiv, and protein-protein interaction networks, that achieve state-of-the-art performance for such network tasks when compared to several recent methods. | |

− | ( | + | == February 8, 2021 '''[Mon 4-5pm]''', [https://sites.google.com/view/mndaoud/home Mohamed Ndaoud] (USC) == |

− | + | (Hosted by Roch) | |

− | + | '''SCALED MINIMAX OPTIMALITY IN HIGH-DIMENSIONAL LINEAR REGRESSION: A NON-CONVEX ALGORITHMIC REGULARIZATION APPROACH''' | |

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | + | The question of fast convergence in the classical problem of high dimensional linear regression has been extensively studied. Arguably, one of the fastest procedures in practice is Iterative Hard Thresholding (IHT). Still, IHT relies strongly on the knowledge of the true sparsity parameter s. In this paper, we present a novel fast procedure for estimation in the high dimensional linear regression. Taking advantage of the interplay between estimation, support recovery and optimization we achieve both optimal statistical accuracy and fast convergence. The main advantage of our procedure is that it is fully adaptive, making it more practical than state of the art IHT methods. Our procedure achieves optimal statistical accuracy faster than, for instance, classical algorithms for the Lasso. Moreover, we establish sharp optimal results for both estimation and support recovery. As a consequence, we present a new iterative hard thresholding algorithm for high dimensional linear regression that is scaled minimax optimal (achieves the estimation error of the oracle that knows the sparsity pattern if possible), fast and adaptive. | |

− | ( | + | == February 12, 2021, [https://sites.math.washington.edu/~blwilson/ Bobby Wilson] (University of Washington) == |

− | + | (Hosted by Smith) | |

− | + | == February 19, 2021, [http://www.mauricefabien.com/ Maurice Fabien] (Brown)== | |

− | + | (Hosted by Smith) | |

− | ( | + | == February 26, 2021, [https://www.math.ias.edu/avi/home Avi Wigderson] (Princeton IAS) == |

− | + | (Hosted by Gurevitch) | |

− | + | == March 12, 2021, [] == | |

− | |||

− | + | (Hosted by ) | |

− | + | == March 26, 2021, [] == | |

− | + | (Hosted by ) | |

− | + | == April 9, 2021, [] == | |

− | + | (Hosted by ) | |

− | + | == April 23, 2021, [] == | |

+ | |||

+ | (Hosted by ) | ||

− | |||

− | |||

− | |||

== Past Colloquia == | == Past Colloquia == | ||

+ | |||

+ | [[Colloquia/Fall2020|Fall 2020]] | ||

[[Colloquia/Spring2020|Spring 2020]] | [[Colloquia/Spring2020|Spring 2020]] |

## Latest revision as of 11:04, 20 January 2021

**UW Madison mathematics Colloquium is ONLINE on Fridays at 4:00 pm. **

# Spring 2021

## January 27, 2021 **[Wed 4-5pm]**, Morgane Austern (Microsoft Research)

(Hosted by Roch)

**Asymptotics of learning on dependent and structured random objects**

Classical statistical inference relies on numerous tools from probability theory to study the properties of estimators. However, these same tools are often inadequate to study modern machine problems that frequently involve structured data (e.g networks) or complicated dependence structures (e.g dependent random matrices). In this talk, we extend universal limit theorems beyond the classical setting.

Firstly, we consider distributionally “structured” and dependent random object–i.e random objects whose distribution are invariant under the action of an amenable group. We show, under mild moment and mixing conditions, a series of universal second and third order limit theorems: central-limit theorems, concentration inequalities, Wigner semi-circular law and Berry-Esseen bounds. The utility of these will be illustrated by a series of examples in machine learning, network and information theory. Secondly by building on these results, we establish the asymptotic distribution of the cross- validated risk with the number of folds allowed to grow at an arbitrary rate. Using this, we study the statistical speed-up of cross validation compared to a train-test split procedure, which reveals surprising results even when used on simple estimators.

## January 29, 2021, Isaac Harris (Purdue)

(Hosted by Smith)

## February 1, 2021 **[Mon 4-5pm]**, Nan Wu (Duke)

(Hosted by Roch)

**From Manifold Learning to Gaussian Process Regression on Manifolds**

In this talk, I will review the concepts in manifold learning and discuss a famous manifold learning algorithm, the Diffusion Map. I will talk about my recent research results which theoretically justify that the Diffusion Map reveals the underlying topological structure of the dataset sampled from a manifold in a high dimensional space. Moreover, I will show the application of these theoretical results in solving the regression problems on manifolds and ecological problems in real life.

## February 5, 2021, Hanbaek Lyu (UCLA)

(Hosted by Roch)

**Dictionary Learning from dependent data samples and networks**

Analyzing group behavior of systems of interacting variables is a ubiquitous problem in many fields including probability, combinatorics, and dynamical systems. This problem also naturally arises when one tries to learn essential features (dictionary atoms) from large and structured data such as networks. For instance, independently sampling some number of nodes in a sparse network hardly detects any edges between adjacent nodes. Instead, we may perform a random walk on the space of connected subgraphs, which will produce more meaningful but correlated samples. As classical results in probability were first developed for independent variables and then gradually generalized for dependent variables, many algorithms in machine learning first developed for independent data samples now need to be extended to correlated data samples. In this talk, we discuss some new results that accomplish this including some for online nonnegative matrix and tensor factorization for Markovian data. A unifying technique for handling dependence in data samples we develop is to condition on the distant past, rather than the recent history. As an application, we present a new approach for learning "basis subgraphs" from network data, that can be used for network denoising and edge inference tasks. We illustrate our method using several synthetic network models as well as Facebook, arXiv, and protein-protein interaction networks, that achieve state-of-the-art performance for such network tasks when compared to several recent methods.

## February 8, 2021 **[Mon 4-5pm]**, Mohamed Ndaoud (USC)

(Hosted by Roch)

**SCALED MINIMAX OPTIMALITY IN HIGH-DIMENSIONAL LINEAR REGRESSION: A NON-CONVEX ALGORITHMIC REGULARIZATION APPROACH**

The question of fast convergence in the classical problem of high dimensional linear regression has been extensively studied. Arguably, one of the fastest procedures in practice is Iterative Hard Thresholding (IHT). Still, IHT relies strongly on the knowledge of the true sparsity parameter s. In this paper, we present a novel fast procedure for estimation in the high dimensional linear regression. Taking advantage of the interplay between estimation, support recovery and optimization we achieve both optimal statistical accuracy and fast convergence. The main advantage of our procedure is that it is fully adaptive, making it more practical than state of the art IHT methods. Our procedure achieves optimal statistical accuracy faster than, for instance, classical algorithms for the Lasso. Moreover, we establish sharp optimal results for both estimation and support recovery. As a consequence, we present a new iterative hard thresholding algorithm for high dimensional linear regression that is scaled minimax optimal (achieves the estimation error of the oracle that knows the sparsity pattern if possible), fast and adaptive.

## February 12, 2021, Bobby Wilson (University of Washington)

(Hosted by Smith)

## February 19, 2021, Maurice Fabien (Brown)

(Hosted by Smith)

## February 26, 2021, Avi Wigderson (Princeton IAS)

(Hosted by Gurevitch)

## March 12, 2021, []

(Hosted by )

## March 26, 2021, []

(Hosted by )

## April 9, 2021, []

(Hosted by )

## April 23, 2021, []

(Hosted by )