Difference between revisions of "SIAM Student Chapter Seminar"
Line 44: | Line 44: | ||
| Nov. 28 | | Nov. 28 | ||
|[http://TBD Xiaowu Dai] (Statistics) | |[http://TBD Xiaowu Dai] (Statistics) | ||
− | |''[[#Nov 28: Xiaowu Dai (Statistics)| | + | |''[[#Nov 28: Xiaowu Dai (Statistics)| Toward the Theoretical Understanding of Large-batch Training in Stochastic Gradient Descent ]]'' |
|- | |- | ||
| | | | ||
Line 77: | Line 77: | ||
=== Nov 28: Xiaowu Dai (Statistics) === | === Nov 28: Xiaowu Dai (Statistics) === | ||
− | + | Toward the Theoretical Understanding of Large-batch Training in Stochastic Gradient Descent | |
− | + | Stochastic gradient descent (SGD) is almost ubiquitously used for training nonconvex optimization tasks including deep neural networks. Recently, a hypothesis that "large batch SGD tends to converge to sharp minimizers of training function" has received increasing attention. We develop some new theory to give a justification of this hypothesis. In particular, we provide new properties of SGD in both finite-time and asymptotic regimes, with the tools from empirical processes and Partial Differential Equations. A connection between the stochasticity in SGD and the idea of smoothing splines in nonparametric statistics is also built. We include numerical experiments to corroborate these theoretical findings. | |
<br> | <br> |
Revision as of 15:00, 21 November 2018
- When: Every Other Wednesday at 2:15 pm (except as otherwise indicated)
- Where: 901 Van Vleck Hall
- Organizers: Ke Chen
- To join the SIAM Chapter mailing list: email [join-siam-chapter@lists.wisc.edu] website.
Fall 2018
date | speaker | title |
---|---|---|
Sept. 12 | Ke Chen (Math) | Inverse Problem in Optical Tomography |
Spet. 26 | Kurt Ehlert (Math) | How to bet when gambling |
Oct. 10 | Zachary Hansen (Atmospheric and Oceanic Sciences) | Land-Ocean contrast in lightning |
Oct. 24 | Xuezhou Zhang (Computer Science) | An Optimal Control Approach to Sequential Machine Teaching |
Nov. 7 | Cancelled | |
Nov. 21 | Cancelled due to Thanksgiving | |
Nov. 28 | Xiaowu Dai (Statistics) | Toward the Theoretical Understanding of Large-batch Training in Stochastic Gradient Descent |
Abstract
Sep 12: Ke Chen (Math)
Inverse Problem in Optical Tomography
I will briefly talk about my researches on the inverse problems of radiative transfer equations, which is usually used as a model to describe the transport of neutrons or other particles in a certain media. Such inverse problems considers the following question: given the knowledge of multiple data collected at the boundary of the domain of interest, is it possible to reconstruct the optical property of the interior of media? In this talk, I will show you that stability of this problem is deteriorating as the Knudsen number is getter smaller. The talk will be introductory and anyone graduate is welcome to join us.
Sept 26: Kurt Ehlert (Math)
How to bet when gambling
When gambling, typically casinos have an edge. But sometimes we can gain an edge by counting cards or other means. And sometimes we have an edge in the biggest casino of all: the financial markets. When we do have an advantage, then we still need to decide how much to bet. Bet too little, and we leave money on the table. Bet too much, and we risk financial ruin. We will discuss the "Kelly criterion", which is a betting strategy that is optimal in many senses.
Oct 10: Zachary Hansen (Atmospheric and Oceanic Sciences)
Land-Ocean contrast in lightning
Land surfaces have orders of magnitude more lightning flashes than ocean surfaces. One explanation for this difference is that land surfaces may generate greater convective available potential energy (CAPE), which fuels stronger thunderstorms. Using a high resolution cloud-resolving atmospheric model, we test whether an island can produce stronger thunderstorms just by having a land-like surface. We find that the island alters the distribution of rainfall but does not produce stronger storms. An equilibrium state known as boundary layer quasi-equilibrium follows, and is explored in more detail.
Oct 24: Xuezhou Zhang (Computer Science)
An Optimal Control Approach to Sequential Machine Teaching
Given a sequential learning algorithm and a target model, sequential machine teaching aims to find the shortest training sequence to drive the learning algorithm to the target model. We present the first principled way to find such shortest training sequences. Our key insight is to formulate sequential machine teaching as a time-optimal control problem. This allows us to solve sequential teaching by leveraging key theoretical and computational tools developed over the past 60 years in the optimal control community. Specifically, we study the Pontryagin Maximum Principle, which yields a necessary condition for opti- mality of a training sequence. We present analytic, structural, and numerical implica- tions of this approach on a case study with a least-squares loss function and gradient de- scent learner. We compute optimal train- ing sequences for this problem, and although the sequences seem circuitous, we find that they can vastly outperform the best available heuristics for generating training sequences.
Nov 7: Cancelled
Nov 21: Cancelled
Nov 28: Xiaowu Dai (Statistics)
Toward the Theoretical Understanding of Large-batch Training in Stochastic Gradient Descent
Stochastic gradient descent (SGD) is almost ubiquitously used for training nonconvex optimization tasks including deep neural networks. Recently, a hypothesis that "large batch SGD tends to converge to sharp minimizers of training function" has received increasing attention. We develop some new theory to give a justification of this hypothesis. In particular, we provide new properties of SGD in both finite-time and asymptotic regimes, with the tools from empirical processes and Partial Differential Equations. A connection between the stochasticity in SGD and the idea of smoothing splines in nonparametric statistics is also built. We include numerical experiments to corroborate these theoretical findings.