Matthias Seeger

alt text 

Principal Machine Learning Scientist at Amazon

Contact: mseeger [@] gmail [DOT] com, matthis [@] amazon [DOT] com

[Google Scholar]
[dblp]
[LinkedIn]
[GitHub]

Short Bio

Matthias W. Seeger received a Ph.D. from the School of Informatics, Edinburgh university, UK, in 2003 (advisor Christopher Williams). He was a research fellow with Michael Jordan and Peter Bartlett, University of California at Berkeley, from 2003, and with Bernhard Schoelkopf, Max Planck Institute for Intelligent Systems, Tuebingen, Germany, from 2005. He led a research group at the University of Saarbruecken, Germany, from 2008, and was assistant professor at the Ecole Polytechnique Federale de Lausanne from fall 2010. He joined Amazon as machine learning scientist in 2014. He received the ICML Test of Time Award in 2020.

Research Interests

For a long while, my interests centered around Bayesian learning and decision making with probabilistic models, from gaining understanding to making it work in large scale practice. I have been working on theory and practice of Gaussian processes and Bayesian optimization, scalable variational approximate inference algorithms, Bayesian compressed sensing, and active learning for medical imaging. I also worked on demand forecasting, hyperparameter tuning (Bayesian optimization) applied to deep learning (NLP), and AutoML.

More recently, I am getting excited about large language models and related data creation and annotation challenges. I am one of the scientists behind Amazon Q, which transforms the way customers build, optimize, and operate applications and workloads on AWS.

Publications

Conference:

  • (2023) D. Salinas, J. Golebiowski, A. Klein, M. Seeger, C. Archambeau. Optimizing Hyperparameters with Conformal Quantile Regression. International Conference on Machine Learning: 29876-29893. [pdf]

  • (2022) D. Salinas, M. Seeger, A. Klein, V. Perrone, M. Wistuba, C. Archambeau. Syne Tune: A Library for Large Scale Hyperparameter Tuning and Reproducible Research. AutoML Conference. [openreview]

  • (2022) A. Makarova, H. Shen, V. Perrone, A. Klein, J. B. Faddoul, A. Krause, M. Seeger, C. Archambeau. Automatic Termination for Hyperparameter Optimization. AutoML Conference. [openreview]

  • (2021) E. Lee, D. Eriksson, V. Perrone, M. Seeger. A Nonmyopic Approach to Cost-Constrained Bayesian Optimization. Uncertainty in Artificial Intelligence: 568-577. [pdf]

  • (2021) L. Tiao, A. Klein, M. Seeger, E. Bonilla, C. Archambeau, F. Ramos. BORE: Bayesian Optimization by Density Ratio Estimation. International Conference on Machine Learning: 10289-10300. [pdf]

  • (2021) V. Perrone, H. Shen, A. Zolic, I. Shcherbatyi, A. Ahmed, T. Bansal, M. Donini, F. Winkelmolen, R. Jenatton, J. B. Faddoul, B. Pogorzelska, M. Miladinovic, K. Kenthapadi, M. Seeger, C. Archambeau. Amazon SageMaker Automatic Model Tuning: Scalable Black-box Optimization. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining: 3463-3471. [pdf]

  • (2020) L. Tiao, A. Klein, M. Seeger, C. Archambeau, E. Bonilla, F. Ramos. Bayesian Optimization by Density Ratio Estimation. NeurIPS 2020 Workshop on Meta-learning. [pdf]

  • (2020) C. Nguyen, T. Hassner, M. Seeger, C. Archambeau. LEEP: A New Measure to Evaluate Transferability of Learned Representations. International Conference on Machine Learning 37. [pdf]

  • (2019) V. Perrone, H. Shen, M. Seeger, C. Archambeau, R. Jenatton. Learning Search Spaces for Bayesian Optimization: Another View of Hyperparameter Transfer Learning. Neural Information Processing Systems 32: 12751-12761. [pdf]

  • (2018) V. Perrone, R. Jenatton, M. Seeger, C. Archambeau. Scalable Hyperparameter Transfer Learning. Neural Information Processing Systems 31: 6846-6856. [pdf]

  • (2018) S. Rangapuram, M. Seeger, J. Gasthaus, L. Stella, Y. Wang, T. Januschowski. Deep State Space Models for Time Series Forecasting. Neural Information Processing Systems 31: 7796-7805. [pdf], [code]

  • (2017) J. Boese, V. Flunkert, J. Gasthaus, T. Januschowski, D. Lange, D. Salinas, S. Schelter, M. Seeger, Y. Wang. Probabilistic Demand Forecasting at Scale. PVLDB 10(12): 1694-1705/. [pdf]

  • (2017) R. Jenatton, C. Archambeau, J. Gonzalez, M.Seeger. Bayesian Optimization with Tree-structured Dependencies. International Conference on Machine Learning 34: 1655-1664. [pdf]

  • (2016) M. Seeger, D. Salinas, V. Flunkert. Bayesian Intermittent Demand Forecasting for Large Inventories. Oral at Neural Information Processing Systems 29: 4646-4654. [pdf]

  • (2015) Y. J. Ko, M. Seeger. Expectation Propagation for Rectified Linear Poisson Regression. Asian Conference on Machine Learning 7. [pdf]

  • (2014) M. Khan, Y. J. Ko, M. Seeger. Scalable Collaborative Bayesian Preference Learning. Artificial Intelligence and Statistics 17: 475-483. [pdf]

  • (2013) M. Khan, A. Aravkin, M. Friedlander, M. Seeger. Fast Dual Variational Inference for Non-Conjugate Latent Gaussian Models. International Conference on Machine Learning 30. [pdf]

  • (2012) M. Seeger, G. Bouchard. Fast Variational Bayesian Inference for Non-Conjugate Matrix Factorization Models. Artificial Intelligence and Statistics 15. [pdf]

  • (2012) Y. J. Ko, M. Seeger. Large Scale Variational Bayesian Inference for Structured Scale Mixture Models. International Conference on Machine Learning 29. [pdf]

  • (2011) M. Seeger, H. Nickisch. Fast Convergent Algorithms for Expectation Propagation Approximate Bayesian Inference. Artificial Intelligence and Statistics 14. [pdf]

  • (2010) M. Seeger. Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing. Neural Information Processing Systems 23: 1633-1641. [pdf]

  • (2010) M. Seeger. Gaussian Covariance and Scalable Variational Inference. International Conference on Machine Learning 27. [pdf]

  • (2010) N. Srinivas, A. Krause, S. Kakade, M. Seeger. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. International Conference on Machine Learning 27. [pdf] ICML 2020 Test of Time Award

  • (2009) M. Seeger. Sparse Linear Models: Variational Approximate Inference and Bayesian Experimental Design. Journal of Physics: Conference Series, 197(012001). [pdf]

  • (2009) H. Nickisch, M. Seeger. Convex Variational Bayesian Inference for Large Scale Generalized Linear Models. International Conference on Machine Learning 26: 761-768. [pdf]

  • (2009) M. Seeger, H. Nickisch, R. Pohmann, B. Schoelkopf. Bayesian Experimental Design of Magnetic Resonance Imaging Sequences. Neural Information Processing Systems 21: 1441-1448. [pdf]

  • (2009) D. Nguyen-Tuong, J. Peters, M. Seeger. Local Gaussian Process Regression for Real Time Online Model Learning. Neural Information Processing Systems 21. [pdf]

  • (2008) M. Seeger, H. Nickisch. Compressed Sensing and Bayesian Experimental Design. International Conference on Machine Learning 25. [pdf]

  • (2008) S. Gerwinn, J. Macke, M. Seeger, M. Bethge. Bayesian Inference for Spiking Neuron Models with a Sparsity Prior. Neural Information Processing Systems 21: 529-536. [pdf]

  • (2007) M. Seeger. Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods. Neural Information Processing Systems 20: 1233-1240. [pdf], [code]

  • (2007) M. Seeger, F. Steinke, K. Tsuda. Bayesian Inference and Optimal Design in the Sparse Linear Model. Artificial Intelligence and Statistics 11. [pdf]

  • (2007) M. Seeger, S. Gerwinn, M. Bethge. Bayesian Inference for Sparse Generalized Linear Models. European Conference on Machine Learning 2007: 298-309. [pdf]

  • (2006) S. Kakade, M. Seeger, D. Foster. Worst-Case Bounds for Gaussian Process Models. Neural Information Processing Systems 19. [pdf]

  • (2006) Y. Shen, A. Ng, M. Seeger. Fast Gaussian Process Regression Using KD-Trees. Neural Information Processing Systems 19. [pdf]

  • (2005) Y.-W. Teh, M. Seeger, M. Jordan. Semiparametric Latent Factor Models. Artificial Intelligence and Statistics 10. [pdf]

  • (2003) N. Lawrence, M. Seeger, R. Herbrich. Fast Sparse Gaussian Process Methods: The Informative Vector Machine. Neural Information Processing Systems 16: 609-616. [pdf]

  • (2003) M. Seeger, C. Williams, N. Lawrence. Fast Forward Selection to Speed Up Sparse Gaussian Process Regression. Artificial Intelligence and Statistics 9. [pdf]

  • (2002) M. Seeger. Covariance Kernels from Bayesian Generative Models. Neural Information Processing Systems 15: 905-912. [pdf]

  • (2001) C. Williams, M. Seeger. Using the Nystroem Method to Speed Up Kernel Machines. Neural Information Processing Systems 14: 682-688. [pdf]

  • (2001) M. Seeger, J. Langford, N. Megiddo. An Improved Predictive Accuracy Bound for Averaging Classifiers. International Conference on Machine Learning 18: 290-297. [pdf]

  • (2000) C. Williams, M. Seeger. The Effect of the Input Density Distribution on Kernel-based Classifiers. International Conference on Machine Learning 17: 1159-1166. [link]

  • (2000) M. Seeger. Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers. Neural Information Processing Systems 13: 603-609. [pdf]

Journal:

  • (2012) N. Srinivas, A. Krause, S. Kakade, M. Seeger. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. IEEE Transactions on Information Theory, 58: 3250-3265. [pdf]

  • (2011) M. Seeger, H. Nickisch. Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models. SIAM Journal on Imaging Sciences, 4(1): 166-199. [pdf]

  • (2010) M. Seeger, D. Wipf. Variational Bayesian Inference Techniques. IEEE Signal Processing Magazine, 27(6): 81-91. [pdf]

  • (2010) M. Seeger, H. Nickisch, R. Pohmann, B. Schoelkopf. Optimization of k-Space Trajectories for Compressed Sensing by Bayesian Experimental Design. Magnetic Resonance in Medicine, 61(1): 116-126. [PubMed]

  • (2008) M. Seeger. Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods. Journal of Machine Learning Research, 9: 1147-1178. [pdf], [code]

  • (2008) M. Seeger. Bayesian Inference and Optimal Design in the Sparse Linear Model. Journal of Machine Learning Research, 9: 759-813. [pdf]

  • (2008) M. Seeger, S. Kakade, D. Foster. Information Consistency of Nonparametric Gaussian Process Methods. IEEE Transactions on Information Theory, 54(5): 2376-2382. [pdf]

  • (2007) F. Steinke, M. Seeger, K. Tsuda. Experimental Design for Efficient Identification of Gene Regulatory Networks using Sparse Bayesian Models. BMC Systems Biology, 1(51) [pdf]

  • (2004) M. Seeger. Gaussian Processes for Machine Learning. International Journal of Neural Systems, 14(2): 69-106. [pdf]

  • (2002) M. Seeger. PAC-Bayesian Generalization Error Bounds for Gaussian Process Classification. Journal of Machine Learning Research, 3: 233-269. [pdf]

Book Chapters:

  • (2007) M. Seeger. Gaussian Process Belief Propagation. In G. Bakir, T. Hofmann, B. Schoelkopf (eds.); Predicting Structured Data: 301-318 [pdf]

  • (2006) M. Seeger. A Taxonomy for Semi-Supervised Learning Methods. In O. Chapelle, B. Schoelkopf, A. Zien (eds.); Semi-Supervised Learning: 15-32 [pdf]

Technical Reports:

  • (2021) R. Grazzi, V. Flunkert, D. Salinas, T. Januschowski, M. Seeger, C. Archambeau. Meta-Forecasting by combining Global DeepRepresentations with Local Adaptation. [arxiv]

  • (2021) A. Makarova, H. Shen, V. Perrone, A. Klein, JB. Faddoul, A. Krause, M. Seeger, C. Archambeau. Automatic Termination for Hyperparameter Optimization. [arxiv]

  • (2020) A. Klein, L. Tiao, T. Lienart, C. Archambeau, M. Seeger. Model-based Asynchronous Hyperparameter and Neural Architecture Search. [arxiv], [code]

  • (2020) E. Lee, V. Perrone, C. Archambeau, M. Seeger. Cost-aware Bayesian Optimization. [arxiv]

  • (2019) V. Perrone, I. Shcherbatyi, R. Jenatton, C. Archambeau, M. Seeger. Constrained Bayesian Optimization with Max-Value Entropy Search. [arxiv]

  • (2019) M. Seeger. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. Addendum. [pdf]

  • (2017) M. Seeger, S. Rangapuram, Y. Wang, D. Salinas, J. Gasthaus, T. Januschowski, V. Flunkert. Approximate Bayesian Inference in Linear State Space Models for Intermittent Demand Forecasting at Scale. [arxiv]

  • (2017) M. Seeger, A. Hetzel, Z. Dai, E. Meissner, N. Lawrence. Auto-Differentiating Linear Algebra. [arxiv], [code]

  • (2010) N. Srinivas, A. Krause, S. Kakade, M. Seeger. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. [arxiv]

  • (2010) M. Seeger, H. Nickisch. Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models. [arxiv]

  • (2008) M. Seeger, S. Kakade, D. Foster. Addendum to Information Consistency of Nonparametric Gaussian Process Methods. [pdf]

  • (2005) M. Seeger, Y.-W. Teh, M. Jordan: Semiparametric Latent Factor Models. [pdf]

  • (2005) M. Seeger. Expectation Propagation for Exponential Families. [pdf]

  • (2004) M. Seeger. Low Rank Updates for the Cholesky Decomposition. [pdf], [code]

  • (2004) M. Seeger, M. Jordan. Sparse Gaussian Process Classification With Multiple Classes. [pdf]

  • (2000) M. Seeger. Learning with Labeled and Unlabeled Data. [pdf]

PhD Theses:

  • (2017) Y. J. Ko. Applications of Approximate Learning and Inference for Probabilistic Models. Ecole Polytechnique Federale, Lausanne (M. Grossglauser, M. Seeger, advisors). [link]

  • (2003) M. Seeger. Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. University of Edinburgh, UK (C. Williams, advisor). [link]

Patents:

  • (2022) S. Rangapuram, J. Gasthaus, T. Januschowski, M. Seeger, L. Stella. Artificial intelligence system combining state space models and neural networks for time series forecasting. US Patent 11,281,969.

  • (2020) M. Seeger, G. Duncan, J. Gasthaus. Intermittent demand forecasting for large inventories. US Patent 10,748,072.

Lecture Notes
  • (2012) Pattern Classification and Machine Learning, taught at EPFL [pdf]

Software

Together with colleagues at Amazon (David Salinas, Aaron Klein, Martin Wistuba), I created and maintain [Syne Tune], a package for state-of-the-art distributed hyperparameter optimization. Together with Asmus Hetzel and Zhenwen Dai, I introduced linear algebra operators (Cholesky decomposition, LU decomposition, singular value decomposition) into [MXNet].