Optimization and Automated Machine Learning
Can we use Machine Learning techniques to improve Machine Learning processes themselves? Automated Machine Learning (AutoML) is about removing (some of) the human element from choosing ML parameters and methods. This gives rise to a difficult optimization problem where a single performance evaluation can take a long time, so fast convergence is desirable. Our group is therefore dealing with the following questions:
- How can we perform optimization as efficiently as possible when single function evaluations are expensive? We tackle this “expensive black box optimization problem” with “Model-Based Optimization”, sometimes called “Bayesian Optimization”, which itself relies on machine learning methods.
- How can we use optimization methods to automatically improve Machine Learning methods, given a problem at hand? For this, we may have to not only choose hyperparameters of a given Machine Learning algorithm, but instead choose this algorithm itself, in combination with possible preprocessing methods and/or methods of ensembling multiple algorithms.
- Does the AutoML method actually lead to better outcomes? A problem that arises when optimization (by machines or, implicitly, by humans!) is performed, is that the methods may perform well on the training data, but will generalize poorly when used in the wild. This not only arises when one algorithm is optimized on a training dataset and then performs worse on new unseen data, but even when researches develop a method that works well on our benchmark datasets but fails to perform well when used for real-world applications. We therefore investigate ways to evaluate and compare AutoML methods on robust and meaningful benchmarks that tell us whether these methods are useful.
Members
Name | Position | |||
---|---|---|---|---|
Dr. Janek Thomas | PostDoc | |||
Dr. Michel Lang | PostDoc | |||
Florian Karl | PhD Student | |||
Florian Pfisterer | PhD Student | |||
Julia Moosbauer | PhD Student | |||
Katharina Rath | PhD Student | |||
Lennart Schneider | PhD Student | |||
Martin Binder | PhD Student | |||
Philipp Müller | PhD Student | |||
Stefan Coors | PhD Student | |||
Tobias Pielok | PhD Student |
Projects and Software
- AutoML Benchmark: Reproducible Benchmarks for AutoML Systems.
- autoxgboost: Automatic Tuning and Fitting of XGBoost.
- autoxgboostMC: Multi-Objective Automatic Tuning and Fitting of XGBoost.
- miesmuschel: Mixed Integer Evolutionary Strategies
- mlr3automl: Automated Machine Learning with
mlr3
. - mlr3hyperband: Multi-Armed Bandit Approach to Hyperparameter Tuning for
mlr3
. - mlr3mbo: Model-based optimization with
mlr3
. - mlrMBO: Model-based optimization with
mlr
. - mosmafs: Multi-Objective Simultaneous Model and Feature Selection.
- YAHPO Gym: Benchmarking suite for HPO.
Publications
- Schalk D, Bischl B, Rügamer D (2022) Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models. arXiv preprint arXiv:2210.07723.
link|pdf. - Rügamer D, Bender A, Wiegrebe S, Racek D, Bischl B, Müller C, Stachl C (2022) Factorized Structured Regression for Large-Scale Varying Coefficient Models Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), Springer International Publishing.
link|pdf. - Rügamer D (2022) Additive Higher-Order Factorization Machines. arXiv preprint arXiv:2205.14515.
link|pdf. - Schneider L, Schäpermeier L, Prager RP, Bischl B, Trautmann H, Kerschke P (2022) HPO X ELA: Investigating Hyperparameter Optimization Landscapes by Means of Exploratory Landscape Analysis. In: In: Rudolph G , In: Kononova AV , In: Aguirre H , In: Kerschke P , In: Ochoa G , In: Tušar T (eds) Parallel Problem Solving from Nature – PPSN XVII, pp. 575–589. Springer International Publishing, Cham.
link | pdf. - Gijsbers P, Bueno MLP, Coors S, LeDell E, Poirier S, Thomas J, Bischl B, Vanschoren J (2022) AMLB: an AutoML Benchmark. arXiv preprint arXiv:2207.12560.
link | pdf. - Karl F, Pielok T, Moosbauer J, Pfisterer F, Coors S, Binder M, Schneider L, Thomas J, Richter J, Lang M, others (2022) Multi-Objective Hyperparameter Optimization – An Overview. arXiv preprint arXiv:2206.07438.
link | pdf. - Schneider L, Pfisterer F, Thomas J, Bischl B (2022) A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 2136–2142. Association for Computing Machinery, New York, NY, USA.
link | pdf. - Pargent F, Pfisterer F, Thomas J, Bischl B (2022) Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics, 1–22.
link | pdf. - Schneider L, Pfisterer F, Kent P, Branke J, Bischl B, Thomas J (2022) Tackling Neural Architecture Search With Quality Diversity Optimization International Conference on Automated Machine Learning, pp. 9–1. PMLR.
link | pdf. - Moosbauer J, Binder M, Schneider L, Pfisterer F, Becker M, Lang M, Kotthoff L, Bischl B (2022) Automated Benchmark-Driven Design and Explanation of Hyperparameter Optimizers. IEEE Transactions on Evolutionary Computation 26, 1336–1350.
link | pdf. - Pfisterer F, Schneider L, Moosbauer J, Binder M, Bischl B (2022) Yahpo Gym – An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization International Conference on Automated Machine Learning, pp. 3–1. PMLR.
link | pdf. - *Coors S, *Schalk D, Bischl B, Rügamer D (2021) Automatic Componentwise Boosting: An Interpretable AutoML System. ECML-PKDD Workshop on Automating Data Science.
link | pdf . - Bischl B, Binder M, Lang M, Pielok T, Richter J, Coors S, Thomas J, Ullmann T, Becker M, Boulesteix A-L, Deng D, Lindauer M (2021) Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges. arXiv preprint arXiv:2107.05847.
link | pdf. - Gijsbers P, Pfisterer F, Rijn JN van, Bischl B, Vanschoren J (2021) Meta-Learning for Symbolic Hyperparameter Defaults. 2021 Genetic and Evolutionary Computation Conference Companion (GECCO ’21 Companion).
link. - Gerostathopoulos I, Plášil F, Prehofer C, Thomas J, Bischl B (2021) Automated Online Experiment-Driven Adaptation–Mechanics and Cost Aspects. IEEE Access 9, 58079–58087.
link | pdf. - Kaminwar SR, Goschenhofer J, Thomas J, Thon I, Bischl B (2021) Structured Verification of Machine Learning Models in Industrial Settings. Big Data.
link . - Binder M, Pfisterer F, Lang M, Schneider L, Kotthoff L, Bischl B (2021) mlr3pipelines - Flexible Machine Learning Pipelines in R. Journal of Machine Learning Research 22, 1–7.
link | pdf. - Schneider L, Pfisterer F, Binder M, Bischl B (2021) Mutation is All You Need 8th ICML Workshop on Automated Machine Learning,
pdf. - Binder M, Pfisterer F, Bischl B (2020) Collecting Empirical Data About Hyperparameters for Data Driven AutoML AutoML Workshop at ICML 2020,
pdf. - Binder M, Moosbauer J, Thomas J, Bischl B (2020) Multi-Objective Hyperparameter Tuning and Feature Selection Using Filter Ensembles Proceedings of the 2020 Genetic and Evolutionary Computation Conference, pp. 471–479. Association for Computing Machinery, New York, NY, USA.
link | pdf. - Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis 143, 106839.
link | pdf. - Sun X, Bommert A, Pfisterer F, Rähenfürher J, Lang M, Bischl B (2020) High Dimensional Restrictive Federated Model Selection with
Multi-objective Bayesian Optimization over Shifted Distributions. In: In: Bi Y , In: Bhatia R , In: Kapoor S (eds) Intelligent Systems and Applications, pp. 629–647. Springer International Publishing, Cham.
link | pdf. - Ellenbach N, Boulesteix A-L, Bischl B, Unger K, Hornung R (2020) Improved Outcome Prediction Across Data Sources Through Robust Parameter Tuning. Journal of Classification, 1–20.
link|pdf. - Pfisterer F, Thomas J, Bischl B (2019) Towards Human Centered AutoML. arXiv preprint arXiv:1911.02391.
link | pdf. - Pfisterer F, Beggel L, Sun X, Scheipl F, Bischl B (2019) Benchmarking time series classification – Functional data vs machine learning approaches. arXiv preprint arXiv:1911.07511.
link | pdf. - Pfisterer F, Coors S, Thomas J, Bischl B (2019) Multi-Objective Automatic Machine Learning with AutoxgboostMC. arXiv preprint arXiv:1908.10796.
link | pdf. - Sun X, Lin J, Bischl B (2019) ReinBo: Machine Learning pipeline search and configuration with Bayesian
Optimization embedded Reinforcement Learning. CoRR abs/1904.05381.
link | pdf. - Probst P, Boulesteix A-L, Bischl B (2019) Tunability: Importance of Hyperparameters of Machine Learning Algorithms. Journal of Machine Learning Research 20, 1–32.
link | pdf. - Gijsbers P, LeDell E, Thomas J, Poirier S, Bischl B, Vanschoren J (2019) An Open Source AutoML Benchmark. CoRR abs/1907.00909.
link | pdf. - Schüller N, Boulesteix A-L, Bischl B, Unger K, Hornung R (2019) Improved outcome prediction across data sources through robust parameter tuning. 221.
link | pdf. - Rijn JN van, Pfisterer F, Thomas J, Bischl B, Vanschoren J (2018) Meta Learning for Defaults–Symbolic Defaults NeurIPS 2018 Workshop on Meta Learning,
link | pdf. - Kühn D, Probst P, Thomas J, Bischl B (2018) Automatic Exploration of Machine Learning Experiments on OpenML. arXiv preprint arXiv:1806.10961.
link | pdf. - Thomas J, Coors S, Bischl B (2018) Automatic Gradient Boosting. ICML AutoML Workshop.
link | pdf. - Bischl B, Richter J, Bossek J, Horn D, Thomas J, Lang M (2017) mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions. arXiv preprint arXiv:1703.03373.
link | pdf. - Kotthaus H, Richter J, Lang A, Thomas J, Bischl B, Marwedel P, Rahnenführer J, Lang M (2017) RAMBO: Resource-Aware Model-Based Optimization with Scheduling for Heterogeneous Runtimes and a Comparison with Asynchronous Model-Based Optimization International Conference on Learning and Intelligent Optimization, pp. 180–195. Springer.
link | pdf.