Empirical Machine Learning

Machine learning has a rich history as a multidisciplinary field, encompassing diverse perspectives leading to different approaches to conducting research. On the one hand, it has been approached from a theoretical standpoint, emphasizing the mathematical properties and technical details of (learning) algorithms. This involves delving into the inner workings of algorithms, understanding their mathematical principles, and exploring their limitations from a theoretical point of view. On the other hand, machine learning has also been embraced as an empirical science, where researchers recognize the significance of practical relevance and the ability to address real-world challenges. While theoretical results are highly important, the community also acknowledges the importance of demonstrating algorithms’ effectiveness in practical applications.

To achieve this, researchers often formulate hypotheses about the behavior and performance of algorithms. They then put these algorithms to the test by benchmarking them on a variety of problem instances, collecting empirical data in the process. Subsequently, the collected data is analyzed, with the ultimate goal of drawing meaningful and insightful conclusions regarding the hypotheses at hand.

This focus group, Empirical Machine Learning, specializes in exploring these empirical aspects, including hypothesis formulation, problem benchmarking, data analysis, and statistical inference methods. It was jointly established with the working group Biometry in Molecular Medicine at the Institute for Medical Information Processing, Biometry, and Epidemiology.

Members

Name				Position
Prof. Dr. Bernd Bischl				Professor
Prof. Dr. Anne-Laure Boulesteix				Professor
Prof. Dr. Matthias Feurer				Professor
Dr. Giuseppe Casalicchio				PostDoc
Dr. Moritz Herrmann				PostDoc
Marc Becker				Research Software Engineer
Martin Binder				PhD Student
Sebastian Fischer				PhD Student
Julian Lange				PhD Student
Maximilian Mandl				PhD Student
Christina Sauer				PhD Student
Lennart Schneider				PhD Student

Publications

Herrmann M, Lange FJD, Eggensperger K, Casalicchio G, Wever M, Feurer M, Rügamer D, Hüllermeier E, Boulesteix A-L, Bischl B (2024) Position: Why We Must Rethink Empirical Research in Machine Learning Proceedings of the 41st International Conference on Machine Learning, pp. 18228–18247. PMLR.
Link | PDF | arXiv.
Kohli R, Feurer M, Bischl B, Eggensperger K, Hutter F (2024) Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning Data-centric Machine Learning (DMLR) workshop at the International Conference on Learning Representations (ICLR),
Nagler T, Schneider L, Bischl B, Feurer M (2024) Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization Advances in Neural Information Processing Systems, pp. 40486–40533.
Link | PDF | arXiv | Code | Conference Video | AutoML Seminar Video.
Gijsbers P, Bueno MLP, Coors S, LeDell E, Poirier S, Thomas J, Bischl B, Vanschoren J (2024) AMLB: an AutoML Benchmark. Journal of Machine Learning Research 25, 1–65.
link | pdf.
Ott F, Raichur NL, Rügamer D, Feigl T, Neumann H, Bischl B, Mutschler C (2023) Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression. arXiv:2208.00919.
link|pdf.
Bischl B, Binder M, Lang M, Pielok T, Richter J, Coors S, Thomas J, Ullmann T, Becker M, Boulesteix A-L, Deng D, Lindauer M (2023) Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1484.
Feurer M, Eggensperger K, Bergman E, Pfisterer F, Bischl B, Hutter F (2023) Mind the Gap: Measuring Generalization Performance Across Multiple Objectives. In: In: Crémilleux B , In: Hess S , In: Nijssen S (eds) Advances in Intelligent Data Analysis XXI. IDA 2023., pp. 130–142. Springer, Cham.
link|arXiv|pdf.
Schneider L, Bischl B, Thomas J (2023) Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models Proceedings of the Genetic and Evolutionary Computation Conference, pp. 538–547.
link | pdf.
Fischer S, Harutyunyan L, Feurer M, Bischl B (2023) OpenML-CTR23 – A curated tabular regression benchmarking suite AutoML Conference 2023 (Workshop),
link|pdf.
Karl F, Pielok T, Moosbauer J, Pfisterer F, Coors S, Binder M, Schneider L, Thomas J, Richter J, Lang M, Garrido-Merchán EC, Branke J, Bischl B (2023) Multi-Objective Hyperparameter Optimization in Machine Learning – An Overview. ACM Transactions on Evolutionary Learning and Optimization 3, 1–50.
Ott F, Rügamer D, Heublein L, Hamann T, Barth J, Bischl B, Mutschler C (2022) Benchmarking Online Sequence-to-Sequence and Character-based Handwriting Recognition from IMU-Enhanced Pens. International Journal on Document Analysis and Recognition (IJDAR).
link|pdf.
Klaß A, Lorenz S, Lauer-Schmaltz M, Rügamer D, Bischl B, Mutschler C, Ott F (2022) Uncertainty-aware Evaluation of Time-Series Classification for Online Handwriting Recognition with Domain Shift IJCAI-ECAI 2022, 1st International Workshop on Spatio-Temporal Reasoning and Learning,
Moosbauer J, Binder M, Schneider L, Pfisterer F, Becker M, Lang M, Kotthoff L, Bischl B (2022) Automated Benchmark-Driven Design and Explanation of Hyperparameter Optimizers. IEEE Transactions on Evolutionary Computation 26, 1336–1350.
link | pdf.
Pargent F, Pfisterer F, Thomas J, Bischl B (2022) Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics, 1–22.
link | pdf.
Pfisterer F, Schneider L, Moosbauer J, Binder M, Bischl B (2022) Yahpo Gym – An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization International Conference on Automated Machine Learning, pp. 3–1. PMLR.
link | pdf.
Schneider L, Pfisterer F, Kent P, Branke J, Bischl B, Thomas J (2022) Tackling Neural Architecture Search With Quality Diversity Optimization International Conference on Automated Machine Learning, pp. 9–1. PMLR.
link | pdf.
Nießl C, Herrmann M, Wiedemann C, Casalicchio G, Boulesteix A-L (2022) Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12, e1441.
link | pdf.
Pfisterer F, Rijn JN van, Probst P, Müller A, Bischl B (2021) Learning Multiple Defaults for Machine Learning Algorithms. 2021 Genetic and Evolutionary Computation Conference Companion (GECCO ’21 Companion).
link | pdf.
Sonabend R, Király FJ, Bender A, Bischl B, Lang M (2021) mlr3proba: An R Package for Machine Learning in Survival Analysis. Bioinformatics.
link|pdf.
Binder M, Pfisterer F, Lang M, Schneider L, Kotthoff L, Bischl B (2021) mlr3pipelines - Flexible Machine Learning Pipelines in R. Journal of Machine Learning Research 22, 1–7.
link | pdf.
Schneider L, Pfisterer F, Binder M, Bischl B (2021) Mutation is All You Need 8th ICML Workshop on Automated Machine Learning,
pdf.
Bischl B, Casalicchio G, Feurer M, Gijsbers P, Hutter F, Lang M, Mantovani RG, Rijn JN van, Vanschoren J (2021) OpenML Benchmarking Suites. In: In: Vanschoren J , In: Yeung S (eds) Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks,
link | pdf.
Moosbauer J, Herbinger J, Casalicchio G, Lindauer M, Bischl B (2021) Explaining Hyperparameter Optimization via Partial Dependence Plots. Advances in Neural Information Processing Systems (NeurIPS 2021) 34.
link | pdf.
Moosbauer J, Herbinger J, Casalicchio G, Lindauer M, Bischl B (2021) Towards Explaining Hyperparameter Optimization via Partial Dependence Plots 8th ICML Workshop on Automated Machine Learning (AutoML),
link | pdf.
Binder M, Moosbauer J, Thomas J, Bischl B (2020) Multi-Objective Hyperparameter Tuning and Feature Selection Using Filter Ensembles Proceedings of the 2020 Genetic and Evolutionary Computation Conference, pp. 471–479. Association for Computing Machinery, New York, NY, USA.
link | pdf.
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis 143, 106839.
link | pdf.
Pfisterer F, Beggel L, Sun X, Scheipl F, Bischl B (2019) Benchmarking time series classification – Functional data vs machine learning approaches. arXiv preprint arXiv:1911.07511.
link | pdf.
Probst P, Boulesteix A-L, Bischl B (2019) Tunability: Importance of Hyperparameters of Machine Learning Algorithms. Journal of Machine Learning Research 20, 1–32.
link | pdf.
Schüller N, Boulesteix A-L, Bischl B, Unger K, Hornung R (2019) Improved outcome prediction across data sources through robust parameter tuning. 221.
link | pdf.
Lang M, Binder M, Richter J, Schratz P, Pfisterer F, Coors S, Au Q, Casalicchio G, Kotthoff L, Bischl B (2019) mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software 4, 1903.
link | pdf.
Rijn JN van, Pfisterer F, Thomas J, Bischl B, Vanschoren J (2018) Meta Learning for Defaults–Symbolic Defaults NeurIPS 2018 Workshop on Meta Learning,
link | pdf.
Horn D, Demircioğlu A, Bischl B, Glasmachers T, Weihs C (2018) A Comparative Study on Large Scale Kernelized Support Vector Machines. Advances in Data Analysis and Classification, 1–17.
link.
Kühn D, Probst P, Thomas J, Bischl B (2018) Automatic Exploration of Machine Learning Experiments on OpenML. arXiv preprint arXiv:1806.10961.
link | pdf.
Bischl B, Richter J, Bossek J, Horn D, Thomas J, Lang M (2017) mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions. arXiv preprint arXiv:1703.03373.
link | pdf.
Lang M, Bischl B, Surmann D (2017) batchtools: Tools for R to work on batch systems. The Journal of Open Source Software 2.
link.
Probst P, Au Q, Casalicchio G, Stachl C, Bischl B (2017) Multilabel Classification with R Package mlr. The R Journal 9, 352–369.
link | pdf.
Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017) OpenML: An R package to connect to the machine learning platform OpenML. Computational Statistics, 977–991.
link | pdf.
Horn D, Bischl B (2016) Multi-objective Parameter Configuration of Machine Learning Algorithms using Model-Based Optimization 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE.
link|pdf.
Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, Casalicchio G, Jones ZM (2016) mlr: Machine Learning in R. The Journal of Machine Learning Research 17, 1–5.
link | pdf.
Bischl B, Kerschke P, Kotthoff L, Lindauer M, Malitsky Y, Frechétte A, Hoos H, Hutter F, Leyton-Brown K, Tierney K, Vanschoren J (2016) ASlib: A Benchmark Library for Algorithm Selection. Artificial Intelligence 237, 41–58.
link.
Casalicchio G, Bischl B, Boulesteix A-L, Schmid M (2015) The residual-based predictiveness curve: A visual tool to assess the performance of prediction models. Biometrics 72, 392–401.
link | pdf.
Mantovani RG, Rossi ALD, Vanschoren J, Bischl B, Carvalho ACPLF (2015) To tune or not to tune: Recommending when to adjust SVM hyper-parameters via meta-learning 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8.
link|pdf.
Bossek J, Bischl B, Wagner T, Rudolph G (2015) Learning feature-parameter mappings for parameter tuning via the profile expected improvement Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1319–1326. Association for Computing Machinery.
link|pdf.
Brockhoff D, Bischl B, Wagner T (2015) The Impact of Initial Designs on the Performance of MATSuMoTo on the Noiseless BBOB-2015 Testbed: A Preliminary Study Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1159–1166. Association for Computing Machinery, Madrid, Spain.
link|pdf.
Lang M, Kotthaus H, Marwedel P, Weihs C, Rahnenführer J, Bischl B (2015) Automatic model selection for high-dimensional survival analysis. Journal of Statistical Computation and Simulation 85, 62–76.
link|pdf.
Bischl B, Lang M, Mersmann O, Rahnenführer J, Weihs C (2015) BatchJobs and BatchExperiments: Abstraction Mechanisms for Using R in Batch Environments. Journal of Statistical Software 64, 1–25.
link.
Mersmann O, Preuss M, Trautmann H, Bischl B, Weihs C (2015) Analyzing the BBOB Results by Means of Benchmarking Concepts. Evolutionary Computation Journal 23, 161–185.
link|pdf.
Vanschoren J, Rijn JN van, Bischl B, Casalicchio G, Feurer M (2015) OpenML: A Networked Science Platform for Machine Learning 2015 ICML Workshop on Machine Learning Open Source Software (MLOSS 2015), pp. 1–3.
link | pdf.
Bischl B, Schiffner J, Weihs C (2014) Benchmarking Classification Algorithms on High-Performance Computing Clusters. In: In: Spiliopoulou M , In: Schmidt-Thieme L , In: Janning R (eds) Data Analysis, Machine Learning and Knowledge Discovery, pp. 23–31. Springer.
link | pdf.
Vanschoren J, Rijn JN van, Bischl B, Torgo L (2014) OpenML: Networked Science in Machine Learning. SIGKDD Explorations Newsletter 15, 49–60.
link | pdf.
Bischl B, Schiffner J, Weihs C (2013) Benchmarking local classification methods. Computational Statistics 28, 2599–2619.
link | pdf.
Schiffner J, Bischl B, Weihs C (2012) Bias-variance analysis of local classification methods. In: In: Gaul W , In: Geyer-Schulz A , In: Schmidt-Thieme L , In: Kunze J (eds) Challenges at the Interface of Data Analysis, Computer Science, and Optimization, pp. 49–57. Springer, Berlin Heidelberg.
link.
Bischl B, Mersmann O, Trautmann H, Weihs C (2012) Resampling Methods for Meta-Model Validation with Recommendations for Evolutionary Computation. Evolutionary Computation 20, 249–275.
link | pdf.
Stüber AT, Coors S, Schachtner B, Weber T, Rügamer D, Bender A, Mittermeier A, Öcal O, Seidensticker M, Ricke J, others (2023) A Comprehensive Machine Learning Benchmark Study for Radiomics-Based Survival Analysis of CT Imaging Data in Patients With Hepatic Metastases of CRC. Investigative Radiology, 10–1097.
link.