The chair typically offers various thesis topics each semester in the areas computational statistics, machine learning, data mining, optimization and statistical software. You are welcome to suggest your own topic as well.
Before you apply for a thesis topic make sure that you fit the following profile:
- Knowledge in machine learning.
- Good R or python skills.
Before you start writing your thesis you must look for a supervisor within the group.
Send an email to the contact person listed in the potential theses topics files with the following information:
- Planned starting date of your thesis.
- Thesis topic (of the list of thesis topics or your own suggestion).
- Previously attended classes on machine learning and programming with R.
Your application will only be processed if it contains all required information.
Potential Thesis Topics
Below is a list of potential thesis topics. Before you start writing your thesis you must look for a supervisor within the group.
Positive-unlabeled Learning on Text Data
Positive-unlabeled learning (PU-Learning) describes binary classification tasks in which the data has annotated positive labels but no negative labels. This is a special case of semi-supervised learning and is related to one-class-classification. It is a common problem, especially in tasks with many observations, where it is impossible to annotate all samples with labels and the focus lies on the positive labels. Within this thesis, PU Learning should be applied in a text classification setting on a data set provided by the Fraunhofer IMW, one of the two Fraunhofer Institutes involved in this project.
The master student is expected to familiarize with the different PU Learning approaches and apply the most promising on an internal dataset provided by the Fraunhofer IMW.
Take a look at this exposé for more information on this topic.
Learning Set and Irregular Data using Deep Meta-learning
Meta-learning, also known as “learning to learn”, intends to design models that can learn new skills or adapt to new environments rapidly with a few training examples. There are three common approaches: 1) metric-based: learn an efficient distance metric; 2) model-based: use (recurrent) network with external or internal memory; 3) optimization-based: optimize the model parameters explicitly for fast learning.
A typical machine/deep learning algorithm, like regression or classification, is designed for fixed dimensional data instances. Their extensions to handle the case when the inputs or outputs are permutation invariant sets rather than fixed dimensional vectors are not trivial. However, most available data are unstructured and set such as point cloud, video/ audio clip, time-series data. Learning representation of unstructured data is very valuable in many domains such as health care and has a high impact on machine, deep learning research areas.
Here, we offer different research topics as master thesis, for more information please visit Google doc.
Deep Reinforcement Learning
Deep reinforcement learning combines deep neural networks with a reinforcement learning structure that enables agents to learn the best actions possible in a virtual environment in order to attain their goals. That is, it unites function approximation and target optimization, mapping state-action pairs to expected rewards.
Reinforcement learning refers to goal-oriented algorithms, which learn how to attain a complex objective (goal) or how to maximize a particular dimension over many steps; for example, they can maximize the points won in a game over many moves. DRL algorithms can start from a blank slate, and under the right conditions, they achieve superhuman performance.
Here, we offer different research topics as master thesis including:
1) Investigating DRL with other ML algorithms such as meta-learning, life-long learning, active-learning, generative model, Bayesian deep learning. 2) Systematically study and benchmarking the value function, state and reward function, actions, and different hyperparameters 3) Study and benchmarking the policy evaluation techniques 4) Investigating the application of DRL in computer vision or NLP. For details please refer to the Google doc.
A Neural Network-based Approach for Feature-weighted Elastic Net
In this Master thesis you will implement the recently proposed fwelnet algorithm into a neural network. fwelnet can be defined in a neural network with either conventional layers or using cvxlayers. In simulation studies you will compare the two implementations against each other and also against an existing R package. You will further investigate the model performance when using more complex additive predictors or deep neural networks to feed information into the model parameters.
For details please see this Markdown file
Deep Distributional Regression
Deep Distributional Regression (DDR) extends classical statistical regression models such as generalized linear or additive models as well as generalized additive models for location, scale and shape to regression models with potentially many deep neural networks. In this Bachelor thesis you will investigate the performance of the one DDR framework in a large simulation and benchmark study and thereby help to generate new ideas and suggestions for improvement in the field of DDR.
For details refer to the Google Doc
Semi-Structured Deep Distributional Regression Penalties
The Semi-Structured Deep Distributional Regression (SDDR) framework extends classical statistical regression models such as generalized linear or additive models as well as generalized additive models for location, scale and shape to regression models with potentially many deep neural networks in the linear predictor of each parameter. SDDR allows for different penalties including L1-, L2- or other custom penalties like smoothing penalties. In Deep Learning different other penalties such as L0-penalties are available for neural networks. In this thesis you will extend the SDDR framework by implementing new penalties and investigate the performance of your implementation in simulations and benchmarks.
For details refer to the Google Doc
Analysis of Overfitting Effects During Model Selection
In this master thesis you will analyze, whether model selection/hyperparameter tuning can suffer from overfitting. For details refer to the Google doc.
Hierarchical word embeddings using hyperbolic neural networks
In this master thesis you will investigate recent advancements in Geometric Deep Learning with a specific focus on Natural Language Processing. Geometric deep learning operates in spaces with non-zero curvatures. Fields of application include computer vision (three-dimensional objects), genetics, neuroscience or virtually any kind of graph data. Especially hyperbolic deep learning is currently of interest to the machine learning community. These models are particularly promising when there are hierarchical structures in the data. For details refer to the Google Doc.
R package for Counterfactual Explanation Methods
Machine learning models have demonstrated great success in many prediction tasks on the cost of being less interpretable. A useful method for explaining single predictions of a model are counterfactual explanations or short counterfactuals. In this master thesis you will make yourself familiar with the literature on counterfactual explanation methods and will integrate some of the methods in the counterfactuals R package. Furthermore, you will evaluate the performance of the integrated methods and their runtimes.
For details refer to the Google Doc.
Analyzing the Permutation Feature Importance
The permutation feature importance (PFI) assesses the importance of features by computing the drop in out-of-bag performance after permuting a considered feature. It was initially introduced for random forests, although the main idea can also be performed in a model-agnostic fashion. However, many properties are not well-studied and several research questions are still open. The aim of this thesis is to study the model-agnostic PFI and conduct empirical studies to answer some of these unanswered research questions.
For details refer to the Google Doc.
Given the increasing usage of automated prediction systems in the context of high-stakes decisions, a growing body of research focuses on tools and methods for detecting and mitigating biases and unfairness in algorithmic decision-making. A particularly promising post-processing method, Multiaccuracy-Boost, has been proposed by Kim et al. (2019), adapting the multicalibration framework of Hébert-Johnson et al. (2018). The underlying fairness notion, multiaccuracy, promotes the idea of subgroup fairness and requires accurate predictions not only for marginal populations, but also for subpopulations that may be defined by complex intersections of many attributes. In contrast to other Fair ML approaches, Multiaccuracy-Boost does not harm the overall utility of a prediction model by imposing a fairness constraint in the model training process, but rather aims at improving prediction performance for a large set of subpopulations post training. Multiaccuracy-Boost attempts to correct the predictions of a given (black box) machine learning model by employing a post-processing algorithm on a labelled validation set. In summary, the algorithm repeatedly updates the initial predictions by “nudging” predictions towards the true outcome for subgroups where high errors are observed. Such subgroups may be discovered by employing an auditor (e.g., a ridge regression or a decision tree) to search for functions that correlate with the current residuals. The updated predictions may then again be subject to further updates, akin to the boosting idea. Given the ubiquitous use of ML models in crucial areas and growing concerns of unfair predictions for minority subpopulations, Multiaccurcy-Boost should be widely accessible in form of a free and open-source software package. For the paper Multiaccuracy: Black-box post-processing for fairness in classification (Kim et al. 2019), a Python implementation has been developed that can serve as a valuable starting point. For more info refer to this Document.
The disputation of a thesis lasts about 60-90 minutes and consists of two parts. Only the first part is relevant for the grade and takes 30 minutes (bachelor thesis) and 40 minutes (master thesis). Here, the student is expected to summarize his/her main results of the thesis in a presentation. The supervisor(s) will ask questions regarding the content of the thesis in between. In the second part (after the presentation), the supervisors will give detailed feedback and discuss the thesis with the student. This will take about 30 minutes.
- How do I prepare for the disputation?
You have to prepare a presentation and if there is a bigger time gap between handing in your thesis and the disputation you might want to reread your thesis.
- How many slides should I prepare?
That’s up to you, but you have to respect the time limit. Prepariong more than 20 slides for a Bachelor’s presentation and more than 30 slides for a Master’s is VERY likely a very bad idea.
- Where do I present?
Bernd’s office, in front of the big TV. At least one PhD will be present, maybe more. If you want to present in front of a larger audience in the seminar room or the old library, please book the room yourself and inform us.
- English or German?
We do not care, you can choose.
- What do I have to bring with me?
A document (Prüfungsprotokoll) which you get from “Prüfungsamt” (Frau Maxa or Frau Höfner) for the disputation.Your laptop or a USB stick with the presentation. You can also email Bernd a PDF.
- How does the grading work?
The student will be graded regarding the quality of the thesis, the presentation and the oral discussion of the work. The grade is mainly determined by the written thesis itself, but the grade can improve or drop depending on the presentation and your answers to defense questions.
- What should the presentation cover?
The presentation should cover your thesis, including motivation, introduction, description of new methods and results of your research. Please do NOT explain already existing methods in detail here, put more focus on novel work and the results.
- What kind of questions will be asked after the presentation?
The questions will be directly connected to your thesis and related theory.
Student Research Projects
We are always interested in mentoring interesting student research projects. Please contact us directly with an interesting resarch idea. In the future you will also be able to find research project topics below.
Currently we are not offering any student research projects.
For more information please visit the official web page Studentische Forschungsprojekte (Lehre@LMU)
Current Theses (With Working Titles)
|Probabilistic Deep Learning of Liver Failure in Therapeutical Cancer Treatment||MA|
|Model agnostic Feature Importance by Loss Measures||MA|
|Model-agnostic interpretable machine learning methods for multivariate||MA|
|time series forecasting|
|Normalizing Flows for Interpretablity Measures||MA|
|Representation Learning for Semi-Supervised Genome Sequence Classification||MA|
|Comparison of Machine Learning Models For Competing Risks Survival Analysis||MA|
|Multi-state modeling in the context of predictive maintanence||MA|
Completed Theses (LMU Munich)
|mlr3automl - Automated Machine Learning in R||MA||2021|
|Knowledge destillation - Compressing arbitrary learners into a neural net||MA||2020|
|Personality Prediction Based on Mobile Gaze and Touch Data||MA||2020|
|Identifying Subgroups induced by Interaction Effects||MA||2020|
|Benchmarking: Tests and Vizualisations||MA||2019|
|Methodik, Anwendungen und Interpretation moderner Benchmark-Studien am Beispiel der||MA||2019|
|Risikomodellierung bei akuter Cholangitis|
|Machine Learning pipeline search with Bayesian Optimization and Reinforcement Learning||MA||2019|
|Visualization and Efficient Replay Memory for Reinforcement Learning||BA||2019|
|Neural Network Embeddings for Categorical Data||BA||2019|
|Localizing phosphorylation sites by deep learning-based fragment ion intensity||MA||2019|
|Average Marginal Effects in Machine Learning||MA||2019|
|Wearable-based Severity Detection in the Context of Parkinson’s Disease Using||MA||2018|
|Deep Learning Techniques|
|Bayesian Optimization under Noise for Model Selection in Machine Learning||MA||2018|
|Interpretable Machine Learning - An Application Study using the Munich Rent Index||MA||2018|
|Automatic Gradient Boosting||MA||2018|
|Efficient and Distributed Model-Based Boosting for Large Datasets||MA||2018|
|Linear individual model-agnostic explanations - discussion and empirical analysis of modifications||MA||2018|
|Extending Hyperband with Model-Based Sampling Strategies||MA||2018|
|Reinforcement learning in R||MA||2018|
|Anomaly Detection using Machine Learning Methods||MA||2018|
|Configuration of deep neural networks using model-based optimization||MA||2017|
|Kernelized anomaly detection||MA||2017|
|Automatic model selection amd hyperparameter optimization||MA||2017|
|mlrMBO / RF distance based infill criteria||MA||2017|
|Kostensensitive Entscheidungsbäume für beobachtungsabhängige Kosten||BA||2016|
|Implementation of 3D Model Visualization for Machine Learning||BA||2016|
|Eine Simulationsstudie zum Sampled Boosting||BA||2016|
|Implementation and Comparison of Stacking Methods for Machine Learning||MA||2016|
|Runtime estimation of ML models||BA||2016|
|Process Mining: Checking Methods for Process Conformance||MA||2016|
|Implementation of Multilabel Algorithms and their Application on Driving Data||MA||2016|
|Stability Selection for Component-Wise Gradient Boosting in Multiple Dimensions||MA||2016|
|Detecting Future Equipment Failures: Predictive Maintenance in Chemical Industrial Plants||MA||2016|
|Fault Detection for Fire Alarm Systems based on Sensor Data||MA||2016|
|Laufzeitanalyse von Klassifikationsverfahren in R||BA||2015|
|Benchmark Analysis for Machine Learning in R||BA||2015|
|Implementierung und Evaluation ergänzender Korrekturmethoden für statistische Lernverfahren||BA||2014|
|bei unbalancierten Klassifikationsproblemen|
Completed Theses (Supervised by Bernd Bischl at TU Dortmund)
|Anwendung von Multilabel-Klassifikationsverfahren auf Medizingerätestatusreporte zur Generierung von Reparaturvorschlägen||MA||2015|
|Erweiterung der Plattform OpenML um Ereigniszeitanalysen||MA||2015|
|Modellgestützte Algorithmenkonfiguration bei Feature-basierten Instanzen: Ein Ansatz über das Profile-Expected-Improvement||Dipl.||2015|
|Modellbasierte Hyperparameteroptimierung für maschinelle Lernverfahren auf großen Daten||MA||2015|
|Implementierung einer Testsuite für mehrkriterielle Optimierungsprobleme||BA||2014|
|R-Pakete für Datenmanagement und -manipulation großer Datensätze||BA||2014|
|Lokale Kriging-Verfahren zur Modellierung und Optimierung gemischter Parameterräume mit Abhängigkeitsstrukturen||BA||2014|
|Kostensensitive Algorithmenselektion für stetige Black-Box-Optimierungsprobleme basierend auf explorativer Landschaftsanalyse||MA||2013|
|Exploratory Landscape Analysis für mehrkriterielle Optimierungsprobleme||MA||2013|
|Feature-based Algorithm Selection for the Traveling-Salesman-Problem||BA||2013|
|Implementierung und Untersuchung einer parallelen Support Vector Machine in R||Dipl.||2013|
|Sequential Model-Based Optimization by Ensembles: A Reinforcement Learning Based Approach||Dipl.||2012|
|Vorhersage der Verkehrsdichte in Warschau basierend auf dem Traffic Simulation Framework||BA||2011|
|Klassifikation von Blutgefäßen und Neuronen des menschlichen Gehirns anhand von ultramikroskopierten 3D-Bilddaten||BA||2011|
|Uncertainty Sampling zur Auswahl optimaler Sampler aus der trunkierten Normalverteilung||BA||2011|
|Over-/Undersampling für unbalancierte Klassifikationsprobleme im Zwei-Klassen-Fall||BA||2010|