Natural Language Processing

Research

This group focuses on methodological and applied research in the context of natural language processing (NLP), including (but not limited to) the following topics:

We have ongoing collaborations with the Bavarian Academy of Sciences (M. Schöffel), the MISODA working group at LMU (C. Heumann, E. Garces Arias), and the University of Applied Sciences Munich (V. Thurner, S. Thiemichen, S. Urchs).

Teaching

We are actively developing the Deep Learning for Natural Language Processing (DL4NLP) course together with colleagues from LMU Munich and the University of Vienna.

Members

Name       Position
Dr. Matthias Aßenmacher       Lead
Helen Alber       PhD Student
Esteban Garces Arias       (External) Collaborating PhD Student
Matthias Schöffel       (External) Collaborating PhD Student
Stefanie Urchs       (External) Collaborating PhD Student
Michael Sawitzki       Student Assistant (DL4NLP Lecture)

Students / Thesis supervision

Supervised Theses/Projects (since 01/2022)
TitleTypeCompleted
Combining Large Language Models and Topic Clustering for Metadata-enriched Temporal Evolution Path DetectionMA2025
Comparing Metrics for the Evaluation of Decoding Strategies on Text Summarization TasksBA2025
Guided Topic Modeling for Customer Feedback Analysis: Incorporating Prior Information and Document-specific CovariatesMA2025
Enhancing Information Retrieval Via Cognitively Motivated Document ExpansionMA2024
Übersicht über zertifizierte industrielle KI-ProdukteConsulting2024
Clustering Embeddings from Large Language Models for Retrospective Event DetectionMA2024
Synthetic Opinions: Utilizing Large Language Models for Generating Responses to Open-Ended Survey QuestionsMA2024
Text-based geographical assignment of tweetsConsulting2024
Exploring Hyperparameter Selection Strategies for Topic Clustering with Large Language ModelsMA2024
Exploring Strategies for Informed Initial Pool Selection in Deep Active Learning with Pre-Trained Language ModelsMA2024
NLP in Insurance: Leveraging Language Models to Automatize Disease ClassificationConsulting2024
Automatic transcription of handwritten Franconian using Deep LearningMA2024
Advanced Knowledge Editing in Large Language ModelsMA2023
Robust, Explainable, and Unbiased Text Classification of Insurance ClaimsMA2023
Topic Classification of News HeadlinesConsulting2023
Integrating Domain Knowledge into Transformer-based Approaches to Vulnerability DetectionMA2023
Transformer-Based Language Models for Multiple Choice Question AnsweringBA2023
Interslavic Natural Language Processing MA2023
A Comparative Study of Large Language Models for Text-to-Code GenerationBA2023
ICON: ICD-10 Coding using Natural Language Processing MA2023
Quantization in Large Language ModelsMA2023
Natural Language Processing for Systematic Literature Reviews: An Application to Immersive Design ResearchConsulting2023
Digitizing Handwritten Old Occitan Cards using Vision and Language ModelsMA2023
Enhancing stance prediction by utilizing party manifestosMA2023
A tailored OCR-System for the Medieval Latin Dictionary for the Bavarian Academy of Sciences and HumanitiesConsulting2023
Application of neural topic models to Twitter data from German politiciansBA2023
Examining and Mitigating Gender Bias in German Word EmbeddingsBA2023
Domain transfer across country, time, and modality in multiclass classificationBA2022
How Different is Stereotypical Bias in Different Languages? Analysis of Multilingual Language ModelsMA2022
Predicted Sentiments of Customer Texts as Covariates for Time Series ForecastingMA2022
A Comparative Evaluation of the Utility of linguistic Features for Part-of-Speech-TaggingBA2022
Evaluating pre-trained language models on partially unlabeled multilingual economic corporaMA2022
Leveraging pairwise constraints for topic discovery in weakly annotated text dataMA2022
Word Embedding Evaluation with Intrinsic EvaluatorsMA2022
A selection of older theses/projects supervised (partly) together with Christian Heumann can be found here.

Publications

  1. Schöffel M, Wiedener M, Arias EG, Ruppert P, Heumann C, Aßenmacher M (2025) Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan Accepted at The 5th International Conference on Natural Language Processing for Digital Humanities,
    link|pdf
    .
  2. Mironov M, Marquard A, Racek D, Heumann C, Thurner PW, Aßenmacher M (2025) A Geoparsing Pipeline for Multilingual Social Media Posts from Ukraine Accepted at The Third International Workshop on Geographic Information Extraction from Texts (GeoExT),
  3. Wuttke A, Aßenmacher M, Klamm C, Lang MM, Würschinger Q, Kreuter F (2025) AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers Accepted at The 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature,
    link|pdf
    .
  4. Garces Arias E, Li M, Heumann C, Assenmacher M (2025) Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation Proceedings of the 31st International Conference on Computational Linguistics, pp. 9992–10020. Association for Computational Linguistics, Abu Dhabi, UAE.
    link|pdf
    .
  5. Ma B, Yoztyurk B, Haensch A-C, Wang X, Herklotz M, Kreuter F, Plank B, Assenmacher M (2024) Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study.
    link|pdf
    .
  6. Garces Arias E, Rodemann J, Li M, Heumann C, Aßenmacher M (2024) Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation Findings of the Association for Computational Linguistics: EMNLP 2024, pp. 15060–15080. Association for Computational Linguistics, Miami, Florida, USA.
    link|pdf
    .
  7. Garces Arias E, Blocher H, Rodemann J, Li M, Heumann C, Aßenmacher M (2024) Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework.
    link|pdf
    .
  8. Aßenmacher M, Karrlein L, Schiele P, Heumann C (2024) Introducing wwm-german-18k - Can LLMs Crack the Million? (Or Win at Least 500 Euros?) Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024), pp. 287–296. Association for Computational Linguistics, Trento.
    link|pdf
    .
  9. Stephan A, Zhu D, Aßenmacher M, Shen X, Roth B (2024) From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks.
    link|pdf
    .
  10. Urchs S, Thurner V, Aßenmacher M, Heumann C, Thiemichen S (2024) Detecting Gender Discrimination on Actor Level Using Linguistic Discourse Analysis Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pp. 140–149. Association for Computational Linguistics, Bangkok, Thailand.
    link|pdf
    .
  11. Aßenmacher M, Stephan A, Weissweiler L, Çano E, Ziegler I, Härttrich M, Bischl B, Roth B, Heumann C, Schütze H (2024) Collaborative Development of Modular Open Source Educational Resources for Natural Language Processing Proceedings of the Sixth Workshop on Teaching NLP, pp. 43–53. Association for Computational Linguistics, Bangkok, Thailand.
    link|pdf
    .
  12. Pavlopoulos J, Kougia V, Garces Arias E, Platanou P, Shabalin S, Liagkou K, Papadatos E, Essler H, Camps J-B, Fischer F (2024) Challenging Error Correction in Recognised Byzantine Greek Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024), pp. 1–12. Association for Computational Linguistics, Bangkok, Thailand.
    link|pdf
    .
  13. Mittermeier A, Aßenmacher M, Schachtner B, Grosu S, Dakovic V, Kandratovich V, Sabel B, Ingrisch M (2024) Automatische ICD-10-Codierung. Die Radiologie, 1–7.
    link
    .
  14. Deiseroth B, Meuer M, Gritsch N, Eichenberg C, Schramowski P, Aßenmacher M, Kersting K (2024) Divergent Token Metrics: Measuring degradation to prune away LLM components – and optimize quantization Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 6764–6783. Association for Computational Linguistics, Mexico City, Mexico.
    link|pdf
    .
  15. Mayer L, Heumann C, Aßenmacher M (2024) Can OpenSource beat ChatGPT? - A Comparative Study of Large Language Models for Text-to-Code Generation Proceedings of the 9th edition of the Swiss Text Analytics Conference, pp. 1–20. Association for Computational Linguistics, Chur, Switzerland.
    link|pdf
    .
  16. Aßenmacher M, Sauter N, Heumann C (2024) Classifying multilingual party manifestos: Domain transfer across country, time, and genre Proceedings of the 9th edition of the Swiss Text Analytics Conference, pp. 21–31. Association for Computational Linguistics, Chur, Switzerland.
    link|pdf
    .
  17. Debelak R, Koch T, Aßenmacher M, Stachl C (2024) From Embeddings to Explainability: A Tutorial on Transformer-Based Text Analysis for Social and Behavioral Scientists.
    link
    .
  18. Gruber C, Hechinger K, Aßenmacher M, Kauermann G, Plank B (2024) More Labels or Cases? Assessing Label Variation in Natural Language Inference Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language, pp. 22–32. Association for Computational Linguistics, Malta.
    link|pdf
    .
  19. Garces Arias E, Pai V, Schöffel M, Heumann C, Aßenmacher M (2023) Automatic Transcription of Handwritten Old Occitan Language Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 15416–15439. Association for Computational Linguistics, Singapore.
    link|pdf
    .
  20. Öztürk IT, Nedelchev R, Heumann C, Garces Arias E, Roger M, Bischl B, Aßenmacher M (2023) How Different Is Stereotypical Bias Across Languages? 3rd Workshop on Bias and Fairness in AI (co-located with ECML-PKDD 2023),
    link|pdf
    .
  21. Witte M, Schwenzow J, Heitmann M, Reisenbichler M, Aßenmacher M (2023) Potential for Decision Aids based on Natural Language Processing Proceedings of the European Marketing Academy, 52nd, (114322),
    link|pdf
    .
  22. Aßenmacher M, Rauch L, Goschenhofer J, Stephan A, Bischl B, Roth B, Sick B (2023) Towards Enhancing Deep Active Learning with Weak Supervision and Constrained Clustering Proceedings of the 7th Workshop on Interactive Adaptive Learning (co-located with ECML-PKDD 2023),
    link|pdf
    .
  23. Akkus C, Chu L, Djakovic V, Jauch-Walser S, Koch P, Loss G, Marquardt C, Moldovan M, Sauter N, Schneider M, Schulte R, Urbanczyk K, Goschenhofer J, Heumann C, Hvingelby R, Schalk D, Aßenmacher M (2023) Multimodal Deep Learning. arXiv preprint arXiv:2301.04856.
    link|pdf
    .
  24. Koch P, Nuñez GV, Garces Arias E, Heumann C, Schöffel M, Häberlin A, Aßenmacher M (2023) A tailored Handwritten-Text-Recognition System for Medieval Latin First Workshop on Ancient Language Processing (ALP 2023),
    link|pdf
    .
  25. Rauch L, Aßenmacher M, Huseljic D, Wirth M, Bischl B, Sick B (2023) ActiveGLAE: A Benchmark for Deep Active Learning with Transformers Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023,
    link|pdf
    .
  26. Schulze P, Wiegrebe S, Thurner PW, Heumann C, Aßenmacher M (2023) A Bayesian approach to modeling topic-metadata relationships. AStA Advances in Statistical Analysis 108, 333–349.
    link
    .
  27. Urchs S, Thurner V, Aßenmacher M, Heumann C, Thiemichen S (2023) How Prevalent is Gender Bias in ChatGPT? - Exploring German and English ChatGPT Responses 1st Workshop on Biased Data in Conversational Agents (co-located with ECML-PKDD 2023),
    link|pdf
    .
  28. Aßenmacher M, Dietrich M, Elmaklizi A, Hemauer EM, Wagenknecht N (2022) Whitepaper: New Tools for Old Problems.
    link
    .
  29. Koch P, Aßenmacher M, Heumann C (2022) Pre-trained language models evaluating themselves - A comparative study Proceedings of the Third Workshop on Insights from Negative Results in NLP, pp. 180–187. Association for Computational Linguistics, Dublin, Ireland.
    link|pdf
    .
  30. Lebmeier E, Aßenmacher M, Heumann C (2022) On the current state of reproducibility and reporting of uncertainty for Aspect-based Sentiment Analysis Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), Springer International Publishing, Grenoble, France.
    pdf
    .
  31. Goschenhofer J, Ragupathy P, Heumann C, Bischl B, Aßenmacher M (2022) CC-Top: Constrained Clustering for Dynamic Topic Discovery Workshop on Ever Evolving NLP (EvoNLP), Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
    link|pdf
    .