Stefan Baumann

I'm an ELLIS PhD student in the CompVis group, advised by Björn Ommer (LMU) and Peter Kontschieder (Meta). My research focuses on furthering our understanding of generative models (primarily diffusion) and how we can use these insights to excert more control over them.

Prior to my PhD, I've spent a wonderful time as a ML research intern at Intel Labs (Emergent AI Research, German Ros, Benjamin Ummenhofer) and Sony (AI Speech and Sound Group, Stefan Uhlich, Giorgio Fabbro, Thomas Kemp), and as a ML researcher at a startup (Duc Tam Nguyen). I obtained my Bachelor's in EECS and my Master's in Signal Processing and Machine Learning at KIT.

I'm always open to collaborations and supervising Master's theses for exceptionally experienced and motivated students (primarily at LMU Munich & TU Munich, but potentially also other institutions). Just drop me an email :)

Email / LinkedIn / Scholar / Twitter / Bluesky / GitHub

Updates

June 2025 One first-author paper (preprint out soon!) and a paper on extremey fast diffusion training accepted to ICCV 2025! See you in Honolulu!
February 2025 Two first-author papers on fine-grained diffusion control and noise-free diffusion features (as an Oral) accepted to CVPR 2025!
December 2024 One paper accepted to AAAI 2025 as an Oral!
July 2024 Three papers were accepted to ECCV 2024! 🎉
May 2024 Our work on ⏳ Hourglass Diffusion Transformers was accepted to ICML 2024!
November 2023 I started my PhD as an ELLIS student in the CompVis lab @ LMU Munich.

Research

I'm interested in generative AI and its applications to computer vision. Representative papers are highlighted.

	TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training Felix Krause, Timy Phan, Ming Gui, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer ICCV, 2025 arXiv / Code Extremely fast diffusion transformer training via token routing.
	CleanDIFT: Diffusion Features without Noise Nick Stracke, Stefan Andreas Baumann, Kolja Bauer, Frank Fundel, Björn Ommer CVPR, 2025 (Oral)* Project Page / arXiv / Code / Twitter Better diffusion features by eliminating the need to add noise.
	Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Melvin Sevi Vincent Tao Hu, Björn Ommer CVPR, 2025 Project Page / arXiv / Code / Colab / Twitter T2I diffusion models already knew how to do fine-grained control, we just had to learn how to leverage this capability.
	DepthFM: Fast Monocular Depth Estimation with Flow Matching Ming Gui, Johannes Fischer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer AAAI, 2025 (Oral) Project Page / arXiv / Code / Twitter Efficient generative monocular depth estimation via flow matching from noisy RGB to depth.
	CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models Nick Stracke, Stefan Andreas Baumann, Joshua M Susskind, Miguel Angel Bautista, Björn Ommer ECCV, 2024 Project Page / arXiv / Code LoRAs don't have to be static! They can also introduce new conditioning into foundation models more efficiently and effectively than previous methods.
	ZigMa: Zigzag Mamba Diffusion Model Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Björn Ommer ECCV, 2024 Project Page / arXiv / Code / Twitter Scan order matters for SSMs in vision tasks.
	Boosting Latent Diffusion with Flow Matching Johannes Fischer, Ming Gui, Pingchuan Ma, Nick Stracke, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer ECCV, 2024 (Oral)* Project Page / arXiv / Code Making high-resolution T2I diffusion fast and increasing resolutions to multiple megapixels by adding flow matching-based superresolution stages in latent space.
	Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Katherine Crowson, Stefan Andreas Baumann*, Alex Birch, Tanishq Mathew Abraham, Daniel Z Kaplan, Enrico Shippole ICML, 2024 Project Page / arXiv / Code / Twitter Incredibly efficient transformer-based image diffusion models that can generate megapixel-resolution images in pixel space.

This website is based on Jon Barron's website, whose code is available on GitHub.