Stefan Baumann

I'm an ELLIS PhD student in the CompVis group, advised by Björn Ommer (LMU) and Peter Kontschieder (Meta). My research focuses on furthering our understanding of generative models (primarily diffusion) and how we can use these insights to excert more control over them.

Prior to my PhD, I've spent a wonderful time as a ML research intern at Intel Labs (Emergent AI Research, German Ros, Benjamin Ummenhofer) and Sony (AI Speech and Sound Group, Stefan Uhlich, Giorgio Fabbro, Thomas Kemp), and as a ML researcher at a startup (Duc Tam Nguyen). I obtained my Bachelor's in EECS and my Master's in Signal Processing and Machine Learning at KIT.

I'm always open to collaborations and supervising Master's theses (primarily at LMU Munich & TU Munich, but potentially also other institutions). Just drop me an email :)

Email  /  LinkedIn  /  Scholar  /  Twitter  /  GitHub

profile photo



I'm interested in generative AI and its applications to computer vision. Representative papers are highlighted.

CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models
Nick Stracke, Stefan Andreas Baumann, Joshua M Susskind, Miguel Angel Bautista, Björn Ommer
Preprint, 2024
Project Page / arXiv / Code

LoRAs don't have to be static! They can also introduce new conditioning into foundation models more efficiently and effectively than previous methods.

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Vincent Tao Hu, Björn Ommer
Preprint, 2024
Project Page / arXiv / Code / Colab / Twitter

T2I diffusion models alredy knew how to do fine-grained control, we just had to learn out how to leverage this capability.

ZigMa: Zigzag Mamba Diffusion Model
Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Björn Ommer
Preprint, 2024
Project Page / arXiv / Code / Twitter

Scan order matters for SSMs in vision tasks.

DepthFM: Fast Monocular Depth Estimation with Flow Matching
Ming Gui*, Johannes Fischer*, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer
Preprint, 2024
Project Page / arXiv / Code / Twitter

Efficient generative monocular depth estimation via flow matching from noisy RGB to depth.

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Katherine Crowson*, Stefan Andreas Baumann*, Alex Birch*, Tanishq Mathew Abraham, Daniel Z Kaplan, Enrico Shippole
ICML, 2024
Project Page / arXiv / Code / Twitter

Incredibly efficient transformer-based image diffusion models that can generate megapixel-resolution images in pixel space.

Boosting Latent Diffusion with Flow Matching
Johannes Fischer*, Ming Gui*, Pingchuan Ma*, Nick Stracke, Stefan Andreas Baumann, Björn Ommer
Preprint, 2023
Project Page / arXiv / Code

Making high-resolution T2I diffusion fast and increasing resolutions to multiple megapixels by adding flow matching-based superresolution stages in latent space.

This website is based on Jon Barron's website, whose code is available on GitHub.