Kaifeng Zhao - 赵铠枫

I am a third-year PhD student in the Computer Vision and Learning Group (VLG) at ETH Zürich, under the supervision of Prof. Siyu Tang. Additionally, I have had the pleasure of collaborating with Thabo Beeler. I obtained my Master's degree in Computer Science with distinction from ETH Zürich in 2022, and my Bachelor's degree in Computer Science from Beihang University in 2019.

My research focuses on the intersection of computer vision and computer graphics, particularly in human motion modeling and the synthesis of human-scene interaction behaviors. My doctoral research is supported by the Swiss Data Science Center (SDSC) PhD fellowship.

I'm actively looking for research internship opportunities in 2025. Please reach out if you have any openings that I might be a fit for.

  Email  /  Google Scholar  /    Twitter  /    LinkedIn  /    Github

profile photo

Publications


DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control
Kaifeng Zhao, Gen Li, Siyu Tang

DART achieves high-quality and efficient ( > 300 frames per second ) motion generation conditioned on online streams of text prompts. Furthermore, by integrating latent space optimization and reinforcement learning-based controls, DART enables various motion generation applications with spatial constraints and goals, including motion in-between, waypoint goal reaching, and human-scene interaction generation.

EgoGen: An Egocentric Synthetic Data Generator
Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys, Siyu Tang
CVPR, 2024 Oral Presentation

EgoGen is new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.

DIMOS: Synthesizing Diverse Human Motions in 3D Indoor Scenes
Kaifeng Zhao, Yan Zhang, Shaofei Wang, Thabo Beeler, Siyu Tang
ICCV, 2023

In this work, we propose a method to generate a sequence of natural human-scene interaction events in real-world complex scenes as illustrated in this figure. The human first walks to sit on a stool (yellow to red), then walk to another chair to sit down (red to magenta), and finally walk to and lie on the sofa (magenta to blue).

COINS: Compositional Human-Scene Interaction Synthesis with Semantic Control
Kaifeng Zhao, Shaofei Wang, Yan Zhang, Thabo Beeler, Siyu Tang
ECCV, 2022

We propose COINS, for COmpositional INteraction Synthesis with Semantic Control. Given a pair of action and object instance as the semantic specification, our method generates virtual humans naturally interacting with the scene objects.

Academic Service

Teaching


Template adapted from Siwei Zhang's website.