Yichen Li

I am a final-year Ph.D. candidate in EECS at MIT CSAIL, advised by Prof. Antonio Torralba. My Ph.D. research focuses on generative foundation models for multimodal perception and interactive world understanding.

I enjoy principled development towards:

  • Multimodal and World Model: approaches from video models, interactions, and multimodality.
  • Post-training and RL: generic RL alg. for faster convergence with zeroth-order and dense backprop.
  • Model Architecture: effective mechanisms of learning and hardware-inspired architecture design.

Recognizing the difficulty for Academic ML research, I started an Open Research Seeds effort to help junior students. Before coming to MIT, I worked with Prof. Leonidas Guibas and Prof. Gordon Wetzstein at Stanford.

Email: yichenl [at] mit [dot] edu

Google Scholar  /  Twitter /  GitHub

profile photo

Photo credit: Jiayuan Mao

New
Open Research Seeds

The Open Research Seeds effort is to offer some delibrately unconventional research ideas to help junior students to start in resource scarse academia.

Technical and Perspective Blogs
Projective Attention and Projective Lesson of Attention

A geometric lens on attention and controlled architecture experiments, including key-normalized variants and random/shuffled controls.

ESES: Efficient and Stable Evolutionary RL for LLM Post-training

Explores sample-efficient evolutionary strategies for language-model post-training through low-rank parameter updates and quantization-aware variants.

VARL: Reinforcing Video Autoregressive Generation

Studies rollout-efficient reinforcement learning for autoregressive video generation by densifying reward feedback over shorter generated blocks.

Publication
Advantage Weighted Matching thumbnail

Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models rl
Shuchen Xue, Chongjian Ge, Shilong Zhang, Yichen Li, Zhi-Ming Ma
ICML, 2026
[paper]   [code]

MultiModal Action Conditioned Video Generation video multimodal
Yichen Li, Antonio Torralba
ICCV, 2025
[paper]   [project page]   [code]

Generalized Dynamics Generation towards Physical World Model physics video robotics
Yichen Li, Zhiyi Li, Brandon Feng, Antonio Torralba
Preprint, 2025
[paper]   [project page]

Learning to Jointly Understand Visual and Tactile Signals multimodal
Yichen Li, Yilun Du, Chao Liu, Chao Liu, Mike Foshey, Francis Williams, Joshua B. Tenenbaum, Wojciech Matusik, Antonio Torralba
ICLR, 2024
[paper]   [project page]   [dataset]

Category-Level Multi-Part Multi-Joint 3D Shape Assembly robotics
Yichen Li, Kaichun Mo, Yueqi Duan, He Wang, Jiequan Zhuang, Lin Shao, Wojciech Matusik, Leonidas Guibas
CVPR, 2024
[paper]   [project page]   [data]   [code]

Learning Preconditioners for Conjugate Gradient PDE Solver physics
Yichen Li, Peter Yichen Chen, Tao Du, Wojciech Matusik
ICML, 2023
[paper]   [video]   [project page]   [code]

Revisiting Image-Language for Open-ended Phrase Detection multimodal
Bryan Plummer, Kevin Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko
TPAMI, 2019
[paper]

ASAP: Automated Sequence Planning for Complex Assembly with Physical Feasibility robotics
Yunsheng Tian, Karl D.D. Willis, Bassel Al Omari, Jieliang Luo, Pingchuan Ma, Yichen Li, Farhad Javid, Edward Gu, Joshua Jacob, Shinjiro Sueda, Hui Li, Sachin Chitta, Wojciech Matusik
ICRA, 2024
[paper]   [project page]   [dataset]   [code]

Assemble Them All: Physics-Based Planning for Generalizable Assembly by Disassembly robotics
Yunsheng Tian, Jie Xu, Yichen Li, Jieliang Luo, Shinjiro Sueda, Hui Li, Karl D.D. Willis, Wojciech Matusik
Siggraph Asia, 2022
[paper]   [project page]   [code]

3D Part Assembly from A Single Image other
Yichen Li*, Kaichun Mo*, Lin Shao, Minhyuk Sung, Leonidas Guibas
ECCV, 2020
[paper]   [project page]   [code]

Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation other
Xingchao Peng*, Yichen Li*, Kate Saenko
ECCV, 2020
[paper]   [project page]   [code]

Professional Experience
NVIDIA Research — Summer 2024

Built a unified physics simulation framework for soft, articulated, and rigid body dynamics. Designed anisotropic Young's modulus learning across physics regimes. 51% error reduction over team baseline.

Adobe Research — Summer 2025

Built RL post-training systems for video diffusion models including zeroth-order evolutionary optimization and dense per-frame reward methods.

NVIDIA Research — Summer 2023

Built a Gaussian kernel-based architecture for fast point cloud processing as a PointNet replacement.

Adobe Research — Summer 2021

Built a video layer decomposition system using source separation methods.

NVIDIA Research — Summer 2020

Built a point cloud completion system utilizing raycasting-based data generation. US Patent filed.


Academic Services
Workshop Organizer: CVPR 2026 Multimodal Learning, Sense of Space, RSS 2025: Multimodal & MultiSensory Robotics, ECCV 2024: Geometry in the Large Model Era
Conference Reviewer: CVPR, ICCV, ECCV, ICML, ICLR, NeurIPS, ACM SIGGRAPH
Journal Reviewer: ACM TOG, IEEE-TPAMI

Random Thoughts

Teaching
cs231n CS231N: Convolutional Neural Networks for Visual Recognition (Spring 2021)

Course Assistant (CA)

cs468 CS468: Geometric Algorithms: Non-Euclidean Methods (Fall 2020)

Course Assistant (CA)

cs898 6.S898: Deep Learning (Fall 2023)

Course Assistant (CA)


Awards
  • Robert J. Shillman Fellowship
  • College Prize For Excellence in Computer Science (GPA Rank: 1st)