Yichen Li

ICCV 2025 Multimodal Action Conditioned Video Generation Sole-authored end-to-end: jointly conditions on language, bounding-box trajectories, and reference frames. 40% FVD improvement, trained on 8×H100 GPUs. ICLR 2024 Learning Generic Multimodal Embeddings An Autodecoder way of learning one generic embedding of any given modalities of input that can serve as a extended embedding for any large foundation model. ICML 2026 AWM: Advantage Weighted Matching Established mathematical equivalence between DDPO and DSM objectives. Aligns RL fine-tuning with diffusion pretraining. Blog 2025 ESES: Efficient and Stable Evolutionary RL Zeroth-order evolutionary RL using LoRA perturbations and quantized base weights for memory-efficient population-based post-training. Blog 2026 Projective Attention A geometric lens on attention and controlled architecture experiments, including key-normalized variants and random/shuffled controls. CLI 2026 Hardware-Aware Architecture Compiler Hardware-ware model architecture calculus for architectures and deltas on the quality–latency–memory tradeoff pareto-front.

Open Research Seeds

The Open Research Seeds effort is to offer some delibrately unconventional research ideas to help junior researchers to start in resource scarce academia.

Technical and Perspective Blogs

Projective Attention and Projective Lesson of Attention

A geometric lens on attention and controlled architecture experiments, including key-normalized variants and random/shuffled controls.

ESES: Efficient and Stable Evolutionary RL for LLM Post-training

Explores sample-efficient evolutionary strategies for language-model post-training through low-rank parameter updates and quantization-aware variants.

VARL: Reinforcing Video Autoregressive Generation

Studies rollout-efficient reinforcement learning for autoregressive video generation by densifying reward feedback over shorter generated blocks.

Publication

	Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models rl Shuchen Xue, Chongjian Ge, Shilong Zhang, Yichen Li, Zhi-Ming Ma ICML, 2026 [paper] [code]
	MultiModal Action Conditioned Video Generation video multimodal Yichen Li, Antonio Torralba ICCV, 2025 [paper] [project page] [code]
	Generalized Dynamics Generation towards Physical World Model physics video robotics Yichen Li, Zhiyi Li, Brandon Feng, Antonio Torralba Preprint, 2025 [paper] [project page]
	Learning to Jointly Understand Visual and Tactile Signals multimodal Yichen Li, Yilun Du, Chao Liu, Chao Liu, Mike Foshey, Francis Williams, Joshua B. Tenenbaum, Wojciech Matusik, Antonio Torralba ICLR, 2024 [paper] [project page] [dataset]
	Category-Level Multi-Part Multi-Joint 3D Shape Assembly robotics Yichen Li, Kaichun Mo, Yueqi Duan, He Wang, Jiequan Zhuang, Lin Shao, Wojciech Matusik, Leonidas Guibas CVPR, 2024 [paper] [project page] [data] [code]
	Learning Preconditioners for Conjugate Gradient PDE Solver physics Yichen Li, Peter Yichen Chen, Tao Du, Wojciech Matusik ICML, 2023 [paper] [video] [project page] [code]
	Revisiting Image-Language for Open-ended Phrase Detection multimodal Bryan Plummer, Kevin Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko TPAMI, 2019 [paper]
	ASAP: Automated Sequence Planning for Complex Assembly with Physical Feasibility robotics Yunsheng Tian, Karl D.D. Willis, Bassel Al Omari, Jieliang Luo, Pingchuan Ma, Yichen Li, Farhad Javid, Edward Gu, Joshua Jacob, Shinjiro Sueda, Hui Li, Sachin Chitta, Wojciech Matusik ICRA, 2024 [paper] [project page] [dataset] [code]
	Assemble Them All: Physics-Based Planning for Generalizable Assembly by Disassembly robotics Yunsheng Tian, Jie Xu, Yichen Li, Jieliang Luo, Shinjiro Sueda, Hui Li, Karl D.D. Willis, Wojciech Matusik Siggraph Asia, 2022 [paper] [project page] [code]
	3D Part Assembly from A Single Image other Yichen Li, Kaichun Mo, Lin Shao, Minhyuk Sung, Leonidas Guibas ECCV, 2020 [paper] [project page] [code]
	Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation other Xingchao Peng, Yichen Li, Kate Saenko ECCV, 2020 [paper] [project page] [code]

Professional Experience

NVIDIA Research — Summer 2024

Built a unified physics simulation framework for soft, articulated, and rigid body dynamics. Designed anisotropic Young's modulus learning across physics regimes. 51% error reduction over team baseline.

Adobe Research — Summer 2025

Built RL post-training systems for video diffusion models including zeroth-order evolutionary optimization and dense per-frame reward methods.

NVIDIA Research — Summer 2023

Built a Gaussian kernel-based architecture for fast point cloud processing as a PointNet replacement.

Adobe Research — Summer 2021

Built a video layer decomposition system using source separation methods.

NVIDIA Research — Summer 2020

Built a point cloud completion system utilizing raycasting-based data generation. US Patent filed.

Academic Services

Workshop Organizer: CVPR 2026 Multimodal Learning, Sense of Space, RSS 2025: Multimodal & MultiSensory Robotics, ECCV 2024: Geometry in the Large Model Era
Conference Reviewer: CVPR, ICCV, ECCV, ICML, ICLR, NeurIPS, ACM SIGGRAPH
Journal Reviewer: ACM TOG, IEEE-TPAMI

Random Thoughts

弱水

Teaching
	CS231N: Convolutional Neural Networks for Visual Recognition (Spring 2021) Course Assistant (CA)
	CS468: Geometric Algorithms: Non-Euclidean Methods (Fall 2020) Course Assistant (CA)
	6.S898: Deep Learning (Fall 2023) Course Assistant (CA)

Awards

Robert J. Shillman Fellowship
College Prize For Excellence in Computer Science (GPA Rank: 1st)