Zihan Wang is an AI researcher at Northwestern University, where he works on vision-language models, robotics, and reinforcement learning. Previously, he interned at DeepSeek, contributing to projects like DeepSeek-V2.
Zihan's homepage: https://zihanwang314.github.io/
(00:00) - Introduction
(01:13) - Zihan's Background, CS and AI Research in China
(11:09) - DeepSeek; Human capital flow from PRC to US
(16:07) - DeepSeek, Open Source and AI Research
(31:52) - Model Size and Performance Constraints
(33:01) - Data Bottleneck in Pre-trained Models
(34:12) - Transformer Architecture and Scaling Laws
(36:30) - Efficiency in Model Training
(47:44) - Chain of Experts Architecture
(01:01:06) - Future of AI and Robotics
Audio-only version and transcript:
https://www.manifold1.com/episodes/robots-small-models-and-rl-with-deepseek-alumnus-zihan-wang-86
Fantastic discussion, especially the SLM/COE part.
It’s always very interesting what clever people when come up with when they are faced with constraints in technology, resources, etc.