Intro
Hi! — I'm Zixuan Zhang, an LLM developer working on post-training, RL, and agents for Rufus foundation models at Amazon.
I am a major contributor to the post-training and RL recipe design for Rufus model releases from 2024 to 2026, covering both in-house pre-trained and open-source base models, dense to ultra-sparse MoE architectures, and small, mid, to large model sizes. In particular, I am leading the RL recipe development for reasoning and agentic coding, achieving state-of-the-art performance on math and STEM reasoning, competitive coding, and terminal-based agentic task solving. My work focuses on synthetic data generation, RL recipe development, and environment design and scaling across reasoning, coding, and many other product-specific agentic applications in shopping scenarios.
I received my Ph.D. in Computer Science from UIUC, advised by Prof. Heng Ji. I am honored to have my Ph.D. thesis Completing the knowledge lifecycle for language models advised by Prof. Chengxiang Zhai, Prof. Hanghang Tong, Dr. Kevin Small, and Dr. Scott Wen-tau Yih. Before that, I received my B.S. in Computer Science from Shanghai Jiao Tong University (SJTU), where I did deep reinforcement learning research under the supervision or Prof. Junchi Yan.
Here is my CV (last updated Apr 02 2026).
What I have worked on:
-
Synthetic Data Generation
- Synthesize hard and challenging math/STEM problems and agentic coding tasks
- Data filtering, curation, rejection sampling, and evaluation pipelines to ensure data quality
-
RL Recipe Development
- RL algorithms and training configurations
- Dynamic sampling, multi-source data mixing, curriculum learning, and difficulty/quality controls
- Training/inference collaborations, unbiased data middlewares, and on-policy staleness controls
-
Environment Design and Scaling
- Designing verifiable and non-verifiable graders for more reliable reward
- Implementing and scaling sandboxed and containerized environments for agentic tasks
Contact
| |