Zhi Zhang
PhD Candidate@UCLA. Reinforcement Learning, LLM Post-Training, Agentic AI.
👋 Hi! I am Zhi Zhang (Zach), a PhD candidate in the Department of Statistics and Data Science at UCLA. I’m fortunate to be advised by Prof. Arash Amini.
Previously, I conducted an Applied Scientist internship at AWS AI (Summer 2025), where I developed AERO (Adaptive Efficient Rollout Optimization) for RL-based LLM fine-tuning, and an AI Research internship at eBay (Summer 2025), where I built ReflexAgent for agentic NER. I hold degrees from Northwestern University (Ph.D. in CS, completed with M.S.), UC Davis (M.S. in Statistics), and Georgia Tech (M.S. in CS).
My research focuses on LLM post-training and RL fine-tuning (GRPO, PPO, RLHF), compute efficiency optimization, Agentic AI systems, and multi-agent reinforcement learning. I’m particularly interested in:
- RL Post-Training for LLMs: Developing efficient algorithms for reinforcement learning-based fine-tuning of large language models
- LLM Efficiency: Reducing computational costs while maintaining or improving model performance
- Agentic AI: Building intelligent agents that can reason, plan, and execute complex tasks
- Multi-Agent RL: Designing algorithms for cooperative and competitive multi-agent systems
I have published at top venues including ICLR (Spotlight), AISTATS, NeurIPS, and AAMAS, and serve as a reviewer for ICML, NeurIPS, AISTATS, and ICLR.
News
| Jan 26, 2026 | Two papers accepted to ICLR 2026! 🎉 |
|---|---|
| Oct 1, 2025 | 🔬 Completed Applied Scientist internship at AWS AI! Developed AERO for efficient RL-based LLM fine-tuning. |
| Jan 15, 2025 | 📄 Two papers accepted to AISTATS 2025: Quantile Additive Trend Filtering and Lifelong RL with PAC-Bayes! |
Selected Publications
-
Preprint Train Less, Learn More: Adaptive and Efficient Rollout Optimization for Group-Based Reinforcement Learning2026 -
arXiv -
ICLR EDIVAL-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn EditingIn International Conference on Learning Representations 2026 -
ICLR Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward FunctionsIn International Conference on Learning Representations 2026 -
AISTATS Quantile Additive Trend FilteringIn International Conference on Artificial Intelligence and Statistics 2025 -
arXiv -
AISTATS Statistical Guarantees for Lifelong Reinforcement Learning Using PAC-Bayes TheoryIn International Conference on Artificial Intelligence and Statistics 2025 -
ICLR Reinforcement Learning under a Multi-agent Predictive State Representation Model: Method and TheoryIn International Conference on Learning Representations 2022 -
AAMAS Integrating Independent and Centralized Multi-agent Reinforcement Learning for Traffic Signal Network OptimizationIn International Conference on Autonomous Agents and Multi-Agent Systems 2020 -
AISTATS Multivariate Time Series Forecasting by Graph Attention Networks with Theoretical GuaranteesIn International Conference on Artificial Intelligence and Statistics 2024
Education
- University of California, Los Angeles, 2022.09 - Present
PhD in Statistics and Data Science - Northwestern University, 2020 - 2022
PhD in Computer Science (completed with M.S.)
Experience
- Amazon AWS AI Labs, Applied Scientist Intern, Summer 2025
- eBay, AI Research Intern, Summer 2024