Yudi Zhang (张钰荻)

I am a final-year Ph.D. student at Eindhoven University of Technology, under the supervision of Mykola Pechenizkiy and Meng Fang. I am also very fortunate to work closely with Prof. Yali Du at King’s College London, and Prof. Biwei Huang at University of California San Diego. I am currently a visiting student at the Max Planck Institute for Intelligent Systems, where I work on the intersection of reinforcement learning (RL) and large language models (LLMs), under the supervision of Dr. Shiwei Liu. I was an intern at Microsoft, mentored by Dr. Lu Wang. I obtained my Master's and Bachelor's degrees at Shandong University (SDU), supervised by Prof. Wei Zhang.

My research focuses on reinforcement learning, especially RL for LLMs and causal RL.

I am currently on the job market and actively looking for collaboration and visiting opportunities. If you are interested, feel free to contact me. Email

✨ News

May 2026One paper was accepted by RLC 2026. See you in Canada!
May 2026One paper was accepted by ICML 2026. See you in Korea!
Apr 2026Start a new journey at the Max Planck Institute for Intelligent Systems!
Feb 2026One paper was accepted by AAMAS 2026.
Feb 2026I’m co-organizing a tutorial on reward modeling for LLMs at CPAL — see you in Tübingen! Slides
Oct 2025One paper was accepted to NeurIPS 2025 as spotlight. ✨
May 2025I will give a tutorial at the OxML Summer School. 🧑‍🏫
May 2025Two papers were accepted by TMLR.
Jan 2025RuAG was accepted to ICLR 2025. 🚀
Oct 2024MACCA and CAST were accepted to the NeurIPS 2024 CRL Workshop.
Oct 2024I will give a talk at the Women in AI & Robotics Reading Group.
Dec 2023One paper was accepted to AAAI 2024.
Oct 2023I will give a talk at RLChina. 💬
Sep 2023Two papers were accepted to NeurIPS 2023. 🎉
Oct 2022I started my PhD journey at TU/e. 🌱

🧑‍💻 Internship

Apr 2026 – present Max Planck Institute for Intelligent Systems — visiting student, supervised by Dr. Shiwei Liu.
Mar – Oct 2024 Microsoft — research intern, mentored by Dr. Lu Wang.

📚 Service & activities

Reviewer: TMLR, IEEE Transactions on Artificial Intelligence, NeurIPS, ICML, ICLR, ACL, AAAI, AISTATS, AAMAS.
Tutorial: Reward Modeling in Large Language Models: Principles, Methods, and Challenges (CPAL 2026). Slides
Teaching assistant: Generative AI in OxML 2024; 2IIG0 Data Mining and Machine Learning (2025).
Supervised MSc theses:
- Olivier T. Schipper (Apr 2025), PillagerBench: a benchmark and framework for competitive multi-agent Minecraft environments, published in IEEE CoG.
- Niels P.G.T. van Beuningen (Jul 2025), HearthGym: A Gymnasium Benchmark for Advanced Hearthstone AI Research.
- Dirk Michielsen (Feb 2026), HearthstoneGUI: GUI Agent for Hearthstone.
- Lan Xie (ongoing).
Leadership: Vice President, Student Union, School of Control Science and Engineering, Shandong University (2018); Captain (Deputy Head), “Lianxin” Volunteer Teaching Program, Shandong University (2018); Class Monitor, Automation Class 1 (Cohort 2015), Shandong University (2015–2019).

🌟 Awards

Travel awards: NeurIPS 2023, ICLR 2025.
Honors: Outstanding Graduate of Shandong Province (2019).
Competitions: 2nd Prize, Chinese Graduate Mathematical Modeling Competition (2019); 1st Prize, National Electronic Design Competition, Shandong Province (2017); Champion, International Aquatic Robot Competition (2018, 2019).
Scholarships: First-Class Scholarship (2017–2021); Outstanding Student Special Scholarship (2019, top 2%), etc.

💻 Programming skills

Languages: Python, C/C++, Bash.
ML / LLM tooling: PyTorch, TensorFlow/Keras, TRL, Verl, MS-Swift, PEFT/LoRA, vLLM.
NLP / RL algorithms: PPO/GRPO, DPO, RLOO, A3C, SAC, DDPG.
Systems & robotics: Linux, Git, Docker, Gym/Gymnasium, ROS, NVIDIA Jetson Xavier.
Compute & platforms: Snellius, Slurm, H100/A100/V100/RTX 4090/2080 Ti.