I am a final-year Ph.D. student at Eindhoven University of Technology, under the supervision of Mykola Pechenizkiy and Meng Fang. I am also very fortunate to work closely with Prof. Yali Du at King’s College London, and Prof. Biwei Huang at University of California San Diego. I am currently a visiting student at the Max Planck Institute for Intelligent Systems, where I work on the intersection of reinforcement learning (RL) and large language models (LLMs), under the supervision of Dr. Shiwei Liu. I was an intern at Microsoft, mentored by Dr. Lu Wang. I obtained my Master's and Bachelor's degrees at Shandong University (SDU), supervised by Prof. Wei Zhang.
My research focuses on reinforcement learning, especially RL for LLMs and causal RL.
I am currently on the job market and actively looking for collaboration and visiting opportunities. If you are interested, feel free to contact me. Email
✨ News
- Apr 2026Start a new journey at the Max Planck Institute for Intelligent Systems!
- Feb 2026One paper was accepted by AAMAS 2026.
- Feb 2026I’m co-organizing a tutorial on reward modeling for LLMs at CPAL — see you in Tübingen! Slides
- Oct 2025One paper was accepted to NeurIPS 2025 as spotlight. ✨
- May 2025I will give a tutorial at the OxML Summer School. 🧑🏫
- May 2025Two papers were accepted by TMLR.
- Jan 2025RuAG was accepted to ICLR 2025. 🚀
- Oct 2024MACCA and CAST were accepted to the NeurIPS 2024 CRL Workshop.
- Oct 2024I will give a talk at the Women in AI & Robotics Reading Group.
- Dec 2023One paper was accepted to AAAI 2024.
- Oct 2023I will give a talk at RLChina. 💬
- Sep 2023Two papers were accepted to NeurIPS 2023. 🎉
- Oct 2022I started my PhD journey at TU/e. 🌱
🧑💻 Internship
- Apr 2026 – present Max Planck Institute for Intelligent Systems — visiting student, supervised by Dr. Shiwei Liu.
- Mar – Oct 2024 Microsoft — research intern, mentored by Dr. Lu Wang.
📚 Service & activities
- Reviewer: TMLR, IEEE Transactions on Artificial Intelligence, NeurIPS, ICML, ICLR, ACL, AAAI, AISTATS, AAMAS.
- Tutorial: Reward Modeling in Large Language Models: Principles, Methods, and Challenges (CPAL 2026). Slides
- Teaching assistant: Generative AI in OxML 2024; 2IIG0 Data Mining and Machine Learning (2025).
- Supervised MSc theses:
- Olivier T. Schipper (Apr 2025), PillagerBench: a benchmark and framework for competitive multi-agent Minecraft environments, published in IEEE CoG.
- Niels P.G.T. van Beuningen (Jul 2025), HearthGym: A Gymnasium Benchmark for Advanced Hearthstone AI Research.
- Dirk Michielsen (Feb 2026), HearthstoneGUI: GUI Agent for Hearthstone.
- Lan Xie (ongoing).
- Leadership: Vice President, Student Union, School of Control Science and Engineering, Shandong University (2018); Captain (Deputy Head), “Lianxin” Volunteer Teaching Program, Shandong University (2018); Class Monitor, Automation Class 1 (Cohort 2015), Shandong University (2015–2019).
🌟 Awards
- Travel awards: NeurIPS 2023, ICLR 2025.
- Honors: Outstanding Graduate of Shandong Province (2019).
- Competitions: 2nd Prize, Chinese Graduate Mathematical Modeling Competition (2019); 1st Prize, National Electronic Design Competition, Shandong Province (2017); Champion, International Aquatic Robot Competition (2018, 2019).
- Scholarships: First-Class Scholarship (2017–2021); Outstanding Student Special Scholarship (2019, top 2%), etc.
💻 Programming skills
- Languages: Python, C/C++, Bash.
- ML / LLM tooling: PyTorch, TensorFlow/Keras, TRL, Verl, MS-Swift, PEFT/LoRA, vLLM.
- NLP / RL algorithms: PPO/GRPO, DPO, RLOO, A3C, SAC, DDPG.
- Systems & robotics: Linux, Git, Docker, Gym/Gymnasium, ROS, NVIDIA Jetson Xavier.
- Compute & platforms: Snellius, Slurm, H100/A100/V100/RTX 4090/2080 Ti.
