I am a final-year Ph.D. student at Eindhoven University of Technology, under the supervision of Mykola Pechenizkiy and Meng Fang. I am also very fortunate to work closely with Prof. Yali Du at King’s College London, and Prof. Biwei Huang at University of California San Diego. I am currently a visiting student at the Max Planck Institute for Intelligent Systems, where I work on the intersection of reinforcement learning (RL) and large language models (LLMs), under the supervision of Dr. Shiwei Liu. I was an intern at Microsoft, mentored by Dr. Lu Wang. I obtained my Master's and Bachelor's degrees at Shandong University (SDU), supervised by Prof. Wei Zhang.

My research focuses on reinforcement learning, especially RL for LLMs and causal RL.

I am currently on the job market and actively looking for collaboration and visiting opportunities. If you are interested, feel free to contact me. Email

✨ News

  • Apr 2026Start a new journey at the Max Planck Institute for Intelligent Systems!
  • Feb 2026One paper was accepted by AAMAS 2026.
  • Feb 2026I’m co-organizing a tutorial on reward modeling for LLMs at CPAL — see you in Tübingen! Slides
  • Oct 2025One paper was accepted to NeurIPS 2025 as spotlight. ✨
  • May 2025I will give a tutorial at the OxML Summer School. 🧑‍🏫
  • May 2025Two papers were accepted by TMLR.
  • Jan 2025RuAG was accepted to ICLR 2025. 🚀
  • Oct 2024MACCA and CAST were accepted to the NeurIPS 2024 CRL Workshop.
  • Oct 2024I will give a talk at the Women in AI & Robotics Reading Group.
  • Dec 2023One paper was accepted to AAAI 2024.
  • Oct 2023I will give a talk at RLChina. 💬
  • Sep 2023Two papers were accepted to NeurIPS 2023. 🎉
  • Oct 2022I started my PhD journey at TU/e. 🌱

🧑‍💻 Internship

  • Apr 2026 – present Max Planck Institute for Intelligent Systems — visiting student, supervised by Dr. Shiwei Liu.
  • Mar – Oct 2024 Microsoft — research intern, mentored by Dr. Lu Wang.

📚 Service & activities

  • Reviewer: TMLR, IEEE Transactions on Artificial Intelligence, NeurIPS, ICML, ICLR, ACL, AAAI, AISTATS, AAMAS.
  • Tutorial: Reward Modeling in Large Language Models: Principles, Methods, and Challenges (CPAL 2026). Slides
  • Teaching assistant: Generative AI in OxML 2024; 2IIG0 Data Mining and Machine Learning (2025).
  • Supervised MSc theses:
  • Leadership: Vice President, Student Union, School of Control Science and Engineering, Shandong University (2018); Captain (Deputy Head), “Lianxin” Volunteer Teaching Program, Shandong University (2018); Class Monitor, Automation Class 1 (Cohort 2015), Shandong University (2015–2019).

🌟 Awards

  • Travel awards: NeurIPS 2023, ICLR 2025.
  • Honors: Outstanding Graduate of Shandong Province (2019).
  • Competitions: 2nd Prize, Chinese Graduate Mathematical Modeling Competition (2019); 1st Prize, National Electronic Design Competition, Shandong Province (2017); Champion, International Aquatic Robot Competition (2018, 2019).
  • Scholarships: First-Class Scholarship (2017–2021); Outstanding Student Special Scholarship (2019, top 2%), etc.

💻 Programming skills

  • Languages: Python, C/C++, Bash.
  • ML / LLM tooling: PyTorch, TensorFlow/Keras, TRL, Verl, MS-Swift, PEFT/LoRA, vLLM.
  • NLP / RL algorithms: PPO/GRPO, DPO, RLOO, A3C, SAC, DDPG.
  • Systems & robotics: Linux, Git, Docker, Gym/Gymnasium, ROS, NVIDIA Jetson Xavier.
  • Compute & platforms: Snellius, Slurm, H100/A100/V100/RTX 4090/2080 Ti.