About Me
I am a fifth-year PhD candidate at the University of Alberta, supervised by Dr. A. Rupam Mahmood and Dr. Dale Schuurmans.
I’m currently developing verifiable coding agents that can formally guarantee alignment between agent behaviour and user intentions. This trend of work targets a fundamental challenge in AI safety: ensuring that autonomous systems execute tasks transparently, without unexpected or unverified behaviours.
My technical approach involves two key areas: latent reasoning mechanisms in diffusion models and exploration using hierarchical reinforcement learning methods. I’d be excited to discuss how latent reasoning or structured exploration might advance AI research.
I used to work on effectively using data for reinforcement learning. I design and analyze algorithms that leverage datasets experiencing distribution shifts and investigate how data should be collected. My research philosophy centers on understanding empirical problems from a theoretical perspective and guiding algorithm design with theoretical insights.
News
2025.02: Our tutorial “Advancing Offline Reinforcement Learning: Essential Theories and Techniques for Algorithm Developers” is accepted at AAAI 2025, Philadelphia, PA, hosted by Fengdi Che, Chenjun Xiao, Ming Yin, and Csaba Szepesvári!
2024.07: Our paper “Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation” is presented at ICML 24 as a spotlight paper with only a 3.5% acceptance rate.
Publications
Veri-Code Team (Role in The Team: Equal Contribution First Author, and Project Lead). Re: Form–Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny. arXiv preprint arXiv:2507.16331. 2025.
Fengdi Che, Bryan Chan, Chen Ma, A Rupam Mahmood. AVG-DICE: Stationary Distribution Correction by Regression. RLC 2025.
Fengdi Che, Ming Yin, Chenjun Xiao, Csaba Szepesvári. A Tutorial: An Intuitive Explanation of Offline Reinforcement Learning Theory. arXiv preprint arXiv:2508.07746. Presented at AAAI 2025.
Fengdi Che, Chenjun Xiao, Jincheng Mei, Bo Dai, Ramki Gummadi, Oscar A Ramirez, Christopher K Harris, A Rupam Mahmood, Dale Schuurmans. Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation. ICML 2024 (spotlight) (Acceptance Rate 3.5%).
Fengdi Che, Gautham Vasan, Rupam Mahmood. Correcting Discount-Factor Mismatch in On-Policy Policy Gradient Methods. ICML 2023 (Acceptance Rate 27.9%).
Jiamin He, Fengdi Che, Yi Wan, Rupam Mahmood. Consistent Emphatic Temporal-Difference Learning. UAI 2023 (Acceptance Rate 31.2%).
Fengdi Che, Xiru Zhu, Doina Precup, David Meger, Gregory Dudek. Bayesian Q-learning with Imperfect Expert Demonstrations. 3rd Offline RL Workshop. 2022.