Kefan Dong

About Me

I am a researcher at OpenAI, working on reinforcement learning. Previously, I completed my PhD in Computer Science at Stanford University, where I was extremely fortunate to be advised by Prof. Tengyu Ma. Before that, I studied as an undergraduate at Institute for Interdisciplinary Information Sciences, Tsinghua University (a.k.a. Yao class). I am broadly interested in LLM reasoning, reinforcement learning theory, online learning, and algorithm design.

Publications and Preprints

STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
Kefan Dong, Tengyu Ma
ICML 2025

Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically
Kefan Dong, Arvind Mahankali, Tengyu Ma
The 4th Workshop on Mathematical Reasoning and AI, NeurIPS 2024

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time
Arvind Mahankali*, Jeff Z. Haochen*, Kefan Dong, Margalit Glasgow, Tengyu Ma
NeurIPS 2023

Toward $L_\infty$-recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields
Kefan Dong, Tengyu Ma
COLT 2023

First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains
Kefan Dong, Tengyu Ma
ICLR 2023
(spotlight) Workshop on Distribution Shifts (DistShift), NeurIPS 2022

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making (video, slides)
Kefan Dong, Tengyu Ma
ICLR 2023

Model-based Offline Reinforcement Learning with Local Misspecification (slides)
Kefan Dong*, Yannis Flet-Berliac*, Allen Nie*, Emma Brunskill
AAAI 2023

Design of Experiments for Stochastic Contextual Linear Bandits
Andrea Zanette*, Kefan Dong*, Jonathan Lee*, Emma Brunskill
NeurIPS 2021

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature (video, slides)
Kefan Dong, Jiaqi Yang, Tengyu Ma
NeurIPS 2021

Refined Analysis of FPL for Adversarial Markov Decision Processes
Yuanhao Wang, Kefan Dong
Theoretical Foundations of Reinforcement Learning Workshop, ICML 2020

Multinomial Logit Bandit with Low Switching Cost
Kefan Dong*, Yingkai Li*, Qin Zhang, Yuan Zhou
ICML 2020

On the Expressivity of Neural Networks for Deep Reinforcement Learning
Kefan Dong*, Yuping Luo*, Tengyu Ma
ICML 2020

Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank
Kefan Dong*, Jian Peng*, Yining Wang*, Yuan Zhou*
COLT 2020

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP
Kefan Dong*, Yuanhao Wang*, Xiaoyu Chen, Liwei Wang
ICLR 2020

Exploration via Hindsight Goal Generation
Zhizhou Ren, Kefan Dong, Yuan Zhou, Qiang Liu, Jian Peng
NeurIPS 2019