EPIC: PAC-Bayes Bounds for Lifelong Reinforcement Learning

Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayes Theory

Zhi Zhang¹, Chris Chow², Yasi Zhang¹, Yanchao Sun³, Haochen Zhang¹, Eric Hanchen Jiang¹, Han Liu⁴, Furong Huang⁵, Yuchen Cui¹, Oscar Hernan Madrid Padilla¹

¹University of California, Los Angeles ²Work completed while at Niantic Labs ³Apple Inc. ⁴Northwestern University ⁵Univeristy of Maryland, College Park

Abstract

We present a method that provides robust convergence guarantees for reinforcement learning algorithms in the lifelong learning setting.

PAC-Bayes theory is employed to derive loss bounds on RL models parameterized as shared "world policies" that are periodically updated and sampled from during each distinct task encountered in the agent's lifetime.

We provide an expression for sample complexity in terms of RL regret and develop a relationship between the algorithm's generalization performance and the number of tasks stored in memory.

Algorithm

Initialize the policy prior distribution.

Draw a policy from the policy prior.

For the lifetime of the agent:

Receive a new task.
If time to update:
1. Roll out trajectories.
2. Update the default policy.
Otherwise, act according to the current policy in this task.

Bound

\begin{aligned} \require{mathtools} &\text{expected loss} \le \text{training error} + \mathscr R(\mathbb{D}_{KL}(P \| \underline{P})), \\ \text{with} \\ & \mathscr R(\mathbb{D}_{KL}(P \| \underline{P}))\\ & \coloneqq \frac{2N^{1/2}H \frac{\lambda r}{1-\alpha} \sqrt{ \frac{1 - \alpha^{2(K/N-1)}}{s_{\min}(1-\alpha^2)} }}{K^{1/2}} + \frac{2N^{1/2}H }{K^{(1-\gamma)/2}}. \end{aligned} Given some policy \(P\), a prior on the policy \(\underline{P}\), \(N\) tasks each of horizon \(H\) stored, and \(K\) observed tasks with learning rate \(\alpha\), the above bound holds with probability at least \(1 - 2\exp(-K^\gamma)\) for any \(0 < \gamma < 1\).

BibTeX

@article{park2021nerfies, @article{zhang2024statistical, title={Statistical guarantees for lifelong reinforcement learning using pac-bayesian theory}, author={Zhang, Zhi and Chow, Chris and Zhang, Yasi and Sun, Yanchao and Zhang, Haochen and Jiang, Eric Hanchen and Liu, Han and Huang, Furong and Cui, Yuchen and Padilla, Oscar Hernan Madrid}, journal={arXiv preprint arXiv:2411.00401}, year={2024} } }

Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayes Theory

Abstract

Video

Algorithm

Bound

Results

BibTeX