Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayes Theory

Zhi Zhang1, Chris Chow2, Yasi Zhang1, Yanchao Sun3, Haochen Zhang1, Eric Hanchen Jiang1, Han Liu4, Furong Huang5, Yuchen Cui1, Oscar Hernan Madrid Padilla1
1University of California, Los Angeles 2Work completed while at Niantic Labs 3Apple Inc. 4Northwestern University 5Univeristy of Maryland, College Park

EPIC - Empirical PAC-Bayes that Improves Continuously

Abstract

We present a method that provides robust convergence guarantees for reinforcement learning algorithms in the lifelong learning setting.

PAC-Bayes theory is employed to derive loss bounds on RL models parameterized as shared "world policies" that are periodically updated and sampled from during each distinct task encountered in the agent's lifetime.

We provide an expression for sample complexity in terms of RL regret and develop a relationship between the algorithm's generalization performance and the number of tasks stored in memory.

Algorithm

  1. Initialize the policy prior distribution.
  2. Draw a policy from the policy prior.
  3. For the lifetime of the agent:
    1. Receive a new task.
    2. If time to update:
      1. Roll out trajectories.
      2. Update the default policy.
      Otherwise, act according to the current policy in this task.

Bound

\begin{aligned} \require{mathtools} &\text{expected loss} \le \text{training error} + \mathscr R(\mathbb{D}_{KL}(P \| \underline{P})), \\ \text{with} \\ & \mathscr R(\mathbb{D}_{KL}(P \| \underline{P}))\\ & \coloneqq \frac{2N^{1/2}H \frac{\lambda r}{1-\alpha} \sqrt{ \frac{1 - \alpha^{2(K/N-1)}}{s_{\min}(1-\alpha^2)} }}{K^{1/2}} + \frac{2N^{1/2}H }{K^{(1-\gamma)/2}}. \end{aligned} Given some policy \(P\), a prior on the policy \(\underline{P}\), \(N\) tasks each of horizon \(H\) stored, and \(K\) observed tasks with learning rate \(\alpha\), the above bound holds with probability at least \(1 - 2\exp(-K^\gamma)\) for any \(0 < \gamma < 1\).


Results

Results

BibTeX

@article{park2021nerfies,
      @article{zhang2024statistical,
        title={Statistical guarantees for lifelong reinforcement learning using pac-bayesian theory},
        author={Zhang, Zhi and Chow, Chris and Zhang, Yasi and Sun, Yanchao and Zhang, Haochen and Jiang, Eric Hanchen and Liu, Han and Huang, Furong and Cui, Yuchen and Padilla, Oscar Hernan Madrid},
        journal={arXiv preprint arXiv:2411.00401},
        year={2024}
      }
}