Bound
\begin{aligned} \require{mathtools} &\text{expected loss} \le \text{training error} + \mathscr R(\mathbb{D}_{KL}(P \| \underline{P})), \\ \text{with} \\ & \mathscr R(\mathbb{D}_{KL}(P \| \underline{P}))\\ & \coloneqq \frac{2N^{1/2}H \frac{\lambda r}{1-\alpha} \sqrt{ \frac{1 - \alpha^{2(K/N-1)}}{s_{\min}(1-\alpha^2)} }}{K^{1/2}} + \frac{2N^{1/2}H }{K^{(1-\gamma)/2}}. \end{aligned} Given some policy \(P\), a prior on the policy \(\underline{P}\), \(N\) tasks each of horizon \(H\) stored, and \(K\) observed tasks with learning rate \(\alpha\), the above bound holds with probability at least \(1 - 2\exp(-K^\gamma)\) for any \(0 < \gamma < 1\).