I don't think you can prove that (forgive me if I don't sit through a 2h video). Those all are susceptible to the deadly triad, and AFAIK there are no convergence proofs of any kind for the big model-free DL algs, and it would've been big news if someone had proved that a real-world version of PPO/DQN/IMPALA does in fact converge in the limit. Sutton's book and earlier proofs only cover cases where you drop the nonlinear approximator or something.
(History stacking may turn POMDPs into MDPs, but I don't know if they handle the specially adversarial nature of games like poker. That's quite different from stacking ALE frames.)