BTW, Bellman actually coined the term curse of dimensionality [1]; got that confused with combinatorial explosion since it is a synonyms in the contexts I typically encounter it [2].
[1]: https://en.wikipedia.org/wiki/Curse_of_dimensionality
[2]: https://en.wikipedia.org/wiki/Combinatorial_explosion
OpenAI has a pretty good introduction to the Bellman equations in their Spinning Up in RL lessons [3]. Sutton's work in Reinforcement Learning also talks about Bellman's work quite a bit. Though Bellman was actually studying what he called dynamic programming problems his work is now considered foundational in reinforcement learning.
[3]: https://spinningup.openai.com/en/latest/
Uh, and for the dual mode observations the person that brought that to my attention was Noam Brown, not Bellman or Norvig. If you haven't already checked out his work, I recommend it above both Norvig and Bellman. He has some great talks on Youtube and I consider it a shame they aren't more widely viewed [4].
[4]: https://www.youtube.com/watch?v=cn8Sld4xQjg