Great project. We've found that the VNC Universe environments are hard for today's RL algorithms primarily due to the their async nature. We're currently working on a new set of Universe environments without VNC; I'm very happy to see others inspired by the core ideas of Universe as well.
I took a lot of inspiration from Universe and am grateful for OpenAI's work on RL in general :). I probably wouldn't have started on this project if a company like OpenAI hadn't already decided it was a worthy goal.
The way I see it, having hooks into the engines themselves helps with what the article talks about - not needing to go through VNCs or other _glue_ to get realtime data. It could potentially send the framebuffers themselves directly from the game/simulation and tie in the actions back to the game/simulation. And using framebuffers is just one direction, we could instead stream the co-ords/the current payoff/etc.
Also, having such plugins would help with the adoption in both directions - games now have an always updating/learning AI (might need a network connection + cloud backend), and researchers can have training/testing environments.
Source: Am an AI research scientist.
However I think this approach is bad. Machine vision is a separate problem from reinforcement learning. You shouldn't need to be able to do both well. Machine vision consumes a ton of processing power and researcher time in figuring out the hyperparameters. And all it's doing is figuring out information that's already in memory like the location of various objects and the score. It really limits what can be done. E.g. the famous atari playing AIs by deepmind were limited to no memory and only knowing the last few frames, because backpropagating through thousands of frames was too expensive.
Because of the way NNs work, it's trivial to separate out the machine vision into a separate module. So if you have a good RNN reinforcement learning system, you can easily add a machine vision learning system to it later if you need.
Despite the flaws, the nice thing with VNC is its universality to support any apps on a computer. Using HTML5 in a browser limits the scope of things we could encapsulate as environments, and makes it less "universe".
However, there is a difference between the universality of the tech stack and the exposed interface. In my opinion, the future universe would be rich clusters of RL environments with unified API, each of which implemented using different underlying technology to meet the desired synchronicity and frame performance.
HTML5 could deliver one of such clusters.
Here's an example of a guy who made a general game playing algorithm that brute forces it's way through any NES game: https://www.youtube.com/watch?v=xOCurBYI_gY This isn't necessarily interesting from an AI perspective - the playing algorithm is just brute force. But it shows what can be done with the platform, easily reloading to previous states and exploring counterfactual futures (which is exactly the sort of thing RL algorithms do.) He also has a cool algorithm for finding the objective function of an arbitrary game, by watching a human play, and seeing what memory addresses increment. Which is a lot more easy to use than writing OCR code to read the score and game over states from the screen.
That's a bit premature for a project that was just released less than 7 months ago, isn't it?
https://blog.openai.com/universe/
Edit: that said the project seems to have some interesting and needed improvements (esp time adjustment). Glad to see dialog between muniverse and openai here.
https://github.com/namuol/muniverse
If I had more time I'd submit a PR to integrate it...