Hacking Flappy Bird with Machine Learning (opens in new tab)

(sarvagyavaish.github.io)

243 pointssarvagyavaish12y ago51 comments

51 comments

48 comments · 22 top-level

habosa12y ago· 14 in thread

I really want to build a Lego Mindstorms robot (or similar) that watches my phone with a webcam, holds a stylus, and plays Flappy Bird. I have played enough (high score 194) that I am pretty sure I know the right strategy, I just want to take human error out of the equation.

Are there open source vision libraries that are low-latency enough to play this in real time? Assuming the robot could keep up with some regular servos. Would be a very interesting project.

Edit: I'm thinking I could run an Android emulator, and then use MonkeyRunner with Python OpenCV processing my monitor to do the work. Anyone who has any relevant experience please reply, I'd love to hear if this is feasible.

iqster12y ago

Someone made a robot that played Angry Birds: https://us.pycon.org/2012/schedule/presentation/470/

I was wondering if this would work for Flappy birds. Due to the speed of the game, I think it is going to be quite a challenge.

P.S. If you are interested in the mindstorms robot part, I'll put a shameless plug for a robot I built some time ago (based on some work by David Singleton). Code is on github but here is a pycon video: http://pyvideo.org/video/1195/self-driving-lego-mindstorms-r...

hugs12y ago

Ohai! (I presented that Angry Birds bot at PyCon.) My bot is now called "Tapster" and I continue to make improvements. It's completely open-source; I'd love to expand the community of game-playing bot fans. (http://tapsterbot.com) I even created a 3d-printable LEGO Technic-compatible building material to make the bot. (http://bitbeam.org)

I'm working on training Tapster to play Flappy Bird. I'm going down the route of webcam + OpenCV.

1 more reply

grogenaut12y ago

yes, because if there's one thing that computers are known for... it's being slow.

nanidin12y ago

If you're going to go as far as running it in an emulator... then why not just write something that works in process and avoids the whole visual feedback / physical input loop altogether? It's how bots are written for MMO's like WOW.

habosa12y ago

I'd like to do it without altering the app, which makes it more like the robot is playing the game not cheating it. This limits me to processing the game's output in real time.

olalonde12y ago

What do you mean strategy? I only played on http://flapmmo.com but it seemed like it was only a game of reflex (I got to level 12).

habosa12y ago

I mean as far as when I should tap based on height and distance to the next obstacle. I try to be consistent but obviously a robot would be better.

Shuai12y ago

http://www.cloudteastudio.com/bird.html Another group already done it. check it out

logicallee12y ago

holy shit you sound like a crack addict

(only with flappy bird) - they say it's addictive but you're taking this to another level. "Man I need to do a Lego Mindstorms robot with a webcam and stylus and OpenCV - I've gotten to 194 but I NEED more. I need to take the human element out of this equation...."

leoc12y ago

A worrying amount of serious CS effort was put into the problem of Sudoku solving after it became a big hit.

1 more reply

Houshalter12y ago

Hey, making AIs play games is fun.

pseut12y ago

You must be unfamiliar with crack.

logicallee12y ago

It's a joke guys. I left it at +4 if I saw the downvotes I would have deleted it.

sizzle12y ago

umm no

potash12y ago· 3 in thread

How hard would it be to solve this deterministically?

Can someone comment on the game's physics? I am assuming:

  - constant horizontal velocity
  - gravity
  - flapping implemented via impulse
  - collisions are handled with bounding boxes

Maybe someone who knows something about optimal control (ODE) say whether this is analytically solvable? Of course there's still the practical stuff (numerical integration, I/O lag) to deal with but I'm optimistic.

kenferry12y ago

If you play the game, it's actually quite trivial to program deterministically. :-) Flapping is not implemented as an impulse. It's more like jumping in Mario - it instantaneously fires a canned "flap" movement that is always the same.

Set flap_height = bottom of next pipe + constant.

If bird height < flap_height, flap.

Done!

userbinator12y ago

It also looks like you'd want to avoid flapping into the top of the pipe.

1 more reply

tripzilch12y ago

In another Flappy Bird thread (the MMO one), somebody posted this formula:

    Let distance between ground and top of screen=100. 
    Bird position after a flap at y=y0, time=t0:
    y(t) = y0 + 13.8 - 147 * (t - t0 - 0.288)^2

(https://news.ycombinator.com/item?id=7229018 -- note the original comment had "+ 0.288", but if you plot the graph this was obviously a mistake)

I tried it out in my own HTML5/JS Flappy clone (mine's not at all interesting or even done, but I felt I had to give it a try), and the movement seems really accurate.

I have no idea how they got to those numbers, however.

primaryobjects12y ago· 3 in thread

Neat. You could also probably do this with a genetic algorithm. Thinking out loud here:

- Have a neural network with 6 inputs for:

   - player x
   - player y
   - acceleration speed
   - acceleration direction
   - next pipe mid-point x
   - next pipe mid-point y

- Two outputs of 0 or 1 for click or no-click

The fitness score would be how close the player is to the pipe mid-point. Hopefully, this would cause the bird to stay as close as possible to fly between the pipes. The genetic algorithm would select the best neural network that knows when to flap based on the current input state.

radarsat112y ago

I'm curious, I've thought about using genetic algorithms for training neural nets before, but haven't seen it mentioned much. I imagine PSO could work as well. What is the general concensus on using algorithms other than back propagation for neural net learning? When is it appropriate / not appropriate?

primaryobjects12y ago

Some people frown upon training neural networks with genetic algorithms, just because gen algs are so random and hard to dissect. I think it's silly to discard anything that is a potential solution to a problem.

I go by a rule of thumb like this:

- If you are able to collect lots of training data (inputs and valid outputs) then use back-propagation. It's faster and you might get better results.

- If you don't know the outputs for inputs or if there are simply too many possible combinations (such as in the case of a game), then use a genetic algorithm. It's effectively a search engine that finds the best solution within the problem space (the solution being the optimal weights for the neural network).

Using Neural Networks and Genetic Algorithms http://primaryobjects.com/CMS/Article105.aspx

snkz12y ago

is being in the middle of the pipes the ideal position most of the time?

mrtksn12y ago· 3 in thread

I was waiting for something like this, but I expected somebody to build a flappy bird playing robot :)

Do you know some good literature for machine learning 101? Where to start?

ahmedlogic12y ago

There's a great Machine Learning course up on Coursera from Stanford: https://www.coursera.org/course/ml

MrBuddyCasino12y ago

So was I. I remember a posting about deep learning on HN, they managed to let the algorithm play a NES game purely by analyzing the video. So who is up for the challenge?

Houshalter12y ago

I would try but how can you get input and output from the flappy bird game?

brandonb12y ago· 2 in thread

Very cool! It's remarkable that you can get a good score with effectively only two features: vertical distance from the lower pipe and horizontal distance from the next pair of pipes.

I suspect you could cut the training time by an order of magnitude with a different optimization algorithm, or just by varying alpha. For example, have you considered trying something like a line search? http://en.wikipedia.org/wiki/Line_search

tlarkworthy12y ago

reenforcement learning is not an optimization algorithm, and his times are consistent with an off the shelf Q-learning approach.

reenforcement learning is trying to find the optimal policy of action from a given position in the discrete state space.

The policy is roughly a map between state -> action. But it understands the temporal nature of the world.

It will start knowing nothing, then discover hitting a pipe is really bad, therefore states close to nearly hitting a pipe are almost as bad. Over tons of iterations the expected negative value of hitting a pipe bleeds across the expected value of the state space.

With regression and optimization, its never clear what you are optimizing against. Obviously hitting a pipe is bad. But what about the states near a pipe? Are they good or bad? There is no natural metric in the problem that tells you the distance from the pipe or what the correction action should be.

So that's the major different between reenforcment and supervised learning.

avaku12y ago

You could set up a cost function to account for closeness to being hit (a.k.a. decision theory approach)

kookiekrak12y ago· 1 in thread

can you build one for this? http://kookiekrak.itch.io/flappy-pipes

isaacb12y ago

</shameless-plug>

dwetterau12y ago

Neat AI approach to play it. When this first blew up 5 or so days ago I threw together a simple javascript bookmarklet that played the game at: http://www.mrspeaker.net/dev/game/flappy/

It's really poorly formatted and has unused code in it but it was fun to automate and watch. http://pastebin.com/yTmdWgfC

To use it just install the bookmarklet, click on it and then start the game. Press space to make it die.

Blahah12y ago

By far the most interesting 'flappy bird' post so far.

fogleman12y ago

I'd like to see a two-dimensional chart that plots Click vs Do Nothing for the two input parameters. (Vertical distance from lower pipe and Horizontal distance from next pair of pipes)

1 more reply

pilooch12y ago

Reinforcement learning (RL) is the future of real-world AI. Bear my words ;) RL has been around for a long time, it is the mix of optimal (in the optimization sense) decision making along with machine learning (ML). It does benefit of most recent advances in ML. As such it is likely to power the next batch of 'intelligent' applications and services.

fogleman12y ago

The GIF shown on this page makes it look like the input space could simply be pipe height and bird height.

https://github.com/zachhuff386/flapbot

ronaldx12y ago

The video seems jerky in a way which suggests the engine is taking longer to make decisions at certain points in the cycle.

Is that plausible? Or just my imagination?

bjd12y ago

How funny is it that I find this post by searching for monkeyrunner and opencv? When I was looking into this for the same silly reason.

shurcooL12y ago

This reminds me (vaguely) of the AI I made for a worm-in-tunnel game long ago.

https://dl.dropboxusercontent.com/u/8554242/available-for-2-...

(It's there in the background of the main menu.)

bamos12y ago

Cool post! Minor typo: Monekyrunner -> monkeyrunner

judk12y ago

If you learn that all actions `a` from state `s_i` have very low reward, does that propagatet backward to `s_j,a` that feed into `s_i`?

FlyingLawnmower12y ago

Beating the unbeatable with science. I like it.

nathan_f7712y ago

Had the same idea when I first played the game, but didn't do anything about it. Well done for doing it!

vinchuco12y ago

good job! now you have to do it with http://corpsmoderne.itch.io/flappy-space-program and deal with the complications

geek9012y ago

Awesome work!

kevonc12y ago

impressive work man!

dpanah12y ago

Tre sweet

j / k navigate · click thread line to collapse

51 comments

48 comments · 22 top-level

habosa12y ago· 14 in thread

Are there open source vision libraries that are low-latency enough to play this in real time? Assuming the robot could keep up with some regular servos. Would be a very interesting project.

iqster12y ago

Someone made a robot that played Angry Birds: https://us.pycon.org/2012/schedule/presentation/470/

I was wondering if this would work for Flappy birds. Due to the speed of the game, I think it is going to be quite a challenge.

hugs12y ago

I'm working on training Tapster to play Flappy Bird. I'm going down the route of webcam + OpenCV.

1 more reply

grogenaut12y ago

yes, because if there's one thing that computers are known for... it's being slow.

nanidin12y ago

habosa12y ago

I'd like to do it without altering the app, which makes it more like the robot is playing the game not cheating it. This limits me to processing the game's output in real time.

olalonde12y ago

What do you mean strategy? I only played on http://flapmmo.com but it seemed like it was only a game of reflex (I got to level 12).

habosa12y ago

I mean as far as when I should tap based on height and distance to the next obstacle. I try to be consistent but obviously a robot would be better.

Shuai12y ago

http://www.cloudteastudio.com/bird.html Another group already done it. check it out

logicallee12y ago

holy shit you sound like a crack addict

leoc12y ago

A worrying amount of serious CS effort was put into the problem of Sudoku solving after it became a big hit.

1 more reply

Houshalter12y ago

Hey, making AIs play games is fun.

pseut12y ago

You must be unfamiliar with crack.

logicallee12y ago

It's a joke guys. I left it at +4 if I saw the downvotes I would have deleted it.

sizzle12y ago

umm no

potash12y ago· 3 in thread

How hard would it be to solve this deterministically?

Can someone comment on the game's physics? I am assuming:

  - constant horizontal velocity
  - gravity
  - flapping implemented via impulse
  - collisions are handled with bounding boxes

kenferry12y ago

Set flap_height = bottom of next pipe + constant.

If bird height < flap_height, flap.

Done!

userbinator12y ago

It also looks like you'd want to avoid flapping into the top of the pipe.

1 more reply

tripzilch12y ago

In another Flappy Bird thread (the MMO one), somebody posted this formula:

    Let distance between ground and top of screen=100. 
    Bird position after a flap at y=y0, time=t0:
    y(t) = y0 + 13.8 - 147 * (t - t0 - 0.288)^2

(https://news.ycombinator.com/item?id=7229018 -- note the original comment had "+ 0.288", but if you plot the graph this was obviously a mistake)

I tried it out in my own HTML5/JS Flappy clone (mine's not at all interesting or even done, but I felt I had to give it a try), and the movement seems really accurate.

I have no idea how they got to those numbers, however.

primaryobjects12y ago· 3 in thread

Neat. You could also probably do this with a genetic algorithm. Thinking out loud here:

- Have a neural network with 6 inputs for:

   - player x
   - player y
   - acceleration speed
   - acceleration direction
   - next pipe mid-point x
   - next pipe mid-point y

- Two outputs of 0 or 1 for click or no-click

radarsat112y ago

primaryobjects12y ago

I go by a rule of thumb like this:

- If you are able to collect lots of training data (inputs and valid outputs) then use back-propagation. It's faster and you might get better results.

Using Neural Networks and Genetic Algorithms http://primaryobjects.com/CMS/Article105.aspx

snkz12y ago

is being in the middle of the pipes the ideal position most of the time?

mrtksn12y ago· 3 in thread

I was waiting for something like this, but I expected somebody to build a flappy bird playing robot :)

Do you know some good literature for machine learning 101? Where to start?

ahmedlogic12y ago

There's a great Machine Learning course up on Coursera from Stanford: https://www.coursera.org/course/ml

MrBuddyCasino12y ago

So was I. I remember a posting about deep learning on HN, they managed to let the algorithm play a NES game purely by analyzing the video. So who is up for the challenge?

Houshalter12y ago

I would try but how can you get input and output from the flappy bird game?

brandonb12y ago· 2 in thread

Very cool! It's remarkable that you can get a good score with effectively only two features: vertical distance from the lower pipe and horizontal distance from the next pair of pipes.

tlarkworthy12y ago

reenforcement learning is not an optimization algorithm, and his times are consistent with an off the shelf Q-learning approach.

reenforcement learning is trying to find the optimal policy of action from a given position in the discrete state space.

The policy is roughly a map between state -> action. But it understands the temporal nature of the world.

So that's the major different between reenforcment and supervised learning.

avaku12y ago

You could set up a cost function to account for closeness to being hit (a.k.a. decision theory approach)

kookiekrak12y ago· 1 in thread

can you build one for this? http://kookiekrak.itch.io/flappy-pipes

isaacb12y ago

</shameless-plug>

dwetterau12y ago

Neat AI approach to play it. When this first blew up 5 or so days ago I threw together a simple javascript bookmarklet that played the game at: http://www.mrspeaker.net/dev/game/flappy/

It's really poorly formatted and has unused code in it but it was fun to automate and watch. http://pastebin.com/yTmdWgfC

To use it just install the bookmarklet, click on it and then start the game. Press space to make it die.

Blahah12y ago

By far the most interesting 'flappy bird' post so far.

fogleman12y ago

I'd like to see a two-dimensional chart that plots Click vs Do Nothing for the two input parameters. (Vertical distance from lower pipe and Horizontal distance from next pair of pipes)

1 more reply

pilooch12y ago

fogleman12y ago

The GIF shown on this page makes it look like the input space could simply be pipe height and bird height.

https://github.com/zachhuff386/flapbot

ronaldx12y ago

The video seems jerky in a way which suggests the engine is taking longer to make decisions at certain points in the cycle.

Is that plausible? Or just my imagination?

bjd12y ago

How funny is it that I find this post by searching for monkeyrunner and opencv? When I was looking into this for the same silly reason.

shurcooL12y ago

This reminds me (vaguely) of the AI I made for a worm-in-tunnel game long ago.

https://dl.dropboxusercontent.com/u/8554242/available-for-2-...

(It's there in the background of the main menu.)

bamos12y ago

Cool post! Minor typo: Monekyrunner -> monkeyrunner

judk12y ago

If you learn that all actions `a` from state `s_i` have very low reward, does that propagatet backward to `s_j,a` that feed into `s_i`?

FlyingLawnmower12y ago

Beating the unbeatable with science. I like it.

nathan_f7712y ago

Had the same idea when I first played the game, but didn't do anything about it. Well done for doing it!

vinchuco12y ago

good job! now you have to do it with http://corpsmoderne.itch.io/flappy-space-program and deal with the complications

geek9012y ago

Awesome work!

kevonc12y ago

impressive work man!

dpanah12y ago

Tre sweet

j / k navigate · click thread line to collapse