The Game That Learns

The Game That Learns


Vsauce! Kevin here, and I’ve built a computer capable
of explaining how you get smarter. Out of these matchboxes, some colorful beads
and… Shrek. Real quick, before we get into the game we’re
about to play, I wanna tell you about the game that I’ve been playing. I partnered with Raid: Shadow Legends on this
video and if you follow me on Instagram you know that I’m a huge fan of RPGs. Well, Raid is the most immersive champion-collecting
experience you’ll get on a smartphone. It has a deep story, detailed graphics, giant
boss fights, and hundreds of champions to collect and customize. And you can play it free. So to check it out, download Raid using only
my link in the description to get 50,000 silver immediately and a free epic champion courtesy
of the dev team. Thanks again to them for supporting Vsauce2,
go check out their game, it’s amazing how far mobile gaming has come, Now let’s
get back to the inner workings of our game. Okay. 24 matchboxes, all filled with beads, and
covered in potential moves for the game we’re about to play and… this is our computer. Now, Shrek comes along and…wait. How is THIS a computer? Aren’t computers, just like, electronic
machines that run software? What IS a computer? Well, the earliest computer was YOU. Or… your ancestors. They used calculating machines like the abacus
to input information which output a result but we were the ones computing. The human operators of early calculating machines
were literally called “computers.” Okay, back to our matchboxes. Once we introduce a game board, Shrek and
the gang, this matchbox and bead setup processes our input, gives us an output and not only
that… it also learns. This is not just a computer, this is an artificial
intelligence machine capable of matching wits with the brightest minds humanity has to offer. At a game called Hexapawn. Here’s how. Hexapawn is based on chess — each player
has 3 pawns on a board with just 9 squares. The pieces move like chess pawns, too. They can go forward one space if that space
is unoccupied, if it is occupied by the opponent then they can’t go forward. Sorry, Donkey. You can, however, move diagonally, but only
to take an opponent’s piece. There are three ways to win: Get a pawn to the other side of the board. Take all of your opponent’s pieces. Or leave your opponent without a possible
move, like a checkmate in chess. Our setup works like this: I’ve got 24 matchboxes
here, and each one corresponds to the position of pieces on the board during that round. I’ve got my Team Kevin pawns vs. the computer’s
Team Shrek. And do you know what that means? That means that we’ve officially turned
Hexapawn into: Shreksapawn. Alright. Let’s play. The human, that’s me, always goes first. Wait. Why? Because recreational mathematician Martin
Gardner said so. He actually created Hexapawn and its rules
as a simplified version of a 304-matchbox computer called MENACE. 15 years after helping the British break the
Nazis’ codes in World War II, Donald Michie invented MENACE to learn how to master Tic-Tac-Toe. And now 59 years later, I’m on YouTube playing
Shreksapawn. Since I go first, my moves occur in only the
odd-numbered rounds. 1, 3, 5 and 7. Therefore, the matchboxes are grouped by possible
Team Shrek moves in rounds 2, 4, and 6. One of us is guaranteed to win before Round
8. So, Team Shrek has no Round 8 moves. Each box contains one colored bead for each
potential move on that board position. So like this first box has a green, a blue,
and a purple. And I’ve cut a hole at the bottom of the box
that will only allow one bead at a time to fall out. So I’ll just shake the box and let one bead
out. And it’s purple. That means if my pawn was here and it was
Team Shrek’s move, Team Shrek would make the purple arrow move. Like this. If a blue bead had fallen out, then Team Shrek
would’ve made the blue arrow move. And if it was a Green Bead then Team Shrek
would’ve made the green arrow move. And taken my pawn. Okay so that’s how Team Shrek will move. Team Kevin will move however I want Team Kevin
to move because I’m Kevin and then we’ll play back and forth until there’s a winner. Alright, Round : Fight!: I decide to move
Lord Farquaad forward. For Round 2 I now use. this box to determine Team Shrek’s move. So we’ll give it a shake. Woah! Let’s try that again. And it’s the green move. So Donkey moves forward. Now it’s my turn and I decide that, look,
I can just take Princess Fiona when I move diagonally and win the game. That’s it. Now here’s the important part. When Team Shrek makes a losing move, I remove
that bead from the box. That way the computer can’t make the same
bad move the next time that this situation comes up. By removing its losing beads, the computer
learns to play better. When Team Shrek does win, then instead of
removing the bead I’ll just put the bead back in the box. Okay, I’m gonna play a bunch of rounds now
and I’ll keep track of wins over here, with a K when I win and I’ll write an S for a Shrek
win. Here we go. Okay, I’ve played 14 games. I started off winning a lot more than I was
losing… and then things changed. Out of the last 7 games, Team Shrek has won
6 of them. The computer is clearly getting better at
the game… but is it really learning? I mean, I’m just taking beads out of matchboxes
how is that learning? What is learning? At the most basic level, learning is acquiring
new knowledge or a new skill, or modifying an existing behavior. Every time I take a bead out of a matchbox,
the computer loses a behavior that leads it to an outcome of failure. That increases the probability that the computer’s
move each round leads it to success — which in our case, is winning Shreksapawn. After a sufficient number of games, the computer
will evolve to play perfectly. My Team Shrek computer may not be thinking
on its own, but it is learning. And it can also learn in a different way. Removing beads is basically a form of learning
by punishment. When Team Shrek makes a bad move, I’m punishing
the computer for being wrong. I don’t have to worry about the computer
feeling bad about losing, these matchboxes aren’t gonna get frustrated and quit playing
and run away crying and slam the door in my face and tell me I’m not their real dad. But what happens if instead of punishing my
computer, I reward it? Instead of just putting the good play bead
back in the box when the computer wins, I could add another bead of the same color that
made the winning move. That would reduce the probability of a losing
bead appearing by increasing the probability of a matchbox generating a winning bead. The computer would still eventually reach
perfect play because I’ll still remove the losing beads, but it will take longer because
it’s winning more often. If it could feel, it would probably feel better
about winning more often along its longer journey toward perfection. So the fastest way to perfect play is by punishing
the computer’s mistakes. But the way to win as many games as possible
along the way is to reward its victories. To improve at hexapawn, our matchbox computer
actually uses a type of genetic algorithm. It’s a way to solve problems and learn based
on natural selection. Based on the process that drives biological
evolution. The beads of learning in your life may be
refined by punishment. Put your hand on a hot stove once, and learn
that, “Ow! That’s painful.” So you remove the touch-hot-stove-bead from
your brain. They may also be augmented by rewards. “My parents bought me ice cream for getting
an A on my exam.” Add another get-good-grades-bead to your matchbox
head computer. Hexapawn is an obscure, academic game from
over 50 years ago, and you can make a matchbox computer that learns to win every time. But by allowing this matchbox computer full
of colored beads to learn, the player who’s learning a bit more about learning is… you. And as always, thanks for watching. If you wanna make your own matchbox, oh I
lost a bead, matchbox computer, download my template for free over at Twitter.com/VsauceTwo. That’s at Vsauce T, W, O. If you wanna watch more Vsauce2 videos, just
uh click over here, and if you aren’t subscribed to Vsauce2 then maybe you should uh, put a,
“subscribe to Vsauce2” bead in your brain. Wow. That was weirdly creepy.

100 thoughts on “The Game That Learns

  1. I wanted to respond to two types of comments that have appeared more than once. I read all the comments and really appreciate when you all dig deep into these topics.

    First, we're missing some matchboxes because we don't actually need them! Some matchboxes work for two scenarios — once for the board position they display, and also for the board position that is a mirror image of it. The computer learns both board positions at the same time, but yes, at first glance it appears as though I just left some out. Martin Gardner didn't think they were necessary, either.

    Second, Hexapawn is a much simpler version of chess, so terms like "checkmate" and "stalemate" aren't exactly the same. They're simpler, too. In chess, checkmate is achieved when there is no way for your opponent to move without the king being captured. A stalemate occurs when a player has no legal move. A stalemate results in a draw.

    So, when that occurs in Hexapawn, it has the trappings of a stalemate but has the result and the spirit of a checkmate — the win is awarded to the player who moves in a way that creates a stalemate for their opponent. Because the situation results in a win instead of a draw, I thought it was more appropriate to compare it to checkmate, though it may have been clearer to avoid the language of "checkmate" entirely.

  2. I am starting to think that I am literally the only person who never put his hand on a hot plate.
    And I somehow feel less educated because of that.

  3. that was a great video!
    fucking lived it 😀
    ai and learning is crazy interesting
    this is my jam ^o^
    Thanks Kevin

  4. This took too long, but if you want the template, here it is https://mobile.twitter.com/VsauceTwo/status/1107733737364770817

  5. You know wouldn't it be cool if the machine also learned from other's mistake than just itself's because now perfecting and more wins can both happen… Bet the creator of hexapon didn't think of that!

  6. This was a brilliant video!!!! I was really excited about the Menace simulation that StandUpMaths did but this is WAY more practical. So SO happy you made this & YouTube recommended it! Definitely trying this out.

  7. using the real hexapawn to show people how it's done
    vsauce: nah
    using shrekapawn to get more view…
    vsauce:stonk!!

  8. Really interesting way of showing a brute-force depth-first-search, although this algorithm doesn't work for difficult games like go, chess because the state is too large for brute-force algorithms.

  9. Actually in chess if your opponent can't move its check, checkmate is when it they move their king you won't get them

  10. If you start off moving the middle piece forward you win every time though, the computer can't learn to react to that if you play it well.

  11. Literaly a best physical example of natural selection and computational evolution I've seen thus far, and I am tempted to do 3 things

    1. Run a C++ script that does this
    2. Do this IRL
    3. See if humans/boxes and or C++ could do this on a much grander scale (i.e. full game of chess)

    For a full game of chess, I'm thinking the bead corresponds to a piece and all possible moves, but this would take years to truly make (more chess combinations than people on earth circa 1985 if I remember correctly) let alone truly have a winning "computer". On top of that, I would need literal THOUSANDS of INDIVIDUAL BEADS per Altoids tin (face it… match boxes won't work)… thank whatever all benevolent deity you believe in for C++ and actual electronic computers….

  12. I have a question, in retrospect I think the answer should be fairly obvious. Wouldn’t it just be faster to both reward and punish the computer to get it to learn faster? In my mind, the answer is a resounding yes. If you removed bad beads and add in good beads? I think it would learn twice as fast, wouldn’t it?

  13. How about you set up one box set like this and another one for playing on the odd turns, and then play them against each other and see how quickly they learn? Or maybe adapt it to a slightly bigger game; say, how many matchboxes would be needed for a 4×4 Octapawn game?

  14. I have a question: What would happen if two perfectly trained "computers" would play against each other?

  15. Me: Some-
    A chain of comments: Body once told me the world is gonna rule me-
    You: I ain't the sharpest tool in the shed

    Please continue:
    SOME BODY-

  16. Technically it can learn but its actually bot learning lets say the computer does a good move and the bead is removed next time it could do a bad because it cant do a good one again

  17. 1:26

    Kevin: What IS a Computer?

    A computer is a machine or device that performs processes, calculations and operations based on instructions provided by a software or hardware program. It is designed to execute applications and provides a variety of solutions by combining integrated hardware and software compone.

    There you go, Happy?

  18. 1:26

    Kevin: What IS a Computer?

    A computer is a machine or device that performs processes, calculations and operations based on instructions provided by a software or hardware program. It is designed to execute applications and provides a variety of solutions by combining integrated hardware and software compone.

    There you go, Happy?

  19. 1:26

    Kevin: What IS a Computer?

    A computer is a machine or device that performs processes, calculations and operations based on instructions provided by a software or hardware program. It is designed to execute applications and provides a variety of solutions by combining integrated hardware and software compone.

    There you go, Happy?

  20. 1:26

    Kevin: What IS a Computer?

    A computer is a machine or device that performs processes, calculations and operations based on instructions provided by a software or hardware program. It is designed to execute applications and provides a variety of solutions by combining integrated hardware and software compone.

    There you go, Happy?

Leave a Reply

Your email address will not be published. Required fields are marked *