AlphaZero: DeepMind’s New Chess AI | Two Minute Papers #216

AlphaZero: DeepMind’s New Chess AI | Two Minute Papers #216


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. After defeating pretty much every highly ranked
professional player in the game of Go, Google DeepMind now ventured into the realm of Chess. They recently challenged not the best humans,
no-no-no, that was long ago. They challenged Stockfish, the best computer
chess engine in existence in quite possibly the most exciting chess-related event since
Kasparov’s matches against Deep Blue. I will note that I was told by DeepMind that
this is the preliminary version of the paper, so now we shall have an initial look, and
perhaps make a part 2 video with the newer results when the final paper drops. AlphaZero is based on a neural network and
reinforcement learning and is trained entirely through self-play after being given the rules
of the game. It is not to be confused with AlphaGo Zero
that played Go. It is also noted that this is not simply AlphaGo
Zero applied to chess. This is a new variant of the algorithm. The differences include:
– one, the rules of chess are asymmetric, for instance pawns only move forward, castling
is different on kingside and queenside, and this means that neural network-based techniques
are less effective at it. – two, the algorithm not only has to predict
a binary win or loss probability when given a move, but draws are also a possibility and
that is to be taken into consideration. Sometimes a draw is the best we can do, actually. There are many more changes to the previous
incarnation of the algorithm, please make sure to have a look at the paper for details. Before we start with the results and more
details, a word on Elo ratings for perspective. The Elo rating is a number that measures the
relative skill level of a player. Currently, the human player with the highest
Elo rating, Magnus Carlssen is hovering around 2800. This man played chess blindfolded against
10 opponents simultaneously in Vienna a couple years ago and won most of these games. That’s how good he is. And Stockfish is one of the best current chess
engines, with Elo rating over 3300. A difference of 500 Elo points means that
if it were to play against Magnus Carlssen, it would be expected to win at least 95 games
out of a 100. Though it is noted that there is a rule suggesting
a hard cutoff at around a 400 point difference. The two algorithms then played each other. AlphaZero versus Stockfish. They were both given 60 seconds of thinking
time per move, which is considered to be plenty given that both of the algorithms take around
10 seconds at most per move. And here are the results. AlphaZero was able to outperform Stockfish
in about 4 hours of learning from scratch. They played a 100 games – AlphaZero won 28
times, drew 72 times and never lost to Stockfish. Holy mother of papers, do you hear that? Stockfish is already unfathomably powerful
compared to even the best human prodigies, and AlphaZero basically crushed it after four
hours of self-play. And, it was run with a similar hardware as
AlphaGo Zero, one machine with 4 Tensor Processing Units. This is hardly commodity hardware, but given
the trajectory of the improvements we’ve seen lately, it might very well be in a couple
of years. Note that Stockfish does not use machine learning
and is a handcrafted algorithm. People like to refer to computer opponents
in computer games as AI, but it is not doing any sort of learning. So, you know what the best part is? AlphaZero is a much more general algorithm
that can also play Shogi on an extremely high level, which is also referred to as Japanese
chess. And this is one of the most interesting points
– AlphaZero would be highly useful even it if were slightly weaker than Stockfish, because
it is built on more general learning algorithms that can be reused for other tasks without
investing significant human effort. But in fact, it is more general, and it also
crushes Stockfish. With every paper from DeepMind, the algorithm
becomes better AND more and more general. I can tell you, this is very, very rarely
the case. Total insanity. Two more interesting tidbits about the paper:
one, all the domain knowledge the algorithm is given is stated precisely for clarity. two, one might think that as computers and
processing power increases over time, all we have to do is add more brute force to the
algorithm and just evaluate more positions. If you think this is the case, have a look
at this – it is noted that AlphaZero was able to reliably defeat Stockfish WHILE evaluating
ten times fewer positions per second. Maybe we could call this the AI equivalent
of intuition, in other words, being able to identify a small number of promising moves
and focusing on them. Chills run down my spine as I read this paper. Being a researcher is the best job in the
world. And we are even being paid for this. Unreal. This is a hot paper, there is lot of discussions
out there on this, lots of chess experts analyze and try to make sense of the games. I had a ton of fun reading and watching through
some of these, as always, Two Minute Papers encourages you to explore and read more, and
the video description is ample in useful materials. You will find videos with some really cool
analysis from Grandmaster Daniel King, International Chess Master Daniel Rensch, and the YouTube
channel ChessNetwork. All quality materials. And, if you have enjoyed this episode and
you think that 8 of these videos a month is worth a few dollars, please throw a coin our
way on Patreon, or, if you favor cryptocurrencies instead, you can throw Bitcoin or Ethereum
our way. You support has been amazing as always and
thanks so much for keeping with us through thick and thin, even in times when weird Patreon
decisions happen. Luckily, this last one has been reverted. I am honored to have supporters like you Fellow
Scholars. Thanks for watching and for your generous
support, and I’ll see you next time!

81 thoughts on “AlphaZero: DeepMind’s New Chess AI | Two Minute Papers #216

  1. >Two Minute Papers
    >6 Minute video

    Not complaining at all! Not worth rebranding the channel over such a small gripe, but I find it funny that as the channel has grown, the videos seem to get longer and longer. Again, I have no problem with this and enjoy some of the added depth and explanations (and would hope many videos are 5+ minutes in the future!), but it's a bit funny considering the name of the channel.

    Love the work, Karoly, especially your course you made available on Rendering/Ray tracing from the university of Vienna. I'm not fully through it yet, but it's been a pleasure to work through in my free time.

  2. For some reason people seem to latch onto the "only 4 TPUs used", both in AlphaZero and in AlphaGo Zero.

    Please clarify that this is only for the fully trained network, i. e. just to get the next move out of a playing AlphaZero.

    During training, according to the paper 5000 first-generation TPUs and 64 second-generation TPUs were used.

    Karol did clarify that this cannot yet be done on "commodity hardware", but the way things are presented both here and elsewhere, the 4 TPU figure is what sticks in people's minds.

  3. God! I love your videos. I use them as means to select my next paper reading as there are tons of really interesting knowledge. What you say is totally true: 'What a time to be alive' 😀

  4. There's still debates going on that Stockfish was handicapped by not allowing it to use opening and endgame database. Also they say hardware they both ran at was incomparable. But certainly as an avid chess player I was really excited about this news. Unlike most of the engines the play of AlphaZero was way more human-like and easy to understand. Those ten games they published are super interesting and educational. Would love to see the other ones.

  5. If it was to play dark chess (where you can only see the enemy pieces that can be captured), would it need even more generalized algorithm?

  6. Hi, love this channel. Not to pick holes but the winning of 50% or more chess games blindfolded is a trick and has nothing to do with the ability to play chess. Lots of popular magicians have performed the same trick.
    But then I'm an idiot and know nothing of such things.

  7. Says this video was published 2 hours ago so 10:30am Eastern on December 21st. I watched this video at least a week ago. Weird.

  8. It can play Shogy? Wow. It is massively more complex. Because when you kill the piece in shogy it is not out of the game. It goes to your pocket and you can drop it on any free space instead of your move (restrictions only for pawns). I wonder what kind of ideas and new strategies AI will come up with. Humans spend a massive amount of initial moves for casting. And Alpha played pretty aggressively in the chess games.

  9. Sir, I watch a lot of YouTube on a lot of different subjects. And I mean all the time. But your channel is my absolute favorite. I get psyched every time I see you have posted something. Thank you so much for doing this. Not only are you exciting and have fantastic info you keep it short and to the point. Please, keep up the good work. My family and I are cheering you on!

  10. I am very happy that AI development is going forward. But one need to keep in mind that AlphaZero played on much faster hardware than Stockfish, also Stockfish was not allowed to have any pre-calculated databases, which AlphaZero was allowed to have. Playing one version of Stockfish against another where this is taken into consideration results in close to 10 to 0 wins for the one on faster hardware with openingbooks. I would be very interested in seeing a fair comparison between Stockfish and AlphaZero, maybe A0 will win in that case as well!

  11. Some ppl say the 1 minute per move time limit in these games was very non-standard and not something Stockfish was well suited for. Usually in Chess, time limits are for the whole game, not fixed per move.

  12. Do we know anything about the hardware Stockfish was run on? Was the comparison between AlphaZero and Stockfish done fairly or was it done in favour of AlphaZero?

  13. 25/50 draws playing white, 47/50 draws playing black. Can it be in chess that when the players surpass a certain level, they will always end up in a draw?

  14. I understand this is a video about chess. BUT! Did I see correctly that AlphaZero defeated AlphaGo Zero 3 day trained (which is already better then Lee Sedol)? And if so, with 4 hours of training?

  15. I saw most of the games. Very good play. This “Immortal Zugzwang” type of game reminded me of what I did 10 years ago, against one forum administrator (who probably played using some kind of computer program). He lost and he was very confused about that, he could not figure how I managed to beat him. Well, having a background in AI research probably helped — I was doing AI stuff (voice recognition and neural networks for Romanian language) since 1998.

    But I have my reserves – this works only if they play fast games, blitz chess. Also, it's one thing to play at a certain level of depth, sequentially, as Deep Blue played with Kasparov (here the machine wins every time), and completely another to play as a hint, "in parallel thinking", or patterns, as Alpha Zero does. Stockfish had 70 million moves analyzed and sorted in a sequential mode, Alpha Zero had just 70.000, but it analyzed and used all of them at the same time.

    It's like a comparing a man and a woman going to a big mall. A man will check every store and optimize the route (to a degree of accuracy and effectiveness, as it is near impossible to check every single item and store – the analogy with the game of chess it that is has around 10 to the 120 power of moves – Shannon number, and there are only 10 to the 80 atoms in the Universe). A girl will wonder around everywhere, but in the end, she will find the best things to buy. Every single time. That's the real power of any modern AI.

    For this reason I am pretty sure the fist functional super-AI will be "a girl" AI. As this is how AI "thinks" all the time (girls are way better at multitasking and lateral thinking, I guess everyone knows that).

    🙂

    https://www.youtube.com/watch?v=DsSXNMCPTtw

    This was played on Yahoo chess, around 2008 or 2009, I guess on Christmas holidays. I clearly remember how I tried to lure him in that position, most of my moves were chosen having this type of “positional fork” in mind. I guess Alpha Zero has some similar type of thinking, is constantly searching for these patterns all the time.

    Classic positional edge & luring tactics used against one very aggressive player.

  16. Alphazero is so fast it's making two minutes turn to six. Jk just keep this great content coming informative and inspirational!

  17. I tried to get into shogi, but I found no good online strategy help and had no good opponents to play against. So I went back to chess.

  18. if you beat someone 84-16 then you get 282,84 elo points 90% result 361 elo points and 98% result then 581,362 points.The Alpha Zero beat The Stockfish 28 wins 72 draws 64% result that means 116 elo points. 3300+116=3416 elo points for Alpha Zero

  19. Can I get some clarification on the statement at 5:00?
    If you have a better heuristic for exploration, it makes perfect sense to have a higher confidence with the same amount of evaluations or, in this case, the same confidence with less evaluations.
    I am not sure how I can count that as "using less computational power", because creating the heuristic involves exploring all those extra states, "caching" the results and later reusing that cache.

  20. Even it they were equal in skill, that optimization in CPU cost in regards to how many PLYs are evaluated per move, is by itself completely mind blowing in my opinion. That it at the same time wins, is just crazy O_O I did my masters thesis with MCTS w/ ANN evaluators too, but it was in 2011, and the methods must really have evolved :O

  21. 1:05 the sudden jump in Elo Rating after a period of stagnation from 17hrs to 27hrs is interesting & scary!!! I wonder what caused the jump

  22. People seem to latch onto the difference in hardware for stockfish and A0. There are 2 key points to make here:

    1. A0 evaluated 1000x less board positions, its learned evaluation function is just much much better and extracted from data through self play.
    2. A0 lends itself much better to parallelism then stockfish. The algorithms used in stockfish don't gain a lot from increasing the number of cpus thrown at it.

    So yes, A0 utilizes much more teraflops but Stockfish is unable to gain from them. It would gain most from higher CPU frequencies but we've reached a limit there. Its algorithms are just inferior, this is the biggest problem.

  23. Chess is a game of finite paths to two potential outcomes. It has always been the simplest thing in the world for a computer to do well. Why does anyone even bother researching such trivial pursuits?

  24. When Magnus Carlsen won against all those people at the chess tournament, it was just a trick. He's not playing chess, he's just remembering the moves from each opponent and playing them against the others opponents. Mathematically, he'll always win a few of them. So yes, it is a trick, but it is still really impressive remembering and replaying all of those moves to all of those opponents. If you want more info on the technique, Darren Brown did a video on it. https://www.youtube.com/watch?v=rIAXIubSTkc

  25. You listed the most boring differences, what about the change from a pool of players to a single player? The fact that the algorithm got better when it was simplified is the biggest kicker for me

  26. Very interesting, thanks.  Nice to hear Jerry and Daniel getting a shoutout, I have enjoyed their analysis of the AlphaZero games, I'll check out the other guy now.  🙂

  27. The Alpha Zero beat the Stockfish 64% that is 101 points 3300+101=3401 75% score is 190 elo points 95% is 463 points and 99% is 656 elo points.3300+656=3956 Elo points.Over 4000 strong computer then the game of chess is solved.

  28. Well Carlsen's blindfold simul was impressive because it was also timed. But unfortunately the people moving his pieces for him and calling out his opponents' moves were utterly incompetent, often neglecting to call out a move for minutes after they were made, and a few times not even moving Magnus's pieces at all when Magnus gave his replies! And he still won most of the games! 

    But blindfold similes normally are not that hard for a strong player, Carlsen could play 40-50 people blindfolded if he wanted to.

  29. Taking nothing away from alphazero it wasn't playing the strongest version of stockfish and stockfish was handicapped quite fundamentally(hardware/database and time constraints). Id like to see a proper both sides happy rematch.

  30. So do they try to allow the AIphaZero to self learning for two months non-stop, and come back to try again with the highest setting + hardware of stockfish? I think they need to do this in order to prove that after two months continuously of self learning this AI will become much much more powerful than the 4 hours version, and can beat stockfish in every game they play.

  31. machine ELO is actually deflated compared to human elo
    I don't know how much, but some say several hundred. it is advised that you search this stuff out yourself.

  32. The Queen's Gambit graph is odd. The record is 1/47/2 after 1. d4 d5, significantly worse than after 1… Nf6. Still, AlphaZero chooses to play 1… d5 against itself ~10% of the time! Would love to see its record against itself from these positions.

  33. I don't know why this is stated wrongly everywhere, even by someone who is supposed to have read the paper. AlphaZero got to the level of Stockfish after 4 hours of training (so you might say he "mastered" it in just 4 hours), but the actual 100 Game match happened after AlphaZero had 8 hours of training. Granted this doesn't change the magnitude of this success, but it's annoying to hear the emphasis on "4 hours" all the time when it's simply wrong.

  34. I so want a version of alpha zero that starts out as a beginner on hardware than I can afford and then play it forever for fun. I've always wanted a computer that might play like a human and not a super processing juggernaut.

  35. You mentioned ChessNetwork. His video analyses of the Alpha Zero/Stockfish chess games are terrific! Check them out on YouTube, anyone who's interested in learning more.

  36. A nice video but it would be better if you could also explain how they are doing it along with what they the doing with the results.
    Cheers!!

  37. Arguing about time settings and databases misses the point in my opinion. The AI beat stockfish as black three times. That’s really all that needs to be said. From our perspective they could as well be two gods duking it out on the board.

  38. Alpha Zero progress in chess skyrocketted and then the learning curve flattened after a few hours. It hit some sort of ceiling, yet it still betters every once in a while. Like Bruce Lee said, there are no limits, only plateaus.

Leave a Reply

Your email address will not be published. Required fields are marked *