

Despite the jitter, we can clearly see the nonlinear ELO curve with compute: There is some jitter, despite increasing the number of games to 1,000 in the second half.
#STOCKFISH CHESS RATING FULL#
The full ELO loss result list from my experiment is for each halfing of compute: ELO On the other hand, when we reduce compute to very low levels, the curve steepens dramatically. Why is that? We can see the same effect in similar experiments down by others ( 1, 2): The ELO gain diminishes (flattens) at high compute. That's much less than the usual statement of 70 ELO. How bad does the version perform with less compute? In this experiment, after running 100 games, we get 14 ELO difference. A command may be: cutechess-cli -fcp cmd=stockfish proto=uci tc=40/90 -scp cmd=stockfish proto=uci tc=40/45 -games 100 In the end, it nicely summarizes the results and includes a differential ELO estimate. It is a command-line interface to play two engines (or two versions of the same engine) against each other. The most well-established tool to compare chess engines is cutechess-cli. To build a ladder of SF towards slower machines, we let this version of SF8 play a set of games of 90s timecontrol versus half that (45s). The whole 40-game match duration is: 90 seconds. Clearly, each move can only take 2.24 seconds. Now we want to perform a game at 17.5 MNodes per move, on a machine running at 7.8 MNodes/s.That's the ballpark of recent (2020) 4-core CPUs. For simplicity, suppose our machine performs at 10 x 777 kNodes/s = 7.8 MNodes/s. This can be done with the Stockfish parameter "bench". We benchmark our own machine, on which the experiments are run.

That's 17.5 MNodes per move to achieve 3302 ELO. The 40 moves in 15 minutes leave 22.5 seconds per move (on average). With that established, we can calculate the ELO as a function of kNodes/s.

This is an important baseline, because it cross-calibrated to dozens of other engines. 64bit 1 CPU is the Athlon this can also be verified with the historical version of that list.).

The main part is the Stockfish 8 experiment. At the time, Kasparov had 2860 ELO, Deep Blue won, although close. The marker for "Deep Blue" in the year 1997 is a bit arbitrarily set to 2900 ELO. Carlsen's rating between 20 (age 13 to 21) grew from 2000 ELO to grandmaster strength, faster than any engine :-) Deep Blue To compare human grandmasters, we take the ELO over time for Kasparov and Carlsen.
#STOCKFISH CHESS RATING PC#
Reproducing chess scaling from 2020 History of PC Programs (ELO by year)Īs a baseline of engine performance over the years, we plot the winner from the yearly rating list of the Swedish Chess Computer Association. That's the aim of the current post (the other questions will be adressed in a later post). To sum it up, the hardware overhang in chess is about 10 years, or 2-3 orders of magnitude in compute.Ībout a year later, in July 2021, Paul Christiano asked similar questions: How much compute would the old engine need to match the current engines? What is the influence of RAM (size and speed), opening books, endgame tables, pondering? Also, my old post gave some insights, but it can be improved by sharing the sources and making it reproducible. I estimated that SF8 drops to Kasparov level on a 486-DX4 100 MHz, available already in 1994. With Stockfish, no supercomputer would have been required. That is an important year: In 1997, the IBM supercomputer "Deep Blue" defeated the world chess champion Gary Kasparov. When reducing compute to 1997 levels (equivalent to a Pentium-II 300 MHz), its ELO score was still ~3,000. I examined the strongest chess engine of 2020, Stockfish 8, performing at 3,400 ELO under tournament conditions. Hardware overhang is when sufficient compute is available, but the algorithms are suboptimal. I had explored measuring AI or hardware overhang in August 2020 using chess.
