Optimizing card game simulations
Table of Contents
Introduction
Recently I’ve been implementing a card game some friends showed me in Go as a side-project. In this journey I ended up implementing Bots of various levels of difficulty and a simulation package to make them play against each other and validate that they indeed play better (or at least win more games) than their previous levels. The performance of the simulation was improved from roughly 2000 games in 30s to 10000 games in 3s.
The game is called Tincho
and if you wish to play it’s available (with low resources) at tincho.manuelpepe.com
In this entry I’ll showcase some iterations I went through improving the performance of the simulation routine. I won’t show the full implementation to avoid explaining a lot of useless context, but we’ll go through the main ideas. At the end I’ll add the results of various simulations performed on the existing bots.
One game
I started simply with a Result
type and a Play
function that can match just two bot strategies against each other:
type Result struct {
Winner int // index of the winning strategy
TotalRounds int
TotalTurns int
}
func Play(ctx context.Context, strat, strat2 bots.Strategy) (Result, error) {
// setup to start game
return Result{ /* ... */ }, nil
}
This allowed me to code a simple test like:
func TestEasyVsMedium(t *testing.T) {
winsForMedium := 0
for i := 0; i < 100; i++ {
res, err := Play(context.Background(), bots.NewEasyStrategy(), bots.NewMediumStrategy())
assert.NoError(t, err)
winsForMedium += res.Winner
}
// medium should win 80% of the time
assert.GreaterOrEqual(t, winsForMedium, 80)
}
Multiple games
I didn’t want to write the loop each time, also I need some kind of summary of the data to avoid scrolling through hundreds of games.
So I added the Compete
function to play a bunch of games, returning all their results:
// `func() bots.Strategy` is used to build a clean strategy each game instead of
// reusing the same struct as they most likely hold game state.
func Compete(ctx context.Context, iters int, strat, strat2 func() bots.Strategy) ([]Result, error) {
results := make([]Result, iters)
for i := 0; i < iters; i++ {
result, err := Play(ctx, strat(), strat2())
if err != nil {
return nil, fmt.Errorf("error on round %d: %w", i, err)
}
results[i] = result
}
return results, nil
}
And a separate function that summarizes this list of results into a Summary
struct:
type Summary struct {
Strat1Summary StratSummary
Strat2Summary StratSummary
TotalRounds int
TotalTurns int
}
type StratSummary struct {
Wins int
Rounds MinMaxMean
Turns MinMaxMean
}
type MinMaxMean struct {
Min int
Max int
Mean int
Sum int
}
func Summarize(results []Result) Summary {
// a big loop aggregating the results
return Summary{ /* ... */ }
}
Which allows me to write tests like:
func TestEasyVsMediumCompeteSummary(t *testing.T) {
easy := func() bots.Strategy {
return bots.NewEasyStrategy()
}
medium := func() bots.Strategy {
return bots.NewMediumStrategy()
}
res, err := Compete(context.Background(), 1000, easy, medium)
assert.NoError(t, err)
summary := Summarize(res)
fmt.Printf("Summary: %+v\n", summary)
}
At this point, running 2000 iterations (games) takes about 30 seconds. It’s not that bad, but it got better.
Concurrency
You may notice that the games are being run sequentially. For the next iteration, I introduced concurrency with goroutines:
func Compete(ctx context.Context, iters int, strat, strat2 func() bots.Strategy) ([]Result, error) {
ctx, cancelPendingGames := context.WithCancel(ctx)
outs := make(chan Result)
errs := make(chan error)
// start every game in a different goroutine
for i := 0; i < iters; i++ {
go func() {
result, err := Play(ctx, strat(), strat2())
if err != nil {
select {
case <-ctx.Done():
return
case errs <- fmt.Errorf("error on round: %w", err):
return
}
}
select {
case <-ctx.Done():
return
case outs <- result:
return
}
}()
}
// collect all results
results := make([]Result, iters)
for i := 0; i < iters; i++ {
select {
case result := <-outs:
results[i] = result
case err := <-errs:
cancelPendingGames()
return nil, err
}
}
cancelPendingGames()
return results, nil
}
The test code doesn’t change, but we can now run 2000 games in 10 seconds from 30 seconds previously.
More concurrency
The next thing I went for was the Summary processing, it happens at the end of the routine but may very well be calculated as we gather the results. This also has the side-effect of reducing the memory footprint of our program, as we don’t need to keep the results for N games in memory at the same time.
So I removed the Summarize()
function and modified the way results are gathered in Compete()
:
func Compete(ctx context.Context, iters int, strat, strat2 func() bots.Strategy) (Summary, error) {
ctx, cancelPendingGames := context.WithCancel(ctx)
outs := make(chan Result)
errs := make(chan error)
// start every game in a different goroutine
for i := 0; i < iters; i++ {
// this code is the same as above
}
// collect all results in a goroutine and summarize them
var finalResChan = make(chan Summary)
var finalErrChan = make(chan error)
go func() {
summary := Summary{}
for i := 0; i < iters; i++ {
select {
case result := <-outs:
// update summary
case err := <-errs:
finalErrChan <- err
return
}
}
finalResChan <- summary
}()
// wait for the final result
select {
case res := <-finalResChan:
cancelPendingGames()
return res, nil
case err := <-finalErrChan:
cancelPendingGames()
return OldSummary{}, err
}
}
This change doesn’t increase the time performance that much, still at rougly 2000 games per 10 seconds, but it should decrease the memory consumption (I really haven’t measured this yet)
Limiting concurrency
Now, goroutines are cheap right? Yes, but they are not free. At this point, all goroutines are started at the same time which at around 10k iterations in my computer, results in an OOM (Out Of Memory) kill of the process.
To fix this, I introduced a limit to the amount of workers that can be spawned on Compete()
:
func Compete(ctx context.Context, iters int, strat, strat2 func() bots.Strategy) (Summary, error) {
ctx, cancelPendingGames := context.WithCancel(ctx)
outs := make(chan Result)
errs := make(chan error)
// start worker goroutines
pending := make(chan struct{})
workerCount := min(iters, 10000)
for i := 0; i < workerCount; i++ {
go func() {
for {
select {
case <-ctx.Done():
return
case <-pending:
result, err := Play(ctx, strat(), strat2())
if err != nil {
select {
case <-ctx.Done():
return
case errs <- fmt.Errorf("error on round: %w", err):
}
}
select {
case <-ctx.Done():
return
case outs <- result:
}
}
}
}()
}
// start queueing goroutine
go func() {
for i := 0; i < rounds; i++ {
select {
case <-ctx.Done():
return
case pending <- struct{}{}:
}
}
}()
// the remaining code is the same as above for summarizing and returning the results
}
With the last change, we limited the amount of worker goroutines to an arbitrary max of 10000 goroutines, and made them long lived by looping and consuming from a pending
channel that is being filled by a separate queueing goroutine.
This last change allows us to run simulations of any number of games without fear of being OOM killed.
Analysis with pprof
Now the modifications of the Compete
procedure stop. The last two modifications did not yield significant improvements specifically on performance. The next step was to dig through the application logic for inefficiencies outside of the simulation routine. This falls out of scope of this entry, but I’ll mention a few more things.
The first step to optimize the application was to run go tool pprof
while running a simulation of a fair number of games. In this case, a lot of time was being occupied by json.Unmarshall
given how I initially structured some messages with json.RawMessage
. I replaced that with an implementation that uses generics for an extra performance boost. I go into a bit more detail in this other entry if you’re interested. This in fact yielded a big performance boost.
Testing for leaks and races
When working with goroutines, remember to always run go test
with the -race
parameter to check for data races. They are easy to introduce and could cause subtile bugs.
I also introduced uber-go/goleak
to my tests to check for goroutine leaks.
Running the simulation outside of go test
Going back to the first iteration, I started with one match between two bots and wrote a test with a loop in it. From that point on I moved the loop to the library implementation, but kept running the simulations inside test functions.
When using go test
the execution of any program will be a lot slower than it would be with go run
, as go has to keep track of a lot more things, mostly to be able to work with debuggers.
If you’r running simulations from test functions, moving them to their own package that can be compiled and run without the test suite on will drastically improve performance.
Results
Finally, here are the results of simulating multiple matches agains the three bot difficulties available at tincho.manuelpepe.com:
# E = Easy
# M = Medium
# H = Hard
$ go run cmd/sim/main.go --all
=== RUN: EvE
0: {Wins:5036 Rounds:{Min:2 Max:8 Mean:3 Sum:19291} Turns:{Min:1 Max:392 Mean:73 Sum:369592}}
1: {Wins:4964 Rounds:{Min:2 Max:8 Mean:3 Sum:18792} Turns:{Min:1 Max:279 Mean:71 Sum:356501}}
Total Games: 10000
Total Rounds: {Min:2 Max:8 Mean:3 Sum:38083}
Total Turns: {Min:1 Max:392 Mean:72 Sum:726093}
--- OK (3.622077403s)
=== RUN: EvM
0: {Wins:731 Rounds:{Min:2 Max:7 Mean:4 Sum:2979} Turns:{Min:1 Max:127 Mean:36 Sum:26643}}
1: {Wins:9269 Rounds:{Min:2 Max:8 Mean:3 Sum:36684} Turns:{Min:0 Max:174 Mean:51 Sum:480269}}
Total Games: 10000
Total Rounds: {Min:2 Max:8 Mean:3 Sum:39663}
Total Turns: {Min:0 Max:174 Mean:50 Sum:506912}
--- OK (2.637042901s)
=== RUN: EvH
0: {Wins:6 Rounds:{Min:4 Max:7 Mean:4 Sum:29} Turns:{Min:10 Max:51 Mean:28 Sum:171}}
1: {Wins:9994 Rounds:{Min:2 Max:9 Mean:4 Sum:40156} Turns:{Min:3 Max:125 Mean:52 Sum:520547}}
Total Games: 10000
Total Rounds: {Min:2 Max:9 Mean:4 Sum:40185}
Total Turns: {Min:3 Max:125 Mean:52 Sum:520718}
--- OK (2.317444913s)
=== RUN: MvM
0: {Wins:5170 Rounds:{Min:2 Max:10 Mean:4 Sum:25649} Turns:{Min:1 Max:206 Mean:58 Sum:303541}}
1: {Wins:4830 Rounds:{Min:2 Max:10 Mean:5 Sum:24202} Turns:{Min:0 Max:235 Mean:60 Sum:290082}}
Total Games: 10000
Total Rounds: {Min:2 Max:10 Mean:4 Sum:49851}
Total Turns: {Min:0 Max:235 Mean:59 Sum:593623}
--- OK (2.937989917s)
=== RUN: MvH
0: {Wins:127 Rounds:{Min:3 Max:12 Mean:6 Sum:877} Turns:{Min:5 Max:157 Mean:64 Sum:8243}}
1: {Wins:9873 Rounds:{Min:2 Max:14 Mean:5 Sum:53097} Turns:{Min:1 Max:191 Mean:70 Sum:698592}}
Total Games: 10000
Total Rounds: {Min:2 Max:14 Mean:5 Sum:53974}
Total Turns: {Min:1 Max:191 Mean:70 Sum:706835}
--- OK (3.042727078s)
=== RUN: EvMvH
0: {Wins:165 Rounds:{Min:2 Max:6 Mean:3 Sum:582} Turns:{Min:1 Max:95 Mean:29 Sum:4948}}
1: {Wins:1373 Rounds:{Min:2 Max:6 Mean:3 Sum:5063} Turns:{Min:1 Max:138 Mean:46 Sum:64351}}
2: {Wins:8462 Rounds:{Min:2 Max:7 Mean:3 Sum:32076} Turns:{Min:1 Max:152 Mean:59 Sum:507158}}
Total Games: 10000
Total Rounds: {Min:2 Max:7 Mean:3 Sum:37721}
Total Turns: {Min:1 Max:152 Mean:57 Sum:576457}
--- OK (3.137018959s)
As you can see, after all optimizations we got it down from 30s for 2k to 3s for 10k.
You might also notice that there is no HvH
simulation. Currently, when I match two Hard bots against each other, there is a good chance that they get into a loop of winning one -10 round each with low score added to the other’s bot hand, eventually timing out the simulation as the game never ends.
As a last point, I went back and made the strategy inputs variadic to allow for simulations with more than two bots, that is the EvMvH
simulation.
This was a fun journey and I may come back to this at some point.
Thank you for reading, hope you found something interesting here, and let me know if you have any comments.