We calculate the sample variance of the rewards online storing the value in the node. This greatly reduces the amount of summations that need to be done to calculate the variance during the selection phase. While this burdens other selection algorithms, the cost is not substantial.
rustic_mcts
An extensible implementation of Monte Carlo Tree Search (MCTS) using an arena allocator.
Languages
Rust
100%