This would in theory be useful for a transposition table, but we do not
currently support that. As such I don't want to burden the
implementation with that field until it is deemed necessary.
We calculate the sample variance of the rewards online storing the value
in the node. This greatly reduces the amount of summations that need to
be done to calculate the variance during the selection phase.
While this burdens other selection algorithms, the cost is not
substantial.
This is a basic working implementation of the MCTS algorithm. Though
currently the algorithm is slow compared with other implemenations, and
makes sub-optimal choices when playing tic-tac-toe. Therefore some
modifications are needed