Dealing with loops in game trees - Part 1

Here I present data structures and algorithms suitable for solving deterministic two-player games of perfect information.

I've recently had modest success in solving life and death problems in go (baduk/weiqi) using novel (as far as I know) algorithms so I thought a write up would be in order.

A game tree is usually though of as a non-cyclic directed graph where the positions are nodes and the moves are directed edges. The non-cyclic nature is usually enforced with some kind of no-loops policy. This increases the number of possible positions exponentially as the move history must now be included in the position definition. Let's drop this requirement and simply talk about directed graphs where loops are allowed. We're also going to adopt the negamax approach so that we don't have to deal with separate maximizing and minimizing nodes.

Let's look at some examples.

Here we have a simple negamax game tree. You can verify that aside from the leaf nodes that have no children each node's value is the maximum of the additive inverses of their children's values.

How about a loop?

Nothing too complex going on here. Each node's value is still the negamax of their children's and the solution is unique.

How about this?

If we tried to keep things single valued we would run into trouble in this case. Let's assume the first node chooses the 1 leaf getting the value of -1. Now the second leaf chooses between the first node and the 2 leaf ending up with a value of +1.

While this is an internally consistent state of affairs we can try different assumptions and see where that takes us. Let's say that the second node chooses the 2-leaf getting the value of -2. Now the first node chooses between the second node and the 1-leaf ending up with a value of +2.

The same graph but a different internally consistent answer. This is no good. We could even choose the value 0 for both nodes and things would stay internally consistent.

You can verify that the negamax condition still holds.

Let's look at an even simpler graph with a loop.

Here the node connects to itself and a leaf with the value of 2. Let's try setting it's value to -2 which would be the correct answer if the node didn't connect to itself.

However if we re-calculate the value, we get 2 which is the negamax of -2 and 2.

If we now re-calculate we get -2 again and the value keeps oscillating. We could also try setting the value to 1 and we would get oscillation between -1 and 1. However if we set the value high or low enough like -3 we get something different. First we get the value of 3, then the value of -2 and then 2 where the value keeps oscillating.

The solution to dealing with game graphs like this lies in establishing bounds on the values that survive the negamax condition, oscillatory or not. The algorithm is as follows:

For each leaf node initialize the lower and upper bound to the value of the leaf.
For each non-leaf node initialize the lower bound to negative infinity and the upper bound to positive infinity.
For each non-leaf node set the lower bound to the negamax of its children's upper bounds.
For each non-leaf node set the upper bound to the negamax of its children's lower bounds.
Repeat steps 3. and 4. until nothing changes.

During each step the lower bounds get higher and upper bounds get lower. Also the lower bound cannot get higher than upper bound so the algorithm is guaranteed to terminate.

Calculating the lower bounds for the first problematic graph we get the following result.

And now we can calculate the upper bounds. It's easily verified that this is the final result.

For the second problematic graph the final result is as follows.

We can now calculate the lower and upper bounds for arbitrary negamax graphs.

Well, given that we can fit the entire graph in memory anyway. Even with symmetry reduction this quickly becomes impractical for realistic game graphs.

This concludes Part 1. In Part 2 we're going to look at partial game graphs and effective expansion strategies to work out the final lower and upper bounds of the root node.