Evaluation of the game tree

Next: Correctness Up: Design of the planner Previous: Example problem Contents

Evaluation of the game tree

The second stage of deciding a strategy is the evaluation of the game tree in the context of the agent's belief state. Evaluation is based on the minimax algorithm commonly used to evaluate game trees, since this algorithm returns the strategy of maximum expected utility [59]. This algorithm is a recursive function which, for any given node, finds the minimax strategy, with respect to the other agent's utility function, for each of its children, and returns the one of maximum utility, using the agent's own utility function. While minimax was originally used in evaluating zero-sum games such as chess, where it takes the maximum of the minimum strategy selected by the opponent, it is used here to select the maximum strategy of the each agent from the that agent's perspective. Since agents have different beliefs and different utility functions an alternating perspective must be taken.

There are chance nodes in the game tree which must be considered. In principle, the game tree should be evaluated using minimax in every belief state outcome, and a weighted sum taken, according to the expected utility rule. In each possible belief state, the chance nodes can be removed from the tree according to the value of the state. This leaves an ordinary game tree, with no chance nodes, that can be evaluated using plain minimax. A weighted sum of utility is taken over these trees, according to the probability of each belief state. One way to do this is to check each possible belief state, finding its corresponding utility and probability. In practice, a better approach is for the evaluator to traverse the game tree, and each time a chance node is encountered, a pair of belief models is generated from the current one. In the first of these, the belief addressed by the chance node is set to true (a probability of one). In the second of these, it is set to false (a probability of zero). Each belief model is then propagated down one of the branches of the chance node. A weighted sum is taken at the chance node. Rather than generating all of the outcomes from the start, this mechanism generates the outcomes on demand as chance nodes are encountered.

To implement the evaluation module, a modified form of the standard minimax algorithm should be used [59]. This algorithm returns the best "play" of the game tree, that is, a path representing the maximum utility strategies that each player is expected to choose in the course of the game. The standard algorithm is a recursive function that returns the best play from a game tree by obtaining the best play from the following choice node, and then evaluating each of the strategies available at the current choice node in the context of the following play. The difference between the standard minimax algorithm and the one that must be used here is that the agents disagree about the best play. Each has different beliefs, and so rather than compute one global best play, a best play must be obtained from the perspective of each agent, so that its best strategy can be obtained. To do this, the central minimax function should be first modified to return not a path as the best play, but a tree, with branching at nodes that represent the questionable beliefs of the two agents. This represents the best play path, but over all outcomes of the belief state. An "expected-utility" function should be used to evaluate such a play, by calculating the expected utility over the leaf nodes of the play tree, taken in the context of a belief model. To compute the best play, the minimax function should, for each of the alternatives at the root of the game tree, call minimax at level two of the belief model to obtain the best act of the second agent at the second level of the game tree. At the third level of the game tree, minimax should be called at level one of the belief model to obtain the best act for the first agent, and so on. Therefore minimax is called recursively, alternating between levels one and two of the belief models to obtain the best play following each of the alternatives at the root. Once best plays have been obtained for each of the alternatives at the root, the evaluation function is applied to choose between them, and the one chosen, along with the play that follows it, is returned as the minimax play.

The expected-utility function terminates at the leaves of the game tree, where a function should be used to calculate the utility of the parsed dialogue history by adding up the contributions from each of the acts in the plan (see section 2.13).

Subsections

Next: Correctness Up: Design of the planner Previous: Example problem Contents

bmceleney 2006-12-19