Artificial Intelligence and Machine Learning: Unit I(e): Adversarial search

Stochastic Games

Adversarial search - Artificial Intelligence and Machine Learning

Home | All Subjects | CSE Department | Artificial Intelligence and Machine Learning

Stochastic games represent dynamic interactions in which the environment changes in response to players' behavior. Scientist Shapley says, "In a stochastic game the play proceeds by steps from position to position, according to transition probabilities controlled jointly by the two players"

Stochastic Games

A stochastic game is played by a set of players. In each stage of the game, the play is in a given state (or position, in Shapley's language), taken from a set of states, and every player chooses an action from a set of available actions. The collection of actions that the players choose, together with the current state, determine the stage payoff that each player receives, as well as determines probability distribution according to which the new state that the player will visit.

Stochastic games extend the model of strategic-form games, which is attributable to scientist Von Neumann, to dynamic situations in which the environment changes in response to the player's choices.

Stochastic games - Complexity

The complexity of stochastic games arises from the fact that the choices made by the players have two contradictory effects. First, together with the current state, the players' actions determine the immediate payoff that each player receives. Second, the current state and the players' actions influence the choice of the new state, which determines the potential of future payoffs. In particular, when choosing his actions, each player has to balance these forces, a decision that may often be difficult. Although this dichotomy is also present in one-player sequential decision problems.

Stochastic game - Definition

A stochastic game is a collection of normal-form games that the agents play repeatedly. The particular game played at any time depends probabilistically on the previous game played and the actions of the agents in that game. Like a probabilistic finite state machines in which the states are the games and the transition labels are joint action-payoff pairs.

• A stochastic game also called as Markov game is defined by:

• a finite set Q of states (games),

• A set of strategies Si(x) for each player for each state xϵX.

• a set N = {1,…., n} of agents

• For each agent i, a finite set Ai of possible actions

• A transition probability function P: Q × A₁ ×…..× A_n × Q → [0, 1]

P (q, a₁, ..., a_n, q' ') = Probability of transitioning to state q

if the action profile (a₁,..., a_n) is used in state q

• A set of rewards dependant on the state and the actions of the other players: u¡ (x,S₁,S₂).

• For each agent i, a real-valued payoff function

• ri: Q × A₁ ×….. × A_n -> R (set of real numbers)

• Each stage game is played at a set of discrete times t.

In current discussion for game solving purpose following assumptions are made with respect to stochastic games,

1. The length of the game is not known (infinite horizon) used;

2. The rewards and transition probabilities are not dependent.

3. For solving game only Markov strategies is considered.

A history of length t in a stochastic game is the sequence of states that the game visited in the first t stages, as well as the actions that the players played in the first t-1 stages. A strategy of a player is a prescription how to play the game; that is, a function that assigns to every finite history an action to play should that history occur. A behavior strategy of a player is a function that assigns to every finite history a lottery over the set of available actions.

Earlier, a history was just a sequence of actions.

But now there are action profiles rather than individual actions, and each profile has several possible outcomes. Thus a history is a sequence ht = (q⁰, a¹, q¹, a¹, ..., a^t-1, q^t), where t is the number of stages. As before, the two most common methods to aggregate payoffs into an overall payoff are average reward and future discounted reward.

Stochastic games and MDPs

Note that Stochastic games generalize both Markov Decision Processes (MDPs) and repeated games. An MDP is a stochastic game with only 1 player. A repeated game is a stochastic game with only 1 state. For example, Iterated Prisoner's Dilemma, Roshambo, Iterated Battle of the Sexes.

Strategies for Solving Stochastic Games

• For agent i, a deterministic strategy specifies a choice of action for i at every stage of every possible history

• A mixed strategy is a probability distribution over deterministic strategies

• Several restricted classes of strategies:

• As in extensive-form games, a behavioral strategy is a mixed strategy in which the mixing take place at each history independently

• A Markov strategy is a behavioral strategy such that for each time t, the distribution over actions depends only on the current state. But the distribution may be different at time t than at time t'..

• A stationary strategy is a Markov strategy in which the distribution over actions depends only on the current state (not on the time t)

Zero sum game

A zero-sum game is defined to have a "value" v if

(i) player 1 has a strategy (which is then said to be "optimal"), which ensures that his expected overall payoff over time does not fall below v, no matter what is the strategy followed by player 2, and

(ii) if the symmetric property holds when exchanging the roles of the two players. Shapley proved the existence of a value.

Because the parameters that define the game are independent of time, the situation that the players face. If today the play is in a certain state is the same situation they would face tomorrow if tomorrow the play is in that state.

In particular, one expects to have optimal strategies that are stationary Markov, that is, they depend only on the current state of the game. Shapley proved that indeed such optimal strategies exist, and characterized the value as the unique fixed point of a nonlinear functional operator-a two-player version of the dynamic programming principle.

The existence of stationary Markov optimal strategies implies that, to play well, a player needs to know only the current state. In particular, the value of the game does not change if players receive partial information on each other's actions, and/or if they forget previously visited states.

Solving stochastic games

Nash equilibrium in game theory

Nash equilibrium is a concept within game theory where the optimal outcome of a game is where there is no incentive to deviate from their initial strategy. More specifically, the Nash equilibrium is a concept of game theory where the optimal outcome of a game is one where no player has an incentive to deviate from his chosen strategy after considering an opponent's choice. Overall, an individual can receive no incremental benefit from changing actions, assuming other players remain constant in their strategies. A game may have multiple Nash equilibria or none at all.

The Nash equilibrium is the solution to a game in which two or more players have a strategy, and with each participant considering an opponent's choice, he has no incentive, nothing to gain, by switching his strategy. In the Nash equilibrium, each player's strategy is optimal when considering the decisions of other players. Every player wins because everyone gets the outcome they desire. To quickly test if the Nash equilibrium exists, reveal each player's strategy to the other players. If no one changes his strategy, then the Nash equilibrium is proven.

For example, imagine a game between Anil and Sunil. In this simple game, both players can choose strategy A, to receive 1, or strategy B, to lose 1. Logically, both players choose strategy A and receive a payoff of 1. If one revealed Sunil's strategy to Anil and vice versa, one can see that no player deviates from the original choice. Knowing the other player's move means little and doesn't change either player's behavior. The outcome A, A represents a Nash equilibrium.

Prisoner's Dilemma

The prisoner's dilemma is a common situation analyzed in game theory that can employ the Nash equilibrium. In this game, two criminals are arrested and each is held in solitary confinement with no means of communicating with the other. The prosecutors do not have the evidence to convict the pair, so they offer each prisoner the opportunity to either betray the other by testifying that the other committed the crime or cooperate by remaining silent. If both prisoners betray each other, each serves five years in prison. If A betrays B but B remains silent, prisoner A is set free and prisoner B serves 10 years in prison, or vice versa. If each remains silent, then each serves just one year in prison. The Nash equilibrium in this example is for both players to betray each other. Even though mutual cooperation leads to a better outcome, if one prisoner chooses mutual cooperation and the other does not, one prisoner's outcome is worse.

Irreducible stochastic game

A stochastic game is said to be irreducible if every game can be reached with positive probability regardless of the strategy adopted.

Theorem: Every 2-player, general-sum, average reward, irreducible stochastic game has a Nash equilibrium. A payoff profile is feasible if it is a convex combination of the outcomes in a game, where the coefficients are rational numbers.

There's a folk theorem similar to the one for repeated games:

If (p1, p2) is a feasible pair of payoffs such that each pi is at least as big as agent i's minimax value, then (p1, p2) can be achieved in equilibrium through the use of enforcement.

Backgammon - a example Two-Player Zero-Sum Stochastic Game

• For two-player zero-sum stochastic games, the folk theorem still applies, but it becomes vacuous (empty).

• The situation is similar to what happened in repeated games. The only feasible pair of payoffs is the minimax payoffs.

• One example of a two-player zero-sum stochastic game is Backgammon.

• Two agents who take turns. Before his/her move,an agent must roll the dice.

• The set of available moves depends on the results of the dice roll.

• Mapping Backgammon into a Markov game is straightforward, but slightly hod awkward. Basic idea is to give each move a stochastic outcome, by combining it o with the dice roll that comes after it.

• Every state is a pair:

(current board, current dice configuration)

• Initial set of states = (initial board) x (all possible results of agent 1's first dice roll}

• Set of possible states after agent 1's move = {the board produced by agent 1's move) x {all possible results of agent 2's dice roll}

• Vice versa for agent 2's move one can extend the minimax algorithm to deal with this.

• But it's easier if one doesn't try to combine the moves and the dice rolls. These two events need to keep separate.

Using expectiminimax algorithm for solving stochastic game

• This algorithm is used to solve, two-player zero-sum game in which each agent's move has a deterministic outcome.

• In addition to the two agents' moves, there are chance moves.

• The algorithm gives optimal play (highest expected utility). The algorithm is as follows.

function EXPECTIMINIMAX(s) returns an expected utility

if s is a terminal state then return Max's payoff at s

if s is a "chance" node then

return Σ P(s' | s)EXPECTIMINIMAX(s')

else if it is Max's move at s then

return max{EXPECTIMINIMAX(result (a, s)): a is applicable to s}

else return min{EXPECTIMINIMAX(result (a, s)): a is applicable to s}

• While finding solution for two player stochastic game following points are to be noted.

1. Dice rolls increase branching factor. 21 possible rolls are there with 2 dice.

Given the dice roll, there are 20 legal moves on an average. For some dice roles, it can be much higher,

depth 4 = 20 × (21 x 20)³ ≈ 1.2 × 109

2. As depth increases, probability of reaching a given node shrinks and the value of lookahead is diminished. In such situation α – β pruning is less effective.

3. There is algorithm, TDGammon that uses depth-2 search and it has very good evaluation function. It can achieve world-champion level.

4. The evaluation function was created automatically using a machine learning technique called Temporal Difference learning. Hence in the algorithm name 'TDGammon', 'TD' stands for Temporal Difference learning.

Discounted Stochastic Games

Shapley's model is equivalent to one in which players discount their future payoffs according to a discount factor that depends on the current state and on the players' actions. The game is called "discounted" if all stopping probabilities equal the same constant, and one minus this constant is called the "discount factor." Models of discounted stochastic games are prevalent in economics, where the discount factor has a clear economic interpretation.

In non-zero-sum games, a collection of strategies, one for each player, is a "(Nash) equilibrium" if no player can profit by deviating from his strategy, assuming all other players follow their prescribed strategies.

Stochastic games - Applications

Stochastic games give a model for a large variety of dynamic interactions and are therefore useful in modeling real-life situations that arise in, e.g., economics, political science, and operations research. So the analysis of the game provides decisive predictions and recommendations, the data that define it must have special features. Because applications are usually motivated by the search for clear-cut conclusions, only highly structured models of stochastic games have been studied.

The significance of stochastic games is threefold. First, by modeling a dynamic situation as a stochastic game, researchers must understand the structure of the problem they face. Second, to simplify the model, they have to realize which aspects of the model do not affect the outcome and can be dispensed with. Third, the qualitative predictions of the model sometimes provide useful conclusions. We provide here two applications of stochastic games.

1. One area that was extensively studied as a stochastic game is the over exploitation of a common resource. For example scientists Lloyd, Levhari and Mirman studied a fishery war between two countries. The state variable is the quantity of fish in a given area, which grows exponentially in the absence of unnatural intervention. Each one of two countries has to determine the quantity of fish it allows its fishermen to catch, so as to maximize its long-run utility. The authors concluded that, in equilibrium, the fish population will be smaller than the population that boog would have resulted if the two countries cooperated and maximized their joint utility. The phenomenon of over exploitation of a common resource is known in economics as the "tragedy of the commons".

2. Another application of stochastic games is that of market games with money. The study of the origin of inflation in a market game with continuum of agents and a central bank has been carried out. At every stage, each player receives a random endowment of a perishable commodity, decides how much to lend to or to borrow ay from the central bank, and consumes the amount that he has after this transaction. The conclusion of the analysis is that the mere presence of uncertainty in the endowments leads to inflation.

Solved Example

Example 5.1 Explain game playing with examples.

Explain the following terms in game playing with examples.

i) Minimax (Position, Depth, Player)

ii) Deep enough (Position, Depth)

Solution: Game playing - Game playing is one area in which substantial progress has been made in scaling up toy problems. An IBM program named DEEP BLUE beat the reigning world chess champion, Garry Kasparov, by 3.5 to 2.5 in a six game match. Championship performance has been achieved by sophisticated search algorithms, high-speed computers and chess-specific hardware.

i) Minimax (Position, Depth, Player)

• In minimax game playing strategy, one can use a recursive procedure that needs to return not one but two results –

a) The backed up value of the path it chooses

b) The path itself.

• Assuming that MINIMAX returns a structure containing both results, we have two functions VALUE and PATH, that extract the separate components.

• Initially MINIMAX procedure is called by specifying three parameters as, MINIMAX (Position, Depth, Player) where

• Position indicates a board position.

• Depth indicates the current depth of search.

• Player indicates the player to move.

• So the initial call to compute the best move from the position CURRENT should be -

MINIMAX (CURRENT, 0, PLAYER-ONE) if player-one is to move or,

MINIMAX (CURRENT, 0, PLAYER-TWO) if player-two is to move.

ii) Deep Enough (Position, Depth)

• A critical issue in the design of the MINIMAX procedure is when to step the recursion and simply call the static evaluation function.

• There are a variety of factors that may influence this decision. They include –

• Has one side won?

• How many ply have we already explored?

• How promising is this path?

• How much time is left?

• How stable is the configuration?

• Thus a function DEEP-ENOUGH is assumed to evaluate all of these factors and to return TRUE if the search should be stopped at the current level and FALSE otherwise.

• One simple implementaiton of DEEP-ENOUGH will take two parameters, Position and Depth. It will ignore its position parameter and simply return true it is Depth parameter exceeds a constant cut-off value.

Review Questions

1. Explain minimax procedure. Is this procedure a breadth first or depth first search? (Refer section 5.11)

2. Explain various strategies of game playing? (Refer sections 5.9 and 5.10)

3. Explain the minimax strategy of game playing techniques with example. (Refer section 5.11)

Artificial Intelligence and Machine Learning: Unit I(e): Adversarial search : Tag: : Adversarial search - Artificial Intelligence and Machine Learning - Stochastic Games

Home | All Subjects | CSE Department | UNIT: Artificial Intelligence and Machine Learning

<< Previous

Next >>

Related Subjects

Artificial Intelligence and Machine Learning

CS3491 4th Semester CSE/ECE Dept | 2021 Regulation | 4th Semester CSE/ECE Dept 2021 Regulation

Artificial Intelligence and Machine Learning: Unit I(e): Adversarial search

Stochastic Games

Adversarial search - Artificial Intelligence and Machine Learning

Related Topics

Related Subjects