Stochastic games represent dynamic interactions in which the environment changes in response to players' behavior. Scientist Shapley says, "In a stochastic game the play proceeds by steps from position to position, according to transition probabilities controlled jointly by the two players"
Stochastic
Games
Stochastic
games represent dynamic interactions in which the environment changes in
response to players' behavior. Scientist Shapley says, "In a stochastic
game the play proceeds by steps from position to position, according to
transition probabilities controlled jointly by the two players"
A
stochastic game is played by a set of players. In each stage of the game, the
play is in a given state (or position, in Shapley's language), taken from a set
of states, and every player chooses an action from a set of available actions.
The collection of actions that the players choose, together with the current
state, determine the stage payoff that each player receives, as well as determines
probability distribution according to which the new state that the player will
visit.
Stochastic
games extend the model of strategic-form games, which is attributable to
scientist Von Neumann, to dynamic situations in which the environment changes
in response to the player's choices.
Stochastic
games - Complexity
The
complexity of stochastic games arises from the fact that the choices made by
the players have two contradictory effects. First, together with the current
state, the players' actions determine the immediate payoff that each player
receives. Second, the current state and the players' actions influence the
choice of the new state, which determines the potential of future payoffs. In
particular, when choosing his actions, each player has to balance these forces,
a decision that may often be difficult. Although this dichotomy is also present
in one-player sequential decision problems.
Stochastic
game - Definition
A
stochastic game is a collection of normal-form games that the agents play
repeatedly. The particular game played at any time depends probabilistically on
the previous game played and the actions of the agents in that game. Like a
probabilistic finite state machines in which the states are the games and the
transition labels are joint action-payoff pairs.
•
A stochastic game also called as Markov game is
defined by:
•
a finite set Q of states (games),
•
A set of strategies Si(x) for each player for each
state xϵX.
•
a set N = {1,…., n} of agents
•
For each agent i, a finite set Ai of possible actions
•
A transition probability function P: Q × A1
×…..× An × Q → [0, 1]
P
(q, a1, ..., an, q' ') = Probability of transitioning to
state q
if
the action profile (a1,..., an) is used in state q
•
A set of rewards dependant on the state and the
actions of the other players: u¡ (x,S1,S2).
•
For each agent i, a real-valued payoff function
•
ri: Q × A1 ×….. × An -> R
(set of real numbers)
•
Each stage game is played at a set of discrete times
t.
In
current discussion for game solving purpose following assumptions are made with
respect to stochastic games,
1.
The length of the game is not known (infinite horizon) used;
2.
The rewards and transition probabilities are not dependent.
3.
For solving game only Markov strategies is considered.
A
history of length t in a stochastic game is the sequence of states that the game
visited in the first t stages, as well as the actions that the players played
in the first t-1 stages. A strategy of a player is a prescription how to play
the game; that is, a function that assigns to every finite history an action to
play should that history occur. A behavior strategy of a player is a function
that assigns to every finite history a lottery over the set of available
actions.
Earlier,
a history was just a sequence of actions.
But
now there are action profiles rather than individual actions, and each profile
has several possible outcomes. Thus a history is a sequence ht = (q0,
a1, q1, a1, ..., at-1, qt),
where t is the number of stages. As before, the two most common methods to
aggregate payoffs into an overall payoff are average reward and future
discounted reward.
Stochastic
games and MDPs
Note
that Stochastic games generalize both Markov Decision Processes (MDPs) and
repeated games. An MDP is a stochastic game with only 1 player. A repeated game
is a stochastic game with only 1 state. For example, Iterated Prisoner's
Dilemma, Roshambo, Iterated Battle of the Sexes.
Strategies
for Solving Stochastic Games
•
For agent i, a deterministic strategy specifies a
choice of action for i at every stage of every possible history
•
A mixed strategy is a probability distribution over
deterministic strategies
•
Several restricted classes of strategies:
•
As in extensive-form games, a behavioral strategy is
a mixed strategy in which the mixing take place at each history independently
•
A Markov strategy is a behavioral strategy such that
for each time t, the distribution over actions depends only on the current
state. But the distribution may be different at time t than at time t'..
•
A stationary strategy is a Markov strategy in which
the distribution over actions depends only on the current state (not on the
time t)
Zero
sum game
A
zero-sum game is defined to have a "value" v if
(i)
player 1 has a strategy (which is then said to be "optimal"), which
ensures that his expected overall payoff over time does not fall below v, no
matter what is the strategy followed by player 2, and
(ii)
if the symmetric property holds when exchanging the roles of the two players.
Shapley proved the existence of a value.
Because
the parameters that define the game are independent of time, the situation that
the players face. If today the play is in a certain state is the same situation
they would face tomorrow if tomorrow the play is in that state.
In
particular, one expects to have optimal strategies that are stationary Markov,
that is, they depend only on the current state of the game. Shapley proved that
indeed such optimal strategies exist, and characterized the value as the unique
fixed point of a nonlinear functional operator-a two-player version of the
dynamic programming principle.
The
existence of stationary Markov optimal strategies implies that, to play well, a
player needs to know only the current state. In particular, the value of the
game does not change if players receive partial information on each other's actions,
and/or if they forget previously visited states.
Solving
stochastic games
Nash
equilibrium in game theory
Nash
equilibrium is a concept within game theory where the optimal outcome of a game
is where there is no incentive to deviate from their initial strategy. More
specifically, the Nash equilibrium is a concept of game theory where the
optimal outcome of a game is one where no player has an incentive to deviate
from his chosen strategy after considering an opponent's choice. Overall, an
individual can receive no incremental benefit from changing actions, assuming
other players remain constant in their strategies. A game may have multiple
Nash equilibria or none at all.
The
Nash equilibrium is the solution to a game in which two or more players have a
strategy, and with each participant considering an opponent's choice, he has no
incentive, nothing to gain, by switching his strategy. In the Nash equilibrium,
each player's strategy is optimal when considering the decisions of other
players. Every player wins because everyone gets the outcome they desire. To
quickly test if the Nash equilibrium exists, reveal each player's strategy to
the other players. If no one changes his strategy, then the Nash equilibrium is
proven.
For
example, imagine a game between Anil and Sunil. In this simple game, both
players can choose strategy A, to receive 1, or strategy B, to lose 1.
Logically, both players choose strategy A and receive a payoff of 1. If one
revealed Sunil's strategy to Anil and vice versa, one can see that no player
deviates from the original choice. Knowing the other player's move means little
and doesn't change either player's behavior. The outcome A, A represents a Nash
equilibrium.
Prisoner's
Dilemma
The
prisoner's dilemma is a common situation analyzed in game theory that can
employ the Nash equilibrium. In this game, two criminals are arrested and each
is held in solitary confinement with no means of communicating with the other.
The prosecutors do not have the evidence to convict the pair, so they offer
each prisoner the opportunity to either betray the other by testifying that the
other committed the crime or cooperate by remaining silent. If both prisoners
betray each other, each serves five years in prison. If A betrays B but B
remains silent, prisoner A is set free and prisoner B serves 10 years in
prison, or vice versa. If each remains silent, then each serves just one year
in prison. The Nash equilibrium in this example is for both players to betray
each other. Even though mutual cooperation leads to a better outcome, if one
prisoner chooses mutual cooperation and the other does not, one prisoner's
outcome is worse.
Irreducible
stochastic game
A
stochastic game is said to be irreducible if every game can be reached with
positive probability regardless of the strategy adopted.
Theorem: Every 2-player, general-sum, average reward, irreducible
stochastic game has a Nash equilibrium. A payoff profile is feasible if it is a
convex combination of the outcomes in a game, where the coefficients are rational
numbers.
There's
a folk theorem similar to the one for repeated games:
If
(p1, p2) is a feasible pair of payoffs such that each pi is at least as big as
agent i's minimax value, then (p1, p2) can be achieved in equilibrium through
the use of enforcement.
Backgammon
- a example Two-Player Zero-Sum Stochastic Game
•
For two-player zero-sum stochastic games, the folk
theorem still applies, but it becomes vacuous (empty).
•
The situation is similar to what happened in repeated
games. The only feasible pair of payoffs is the minimax payoffs.
• One example of a
two-player zero-sum stochastic game is Backgammon.
• Two agents who
take turns. Before his/her move,an agent must roll the dice.
•
The set of available moves depends on the results of
the dice roll.
•
Mapping Backgammon into a Markov game is
straightforward, but slightly hod awkward. Basic idea is to give each move a
stochastic outcome, by combining it o with the dice roll that comes after it.
•
Every state is a pair:
(current
board, current dice configuration)
• Initial set of states = (initial board) x (all possible
results of agent 1's first dice roll}
• Set of possible states after agent 1's move = {the board
produced by agent 1's move) x {all possible results of agent 2's dice roll}
• Vice versa for agent 2's move one can extend the minimax
algorithm to deal with this.
• But it's easier if one doesn't try to combine the moves and
the dice rolls. These two events need to keep separate.
Using expectiminimax algorithm for solving stochastic game
•
This algorithm is used to solve, two-player zero-sum
game in which each agent's move has a deterministic outcome.
•
In addition to the two agents' moves, there are
chance moves.
•
The algorithm gives optimal play (highest expected
utility). The algorithm is as follows.
function EXPECTIMINIMAX(s) returns an expected utility
if s is a terminal state then return Max's payoff at s
if s is a "chance" node then
return Σ P(s' | s)EXPECTIMINIMAX(s')
else if it is Max's move at s then
return max{EXPECTIMINIMAX(result (a, s)): a is applicable to s}
else return min{EXPECTIMINIMAX(result (a, s)): a is applicable to
s}
•
While finding solution for two player stochastic game
following points are to be noted.
1.
Dice rolls increase branching factor. 21 possible rolls are there with 2 dice.
Given
the dice roll, there are 20 legal moves on an average. For some dice roles, it
can be much higher,
depth
4 = 20 × (21 x 20)3 ≈ 1.2 × 109
2.
As depth increases, probability of reaching a given node shrinks and the value
of lookahead is diminished. In such situation α – β pruning is less effective.
3.
There is algorithm, TDGammon that uses depth-2 search and it has very good
evaluation function. It can achieve world-champion level.
4.
The evaluation function was created automatically using a machine learning
technique called Temporal Difference learning. Hence in the algorithm name
'TDGammon', 'TD' stands for Temporal Difference learning.
Discounted
Stochastic Games
Shapley's
model is equivalent to one in which players discount their future payoffs
according to a discount factor that depends on the current state and on the
players' actions. The game is called "discounted" if all stopping
probabilities equal the same constant, and one minus this constant is called
the "discount factor." Models of discounted stochastic games are
prevalent in economics, where the discount factor has a clear economic
interpretation.
In
non-zero-sum games, a collection of strategies, one for each player, is a
"(Nash) equilibrium" if no player can profit by deviating from his
strategy, assuming all other players follow their prescribed strategies.
Stochastic
games - Applications
Stochastic
games give a model for a large variety of dynamic interactions and are
therefore useful in modeling real-life situations that arise in, e.g.,
economics, political science, and operations research. So the analysis of the
game provides decisive predictions and recommendations, the data that define it
must have special features. Because applications are usually motivated by the
search for clear-cut conclusions, only highly structured models of stochastic
games have been studied.
The
significance of stochastic games is threefold. First, by modeling a dynamic
situation as a stochastic game, researchers must understand the structure of
the problem they face. Second, to simplify the model, they have to realize
which aspects of the model do not affect the outcome and can be dispensed with.
Third, the qualitative predictions of the model sometimes provide useful
conclusions. We provide here two applications of stochastic games.
1.
One area that was extensively studied as a stochastic game is the over
exploitation of a common resource. For example scientists Lloyd, Levhari and
Mirman studied a fishery war between two countries. The state variable is the
quantity of fish in a given area, which grows exponentially in the absence of
unnatural intervention. Each one of two countries has to determine the quantity
of fish it allows its fishermen to catch, so as to maximize its long-run
utility. The authors concluded that, in equilibrium, the fish population will
be smaller than the population that boog would have resulted if the two
countries cooperated and maximized their joint utility. The phenomenon of over
exploitation of a common resource is known in economics as the "tragedy of
the commons".
2.
Another application of stochastic games is that of market games with money. The
study of the origin of inflation in a market game with continuum of agents and
a central bank has been carried out. At every stage, each player receives a
random endowment of a perishable commodity, decides how much to lend to or to
borrow ay from the central bank, and consumes the amount that he has after this
transaction. The conclusion of the analysis is that the mere presence of
uncertainty in the endowments leads to inflation.
Solved
Example
Example 5.1
Explain game playing with examples.
or
Explain
the following terms in game playing with examples.
i)
Minimax (Position, Depth, Player)
ii)
Deep enough (Position, Depth)
Solution:
Game playing - Game playing is one area in
which substantial progress has been made in scaling up toy problems. An IBM
program named DEEP BLUE beat the reigning world chess champion, Garry Kasparov,
by 3.5 to 2.5 in a six game match. Championship performance has been achieved
by sophisticated search algorithms, high-speed computers and chess-specific
hardware.
i)
Minimax (Position, Depth, Player)
•
In minimax game playing strategy, one can use a
recursive procedure that needs to return not one but two results –
a) The backed up value of the path it
chooses
b) The path itself.
•
Assuming that MINIMAX returns a structure containing
both results, we have two functions VALUE and PATH, that extract the separate
components.
•
Initially MINIMAX procedure is called by specifying
three parameters as, MINIMAX (Position, Depth, Player) where
• Position
indicates a board position.
• Depth indicates
the current depth of search.
• Player indicates
the player to move.
•
So the initial call to compute the best move from the
position CURRENT should be -
MINIMAX
(CURRENT, 0, PLAYER-ONE) if player-one is to move or,
MINIMAX
(CURRENT, 0, PLAYER-TWO) if player-two is to move.
ii) Deep
Enough (Position, Depth)
•
A critical issue in the design of the MINIMAX
procedure is when to step the recursion and simply call the static evaluation
function.
•
There are a variety of factors that may influence
this decision. They include –
• Has one side won?
• How many ply have
we already explored?
• How promising is
this path?
• How much time is
left?
• How stable is the
configuration?
• Thus a function
DEEP-ENOUGH is assumed to evaluate all of these factors and to return TRUE if
the search should be stopped at the current level and FALSE otherwise.
• One simple
implementaiton of DEEP-ENOUGH will take two parameters, Position and Depth. It
will ignore its position parameter and simply return true it is Depth parameter
exceeds a constant cut-off value.
Review
Questions
1.
Explain minimax procedure. Is this procedure a breadth first or depth first
search? (Refer section 5.11)
2.
Explain various strategies of game playing? (Refer sections 5.9 and 5.10)
Artificial Intelligence and Machine Learning: Unit I(e): Adversarial search : Tag: : Adversarial search - Artificial Intelligence and Machine Learning - Stochastic Games
Artificial Intelligence and Machine Learning
CS3491 4th Semester CSE/ECE Dept | 2021 Regulation | 4th Semester CSE/ECE Dept 2021 Regulation