An AI game player is nothing but an intelligent agent with a clear goal: to win the game and defeat all other players. Artificial Intelligence experiments have achieved surprising results when it comes to games. Today, no human can defeat an AI in the game of chess.
The game Go was the last game where human grandmasters could consistently defeat a computer player. However, in 2017, Google's game-playing AI defeated the Go grandmaster.
Intelligent Agents in Games
An intelligent agent plays according to the rules of the game. The agent can sense the current state of the game through its sensors and can evaluate the utility of potential steps. Once the agent finds the best possible step, it performs the action using its actuators. The agent finds the best possible action to reach the goal based on the information it has. Actions are either rewarded or punished. The carrot and stick are excellent examples of rewards and punishment. Imagine a donkey in front of your cart. You put a carrot in front of the eyes of the donkey, so the poor animal starts walking toward it. As soon as the donkey stops, the rider may apply punishment with a stick. This is not a human way of moving, but rewards and punishment control living organisms to some extent. The same happens to humans at school, at work, and in everyday life as well. Instead of carrots and sticks, we have income and legal punishment to shape our behavior.
In most games and gamified applications, a good sequence of actions results in a reward. When a human player feels rewarded, a hormone called dopamine is released. Dopamine is also referred to as the chemical of reward. When a human achieves a goal or completes a task, dopamine is released. This hormone makes you feel happy. Humans tend to act in a way that maximizes their happiness. This sequence of actions is called a compulsion loop. Intelligent agents, on the other hand, are only interested in their goal, which is to maximize their reward and minimize their punishment.
When modeling games, we must determine their state space. An action causes a state transition. When we explore the consequences of all possible actions, we get a decision tree. This tree goes deeper as we start exploring the possible future actions of all players until the game ends.
The strength of AI is the execution of millions of possible steps each second. Therefore, game AI often boils down to a search exercise. When exploring all of the possible sequences of moves in a game, we get the state tree of a game.
Consider a chess AI. What is the problem with evaluating all possible moves by building a state tree consisting of all of the possible sequences of moves?
Chess is an EXPTIME game complexity-wise. The number of possible moves explodes combinatorially.
White starts with 20 possible moves: the 8 pawns may move either one or two steps, and the two knights may move either up-up-left, or up-up-right. Then, black can make any of these twenty moves. There are already 20*20 = 400 possible combinations after just one move per player.
After the second move, we get 8,902 possible board constellations, and this number just keeps on growing. Just take seven moves, and you have to search through 10,921,506 possible constellations.
The average length of a chess game is approximately 40 moves. Some exceptional games take more than 200 moves to finish.
As a consequence, the computer player simply does not have time to explore the whole state space. Therefore, the search activity has to be guided with proper rewards, punishment, and simplifications of the rules.
Breadth First Search and Depth First Search
Creating a game AI is often a search exercise. Therefore, we need to be familiar with the two primary search techniques: Breadth First Search (BFS) and Depth First Search (DFS).
These search techniques are applied on a directed rooted tree. A tree is a data structure that has nodes, and edges connecting these nodes in such a way that any two nodes of the tree are connected by exactly one path:
Figure 1.3: A directed rooted tree
When the tree is rooted, there is a special node in the tree called the root, where we begin our traversal. A directed tree is a tree where the edges may only be traversed in one direction. Nodes may be internal nodes or leaves. Internal nodes have at least one edge through which we can leave the node. A leaf has no edges pointing out from the node.
In AI search, the root of the tree is the starting state. We traverse from this state by generating successor nodes of the search tree. Search techniques differ regarding which order we visit these successor nodes in.
Suppose we have a tree defined by its root, and a function that generates all the successor nodes from the root. In this example, each node has a value and a depth. We start from 1 and may either increase the value by 1 or 2. Our goal is to reach the value 5.
root = {'value': 1, 'depth': 1}
def succ(node):
if node['value'] == 5:
return []
elif node['value'] == 4:
return [{'value': 5,'depth': node['depth']+1}]
else:
return [
{'value': node['value']+1, 'depth':node['depth']+1},
{'value': node['value']+2, 'depth':node['depth']+1}
]
We will first perform DFS on this example:
def bfs_tree(node):
nodes_to_visit = [node]
visited_nodes = []
while len(nodes_to_visit) > 0:
current_node = nodes_to_visit.pop(0)
visited_bodes.append(current_node)
nodes_to_visit.extend(succ(current_node))
return visited_nodes
bfs_tree(root)
The output is as follows:
[{'depth': 1, 'value': 1},
{'depth': 2, 'value': 2},
{'depth': 2, 'value': 3},
{'depth': 3, 'value': 3},
{'depth': 3, 'value': 4},
{'depth': 3, 'value': 4},
{'depth': 3, 'value': 5},
{'depth': 4, 'value': 4},
{'depth': 4, 'value': 5},
{'depth': 4, 'value': 5},
{'depth': 4, 'value': 5},
{'depth': 5, 'value': 5}]
Notice that breadth first search finds the shortest path to a leaf first, because it enumerates all nodes in the order of increasing depth.
If we had to traverse a graph instead of a directed rooted tree, breadth first search would look different: whenever we visit a node, we would have to check whether the node had been visited before. If the node had been visited before, we would simply ignore it.
In this chapter, we only use Breadth First Traversal on trees. Depth First Search is surprisingly similar to Breadth First Search. The difference between Depth First Traversals and BFS is the sequence in which you access the nodes. While BFS visits all the children of a node before visiting any other nodes, DFS digs deep in the tree first:
def dfs_tree(node):
nodes_to_visit = [node]
visited_nodes = []
while len(nodes_to_visit) > 0:
current_node = nodes_to_visit.pop()
visited_nodes.append(current_node)
nodes_to_visit.extend(succ(current_node))
return visited_nodes
dfs_tree(root)
The output is as follows:
[{'depth': 1, 'value': 1},
{'depth': 2, 'value': 3},
{'depth': 3, 'value': 5},
{'depth': 3, 'value': 4},
{'depth': 4, 'value': 5},
{'depth': 2, 'value': 2},
{'depth': 3, 'value': 4},
{'depth': 4, 'value': 5},
{'depth': 3, 'value': 3},
{'depth': 4, 'value': 5},
{'depth': 4, 'value': 4},
{'depth': 5, 'value': 5}]
As you can see, the DFS algorithm digs deep fast. It does not necessarily find the shortest path first, but it is guaranteed to find a leaf before exploring a second path.
In game AI, the BFS algorithm is often better for the evaluation of game states, because DFS may get lost. Imagine starting a chess game, where a DFS algorithm may easily get lost in searching.
Exploring the State Space of a Game
Let's explore the state space of a simple game: Tic-Tac-Toe.
In Tic-Tac-Toe, a 3x3 game board is given. Two players play this game. One plays with the sign X, and the other plays with the sign O. X starts the game, and each player makes a move after the other. The goal of the game is to get three of your own signs horizontally, vertically, or diagonally.
Let's denote the cells of the Tic-Tac-Toe board as follows:
Figure 1.4: Tic-Tac-Toe Board
In the following example, X started at position 1. O retaliated at position 5, X made a move at position 9, and then O moved to position 3:
Figure 1.5: Tic-Tac-Toe Board with noughts and crosses
This was a mistake by the second player, because now X is forced to place a sign on cell 7, creating two future scenarios for winning the game. It does not matter whether O defends by moving to cell 4 or 8 – X will win the game by selecting the other unoccupied cell.
Note
You can try out the game at http://www.half-real.net/tictactoe/.
For simplicity, we will only explore the state space belonging to the cases when the AI player starts. We will start with an AI player that plays randomly, placing a sign in an empty cell. After playing with this AI player, we will create a complete decision tree. Once we generate all possible game states, you will experience their combinatoric explosion. As our goal is to make these complexities simple, we will use several different techniques to make the AI player smarter, and to reduce the size of the decision tree. By the end of this experiment, we will have a decision tree that has less than 200 different game endings, and as a bonus, the AI player will never lose a single game.
To make a random move, you will have to know how to choose a random element from a list using Python. We will use the choice
function of the random library:
from random import choice
choice([2, 4, 6, 8])
The output is 6
.
The output of the choice function is a random element of the list.
Note
We will use the factorial notation in the following exercise. Factorial is denoted by the "!" exclamation mark. By definition, 0! = 1, and n! = n*(n-1)!. In our example, 9! = 9* 8! = 9*8*7! = … = 9*8*7*6*5*4*3*2*1.
Exercise 2: Estimating the Number of Possible States in Tic-Tac-Toe Game
Make a rough estimate of the number of possible states on each level of the state space of the Tic-Tac-Toe game:
- In our estimation, we will not stop until all of the cells of the board have been filled. A player might win before the game ends, but for the sake of uniformity, we will continue the game.
- The first player will choose one of the nine cells. The second player will choose one out of the eight remaining cells. The first player can then choose one out of the seven remaining cells. This goes on until either player wins the game, or the first player is forced to make the ninth and last move.
- The number of possible decision sequences are therefore 9! = 362880. A few of these sequences are invalid, because a player may win the game in less than nine moves. It takes at least five moves to win a game, because the first player needs to move three times.
- To calculate the exact size of the state space, we would have to calculate the number of games that are won in five, six, seven, and eight steps. This calculation is simple, but due to its exhaustive nature, it is out of scope for us. We will therefore settle for the magnitude of the state space.
Note
After generating all possible Tic-Tac-Toe games, researchers counted 255,168 possible games. Out of those games, 131,184 were won by the first player, 77,904 were won by the second player, and 46,080 games ended with a draw. Visit http://www.half-real.net/tictactoe/allgamesoftictactoe.zip to download all possible Tic-Tac-Toe games.
Even a simple game like Tic-Tac-Toe has a lot of states. Just imagine how hard it would be to start exploring all possible chess games. Therefore, we can conclude that brute-force search is rarely ideal.
Exercise 3: Creating an AI Randomly
In this section, we'll create a framework for the Tic-Tac-Toe game for experimentation. We will be modelling the game on the assumption that the AI player always starts the game. Create a function that prints your internal representation and allow your opponent to enter a move randomly. Determine whether a player has won. To ensure that this happens correctly, you will need to have completed the previous exercises:
- We will import the choice function from the
random
library:from random import choice
- We will model the nine cells in a simple string for simplicity. A nine-character long Python string stores these cells in the following order: "
123456789
". Let's determine the index triples that must contain matching signs so that a player wins the game:combo_indices = [
[0, 1, 2],
[3, 4, 5],
[6, 7, 8],
[0, 3, 6],
[1, 4, 7],
[2, 5, 8],
[0, 4, 8],
[2, 4, 6]
]
- Let's define the sign constants for empty cells, the AI, and the opponent player:
EMPTY_SIGN = '.'
AI_SIGN = 'X'
OPPONENT_SIGN = 'O'
- Let's create a function that prints a board. We will add an empty row before and after the board so that we can easily read the game state:
def print_board(board):
print(" ")
print(' '.join(board[:3]))
print(' '.join(board[3:6]))
print(' '.join(board[6:]))
print(" ")
- We will describe a move of the human player. The input arguments are the boards, the row numbers from 1 to 3, and the column numbers from 1 to 3. The return value of this function is a board containing the new move:
def opponent_move(board, row, column):
index = 3 * (row - 1) + (column - 1)
if board[index] == EMPTY_SIGN:
return board[:index] + OPPONENT_SIGN + board[index+1:]
return board
- It is time to define a random move of the AI player. We will generate all possible moves with the
all_moves_from_board
function, and then we will select a random move from the list of possible moves:def all_moves_from_board_list(board, sign):
move_list = []
for i, v in enumerate(board):
if v == EMPTY_SIGN:
move_list.append(board[:i] + sign + board[i+1:])
return move_list
def ai_move(board):
return choice(all_moves_from_board(board, AI_SIGN))
- After defining the moves, we have to determine whether a player has won the game:
def game_won_by(board):
for index in combo_indices:
if board[index[0]] == board[index[1]] == board[index[2]] != EMPTY_SIGN:
return board[index[0]]
return EMPTY_SIGN
- Last, but not least, we will create a game loop so that we can test the interaction between the computer player and the human player. Although we will carry out an exhaustive search in the following examples:
def game_loop():
board = EMPTY_SIGN * 9
empty_cell_count = 9
is_game_ended = False
while empty_cell_count > 0 and not is_game_ended:
if empty_cell_count % 2 == 1:
board = ai_move(board)
else:
row = int(input('Enter row: '))
col = int(input('Enter column: '))
board = opponent_move(board, row, col)
print_board(board)
is_game_ended = game_won_by(board) != EMPTY_SIGN
empty_cell_count = sum(
1 for cell in board if cell == EMPTY_SIGN
)
print('Game has been ended.')
- Use the
game_loop
function to run the game:game_loop()
As you can see, even an opponent who's playing randomly may win from time to time if their opponent makes a mistake.
Activity 1: Generating All Possible Sequences of Steps in a Tic-Tac-Toe Game
This activity will explore the combinatoric explosion that is possible when two players play randomly. We will be using a program, building on the previous results, that generates all possible sequences of moves between a computer player and a human player. Assume that the human player may make any possible move. In this example, given that the computer player is playing randomly, we will examine the wins, losses, and draws belonging to two randomly playing players:
- Create a function that maps the
all_moves_from_board
function on each element of a list of board spaces/squares. This way, we will have all of the nodes of a decision tree.
- The decision tree starts with
[ EMPTY_SIGN * 9 ]
, and expands after each move. Let's create a filter_wins
function that takes finished games out of the list of moves and appends them in an array containing the board states won by the AI player and the opponent player:
- Then, with a
count_possibilities
function that prints the number of decision tree leaves that ended with a draw, were won by the first player, and were won by the second player.
- We have up to 9 steps in each state. In the 0th, 2nd, 4th, 6th, and 8th iteration, the AI player moves. In all other iterations, the opponent moves. We create all possible moves in all steps and take out finished games from the move list.
- Then, execute the number of possibilities to experience the combinatoric explosion.
As you can see, the tree of board states consists of 266,073 leaves. The count_possibilities
function essentially implements a BFS algorithm to traverse all the possible states of the game. Notice that we count these states multiple times because placing an X in the top-right corner on step 1 and placing an X in the top-left corner on step 3 leads to similar possible states as starting with the top-left corner and then placing an X in the top-right corner. If we implemented the detection of duplicate states, we would have to check fewer nodes. However, at this stage, due to the limited depth of the game, we'll omit this step.
A decision tree is very similar to the data structure examined by count_possibilities
. In a decision tree, we explore the utility of each move by investigating all possible future steps up to a certain extent. In our example, we could calculate the utility of the first moves by observing the number of wins and losses after fixing the first few moves.
Note
The root of the tree is the initial state. An internal state of the tree is a state in which a game has not been ended and moves are possible. A leaf of the tree contains a state where a game has ended.
The solution for this activity can be found on page 258.