StarCraft II Learning environment full overview (VI)

gema.parreno.piqueras
3 min readOct 6, 2018

Lab : Jumping into the machine learning agent

“Willingly.” Mothership Protoss selected order unit quote

Into the machine learning agent

Here is presented an overview of the agent´s code : an informal walkthrough and descriptive inspection of the code, in order to help others to understand and improve the implementation . In a functional overview, the code is making the following steps.

  • Import statements from libraries : pySC2, keras and keras-rl
  • Load actions from the API
  • Configure flags and parameters
  • Configure processor with observations and batches
  • Define the environment
  • Agent model DNN Architecture
  • Process of training the game

Agent Overview and DNN Architecture

We understand the agent as the Learner , the decision maker . There might be many different agents that can be used for this challenge The goal of the agent is to learn a policy -control strategy- that maximizes the expected return -cumulative, discounted reward-. The agent uses knowledge of state transitions, of the form (st, at, st+1, rt+1) in order to learn and improve its policy . In DQN , we use a Neural Network as a function approximator for the Q-values.

Deep Q Learning is a model-free off policy algorithm that will define the agent. If you want to know more about the basis of the agent , jump into The Step 3 and read what is Deep Reinforcement learning?

The agent is constructed looking at the feature layers with a CNN-LSTM network. This network is an LSTM architecture specifically designed for sequence prediction problems with inputs, like images or videos. This architecture involves using Convolutional Neural Networks(CNN) layers for feature extraction on input data combined with LSTMs to support sequence prediction .

In a general approach, this neural net architecture has been used for Activity Recognition, Image and video description

def neural_network_model(input, actions):
model = Sequential()
# Define CNN model
print(input)
model.add(Conv2D(64, kernel_size=(5, 5), input_shape=input))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None))
model.add(Dropout(0.3))
model.add(Conv2D(128, kernel_size=(3, 3), input_shape=input))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None))
model.add(Dropout(0.3))
model.add(Conv2D(256, kernel_size=(3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Reshape((1, 256)))
# Add some memory
model.add(LSTM(256))
model.add(Dense(actions, activation='softmax'))
model.summary()
model.compile(loss="categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
return model

The Neural Network architecture is made of 3 conv2D layers of different kernel_sizes with dropout , and add it a last layer of LSTM .

Policy and Agent

# Policy
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr="eps", value_max=1, value_min=0.2, value_test=.0,
nb_steps=1e6)

The policy helps with the selection of action to take on an environment. The linear annealing policy computes a current threshold value and transfers it to an inner policy which chooses the action. The threshold value is following a linear function decreasing over time. In this case we use eps-greedy action selection, which means that a random action is selected with probability eps.The value_max and value_min threshold settled regulates how the agent explores the environment and then gradually sticks to what it knows . []

# Agent
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, enable_double_dqn=True,enable_dueling_network=True,
nb_steps_warmup=500, target_model_update=1e-2,policy=policy,
batch_size=150,processor=processor,delta_clip=1)
dqn.compile(Adam(lr=.001), metrics=["mae", "acc"])

The keras-rl DQNAgent class that calls the agent The model refers to the Neural Network coded above , so if you change the model, you can have different neural networks as an approximation function, the nb_actions take the actions available for the agent, that are printed when you run the agent in the console. The target_model_update and delta_clip parameters related to optimization and stable learning of Deep Reinforcement learning: target model update will tell us how oftenly the weights will be transferred to the memory. Delta_clip parameter refers to Huber loss implementation.

Another callbacks like Tensorboard are added in order to visualize the learning.

DQN Algorithm has its own challenges . In keras-rl library you can implement in a straightforward way Replay memory, target Network and Huber loss by hyperparameters. For further evolutions and variations of DQN, you can study Double Deep Q Learning, Dueling Q-Learning and Rainbow algorithm

Congratulations!

Now you have passed though the agent and the main concepts of DRL

Thanks for being around!

Click bellow for going into the next step

https://medium.com/@gema.parreno.piqueras/starcraft-ii-learning-environment-full-overview-vii-8b0775c1f3a7

--

--