StarCraft II Learning environment full overview (IV)

gema.parreno.piqueras

5 min readOct 6, 2018

Theory : pySC2 environment

(Sound effect) Probe Protoss selected order unit quote

Overview

Find in this section a ripped version of pySC2 research published paper by DeepMind and Blizzard.

Starcraft II presents itself as a grand challenge for Reinforcement Learning for several reasons : The imperfect information of the game while playing forces to guess what the opponent is doing -curiously , it is Partially Observable Markov Decision Process due to fog of War- . Besides, the huge action space -combinatorial space of 10e8 possibilities — urges the need for hierarchy management, that might change as the tech player strategy evolves . The economy management . Real-time simultaneousness, with fast paced decisions and multitasking management presents the credit assignment problem as a necessity to be solved .

As the paper underlines, Starcraft II Learning environment offers a new environment for exploring DRL algos and architectures. Here we will explore and ripped some useful concepts about the environment that might be use to understand them . The full potential of this environment has yet to be seen , as it is a multi-agent environment at high and low level : several players compete for control over the map and resources , and ultimately win the match . In the low level, each player controls an army that must cooperate to achieve a common goal .

Starcraft II learning environment release consists of 3 subcomponents, 2 of those cited here :

SC II Api . Used for start a game, get observations, take actions and review replays .
pySC2 Using SC II Api , pySC2 is a Python environment that wraps the Starcarft II Api to ease the interaction between agents and Starcraft II . It defines action and observation specification

Game description and reward structure

The ultimate goal of the game is to win, so is the reward structure defined as win(+1) / tie(0) / loss(-1) . The ternary win/tie/loss is sparse enough to be accompanied by Blizzard score, which gives an intuition about the player´s development during the game — training units, building buildings and researching tech — and might, in some cases, be used as a reinforcement learning reward. In order to do so , the player should : accumulate resources, construct production buildings , amass an army and eliminate all of the opponent´s building . Other strategies might be used to win the game

Actions and observations

Observations

The games exposes observation from RGB pixels exposed as rgb_screen and rgb_minimap — view as a human would see — and in a Feature layers structure . Extra information beyond the screen is exposed as rgb_minimap Feature like the pySC2 interpreter . Inside features we must differentiate in between minimap feature layers and screen feature layers. One is looking at the screen and the other is looking at the top bottom left minimap .

Besides that, it offers as a structured information 10 Tensors with different data with relevant information for the player.

Feature Layer view, initialized with the game

Actions

Functions actions have been created to solve this problem availabe at pysc2.lib.actions . The actions are the output of the agent in every state t, that means, the decisions the agent must take in every time-step … PySC2 agent calls display available actions . Each action has its own argument or lists of arguments

Agent Neural Net architectures

In DQN , The Neural Networks acts as an approximation function to find the Q-values. In pySC2 research, DeepMind has tried several Neural Networks architectures , which include the Atari and CNN+LSTM[fig4] architectures. Both of them processes spatial -screen- and non-spatial -vector of features- .The difference among the two of them is that Atari combines the Spatial and non-spatial features into a flatten layer that process the agent output, unrolling it into a space action policy through coordinates ( x and y ) and a value that will take the action . In the Neural Net architecture proposed by pySC2 paper it combines the spatial and non-spatial features into a NN .

Full Conv Architecture

Results

Here we have the comparison over different mini-games from pySC2 paper , in which there is an overview of how you can use different neural net architectures . In the Y axis, we find the score of the game, in the X axis we find the training time . As we can see, FullyConv and Atari Net start giving best results than the FullyConv with memory in the case of Defeat Roaches mini-game. However, this paradigm changes in the long term and so does the performance of different architectures in different challenges. The graph shown above ilustrates how the long tail shows that the early performance seems to be better in certain kind of NN architectures but at the end it changes and shows the best performance on LSTM+CNN . In Defeat Roaches case, starting learning quicker doesn´t mean that there will be a better performance when the NN stabilizes itself .

What are mini-games?

Mini-games are bounded challenges of the game worth of be investigated in isolation. The purpose of those is to test a subset of actions or game mechanics with a clear reward structure . This maps must be configured as Markov Decision Processes , with a initial setup, timing and reward structure that encourages the agent to learn a desired behaviour and to avoid local optima . Reward structure might have the same structure as the full game ( win/tie/loss) . This mini-games are characterized by restricted action sets, custom reward functions and/or time limits . If you want to see more mini-games you can visit the community mini-games repository or build your own tutorial , transforming DefeatRoaches mini-game into a designed melee in your DefeatWhatever configuration.

Tackling the whole SC2 problem is really hard, so mini-games comes to solve a smaller problem. Some thoughts may led you to think that mini-games are only useful if the knowledge is portable of a larger AI game , but it really depends on your goal. There is still room for great mini-games to come up with interesting nested problems for both micro and macro in SC2.[8]

Let´s deep dive into the code

https://medium.com/@gema.parreno.piqueras/starcraft-ii-learning-environment-full-overview-v-34f5cb2a2220