Welcome, AI Traineree!

As you may have noticed, current version of the documentation is rather modest. True. But be patient, young one, as this is only the beginning. Hopefully.

It is always a motivational boost and a driver to see that someone is using the project. Feel free to let me know if you have any questions or anything, and I’ll sure try to help.

Getting started

What is this?

Have you heard about DeepMind or recent advancments in the Artificial Intelligence like beating Go game, StarCraft 2 or Dota2? The AI Traineree is almost the same. Almost in the sense that it’s unlikely to achieve the same results and those algorithms aren’t provided (yet) but at least we use the same terminology. That’s something, right?

AI Traineree is a collection of (some) Reinforcement Learning algorithms. The emphasis is on the Deep part, as in Deel Learning, but there are/will be some of more traditional algorithms. Yes, we are fully aware that there are already some excelent packages which provide similar code, however, we think we still provide some value especially in:

  • Multi agents.

    The goal is to focus on multi agent environments and algorithms. It might be a bit modest right now but that’s simply because we want to establish a baseline.

  • Implementation philosophy.

    Many look-alike packages have the tendency to pass environment as an input to agent’s instance. We consider this a big no-no. The agent lives in the environment, it lives thanks to the environment. Such distinction already makes algorithms’ implementations different.

Installation

Currently the only way to install the package is to download and install it from the GitHub repository, i.e. https://github.com/laszukdawid/ai-traineree.

Assuming that this isn’t your first git project, the steps are:

$ git clone https://github.com/laszukdawid/ai-traineree.git
$ cd ai-traineree
$ python setup.py install

Issues or questions

Is there something that doesn’t work, or you don’t know if it should, or simply have a question? The best way is to create a github issue (https://github.com/laszukdawid/ai-traineree/issues).

Public tickets are really the best way. If something isn’t obvious then it means that others must have the same question. Be a friend and help them discover the answer.

In case you want some questions or offers that would like to ask in private then feel free to reach me at ai-traineree@dawid.lasz.uk .

Citing

If you found this project useful and would like to cite then we suggest the following BibTeX format.

@misc{ai-traineree,
    author = {Laszuk, Dawid},
    title = {AI Traineree: Reinforcement learning toolset},
    year = {2020},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/laszukdawid/ai-traineree}},
}

Examples

Single agent

DQN on CartPole

This example uses the CartPole environment provided by the OpenAI Gym. If you don’t have the Gym then you can install it either through pip install gym.

from ai_traineree.agents.dqn import DQNAgent
from ai_traineree.runners.env_runner import EnvRunner
from ai_traineree.tasks import GymTask

task = GymTask('CartPole-v1')
agent = DQNAgent(task.obs_space, task.action_space, n_steps=5)
env_runner = EnvRunner(task, agent)

# Learning
scores = env_runner.run(reward_goal=100, max_episodes=300, force_new=True)

# Check what we have learned by rendering
env_runner.interact_episode(render=True)

Multi agent

IQL on Prison

This example uses the Prison environment provided by the PettingZoo. The Prison is simple environment where all agents are independent with a simple task alternatively touch walls. To install the environment execute pip install pettingzoo[butterfly].

from ai_traineree.multi_agent.iql import IQLAgents
from ai_traineree.runners.multiagent_env_runner import MultiAgentCycleEnvRunner
from ai_traineree.tasks import PettingZooTask
from pettingzoo.butterfly import prison_v2 as prison

env = prison.env(vector_observation=True)
task = PettingZooTask(env)
task.reset()

config = {
    'device': 'cpu',
    'update_freq': 10,
    'batch_size': 200,
    'agent_names': env.agents,
}
agents = IQLAgents(task.obs_space, task.action_space, task.num_agents, **config)

env_runner = MultiAgentCycleEnvRunner(task, agents, max_iterations=9000, data_logger=data_logger)
scores = env_runner.run(reward_goal=20, max_episodes=50, eps_decay=0.95, log_episode_freq=1, force_new=True)

More examples

Here are only some selected examples. There are many more examples provided in the repository as individual files. There is examples directory or directly here https://github.com/laszukdawid/ai-traineree/tree/master/examples.

The easiest way to run them is to checkout git package and install it (see note below). Examples can be run as modules from the root directory, i.e. directory with setup.cfg file. To run cart_dqn example execute:

$ python -m examples.cart_dqn

Note

Examples use some libraries that aren’t provided in the default package installation. To install all necessary packages make sure to install AI Traineree with [examples] conditions. If you are using pip to install packages then you should use pip install -e .[examples].

Agents

DQN

Rainbow

PPO

DDPG

D3PG

D4PG

SAC

Multi agents

Usage of “agents” in this case could be a bit misleading. Here are entitites or algorithms that understand how to organize internal agents to get better in interacting with the environment.

The distinction between these and many individual is that some interaction between agents is assumed. It isn’t a single agent that tries to do something in the environment and could consider other agents as part of the environment. Typical cases for multi agents is when they need to achieve a common goal. Consider cooperative games like not letting a ball fall on the ground, or team sports where one team tries to capture a flag and the other tries to stop them.

MADDPG

IQL

Buffers

Basis

This class is the abstraction for all buffers. In short, each buffer should support adding new samples and sampling from the buffer. Additional classes are required for saving or resuming the whole state. All buffers internally store data in as Experience but on sampling these are converted into torch Tensors or numpy arrays.

class ai_traineree.types.experience.Experience(**kwargs)

Basic data unit to hold information.

It typically represents a one whole cycle of observation - action - reward. Data type used to store experiences in experience buffers.

Replay Buffer

The most basic buffer. Supports uniform sampling.

Replay Experience Buffer (PER)

Rollout Buffer

Policies

Networks

Networks are dividied depending on their context. For some reason it’s often to find convetion of heads and bodies, and that’s why we’re keeping it here. If you haven’t heard of these before think about the Frankeinstain monster. Body is not a whole body but rather a body part, e.g. arms and legs. Obviously(!), they don’t work by themselves so you need a head which will control them. Some heads take body parts explicitly and build the whole monstrocity and some heads are predefined to closely match suggestion in a paper. So, in general, a head is more complex and does more than a body, but for some agents a single body part, e.g. Fully connected network, is good enough.

Bodies

Heads

Environment Runners

Single agent

Multi agent

Tasks

In short, a Task is a bit more than environment. Task takes an environment, e.g. CartPole, as an input but it also handles state transformation and reward shaping. A Task also aims to be compatible with OpenAI Gym’s API. Some environments aren’t compatible and so we need to make them.

Development

Philosophy

  • Agents are independent from environment. No interaction is forced.

  • All agents should have the same concise APIs.

  • Modular components but simplicity over forced modularity.

Concepts

State vs Observation vs Features

State is an objective information about the enviroment. It is from external entity’s point of view. Access to states isn’t guaranteed even if one has full control over the environment.

Observation is from agent’s perspective. Its domain is defined by agent’s senses and values depend on agent’s state, e.g. position.

Features are context dependent but generally relate to some output of a transformation. We can transform observation to a different space, e.g. projecting camera RGB image into an embedding vector, or modify values, e.g. normalize tensor.

Example: Considering basketball game as an environment. A spectator is the one who might have access to the state information. In this case, a state would consist of all players’ positions, ball possession, time and score. However, decide being able to see everything they wouldn’t know whether any player is feeling bad or some places on the field have draft. An agent, in this situation, is a single player. Everything that they see and know is their observation. Although they might be able to deduce position of all players, it will often happen that some players will be behind others or they will be looking in a different direction. Similar to spectator they don’t know about other players stamina levels but they know theirs which also has an impact on the play. Their physical state and internal thoughts are features.

Code-wise, state is the output of enviroment. Observation is what an agent can get and deals with on input. Feature is anything that goes through any transfomrations, e.g. bodies and heads. A specific case of a feature is an action.

Indices and tables