Environment Runners¶

Single agent¶

class ai_traineree.runners.env_runner.EnvRunner(task: ai_traineree.types.task.TaskType, agent: ai_traineree.agents.AgentBase, max_iterations: int = 100000, **kwargs)¶

EnvRunner, short for Environment Runner, is meant to be used as module that runs your experiments. It’s expected that the environments are wrapped in a Task which has typical step and act methods. The agent can be any agent which makes sense as there aren’t any checks like whether the output is discrete.

Examples

>>> env_runner = EnvRunner(task, agent)
>>> env_runner.run()

__init__(task: ai_traineree.types.task.TaskType, agent: ai_traineree.agents.AgentBase, max_iterations: int = 100000, **kwargs)¶

Expects the environment to come as the TaskType and the agent as the AgentBase.

Keyword Arguments

window_len (int) – Length of the score averaging window. Default 50.
data_logger – An instance of Data Logger, e.g. TensorboardLogger.
logger_level – Logging level. Default: logging.INFO.

info(**kwargs)¶: Writes out current state into provided loggers. Writting to stdout is done through Python’s logger, whereas all metrics are supposed to be handled via DataLogger. Currently supported are Tensorboard and Neptune (neptune.ai). To use one of these data_logger is expected.

interact_episode(train: bool = False, eps: float = 0, max_iterations: Optional[int] = None, render: bool = False, render_gif: bool = False, log_interaction_freq: Optional[int] = 10, full_log_interaction_freq: Optional[int] = 1000) → Tuple[Union[int, float], int]¶

load_state(file_prefix: str)¶: Loads state with the highest episode value for given agent and environment.

log_data_interaction(**kwargs)¶

log_episode_metrics(**kwargs)¶: Uses DataLogger, e.g. TensorboardLogger, to store env metrics.

log_logger(**kwargs)¶: Writes out env logs via logger (either stdout or a file).

logger = <Logger EnvRunner (INFO)>¶

reset()¶: Resets the EnvRunner. The task env and the agent are preserved.

run(reward_goal: float = 100.0, max_episodes: int = 2000, test_every: int = 10, eps_start: float = 1.0, eps_end: float = 0.01, eps_decay: float = 0.995, log_episode_freq: int = 1, log_interaction_freq: int = 10, gif_every_episodes: Optional[int] = None, checkpoint_every: Optional[int] = 200, force_new: bool = False) → List[float]¶

Evaluates the agent in the environment. The evaluation will stop when the agent reaches the reward_goal in the averaged last self.window_len, or when the number of episodes reaches the max_episodes.

To help debugging one can set the gif_every_episodes to a positive integer which relates to how often a gif of the episode evaluation is written to the disk.

Every checkpoint_every (default: 200) iterations the Runner will store current state of the runner and the agent. These states can be used to resume previous run. By default the runner checks whether there is ongoing run for the combination of the environment and the agent.

Parameters

reward_goal – Goal to achieve on the average reward.
max_episode – After how many episodes to stop regardless of the score.
test_every – Number of episodes between agent test run (without learning). Default: 10.
eps_start – Epsilon-greedy starting value.
eps_end – Epislon-greeedy lowest value.
eps_decay – Epislon-greedy decay value, eps[i+1] = eps[i] * eps_decay.
log_episode_freq – Number of episodes between state logging.
gif_every_episodes – Number of episodes between storing last episode as a gif.
checkpoint_every – Number of episodes between storing the whole state, so that in case of failure it can be safely resumed from it.
force_new – Flag whether to resume from previously stored state (False), or to start learning from a clean state (True).

Returns

All obtained scores from all episodes.

save_state(state_name: str)¶

Saves the current state of the runner and the agent.

Files are stored with appended episode number. Agents are saved with their internal saving mechanism.

seed(seed)¶

Multi agent¶

class ai_traineree.runners.multiagent_env_runner.MultiAgentCycleEnvRunner(task: ai_traineree.tasks.PettingZooTask, multi_agent: ai_traineree.types.agent.MultiAgentType, mode: str = 'coop', max_iterations: int = 100000, **kwargs)¶

MultiAgentCycleEnvRunner has the same purpose as the EnvRunner but specifically for environments that support multiple agents. It’s expected that the environments are wrapped in a Task which has typical step and act methods. The agent can be any agent which makes sense as there aren’t any checks like whether the output is discrete.

Examples

>>> ma_env_runner = MultiAgentCycleEnvRunner(task, agent)
>>> ma_env_runner.run()

__init__(task: ai_traineree.tasks.PettingZooTask, multi_agent: ai_traineree.types.agent.MultiAgentType, mode: str = 'coop', max_iterations: int = 100000, **kwargs)¶

Expects the environment to come as the TaskType and the agent as the MultiAgentBase.

Parameters

task – An OpenAI gym API compatible task.
multi_agent – An instance which handles interations between multiple agents.
mode – Type of interaction between agents. Currently supported only coop which means that the reward is cumulative for all agents.
max_iterations – How many iterations can one episode have.

Keyword arguments:

Keyword Arguments

window_len (int) – Length of the averaging window for average reward. Default: 100.
data_logger – An instance of Data Logger, e.g. TensorboardLogger. Default: None.
state_dir (str) – Dir path where states are stored. Default: run_states.
debug_log (bool) – Whether to produce extensive logging. Default: False.

info(**kwargs)¶: Writes out current state into provided loggers. Writting to stdout is done through Python’s logger, whereas all metrics are supposed to be handled via DataLogger. Currently supported are Tensorboard and Neptune (neptune.ai). To use one of these data_logger is expected.

interact_episode(eps: float = 0, max_iterations: Optional[int] = None, render: bool = False, render_gif: bool = False, log_interaction_freq: Optional[int] = None) → Tuple[Dict[str, Union[int, float]], int]¶

load_state(file_prefix: str)¶: Loads state with the highest episode value for given agent and environment.

log_data_interaction(**kwargs)¶

log_episode_metrics(**kwargs)¶: Uses data_logger, e.g. Tensorboard, to store env metrics.

log_logger(**kwargs)¶: Writes out env logs via logger (either stdout or a file).

logger = <Logger MAEnvRunner (INFO)>¶

reset() → None¶: Resets instance. Preserves everything about task and agent.

run(reward_goal: float = 100.0, max_episodes: int = 2000, eps_start=1.0, eps_end=0.01, eps_decay=0.995, log_episode_freq=1, gif_every_episodes: Optional[int] = None, checkpoint_every=200, force_new=False) → List[Dict[str, Union[int, float]]]¶

Evaluates the Multi Agent in the environment. The evaluation will stop when the agent reaches the reward_goal in the averaged last self.window_len, or when the number of episodes reaches the max_episodes.

To help debugging one can set the gif_every_episodes to a positive integer which relates to how often a gif of the episode evaluation is written to the disk.

Every checkpoint_every (default: 200) iterations the Runner will store current state of the runner and the agent. These states can be used to resume previous run. By default the runner checks whether there is an ongoing run for the combination of the environment and the agent.

save_state(state_name: str)¶

Saves the current state of the runner and the multi_agent.

Files are stored with appended episode number. Agents are saved with their internal saving mechanism.

seed(seed: int) → None¶: Sets provided seed to multi agent and task.

class ai_traineree.runners.multiagent_env_runner.MultiAgentEnvRunner(task: ai_traineree.types.task.MultiAgentTaskType, multi_agent: ai_traineree.types.agent.MultiAgentType, mode: str = 'coop', max_iterations: int = 100000, **kwargs)¶

MultiAgentEnvRunner has the same purpose as the EnvRunner but specifically for environments that support multiple agents. It’s expected that the environments are wrapped in a Task which has typical step and act methods. The agent can be any agent which makes sense as there aren’t any checks like whether the output is discrete.

Example

>>> ma_env_runner = MultiAgentEnvRunner(task, agent)
>>> ma_env_runner.run()

__init__(task: ai_traineree.types.task.MultiAgentTaskType, multi_agent: ai_traineree.types.agent.MultiAgentType, mode: str = 'coop', max_iterations: int = 100000, **kwargs)¶

Expects the environment to come as the TaskType and the agent as the MultiAgentBase.

Parameters

task – An OpenAI gym API compatible task.
multi_agent – An instance which handles interations between multiple agents.
mode – Type of interaction between agents. Currently supported only coop which means that the reward is cumulative for all agents.
max_iterations – How many iterations can one episode have.

Keyword Arguments

window_len (int) – Length of the averaging window for average reward.
data_logger – An instance of Data Logger, e.g. TensorboardLogger.
state_dir (str) – Dir path where states are stored. Default: run_states.
debug_log (bool) – Whether to produce extensive logging. Default: False.

info(**kwargs)¶: Writes out current state into provided loggers. Writting to stdout is done through Python’s logger, whereas all metrics are supposed to be handled via DataLogger. Currently supported are Tensorboard and Neptune (neptune.ai). To use one of these data_logger is expected.

interact_episode(eps: float = 0, max_iterations: Optional[int] = None, render: bool = False, render_gif: bool = False, log_interaction_freq: Optional[int] = None) → Tuple[List[Union[int, float]], int]¶

load_state(state_prefix: str)¶: Loads state with the highest episode value for given agent and environment.

log_data_interaction(**kwargs)¶

log_episode_metrics(**kwargs)¶: Uses data_logger, e.g. Tensorboard, to store env metrics.

log_logger(**kwargs)¶: Writes out env logs via logger (either stdout or a file).

reset()¶: Resets the EnvRunner. The task env and the agent are preserved.

run(reward_goal: float = 100.0, max_episodes: int = 2000, eps_start=1.0, eps_end=0.01, eps_decay=0.995, log_episode_freq=1, gif_every_episodes: Optional[int] = None, checkpoint_every=200, force_new=False) → List[List[Union[int, float]]]¶

Evaluates the multi_agent in the environment. The evaluation will stop when the agent reaches the reward_goal in the averaged last self.window_len, or when the number of episodes reaches the max_episodes.

To help debugging one can set the gif_every_episodes to a positive integer which relates to how often a gif of the episode evaluation is written to the disk.

Every checkpoint_every (default: 200) iterations the Runner will store current state of the runner and the agent. These states can be used to resume previous run. By default the runner checks whether there is an ongoing run for the combination of the environment and the agent.

save_state(state_name: str)¶

Saves the current state of the runner and the multi_agent.

Files are stored with appended episode number. Agents are saved with their internal saving mechanism.

seed(seed: int)¶