Multi agents¶

Usage of “agents” in this case could be a bit misleading. Here are entitites or algorithms that understand how to organize internal agents to get better in interacting with the environment.

The distinction between these and many individual is that some interaction between agents is assumed. It isn’t a single agent that tries to do something in the environment and could consider other agents as part of the environment. Typical cases for multi agents is when they need to achieve a common goal. Consider cooperative games like not letting a ball fall on the ground, or team sports where one team tries to capture a flag and the other tries to stop them.

MADDPG¶

class ai_traineree.multi_agents.maddpg.MADDPGAgent(obs_space: ai_traineree.types.dataspace.DataSpace, action_space: ai_traineree.types.dataspace.DataSpace, num_agents: int, **kwargs)¶

__init__(obs_space: ai_traineree.types.dataspace.DataSpace, action_space: ai_traineree.types.dataspace.DataSpace, num_agents: int, **kwargs)¶

Initiation of the Multi Agent DDPG.

All keywords are also passed to DDPG agents.

Parameters

obs_size (int) – Dimensionality of the state.
action_size (int) – Dimensionality of the action.
num_agents (int) – Number of agents.

Keyword Arguments

hidden_layers (tuple of ints) – Shape for fully connected hidden layers.
noise_scale (float) – Default: 1.0. Noise amplitude.
noise_sigma (float) – Default: 0.5. Noise variance.
actor_lr (float) – Default: 0.001. Learning rate for actor network.
critic_lr (float) – Default: 0.001. Learning rate for critic network.
gamma (float) – Default: 0.99. Discount value
tau (float) – Default: 0.02. Soft copy value.
gradient_clip (optional float) – Max norm for learning gradient. If None then no clip.
batch_size (int) – Number of samples per learning.
buffer_size (int) – Number of previous samples to remember.
warm_up (int) – Number of samples to see before start learning.
update_freq (int) – How many samples between learning sessions.
number_updates (int) – How many learning cycles per learning session.

act(agent_name: str, experience: ai_traineree.types.experience.Experience, noise: float = 0.0) → ai_traineree.types.experience.Experience¶

Get actions from all agents. Synchronized action.

Parameters

states – List of states per agent. Positions need to be consistent.
noise – Scale for the noise to include

Returns

List of actions that each agent wants to perform

Return type

actions

action_space: ai_traineree.types.dataspace.DataSpace¶

agent_names: List[str]¶

agents: List[ai_traineree.types.agent.AgentType]¶

commit()¶

get_state() → Dict[str, dict]¶: Returns agents’ internal states

learn(experiences, agent_name: str) → None¶: update the critics and actors of all the agents

load_state(*, path: Optional[str] = None, agent_state: Optional[dict] = None) → None¶

Loads the state into the Multi Agent.

The state can be provided either via path to a file that contains the state, see save_state, or direclty via state.

Parameters

path – (str) A path where the state was saved via save_state.
state – (dict) Already loaded state kept in memory.

log_metrics(data_logger: ai_traineree.loggers.data_logger.DataLogger, step: int, full_log: bool = False)¶

property loss: Dict[str, float]¶

model: str = 'MADDPG'¶

num_agents: int¶

obs_space: ai_traineree.types.dataspace.DataSpace¶

reset()¶

reset_agents()¶

save_state(path: str)¶

Saves current state of the Multi Agent instance and all related agents.

All states are stored via PyTorch’s save function.

Parameters: path – (str) String path to a location where the state is store.

seed(seed: int) → None¶

state_dict() → Dict[str, Any]¶: Returns description of all agent’s components.

step(agent_name: str, experience: ai_traineree.types.experience.Experience) → None¶

update_targets()¶: soft update targets

IQL¶

class ai_traineree.multi_agents.iql.IQLAgents(obs_space: ai_traineree.types.dataspace.DataSpace, action_space: ai_traineree.types.dataspace.DataSpace, num_agents: int, **kwargs)¶

__init__(obs_space: ai_traineree.types.dataspace.DataSpace, action_space: ai_traineree.types.dataspace.DataSpace, num_agents: int, **kwargs)¶

Independent Q-Learning

A set of independent Q-Learning agents (DQN implementation) that are organized to work as an Multi Agent agent. These agents have defaults as per DQNAgent class. All keyword paramters are passed to each agent.

Parameters

obs_space (int) – Dimensionality of the state.
action_size (int) – Dimensionality of the action.
num_agents (int) – Number of agents.

Keyword Arguments

hidden_layers (tuple of ints) – Shape for fully connected hidden layers.
noise_scale (float) – Default: 1.0. Noise amplitude.
noise_sigma (float) – Default: 0.5. Noise variance.
actor_lr (float) – Default: 0.001. Learning rate for actor network.
gamma (float) – Default: 0.99. Discount value
tau (float) – Default: 0.02. Soft copy value.
gradient_clip (optional float) – Max norm for learning gradient. If None then no clip.
batch_size (int) – Number of samples per learning.
buffer_size (int) – Number of previous samples to remember.
warm_up (int) – Number of samples to see before start learning.
update_freq (int) – How many samples between learning sessions.
number_updates (int) – How many learning cycles per learning session.

act(agent_name: str, experience: ai_traineree.types.experience.Experience, noise: float = 0.0) → ai_traineree.types.experience.Experience¶

action_space: ai_traineree.types.dataspace.DataSpace¶

agent_names: List[str]¶

agents: List[ai_traineree.types.agent.AgentType]¶

commit() → None¶

This method does nothing.

Since all agents are completely independent there is no need for synchronizing them.

get_state()¶: Returns agents’ internal states

load_state(path: str)¶: Reads the whole agent state from a local file.

log_metrics(data_logger: ai_traineree.loggers.data_logger.DataLogger, step: int, full_log: bool = False)¶

property loss: Dict[str, float]¶

model: str = 'IQL'¶

num_agents: int¶

obs_space: ai_traineree.types.dataspace.DataSpace¶

reset() → None¶: Resets all agents’ states.

reset_agents()¶

save_state(path: str)¶: Saves the whole agent state into a local file.

seed(seed: int)¶

state_dict() → Dict[str, dict]¶: Returns description of all agent’s components.

step(agent_name: str, experience: ai_traineree.types.experience.Experience) → None¶