Multi agents

Usage of “agents” in this case could be a bit misleading. Here are entitites or algorithms that understand how to organize internal agents to get better in interacting with the environment.

The distinction between these and many individual is that some interaction between agents is assumed. It isn’t a single agent that tries to do something in the environment and could consider other agents as part of the environment. Typical cases for multi agents is when they need to achieve a common goal. Consider cooperative games like not letting a ball fall on the ground, or team sports where one team tries to capture a flag and the other tries to stop them.

MADDPG

class ai_traineree.multi_agents.maddpg.MADDPGAgent(obs_space: ai_traineree.types.dataspace.DataSpace, action_space: ai_traineree.types.dataspace.DataSpace, num_agents: int, **kwargs)
__init__(obs_space: ai_traineree.types.dataspace.DataSpace, action_space: ai_traineree.types.dataspace.DataSpace, num_agents: int, **kwargs)

Initiation of the Multi Agent DDPG.

All keywords are also passed to DDPG agents.

Parameters
  • obs_size (int) – Dimensionality of the state.

  • action_size (int) – Dimensionality of the action.

  • num_agents (int) – Number of agents.

Keyword Arguments
  • hidden_layers (tuple of ints) – Shape for fully connected hidden layers.

  • noise_scale (float) – Default: 1.0. Noise amplitude.

  • noise_sigma (float) – Default: 0.5. Noise variance.

  • actor_lr (float) – Default: 0.001. Learning rate for actor network.

  • critic_lr (float) – Default: 0.001. Learning rate for critic network.

  • gamma (float) – Default: 0.99. Discount value

  • tau (float) – Default: 0.02. Soft copy value.

  • gradient_clip (optional float) – Max norm for learning gradient. If None then no clip.

  • batch_size (int) – Number of samples per learning.

  • buffer_size (int) – Number of previous samples to remember.

  • warm_up (int) – Number of samples to see before start learning.

  • update_freq (int) – How many samples between learning sessions.

  • number_updates (int) – How many learning cycles per learning session.

act(agent_name: str, experience: ai_traineree.types.experience.Experience, noise: float = 0.0) ai_traineree.types.experience.Experience

Get actions from all agents. Synchronized action.

Parameters
  • states – List of states per agent. Positions need to be consistent.

  • noise – Scale for the noise to include

Returns

List of actions that each agent wants to perform

Return type

actions

action_space: ai_traineree.types.dataspace.DataSpace
agent_names: List[str]
agents: List[ai_traineree.types.agent.AgentType]
commit()
get_state() Dict[str, dict]

Returns agents’ internal states

learn(experiences, agent_name: str) None

update the critics and actors of all the agents

load_state(*, path: Optional[str] = None, agent_state: Optional[dict] = None) None

Loads the state into the Multi Agent.

The state can be provided either via path to a file that contains the state, see save_state, or direclty via state.

Parameters
  • path – (str) A path where the state was saved via save_state.

  • state – (dict) Already loaded state kept in memory.

log_metrics(data_logger: ai_traineree.loggers.data_logger.DataLogger, step: int, full_log: bool = False)
property loss: Dict[str, float]
model: str = 'MADDPG'
num_agents: int
obs_space: ai_traineree.types.dataspace.DataSpace
reset()
reset_agents()
save_state(path: str)

Saves current state of the Multi Agent instance and all related agents.

All states are stored via PyTorch’s save function.

Parameters

path – (str) String path to a location where the state is store.

seed(seed: int) None
state_dict() Dict[str, Any]

Returns description of all agent’s components.

step(agent_name: str, experience: ai_traineree.types.experience.Experience) None
update_targets()

soft update targets

IQL

class ai_traineree.multi_agents.iql.IQLAgents(obs_space: ai_traineree.types.dataspace.DataSpace, action_space: ai_traineree.types.dataspace.DataSpace, num_agents: int, **kwargs)
__init__(obs_space: ai_traineree.types.dataspace.DataSpace, action_space: ai_traineree.types.dataspace.DataSpace, num_agents: int, **kwargs)

Independent Q-Learning

A set of independent Q-Learning agents (DQN implementation) that are organized to work as an Multi Agent agent. These agents have defaults as per DQNAgent class. All keyword paramters are passed to each agent.

Parameters
  • obs_space (int) – Dimensionality of the state.

  • action_size (int) – Dimensionality of the action.

  • num_agents (int) – Number of agents.

Keyword Arguments
  • hidden_layers (tuple of ints) – Shape for fully connected hidden layers.

  • noise_scale (float) – Default: 1.0. Noise amplitude.

  • noise_sigma (float) – Default: 0.5. Noise variance.

  • actor_lr (float) – Default: 0.001. Learning rate for actor network.

  • gamma (float) – Default: 0.99. Discount value

  • tau (float) – Default: 0.02. Soft copy value.

  • gradient_clip (optional float) – Max norm for learning gradient. If None then no clip.

  • batch_size (int) – Number of samples per learning.

  • buffer_size (int) – Number of previous samples to remember.

  • warm_up (int) – Number of samples to see before start learning.

  • update_freq (int) – How many samples between learning sessions.

  • number_updates (int) – How many learning cycles per learning session.

act(agent_name: str, experience: ai_traineree.types.experience.Experience, noise: float = 0.0) ai_traineree.types.experience.Experience
action_space: ai_traineree.types.dataspace.DataSpace
agent_names: List[str]
agents: List[ai_traineree.types.agent.AgentType]
commit() None

This method does nothing.

Since all agents are completely independent there is no need for synchronizing them.

get_state()

Returns agents’ internal states

load_state(path: str)

Reads the whole agent state from a local file.

log_metrics(data_logger: ai_traineree.loggers.data_logger.DataLogger, step: int, full_log: bool = False)
property loss: Dict[str, float]
model: str = 'IQL'
num_agents: int
obs_space: ai_traineree.types.dataspace.DataSpace
reset() None

Resets all agents’ states.

reset_agents()
save_state(path: str)

Saves the whole agent state into a local file.

seed(seed: int)
state_dict() Dict[str, dict]

Returns description of all agent’s components.

step(agent_name: str, experience: ai_traineree.types.experience.Experience) None