Networks¶

Networks are dividied depending on their context. For some reason it’s often to find convetion of heads and bodies, and that’s why we’re keeping it here. If you haven’t heard of these before think about the Frankeinstain monster. Body is not a whole body but rather a body part, e.g. arms and legs. Obviously(!), they don’t work by themselves so you need a head which will control them. Some heads take body parts explicitly and build the whole monstrocity and some heads are predefined to closely match suggestion in a paper. So, in general, a head is more complex and does more than a body, but for some agents a single body part, e.g. Fully connected network, is good enough.

Bodies¶

ai_traineree.networks.bodies.ActorBody¶: alias of ai_traineree.networks.bodies.FcNet

class ai_traineree.networks.bodies.ConvNet(input_dim: Sequence[int], **kwargs)¶

__init__(input_dim: Sequence[int], **kwargs)¶

Convolution Network.

Constructs a layered network over torch.nn.Conv2D. Number of layers is set based on hidden_layers argument. To update other arguments, e.g. kernel_size or bias, pass either a single value or a tuple of the same length as hidden_layers.

Quick reminder from the PyTorch doc (https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html).

Keyword Arguments

in_channels (int) – Number of channels in the input image
hidden_layers (tuple of ints) – Number of channels in each hidden layer
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
padding_mode (string, optional) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

Examples

>>> config = {"hidden_layers": (300, 200, 100), "kernel_size": 6, "gate": F.relu}
>>> net = ConvNet(input_dim=(10, 10, 3), **config)
>>> config = {"hidden_layers": (64, 32, 64), "kernel_size": (3, 4, 3), padding: 2, "gate": F.relu}
>>> net = ConvNet(input_dim=(20, 10, 1), **config)

forward(x)¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property output_size¶

reset_parameters()¶

training: bool¶

class ai_traineree.networks.bodies.CriticBody(in_features: Sequence[int], inj_action_size: int, out_features: Sequence[int] = (1,), hidden_layers: Optional[Sequence[int]] = (100, 100), inj_actions_layer: int = 1, **kwargs)¶

Extension of the FcNet which includes actions.

Mainly used to estimate the state-action value function in actor-critic agents. Actions are included (by default) in the first hidden layer (changeable).

Since the main purpose for this is value function estimation the output is a single value.

__init__(in_features: Sequence[int], inj_action_size: int, out_features: Sequence[int] = (1,), hidden_layers: Optional[Sequence[int]] = (100, 100), inj_actions_layer: int = 1, **kwargs)¶

Parameters

in_features (tuple of ints) – Dimension of the input features.
inj_action_size (int) – Dimension of the action vector that is injected into inj_action_layer.
out_features (tuple of ints) – Dimension of critic’s action. Default: (1,).
hidden_layers (tuple of ints) – Shape of the hidden layers. Default: (100, 100).
inj_action_layer (int) – An index for the layer that will have actions injected as an additional input. By default that’s a first hidden layer, i.e. (state) -> (out + actions) -> (out) … -> (output). Default: 1.

Keyword Arguments

bias (bool) – Whether to include bias in network’s architecture. Default: True.
gate (callable) – Activation function for each layer, expect the last. Default: Identity layer.
gate_out (callable) – Activation function after the last layer. Default: Identity layer.
device – Device where to allocate memory. CPU or CUDA. Default CUDA if available.

forward(x, actions)¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_parameters()¶

training: bool¶

class ai_traineree.networks.bodies.FcNet(in_features: Sequence[int], out_features: Sequence[int], hidden_layers: Optional[Sequence[int]] = (200, 100), last_layer_range=(- 0.0003, 0.0003), bias: bool = True, **kwargs)¶

For the activation layer we use tanh by default which was observed to be much better, e.g. compared to ReLU, for policy networks [1]. The last gate, however, might be changed depending on the actual task.

References

1: “What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study” by M. Andrychowicz et al. (2020). Link: https://arxiv.org/abs/2006.05990

__init__(in_features: Sequence[int], out_features: Sequence[int], hidden_layers: Optional[Sequence[int]] = (200, 100), last_layer_range=(- 0.0003, 0.0003), bias: bool = True, **kwargs)¶

Fully Connected network with default APIs.

Parameters

in_features (sequence of ints) – Shape of the input.
out_features (sequence of ints) – Shape of the output.
hidden_layers – Shape of the hidden layers. If None, then the output is directly computed from the input.
last_layer_range – The range for the uniform distribution that initiates the last layer.

Keyword Arguments

gate (optional torch.nn.layer) – Activation function for each layer, expect the last. Default: torch.tanh.
gate_out (optional torch.nn.layer) – Activation function after the last layer. Default: Identity layer.
device (torch.devce or str) – Device where to allocate memory. CPU or CUDA.

forward(x)¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_parameters()¶

training: bool¶

class ai_traineree.networks.bodies.NoisyLayer(in_features: Sequence[int], out_features: Sequence[int], sigma: float = 0.4, factorised: bool = True)¶

__init__(in_features: Sequence[int], out_features: Sequence[int], sigma: float = 0.4, factorised: bool = True)¶

A linear layer with added noise perturbations in training as described in [1]. For a fully connected network of NoisyLayers see NoisyNet.

Parameters

in_features (tuple ints) – Dimension of the input.
out_features (tuple ints) – Dimension of the output.
sigma (float) – Used to intiated noise distribution. Default: 0.4.
factorised – Whether to use independent Gaussian (False) or Factorised Gaussian (True) noise. Suggested [1] for DQN and Duelling nets to use factorised as it’s quicker.

References

1: “Noisy Networks for Exploration” by Fortunato et al. (ICLR 2018), https://arxiv.org/abs/1706.10295.

forward(x) → torch.Tensor¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static noise_function(x)¶

reset_noise()¶

reset_parameters() → None¶

training: bool¶

class ai_traineree.networks.bodies.NoisyNet(in_features: Sequence[int], out_features: Sequence[int], hidden_layers: Optional[Sequence[int]] = (100, 100), sigma=0.4, factorised=True, **kwargs)¶

__init__(in_features: Sequence[int], out_features: Sequence[int], hidden_layers: Optional[Sequence[int]] = (100, 100), sigma=0.4, factorised=True, **kwargs)¶

Parameters

in_features (tuple ints) – Dimension of the input.
out_features (tuple ints) – Dimension of the output.
hidden_layers (sequence ints) – Sizes of latent layers. Size of sequence denotes number of hidden layers and values of the sequence are nodes per layer. If None is passed then the input goes straight to output. Default: (100, 100).
sigma (float) – Variance value for generating noise in noisy layers. Default: 0.4 per layer.
factorised (bool) – Whether to use independent Gaussian (False) or Factorised Gaussian (True) noise. Suggested [1] for DQN and Duelling nets to use factorised as it’s quicker.

Keyword Arguments

gate (callable) – Function to apply after each layer pass. For the best performance it is suggested to use non-linear functions such as tanh. Default: tanh.
gate_out (callable) – Function to apply on network’s exit. Default: identity.
device (str or torch.device) – Whether and where to cast the network. Default is CUDA if available else cpu.

References

1: “Noisy Networks for Exploration” by Fortunato et al. (ICLR 2018), https://arxiv.org/abs/1706.10295.

forward(x) → torch.Tensor¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_noise() → None¶

reset_parameters() → None¶

training: bool¶

class ai_traineree.networks.bodies.ScaleNet(scale: Union[float, int])¶

forward(x)¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

ai_traineree.networks.bodies.hidden_init(layer: torch.nn.modules.module.Module)¶

ai_traineree.networks.bodies.layer_init(layer: torch.nn.modules.module.Module, range_value: Optional[Tuple[float, float]] = None, remove_mean=True)¶

Heads¶

Heads are build on Brains. Like in real life, heads do all the difficult part of receiving stimuli, being above everything else and not falling apart. You take brains out and they just do nothng. Lazy. The most common use case is when one head contains one brain. But who are we to say what you can and cannot do. You want two brains and a head within your head? Sure, go crazy.

What we’re trying to do here is to keep thing relatively simple. Unfortunately, not everything can be achieved [citation needed] with a serial topography and at some point you’ll need branching. Heads are “special” in that each is built on networks/brains and will likely need some special pipeping when attaching to your agent.

class ai_traineree.networks.heads.CategoricalNet(num_atoms: int = 21, v_min: float = - 20.0, v_max: float = 20.0, in_features: Optional[Sequence[int]] = None, out_features: Optional[Sequence[int]] = None, hidden_layers: Sequence[int] = (200, 200), net: Optional[ai_traineree.networks.NetworkType] = None, device: Optional[torch.device] = None)¶

Computes discrete probability distribution for the state-action Q function.

CategoricalNet [1] learns significantly different compared to other nets here. For this reason it won’t be suitable for simple replacement in most (current) agents. Please check the Agent whether it supports.

The algorithm is used in the RainbowNet but not this particular net.

References

1: “A Distributional Perspective on Reinforcement Learning” (2017) by M. G. Bellemare, W. Dabney, R. Munos. Link: http://arxiv.org/abs/1707.06887

__init__(num_atoms: int = 21, v_min: float = - 20.0, v_max: float = 20.0, in_features: Optional[Sequence[int]] = None, out_features: Optional[Sequence[int]] = None, hidden_layers: Sequence[int] = (200, 200), net: Optional[ai_traineree.networks.NetworkType] = None, device: Optional[torch.device] = None)¶

Parameters

num_atoms – Number of atoms that disceritze the probability distrubition.
v_min – Minimum (edge) value of the shifted distribution.
v_max – Maximum (edge) value of the shifted distribution.
net – (Optional) A network used for estimation. If net is proved then hidden_layers has no effect.
obs_space – Size of the observation.
action_size – Length of the output.
hidden_layers – Shape of the hidden layers that are fully connected networks.

Note that either net or both (obs_space, action_size) need to be not None. If obs_space and action_size are provided then the default net is created as fully connected network with hidden_layers size.

dist_projection(rewards: torch.Tensor, masks: torch.Tensor, discount: float, prob_next: torch.Tensor) → torch.Tensor¶

Parameters

rewards – Tensor containing rewards that are used as offsets for each distrubitions.
masks – Tensor indicating whether the iteration is terminal. Usually masks = 1 - dones.
discount – Discounting value for added Q distributional estimate. Typically gamma or gamma^(n_steps).
prob_next – Probablity estimates based on transitioned (next) states.

forward(*args) → torch.Tensor¶: Passes *args through the net with proper handling.

mean(values)¶

reset_paramters()¶

training: bool¶

class ai_traineree.networks.heads.DoubleCritic(in_features: Sequence[int], action_size: int, body_cls: ai_traineree.networks.NetworkTypeClass, **kwargs)¶

act(states, actions)¶

forward(state, actions)¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_parameters()¶

training: bool¶

class ai_traineree.networks.heads.DuelingNet(in_features: Sequence[int], out_features: Sequence[int], hidden_layers: Sequence[int], net_fn: Optional[Callable[[...], ai_traineree.networks.NetworkType]] = None, net_class: Optional[ai_traineree.networks.NetworkTypeClass] = None, **kwargs)¶

__init__(in_features: Sequence[int], out_features: Sequence[int], hidden_layers: Sequence[int], net_fn: Optional[Callable[[...], ai_traineree.networks.NetworkType]] = None, net_class: Optional[ai_traineree.networks.NetworkTypeClass] = None, **kwargs)¶

Parameters

in_features (tuple of ints) – Dimension of the input features.
out_features (tuple of ints) – Dimension of critic’s action. Default: (1,).
hidden_layers (tuple of ints) – Shape of the hidden layers.
net_fn (optional func) –
net_class (optional class) –

Keyword Arguments

device – Device where to allocate memory. CPU or CUDA. Default CUDA if available.

act(x)¶

forward(x)¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_parameters() → None¶

training: bool¶

class ai_traineree.networks.heads.NetChainer(net_classes: List[ai_traineree.networks.NetworkTypeClass], **kwargs)¶

Chains nets into a one happy family.

As it stands it is a wrapper around pytroch.nn.ModuleList. The need for wrapper comes from unified API to reset properties.

forward(x)¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_noise()¶

reset_parameters()¶

training: bool¶

class ai_traineree.networks.heads.RainbowNet(in_features: Sequence[int], out_features: Sequence[int], **kwargs)¶

Rainbow networks combines dueling and categorical networks.

__init__(in_features: Sequence[int], out_features: Sequence[int], **kwargs)¶

Parameters: in_features (tuple of ints): Shape of the input. out_features (tuple of ints): Shape of the expected output.

Keyword Arguments

hidden_layers (tuple of ints) – Shape of fully connected networks. Default: (200, 200).
num_atoms (int) – Number of atoms used in estimating distribution. Default: 21.
v_min (float) – Value distribution minimum (left most) value. Default -10.
v_max (float) – Value distribution maximum (right most) value. Default 10.
noisy (bool) – Whether to use Noisy version of FC networks.
pre_network_fn (func) – A shared network that is used before value and advantage networks.
device (None, str or torch.device) – Device where to cast the network. Can be assigned with strings, or directly passing torch.device type. If None then it tries to use CUDA then CPU. Default: None.

act(x, log_prob=False)¶

Parameters: log_prob (bool) – Whether to return log(prob) which uses pytorch’s function. According to doc it’s quicker and more stable than taking prob.log().

dist_projection(rewards: torch.Tensor, masks: torch.Tensor, discount: float, prob_next: torch.Tensor) → torch.Tensor¶

Parameters

rewards – Tensor containing rewards that are used as offsets for each distrubitions.
masks – Tensor indicating whether the iteration is terminal. Usually masks = 1 - dones.
discount – Discounting value for added Q distributional estimate. Typically gamma or gamma^(n_steps).
prob_next – Probablity estimates based on transitioned (next) states.

forward(x, log_prob=False)¶

Parameters: log_prob (bool) – Whether to return log(prob) which uses pytorch’s function. According to doc it’s quicker and more stable than taking prob.log().

reset_noise()¶

training: bool¶