Policies

class aitraineree.policies.BetaPolicy(size: int, bound_space: DataSpace, out_scale: float = 1, **kwargs)

Multivarate generalized version of the Dirichlet (1D) distribution.

Uses torch.distributions.Beta or torch.distributions.Dirichlet distirubitions depending on the input size.

https://pytorch.org/docs/stable/distributions.html#beta https://pytorch.org/docs/stable/distributions.html#dirichlet

__init__(size: int, bound_space: DataSpace, out_scale: float = 1, **kwargs)

Parameters:

size – Observation’s dimensionality upon sampling.
bounds – Beta dist input clamp for both alpha and betas. Both concentration are expected to be larger than 1.

forward(x, deterministic: bool = False) → Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

log_prob(samples)

param_dim: int = 2

class aitraineree.policies.DeterministicPolicy(action_size)

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

param_dim: int = 1

class aitraineree.policies.DirichletPolicy(size: int, *, alpha_min: float = 0.05)

forward(x, deterministic: bool = False) → Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

log_prob(samples) → Tensor

param_dim: int = 2

class aitraineree.policies.GaussianPolicy(in_features: Sequence[int], out_features: Sequence[int], out_scale: float = 1, **kwargs)

Univariate Gaussian (Normal) Distribution. Has two heads; one for location estimate and one for standard deviation.

__init__(in_features: Sequence[int], out_features: Sequence[int], out_scale: float = 1, **kwargs)

Parameters:: size – Observation’s dimensionality upon sampling.

forward(x, deterministic: bool = False) → Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

log_prob(samples) → Tensor | None

class aitraineree.policies.MultivariateGaussianPolicy(size: int, std_init: float = 1.0, std_min: float = 0.001, std_max: float = 2.0, device=None, **kwargs)

Multivariate Gaussian (Normal) Policy.

In contrast to MultivariateGaussianPolicySimple it assumes that distribution’s characteristics are estimated by the network rather than optimized by the optimizer. Both location and covariance are assumed to be inputs into the policy.

__init__(size: int, std_init: float = 1.0, std_min: float = 0.001, std_max: float = 2.0, device=None, **kwargs)

Parameters:

size – Observation’s dimensionality upon sampling.
batch_size – Expected size of batch.
device – Device where to allocate memory. CPU or CUDA.

act(x) → Tensor: Deterministic pass. Ignores covariance and returns locations directly.

static diag_idx(batch_size: int, size: int, device)

forward(x, deterministic=False) → Distribution: Returns distribution

log_prob(samples)

param_dim: int = 2

class aitraineree.policies.MultivariateGaussianPolicySimple(size: int, std_init: float = 0.5, std_min: float = 0.0001, std_max: float = 2.0, device=None, **kwargs)

Multivariate Gaussian (Normal) Policy.

Simplicity of this class, compared to MultivariateGaussianPolicy, is in the assumption that the covariance is diagonal, sample independent and is treated a trainable parameter.

__init__(size: int, std_init: float = 0.5, std_min: float = 0.0001, std_max: float = 2.0, device=None, **kwargs)

Parameters:

size (int) – Size of the observation.
std_init – Initial value for covariance’s diagonal. All values start the same. Default: 0.5.
std_min – Minimum value for standard deviation. Default: 0.0001.
std_max – Maximum value for standard deviation. Default: 2.
device – Device where to allocate memory. CPU or CUDA.

act(x)

static diag_idx(batch_size: int, size: int, device)

forward(x, deterministic: bool = False) → Distribution

Samples from distribution.

Parameters:

x (tensor) – Uses a location (mu) for the distrubition.
deterministic (bool) – Whether to sample from distribution, or use estimates. Default: False, i.e. it’ll sample distribution.

log_prob(samples) → Tensor

param_dim: int = 1

class aitraineree.policies.PolicyType(*args: Any, **kwargs: Any)

param_dim: int