Policies

class ai_traineree.policies.BetaPolicy(size: int, bounds: Tuple[float, float] = (1, inf))

Multivarate generalized version of the Dirichlet (1D) distribution.

Uses torch.distributions.Beta or torch.distributions.Dirichlet distirubitions depending on the input size.

https://pytorch.org/docs/stable/distributions.html#beta https://pytorch.org/docs/stable/distributions.html#dirichlet

__init__(size: int, bounds: Tuple[float, float] = (1, inf))
Parameters
  • size – Observation’s dimensionality upon sampling.

  • bounds – Beta dist input clamp for both alpha and betas. Both concentration are expected to be larger than 1.

forward(x) torch.distributions.distribution.Distribution

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static log_prob(dist, samples)
param_dim: int = 2
class ai_traineree.policies.DeterministicPolicy(action_size)
forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

param_dim: int = 1
class ai_traineree.policies.DirichletPolicy(*, alpha_min: float = 0.05)
forward(x) torch.distributions.distribution.Distribution

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

log_prob(dist: torch.distributions.dirichlet.Dirichlet, samples) torch.Tensor
param_dim: int = 1
class ai_traineree.policies.GaussianPolicy(in_features: Sequence[int], out_features: Sequence[int], out_scale: float = 1, **kwargs)

Univariate Gaussian (Normal) Distribution. Has two heads; one for location estimate and one for standard deviation.

__init__(in_features: Sequence[int], out_features: Sequence[int], out_scale: float = 1, **kwargs)
Parameters

size – Observation’s dimensionality upon sampling.

forward(x, deterministic: bool = False) torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

log_prob(samples) Optional[torch.Tensor]
param_dim: int
class ai_traineree.policies.MultivariateGaussianPolicy(size: int, std_init: float = 1.0, std_min: float = 0.001, std_max: float = 2.0, device=None, **kwargs)

Multivariate Gaussian (Normal) Policy.

In contrast to MultivariateGaussianPolicySimple it assumes that distribution’s characteristics are estimated by the network rather than optimized by the optimizer. Both location and covariance are assumed to be inputs into the policy.

__init__(size: int, std_init: float = 1.0, std_min: float = 0.001, std_max: float = 2.0, device=None, **kwargs)
Parameters
  • size – Observation’s dimensionality upon sampling.

  • batch_size – Expected size of batch.

  • device – Device where to allocate memory. CPU or CUDA.

act(x) torch.Tensor

Deterministic pass. Ignores covariance and returns locations directly.

static diag_idx(batch_size: int, size: int, device)
forward(x, deterministic=False) torch.distributions.distribution.Distribution

Returns distribution

log_prob(samples)
param_dim: int = 2
class ai_traineree.policies.MultivariateGaussianPolicySimple(size: int, std_init: float = 0.5, std_min: float = 0.0001, std_max: float = 2.0, device=None, **kwargs)

Multivariate Gaussian (Normal) Policy.

Simplicity of this class, compared to MultivariateGaussianPolicy, is in the assumption that the covariance is diagonal, sample independent and is treated a trainable parameter.

__init__(size: int, std_init: float = 0.5, std_min: float = 0.0001, std_max: float = 2.0, device=None, **kwargs)
Parameters
  • size (int) – Size of the observation.

  • std_init – Initial value for covariance’s diagonal. All values start the same. Default: 0.5.

  • std_min – Minimum value for standard deviation. Default: 0.0001.

  • std_max – Maximum value for standard deviation. Default: 2.

  • device – Device where to allocate memory. CPU or CUDA.

act(x)
static diag_idx(batch_size: int, size: int, device)
forward(x, deterministic: bool = False) torch.distributions.distribution.Distribution

Samples from distribution.

Parameters
  • x (tensor) – Uses a location (mu) for the distrubition.

  • deterministic (bool) – Whether to sample from distribution, or use estimates. Default: False, i.e. it’ll sample distribution.

log_prob(samples) torch.Tensor
param_dim: int = 1
class ai_traineree.policies.PolicyType
param_dim: int