Policies¶

class ai_traineree.policies.BetaPolicy(size: int, bounds: Tuple[float, float] = (1, inf))¶

Multivarate generalized version of the Dirichlet (1D) distribution.

Uses torch.distributions.Beta or torch.distributions.Dirichlet distirubitions depending on the input size.

https://pytorch.org/docs/stable/distributions.html#beta https://pytorch.org/docs/stable/distributions.html#dirichlet

__init__(size: int, bounds: Tuple[float, float] = (1, inf))¶

Parameters

size – Observation’s dimensionality upon sampling.
bounds – Beta dist input clamp for both alpha and betas. Both concentration are expected to be larger than 1.

forward(x) → torch.distributions.distribution.Distribution¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static log_prob(dist, samples)¶

param_dim: int = 2¶

class ai_traineree.policies.DeterministicPolicy(action_size)¶

forward(x)¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

param_dim: int = 1¶

class ai_traineree.policies.DirichletPolicy(*, alpha_min: float = 0.05)¶

forward(x) → torch.distributions.distribution.Distribution¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

log_prob(dist: torch.distributions.dirichlet.Dirichlet, samples) → torch.Tensor¶

param_dim: int = 1¶

class ai_traineree.policies.GaussianPolicy(in_features: Sequence[int], out_features: Sequence[int], out_scale: float = 1, **kwargs)¶

Univariate Gaussian (Normal) Distribution. Has two heads; one for location estimate and one for standard deviation.

__init__(in_features: Sequence[int], out_features: Sequence[int], out_scale: float = 1, **kwargs)¶

Parameters: size – Observation’s dimensionality upon sampling.

forward(x, deterministic: bool = False) → torch.Tensor¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

log_prob(samples) → Optional[torch.Tensor]¶

param_dim: int¶

class ai_traineree.policies.MultivariateGaussianPolicy(size: int, std_init: float = 1.0, std_min: float = 0.001, std_max: float = 2.0, device=None, **kwargs)¶

Multivariate Gaussian (Normal) Policy.

In contrast to MultivariateGaussianPolicySimple it assumes that distribution’s characteristics are estimated by the network rather than optimized by the optimizer. Both location and covariance are assumed to be inputs into the policy.

__init__(size: int, std_init: float = 1.0, std_min: float = 0.001, std_max: float = 2.0, device=None, **kwargs)¶

Parameters

size – Observation’s dimensionality upon sampling.
batch_size – Expected size of batch.
device – Device where to allocate memory. CPU or CUDA.

act(x) → torch.Tensor¶: Deterministic pass. Ignores covariance and returns locations directly.

static diag_idx(batch_size: int, size: int, device)¶

forward(x, deterministic=False) → torch.distributions.distribution.Distribution¶: Returns distribution

log_prob(samples)¶

param_dim: int = 2¶

class ai_traineree.policies.MultivariateGaussianPolicySimple(size: int, std_init: float = 0.5, std_min: float = 0.0001, std_max: float = 2.0, device=None, **kwargs)¶

Multivariate Gaussian (Normal) Policy.

Simplicity of this class, compared to MultivariateGaussianPolicy, is in the assumption that the covariance is diagonal, sample independent and is treated a trainable parameter.

__init__(size: int, std_init: float = 0.5, std_min: float = 0.0001, std_max: float = 2.0, device=None, **kwargs)¶

Parameters

size (int) – Size of the observation.
std_init – Initial value for covariance’s diagonal. All values start the same. Default: 0.5.
std_min – Minimum value for standard deviation. Default: 0.0001.
std_max – Maximum value for standard deviation. Default: 2.
device – Device where to allocate memory. CPU or CUDA.

act(x)¶

static diag_idx(batch_size: int, size: int, device)¶

forward(x, deterministic: bool = False) → torch.distributions.distribution.Distribution¶

Samples from distribution.

Parameters

x (tensor) – Uses a location (mu) for the distrubition.
deterministic (bool) – Whether to sample from distribution, or use estimates. Default: False, i.e. it’ll sample distribution.

log_prob(samples) → torch.Tensor¶

param_dim: int = 1¶

class ai_traineree.policies.PolicyType¶

param_dim: int¶