Policies
- class aitraineree.policies.BetaPolicy(size: int, bound_space: DataSpace, out_scale: float = 1, **kwargs)
Multivarate generalized version of the Dirichlet (1D) distribution.
Uses torch.distributions.Beta or torch.distributions.Dirichlet distirubitions depending on the input size.
https://pytorch.org/docs/stable/distributions.html#beta https://pytorch.org/docs/stable/distributions.html#dirichlet
- __init__(size: int, bound_space: DataSpace, out_scale: float = 1, **kwargs)
- Parameters:
size – Observation’s dimensionality upon sampling.
bounds – Beta dist input clamp for both alpha and betas. Both concentration are expected to be larger than 1.
- forward(x, deterministic: bool = False) Tensor
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- log_prob(samples)
- param_dim: int = 2
- class aitraineree.policies.DeterministicPolicy(action_size)
- forward(x)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- param_dim: int = 1
- class aitraineree.policies.DirichletPolicy(size: int, *, alpha_min: float = 0.05)
- forward(x, deterministic: bool = False) Tensor
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- log_prob(samples) Tensor
- param_dim: int = 2
- class aitraineree.policies.GaussianPolicy(in_features: Sequence[int], out_features: Sequence[int], out_scale: float = 1, **kwargs)
Univariate Gaussian (Normal) Distribution. Has two heads; one for location estimate and one for standard deviation.
- __init__(in_features: Sequence[int], out_features: Sequence[int], out_scale: float = 1, **kwargs)
- Parameters:
size – Observation’s dimensionality upon sampling.
- forward(x, deterministic: bool = False) Tensor
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- log_prob(samples) Tensor | None
- class aitraineree.policies.MultivariateGaussianPolicy(size: int, std_init: float = 1.0, std_min: float = 0.001, std_max: float = 2.0, device=None, **kwargs)
Multivariate Gaussian (Normal) Policy.
In contrast to MultivariateGaussianPolicySimple it assumes that distribution’s characteristics are estimated by the network rather than optimized by the optimizer. Both location and covariance are assumed to be inputs into the policy.
- __init__(size: int, std_init: float = 1.0, std_min: float = 0.001, std_max: float = 2.0, device=None, **kwargs)
- Parameters:
size – Observation’s dimensionality upon sampling.
batch_size – Expected size of batch.
device – Device where to allocate memory. CPU or CUDA.
- act(x) Tensor
Deterministic pass. Ignores covariance and returns locations directly.
- static diag_idx(batch_size: int, size: int, device)
- forward(x, deterministic=False) Distribution
Returns distribution
- log_prob(samples)
- param_dim: int = 2
- class aitraineree.policies.MultivariateGaussianPolicySimple(size: int, std_init: float = 0.5, std_min: float = 0.0001, std_max: float = 2.0, device=None, **kwargs)
Multivariate Gaussian (Normal) Policy.
Simplicity of this class, compared to MultivariateGaussianPolicy, is in the assumption that the covariance is diagonal, sample independent and is treated a trainable parameter.
- __init__(size: int, std_init: float = 0.5, std_min: float = 0.0001, std_max: float = 2.0, device=None, **kwargs)
- Parameters:
size (int) – Size of the observation.
std_init – Initial value for covariance’s diagonal. All values start the same. Default: 0.5.
std_min – Minimum value for standard deviation. Default: 0.0001.
std_max – Maximum value for standard deviation. Default: 2.
device – Device where to allocate memory. CPU or CUDA.
- act(x)
- static diag_idx(batch_size: int, size: int, device)
- forward(x, deterministic: bool = False) Distribution
Samples from distribution.
- Parameters:
x (tensor) – Uses a location (mu) for the distrubition.
deterministic (bool) – Whether to sample from distribution, or use estimates. Default: False, i.e. it’ll sample distribution.
- log_prob(samples) Tensor
- param_dim: int = 1