Buffers¶

Basis¶

This class is the abstraction for all buffers. In short, each buffer should support adding new samples and sampling from the buffer. Additional classes are required for saving or resuming the whole state. All buffers internally store data in as Experience but on sampling these are converted into torch Tensors or numpy arrays.

class ai_traineree.types.experience.Experience(**kwargs)¶

Basic data unit to hold information.

It typically represents a one whole cycle of observation - action - reward. Data type used to store experiences in experience buffers.

class ai_traineree.buffers.replay.BufferBase¶

Abstract class that defines buffer.

add(**kwargs)¶: Add samples to the buffer.

dump_buffer(serialize: bool = False) → List[Dict]¶: Return the whole buffer, e.g. for storing.

load_buffer(buffer: List[ai_traineree.types.experience.Experience]) → None¶: Loads provided data into the buffer.

sample(*args, **kwargs) → Optional[List[ai_traineree.types.experience.Experience]]¶: Sample buffer for a set of experience.

Replay Buffer¶

The most basic buffer. Supports uniform sampling.

class ai_traineree.buffers.replay.ReplayBuffer(batch_size: int, buffer_size=1000000, **kwargs)¶

__eq__(o: object) → bool¶: Return self==value.

__hash__ = None¶

__init__(batch_size: int, buffer_size=1000000, **kwargs)¶

Parameters

compress_state – bool (default: False) Whether manage memory used by states. Useful when states are “large”. Improves memory usage but has a significant performance penalty.
seed – int (default: None) Set seed for the random number generator.

add(**kwargs)¶: Add samples to the buffer.

clear()¶: Removes all data from the buffer

dump_buffer(serialize: bool = False) → Iterator[Dict[str, List]]¶: Return the whole buffer, e.g. for storing.

load_buffer(buffer: List[ai_traineree.types.experience.Experience])¶: Loads provided data into the buffer.

sample(keys: Optional[Sequence[str]] = None) → Dict[str, List]¶

Parameters: keys – A list of keys which limit the return. If nothing is provided, all keys are returned.
Returns: Returns all values for asked keys.

Replay Experience Buffer (PER)¶

class ai_traineree.buffers.per.PERBuffer(batch_size: int, buffer_size: int = 1000000, alpha=0.5, device=None, **kwargs)¶

Prioritized Experience Replay

A buffer that holds previously seen sets of transitions, or memories. Prioritization in the name means that each transition has some value (priority) which refers to the probability of sampling that transition. In short, the larger priority value the higher chances of sampling associated samples. Often these priority values are related to the error calculated when learning from that associated sample. In such cases, sampling from the buffer will more often provide values that are troublesome.

Based on “Prioritized Experience Replay” (2016) T. Shaul, J. Quan, I. Antonoglou, D. Silver. https://arxiv.org/pdf/1511.05952.pdf

__eq__(o: object) → bool¶: Return self==value.

__hash__ = None¶

__init__(batch_size: int, buffer_size: int = 1000000, alpha=0.5, device=None, **kwargs)¶

Parameters

batch_size (int) – Number of samples to return on sampling.
buffer_size (int) – Maximum number of samples to store. Default: 10^6.
alpha (float) – Optional (default: 0.5). Power factor for priorities making the sampling prob ~priority^alpha.
compress_state (bool) – Optional (default: False). Whether manage memory used by states. Useful when states are “large”. Improves memory usage but has a significant performance penalty.
seed (int) – Optional (default None). Set seed for the random number generator.

add(*, priority: float = 0, **kwargs)¶: Add samples to the buffer.

dump_buffer(serialize: bool = False) → Iterator[Dict[str, List]]¶: Return the whole buffer, e.g. for storing.

load_buffer(buffer: List[ai_traineree.types.experience.Experience])¶: Loads provided data into the buffer.

priority_update(indices: Sequence[int], priorities: List) → None¶: Updates prioprities for elements on provided indices.

reset_alpha(alpha: float)¶: Resets the alpha wegith (p^alpha)

sample(beta: float = 0.5) → Optional[Dict[str, List]]¶: Sample buffer for a set of experience.

Rollout Buffer¶

class ai_traineree.buffers.rollout.RolloutBuffer(batch_size: int, buffer_size=1000000, **kwargs)¶

__eq__(o: object) → bool¶: Return self==value.

__hash__ = None¶

__init__(batch_size: int, buffer_size=1000000, **kwargs)¶

A buffer that keeps and returns data in order. Commonly used with on-policy methods such as PPO.

Parameters

batch_size (int) – Maximum number of samples to return in each batch.
buffer_size (int) – Number of samples to store in the buffer.

Keyword Arguments

compress_state (bool) – Default False. Whether to manage memory used by states. Useful when states are “large” and frequently visited. Typical use case is dealing with images.

add(**kwargs)¶: Add samples to the buffer.

dump_buffer(serialize: bool = False) → Iterator[Dict[str, List]]¶: Return the whole buffer, e.g. for storing.

load_buffer(buffer: List[ai_traineree.types.experience.Experience])¶: Loads provided data into the buffer.

sample(batch_size: Optional[int] = None) → Iterator[Dict[str, list]]¶

Samples the whole buffer. Iterates all gathered data. Note that sampling doesn’t clear the buffer.

Returns: A generator that iterates over all rolled-out samples.