Agents and Networks#
Agents#
smart_control.reinforcement_learning.agents.sac_agent
#
Reinforcement learning - Soft Actor Critic (SAC) agent.
create_sac_agent
#
create_sac_agent(
time_step_spec: TimeStep,
action_spec: NestedTensorSpec,
actor_fc_layers: Sequence[int] = (256, 256),
actor_network: Optional[Network] = None,
critic_obs_fc_layers: Sequence[int] = (256, 128),
critic_action_fc_layers: Sequence[int] = (256, 128),
critic_joint_fc_layers: Sequence[int] = (256, 128),
critic_network: Optional[Network] = None,
actor_learning_rate: float = 0.0003,
critic_learning_rate: float = 0.0003,
alpha_learning_rate: float = 0.0003,
gamma: float = 0.99,
target_update_tau: float = 0.005,
target_update_period: int = 1,
reward_scale_factor: float = 1.0,
gradient_clipping: Optional[float] = None,
debug_summaries: bool = False,
summarize_grads_and_vars: bool = False,
train_step_counter: Optional[Variable] = None,
) -> tf_agent.TFAgent
Creates a SAC Agent.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
time_step_spec
|
TimeStep
|
A |
required |
action_spec
|
NestedTensorSpec
|
A nest of BoundedTensorSpec representing the actions. |
required |
actor_fc_layers
|
Sequence[int]
|
Iterable of fully connected layer units for the actor network. |
(256, 256)
|
actor_network
|
Optional[Network]
|
Optional custom actor network to use. |
None
|
critic_obs_fc_layers
|
Sequence[int]
|
Iterable of fully connected layer units for the critic observation network. |
(256, 128)
|
critic_action_fc_layers
|
Sequence[int]
|
Iterable of fully connected layer units for the critic action network. |
(256, 128)
|
critic_joint_fc_layers
|
Sequence[int]
|
Iterable of fully connected layer units for the joint part of the critic network. |
(256, 128)
|
critic_network
|
Optional[Network]
|
Optional custom critic network to use. |
None
|
actor_learning_rate
|
float
|
Actor network learning rate. |
0.0003
|
critic_learning_rate
|
float
|
Critic network learning rate. |
0.0003
|
alpha_learning_rate
|
float
|
Alpha (entropy regularization) learning rate. |
0.0003
|
gamma
|
float
|
Discount factor for future rewards. |
0.99
|
target_update_tau
|
float
|
Factor for soft update of target networks. |
0.005
|
target_update_period
|
int
|
Period for soft update of target networks. |
1
|
reward_scale_factor
|
float
|
Multiplicative scale for the reward. |
1.0
|
gradient_clipping
|
Optional[float]
|
Norm length to clip gradients. |
None
|
debug_summaries
|
bool
|
Whether to emit debug summaries. |
False
|
summarize_grads_and_vars
|
bool
|
Whether to summarize gradients and variables. |
False
|
train_step_counter
|
Optional[Variable]
|
An optional counter to increment every time the train op is run. Defaults to the global_step. |
None
|
Returns:
Type | Description |
---|---|
TFAgent
|
A BaseAgent instance with the SAC agent. |
Networks#
smart_control.reinforcement_learning.agents.networks.sac_networks
#
Network architectures for SAC agent.
This module provides functions to create actor and critic networks for SAC agents.
create_fc_network
#
Creates a fully connected network.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
layer_units
|
Sequence[int]
|
A sequence of layer units. |
required |
Returns:
Type | Description |
---|---|
Model
|
A sequential model of dense layers. |
create_identity_layer
#
Creates an identity layer.
Returns:
Type | Description |
---|---|
Layer
|
A Lambda layer that returns its input. |
create_sequential_actor_network
#
create_sequential_actor_network(
actor_fc_layers: Sequence[int], action_tensor_spec: NestedTensorSpec
) -> sequential.Sequential
Create a sequential actor network for SAC.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actor_fc_layers
|
Sequence[int]
|
Units for actor network fully connected layers. |
required |
action_tensor_spec
|
NestedTensorSpec
|
The action tensor spec. |
required |
Returns:
Type | Description |
---|---|
Sequential
|
A sequential actor network. |
create_sequential_critic_network
#
create_sequential_critic_network(
obs_fc_layer_units: Sequence[int],
action_fc_layer_units: Sequence[int],
joint_fc_layer_units: Sequence[int],
) -> sequential.Sequential
Create a sequential critic network for SAC.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs_fc_layer_units
|
Sequence[int]
|
Units for observation network layers. |
required |
action_fc_layer_units
|
Sequence[int]
|
Units for action network layers. |
required |
joint_fc_layer_units
|
Sequence[int]
|
Units for joint network layers. |
required |
Returns:
Type | Description |
---|---|
Sequential
|
A sequential critic network. |