Skip to content

Agents and Networks#

Agents#

smart_control.reinforcement_learning.agents.sac_agent #

Reinforcement learning - Soft Actor Critic (SAC) agent.

create_sac_agent #

create_sac_agent(
    time_step_spec: TimeStep,
    action_spec: NestedTensorSpec,
    actor_fc_layers: Sequence[int] = (256, 256),
    actor_network: Optional[Network] = None,
    critic_obs_fc_layers: Sequence[int] = (256, 128),
    critic_action_fc_layers: Sequence[int] = (256, 128),
    critic_joint_fc_layers: Sequence[int] = (256, 128),
    critic_network: Optional[Network] = None,
    actor_learning_rate: float = 0.0003,
    critic_learning_rate: float = 0.0003,
    alpha_learning_rate: float = 0.0003,
    gamma: float = 0.99,
    target_update_tau: float = 0.005,
    target_update_period: int = 1,
    reward_scale_factor: float = 1.0,
    gradient_clipping: Optional[float] = None,
    debug_summaries: bool = False,
    summarize_grads_and_vars: bool = False,
    train_step_counter: Optional[Variable] = None,
) -> tf_agent.TFAgent

Creates a SAC Agent.

Parameters:

Name Type Description Default
time_step_spec TimeStep

A TimeStep spec of the expected time_steps.

required
action_spec NestedTensorSpec

A nest of BoundedTensorSpec representing the actions.

required
actor_fc_layers Sequence[int]

Iterable of fully connected layer units for the actor network.

(256, 256)
actor_network Optional[Network]

Optional custom actor network to use.

None
critic_obs_fc_layers Sequence[int]

Iterable of fully connected layer units for the critic observation network.

(256, 128)
critic_action_fc_layers Sequence[int]

Iterable of fully connected layer units for the critic action network.

(256, 128)
critic_joint_fc_layers Sequence[int]

Iterable of fully connected layer units for the joint part of the critic network.

(256, 128)
critic_network Optional[Network]

Optional custom critic network to use.

None
actor_learning_rate float

Actor network learning rate.

0.0003
critic_learning_rate float

Critic network learning rate.

0.0003
alpha_learning_rate float

Alpha (entropy regularization) learning rate.

0.0003
gamma float

Discount factor for future rewards.

0.99
target_update_tau float

Factor for soft update of target networks.

0.005
target_update_period int

Period for soft update of target networks.

1
reward_scale_factor float

Multiplicative scale for the reward.

1.0
gradient_clipping Optional[float]

Norm length to clip gradients.

None
debug_summaries bool

Whether to emit debug summaries.

False
summarize_grads_and_vars bool

Whether to summarize gradients and variables.

False
train_step_counter Optional[Variable]

An optional counter to increment every time the train op is run. Defaults to the global_step.

None

Returns:

Type Description
TFAgent

A BaseAgent instance with the SAC agent.

Networks#

smart_control.reinforcement_learning.agents.networks.sac_networks #

Network architectures for SAC agent.

This module provides functions to create actor and critic networks for SAC agents.

create_fc_network #

create_fc_network(layer_units: Sequence[int]) -> tf.keras.Model

Creates a fully connected network.

Parameters:

Name Type Description Default
layer_units Sequence[int]

A sequence of layer units.

required

Returns:

Type Description
Model

A sequential model of dense layers.

create_identity_layer #

create_identity_layer() -> tf.keras.layers.Layer

Creates an identity layer.

Returns:

Type Description
Layer

A Lambda layer that returns its input.

create_sequential_actor_network #

create_sequential_actor_network(
    actor_fc_layers: Sequence[int], action_tensor_spec: NestedTensorSpec
) -> sequential.Sequential

Create a sequential actor network for SAC.

Parameters:

Name Type Description Default
actor_fc_layers Sequence[int]

Units for actor network fully connected layers.

required
action_tensor_spec NestedTensorSpec

The action tensor spec.

required

Returns:

Type Description
Sequential

A sequential actor network.

create_sequential_critic_network #

create_sequential_critic_network(
    obs_fc_layer_units: Sequence[int],
    action_fc_layer_units: Sequence[int],
    joint_fc_layer_units: Sequence[int],
) -> sequential.Sequential

Create a sequential critic network for SAC.

Parameters:

Name Type Description Default
obs_fc_layer_units Sequence[int]

Units for observation network layers.

required
action_fc_layer_units Sequence[int]

Units for action network layers.

required
joint_fc_layer_units Sequence[int]

Units for joint network layers.

required

Returns:

Type Description
Sequential

A sequential critic network.