Agents and Networks#

Agents#

smart_control.reinforcement_learning.agents.sac_agent #

Reinforcement learning - Soft Actor Critic (SAC) agent.

create_sac_agent #

create_sac_agent(
    time_step_spec: TimeStep,
    action_spec: NestedTensorSpec,
    actor_fc_layers: Sequence[int] = (256, 256),
    actor_network: Optional[Network] = None,
    critic_obs_fc_layers: Sequence[int] = (256, 128),
    critic_action_fc_layers: Sequence[int] = (256, 128),
    critic_joint_fc_layers: Sequence[int] = (256, 128),
    critic_network: Optional[Network] = None,
    actor_learning_rate: float = 0.0003,
    critic_learning_rate: float = 0.0003,
    alpha_learning_rate: float = 0.0003,
    gamma: float = 0.99,
    target_update_tau: float = 0.005,
    target_update_period: int = 1,
    reward_scale_factor: float = 1.0,
    gradient_clipping: Optional[float] = None,
    debug_summaries: bool = False,
    summarize_grads_and_vars: bool = False,
    train_step_counter: Optional[Variable] = None,
) -> tf_agent.TFAgent

Creates a SAC Agent.

Parameters:

Name	Type	Description	Default
`time_step_spec`	`TimeStep`	A `TimeStep` spec of the expected time_steps.	required
`action_spec`	`NestedTensorSpec`	A nest of BoundedTensorSpec representing the actions.	required
`actor_fc_layers`	`Sequence[int]`	Iterable of fully connected layer units for the actor network.	`(256, 256)`
`actor_network`	`Optional[Network]`	Optional custom actor network to use.	`None`
`critic_obs_fc_layers`	`Sequence[int]`	Iterable of fully connected layer units for the critic observation network.	`(256, 128)`
`critic_action_fc_layers`	`Sequence[int]`	Iterable of fully connected layer units for the critic action network.	`(256, 128)`
`critic_joint_fc_layers`	`Sequence[int]`	Iterable of fully connected layer units for the joint part of the critic network.	`(256, 128)`
`critic_network`	`Optional[Network]`	Optional custom critic network to use.	`None`
`actor_learning_rate`	`float`	Actor network learning rate.	`0.0003`
`critic_learning_rate`	`float`	Critic network learning rate.	`0.0003`
`alpha_learning_rate`	`float`	Alpha (entropy regularization) learning rate.	`0.0003`
`gamma`	`float`	Discount factor for future rewards.	`0.99`
`target_update_tau`	`float`	Factor for soft update of target networks.	`0.005`
`target_update_period`	`int`	Period for soft update of target networks.	`1`
`reward_scale_factor`	`float`	Multiplicative scale for the reward.	`1.0`
`gradient_clipping`	`Optional[float]`	Norm length to clip gradients.	`None`
`debug_summaries`	`bool`	Whether to emit debug summaries.	`False`
`summarize_grads_and_vars`	`bool`	Whether to summarize gradients and variables.	`False`
`train_step_counter`	`Optional[Variable]`	An optional counter to increment every time the train op is run. Defaults to the global_step.	`None`

Returns:

Type	Description
`TFAgent`	A BaseAgent instance with the SAC agent.

Networks#

smart_control.reinforcement_learning.agents.networks.sac_networks #

Network architectures for SAC agent.

This module provides functions to create actor and critic networks for SAC agents.

create_fc_network #

create_fc_network(layer_units: Sequence[int]) -> tf.keras.Model

Creates a fully connected network.

Parameters:

Name	Type	Description	Default
`layer_units`	`Sequence[int]`	A sequence of layer units.	required

Returns:

Type	Description
`Model`	A sequential model of dense layers.

create_identity_layer #

create_identity_layer() -> tf.keras.layers.Layer

Creates an identity layer.

Returns:

Type	Description
`Layer`	A Lambda layer that returns its input.

create_sequential_actor_network #

create_sequential_actor_network(
    actor_fc_layers: Sequence[int], action_tensor_spec: NestedTensorSpec
) -> sequential.Sequential

Create a sequential actor network for SAC.

Parameters:

Name	Type	Description	Default
`actor_fc_layers`	`Sequence[int]`	Units for actor network fully connected layers.	required
`action_tensor_spec`	`NestedTensorSpec`	The action tensor spec.	required

Returns:

Type	Description
`Sequential`	A sequential actor network.

create_sequential_critic_network #

create_sequential_critic_network(
    obs_fc_layer_units: Sequence[int],
    action_fc_layer_units: Sequence[int],
    joint_fc_layer_units: Sequence[int],
) -> sequential.Sequential

Create a sequential critic network for SAC.

Parameters:

Name	Type	Description	Default
`obs_fc_layer_units`	`Sequence[int]`	Units for observation network layers.	required
`action_fc_layer_units`	`Sequence[int]`	Units for action network layers.	required
`joint_fc_layer_units`	`Sequence[int]`	Units for joint network layers.	required

Returns:

Type	Description
`Sequential`	A sequential critic network.