Reward Functions#

smart_control.reward.base_setpoint_energy_carbon_reward #

Base Reward Function for Smart Buildings.

BaseSetpointEnergyCarbonRewardFunction #

BaseSetpointEnergyCarbonRewardFunction(
    max_productivity_personhour_usd: float,
    productivity_midpoint_delta: float,
    productivity_decay_stiffness: float,
)

Bases: BaseRewardFunction

Reward function based on productivity, energy cost and carbon emission.

Attributes:

Name	Type	Description
`max_productivity_personhour_usd`		max productivity for average occupancy in $
`productivity_midpoint_delta`		temp difference from setpoint of half prod.
`productivity_decay_stiffness`		midpoint slope of the decay curve

compute_reward #

compute_reward(
    reward_info: RewardInfo,
) -> smart_control_reward_pb2.RewardResponse

Returns the real-valued reward for the current state of the building.

smart_control.reward.electricity_energy_cost #

Energy carbon and cost model for electricity.

ElectricityEnergyCost #

ElectricityEnergyCost(
    weekday_energy_prices: Sequence[float] = WEEKDAY_PRICE_BY_HOUR,
    weekend_energy_prices: Sequence[float] = WEEKEND_PRICE_BY_HOUR,
    carbon_emission_rates: Sequence[float] = CARBON_EMISSION_BY_HOUR,
)

Bases: BaseEnergyCost

Energy cost and carbon emission model for reward function.

carbon #

carbon(start_time: Timestamp, end_time: Timestamp, energy_rate: float) -> float

Returns the carbon produced in this time step.

Parameters:

Name	Type	Description	Default
`start_time`	`Timestamp`	start of window	required
`end_time`	`Timestamp`	end of window	required
`energy_rate`	`float`	power applies in W, if negative then energy is drawn away (i.e., cooling), positive energy_rate means heating.	required

Returns:

Type	Description
`float`	carbon emitted [kg] for the energy consumed over the interval.

cost #

cost(start_time: Timestamp, end_time: Timestamp, energy_rate: float) -> float

Returns the cost of energy from this time step.

Parameters:

Name	Type	Description	Default
`start_time`	`Timestamp`	start of window	required
`end_time`	`Timestamp`	end of window	required
`energy_rate`	`float`	power applies in W, if negative then energy is drawn away (i.e., cooling), positive energy_rate means heating.	required

Returns:

Type	Description
`float`	cost in USD for the energy consumed over the interval.

smart_control.reward.natural_gas_energy_cost #

Energy carbon and cost model for natural gas.

NaturalGasEnergyCost #

NaturalGasEnergyCost(
    gas_price_by_month: Sequence[float] = GAS_PRICE_BY_MONTH_SOURCE,
)

Bases: BaseEnergyCost

Energy cost and carbon emission model for reward function.

Attributes:

Name	Type	Description
`gas_price_per_month`		Cost/energy consumed [$/1000 cubic feet] by month.

carbon #

carbon(start_time: Timestamp, end_time: Timestamp, energy_rate: float) -> float

Returns the carbon produced in this time step.

Parameters:

Name	Type	Description	Default
`start_time`	`Timestamp`	start of window	required
`end_time`	`Timestamp`	end of window	required
`energy_rate`	`float`	thermal power in W applied during the window.	required

Returns:

Type	Description
`float`	carbon mass consumed in kg.

Raises ValueError if the energy is negative since natural gas can only be applied for heating.

cost #

cost(start_time: Timestamp, end_time: Timestamp, energy_rate: float) -> float

Returns the cost of energy from this time step.

Parameters:

Name	Type	Description	Default
`start_time`	`Timestamp`	start of window	required
`end_time`	`Timestamp`	end of window	required
`energy_rate`	`float`	thermal power in W applied during the window.	required

Returns:

Type	Description
`float`	cost of energy consumed in window in USD.

Raises ValueError if the energy is negative since natural gas can only be applied for heating.

smart_control.reward.setpoint_energy_carbon_reward #

Reward Function for Smart Buildings.

The reward function provides a feedback signal to the reinforcement learning agent that indicates the benefit of the action taken. During training, the agent learns an action policy to maximize the cumulative, or long-term reward.

For this pilot there are three principal factors that contribute to the reward function: * Setpoint: Maintaining the zone temperatures within heating and cooling setpoints results in a positive reward, and any temperature outside of setpoints may also result in a negative reward (i.e., penalty). * Cost: The cost of electricity and natural gas is a negative reward (cost). Then by minimizing negative rewards/maximizing positive reward, the agent will reduce overall energy cost. To compute the cost, both energy consumption and the energy cost schedules are required. * Carbon: By receiving negative reward for consuming natural gas, the agent will learn to shift energy use to renewable sources. This factor requires an energy-to-carbon conversion formula/table.

The three factors can be scaled and combined into a single reward function

r = s(setpoint) - u x f(cost) - w x g(carbon)

where: r is the incremental reward at this step s(setpoint) is the reward for maintaining setpoint f(cost) is the cost of consuming electrical and natural gas energy g(carbon) is the cost of emitting carbon, and u, w are weighing factors for cost and carbon depending on the policy.

The fundamental metric unit of energy is the Joule (J), and the unit of energy applied over a fixed time interval (energy rate) is power measured in J/sec or Watts. However, energy is expressed based on diverse traditional units. For example, electrical energy unit is one hour of 1,000 W, or kWh. However, natural gas energy is measured in British thermal units (Btu) or cubic feet. So coordinate conversions are necessary.

Notes on Setpoint Reward: Setpoint reward is the incremental reward (beneficial feedback) for maintaining comfort conditions inside the zone.

We postulate that productivity is adversely affected when the zone air temperature is outside the deadband. Near the deadband, individual productivity decreases a little, but decreases smoothly and monotonically the farther the zone air temperature is away from the deadband.

Cumulative productivity is the maximum potential reward, and is parameterized by how many persons occupy the zones, and the average hourly per-person productivity.

Two other parameters are added to describe how productivity decays outside the deadband.

productivity_midpoint_delta_temp: The difference in temperature beyond the setpoint, at which productivity decays to 50%. decay_stiffness: Parameter that controls the slope of the decay, the higher the value, the steeper the slope.

The function for setpoint reward is based on a piecewise logistic regression. Maximum/full productivity occurs when the zone is occupied and inside its deadband. Productivity decays smoothly on a logistic curve outside the deadband.

SetpointEnergyCarbonRewardFunction #

SetpointEnergyCarbonRewardFunction(
    max_productivity_personhour_usd: float,
    productivity_midpoint_delta: float,
    productivity_decay_stiffness: float,
    electricity_energy_cost: BaseEnergyCost,
    natural_gas_energy_cost: BaseEnergyCost,
    energy_cost_weight: float,
    carbon_cost_weight: float,
    carbon_cost_factor: float,
    reward_normalizer_shift: float = 0.0,
    reward_normalizer_scale: float = 1.0,
)

Bases: BaseSetpointEnergyCarbonRewardFunction

Reward function based on productivity, energy cost and carbon emission.

Attributes:

Name	Type	Description
`max_productivity_personhour_usd`		average occupant hourly productivity in $
`productivity_midpoint_delta`		temp difference from setpoint of half prod.
`productivity_decay_stiffness`		midpoint slope of the decay curve
`electricity_energy_cost`		cost and carbon model for electricity
`natural_gas_energy_cost`		cost and carbon model for natural gas
`energy_cost_weight`		u-coefficient described above
`carbon_cost_weight`		w-coefficient described above
`carbon_cost_factor`		cost value in $ per kg carbon emitted
`reward_normalizer_shift`		shift reward by subtracting the from the reward
`reward_normalizer_scale`		divide the shifted reward by this value

compute_reward #

compute_reward(
    reward_info: RewardInfo,
) -> smart_control_reward_pb2.RewardResponse

Returns the real-valued reward for the current state of the building.

smart_control.reward.setpoint_energy_carbon_regret #

Reward (Regret) Function for Smart Buildings.

The reward function provides a feedback signal to the reinforcement learning agent that indicates the benefit of the action taken. During training, the agent learns an action policy to maximize the cumulative, or long-term reward.

For this pilot there are three principal factors that contribute to the reward function: * Setpoint: Maintaining the zone temperatures within heating and cooling setpoints results in a positive reward, and any temperature outside of setpoints may also result in a negative reward (i.e., penalty). * Cost: The cost of electricity and natural gas is a negative reward (cost). Then by minimizing negative rewards/maximizing positive reward, the agent will reduce overall energy cost. To compute the cost, both energy consumption and the energy cost schedules are required. * Carbon: By receiving negative reward for consuming natural gas, the agent will learn to shift energy use to renewable sources. This factor requires an energy-to-carbon conversion formula/table.

The three factors can be scaled and combined into a single regret function

r_i = [u x (s(setpoint) - s_max)/s_max - v x f(cost)/f_max - w x g(carbon)/g_max] / [u + v + w]

r_i -> [-1, 0]

where: r_i is the incremental reward at step i s(setpoint) is the reward for maintining temperature inside setpoint s_max = occupancy x productivity, the maximum possible reward f(cost) is the cost of consuming electrical and natural gas energy f_max: maximum momentary cost that occurs at max energy use g(carbon) is the cost of emitting carbon, g_max and u, w, w are weighing factors for the policy.

The fundamental metric unit of energy is the Joule (J), and the unit of energy applied over a fixed time interval (energy rate) is power measured in J/sec or Watts. However, energy is expressed based on diverse traditional units. For example, electrical energy unit is one hour of 1,000 W, or kWh. However, natural gas energy is measured in British thermal units (Btu) or cubic feet. So coordinate conversions are necessary.

Notes on Setpoint Reward: Setpoint reward is the incremental reward (beneficial feedback) for maintaining comfort conditions inside the zone.

We postulate that productivity is adversely affected when the zone air temperature is outside the deadband. Near the deadband, individual productivity decreases a little, but decreases smoothly and monotonically the farther the zone air temperature is away from the deadband.

Cumulative productivity is the maximum potential reward, and is parameterized by how many persons occupy the zones, and the average hourly per-person productivity.

Two other parameters are added to describe how productivity decays outside the deadband.

productivity_midpoint_delta_temp: The difference in temperature beyond the setpoint, at which productivity decays to 50%. decay_stiffness: Parameter that controls the slope of the decay, the higher the value, the steeper the slope.

The function for setpoint reward is based on a piecewise logistic regression. Maximum/full productivity occurs when the zone is occupied and inside its deadband. Productivity decays smoothly on a logistic curve outside the deadband.

SetpointEnergyCarbonRegretFunction #

SetpointEnergyCarbonRegretFunction(
    max_productivity_personhour_usd: float,
    min_productivity_personhour_usd: float,
    max_electricity_rate: float,
    max_natural_gas_rate: float,
    productivity_midpoint_delta: float,
    productivity_decay_stiffness: float,
    electricity_energy_cost: BaseEnergyCost,
    natural_gas_energy_cost: BaseEnergyCost,
    productivity_weight: float,
    energy_cost_weight: float,
    carbon_emission_weight: float,
)

Bases: BaseSetpointEnergyCarbonRewardFunction

Reward function based on productivity, energy cost and carbon emission.

Attributes:

Name	Type	Description
`max_productivity_personhour_usd`		max occupant hourly productivity in $
`min_productivity_personhour_usd`		min occupant hourly productivity in $
`productivity_midpoint_delta`		temp difference from setpoint of half prod.
`productivity_decay_stiffness`		midpoint slope of the decay curve
`electricity_energy_cost`		cost and carbon model for electricity
`natural_gas_energy_cost`		cost and carbon model for natural gas
`energy_cost_weight`		u-coefficient described above
`carbon_emission_weight`		w-coefficient described above
`carbon_cost_factor`		cost value in $ per kg carbon emitted

compute_reward #

compute_reward(
    reward_info: RewardInfo,
) -> smart_control_reward_pb2.RewardResponse

Returns the real-valued reward for the current state of the building.