Reward Functions#
smart_control.reward.base_setpoint_energy_carbon_reward
#
Base Reward Function for Smart Buildings.
BaseSetpointEnergyCarbonRewardFunction
#
BaseSetpointEnergyCarbonRewardFunction(
max_productivity_personhour_usd: float,
productivity_midpoint_delta: float,
productivity_decay_stiffness: float,
)
Bases: BaseRewardFunction
Reward function based on productivity, energy cost and carbon emission.
Attributes:
Name | Type | Description |
---|---|---|
max_productivity_personhour_usd |
max productivity for average occupancy in $ |
|
productivity_midpoint_delta |
temp difference from setpoint of half prod. |
|
productivity_decay_stiffness |
midpoint slope of the decay curve |
compute_reward
#
Returns the real-valued reward for the current state of the building.
smart_control.reward.electricity_energy_cost
#
Energy carbon and cost model for electricity.
ElectricityEnergyCost
#
ElectricityEnergyCost(
weekday_energy_prices: Sequence[float] = WEEKDAY_PRICE_BY_HOUR,
weekend_energy_prices: Sequence[float] = WEEKEND_PRICE_BY_HOUR,
carbon_emission_rates: Sequence[float] = CARBON_EMISSION_BY_HOUR,
)
Bases: BaseEnergyCost
Energy cost and carbon emission model for reward function.
carbon
#
Returns the carbon produced in this time step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start_time
|
Timestamp
|
start of window |
required |
end_time
|
Timestamp
|
end of window |
required |
energy_rate
|
float
|
power applies in W, if negative then energy is drawn away (i.e., cooling), positive energy_rate means heating. |
required |
Returns:
Type | Description |
---|---|
float
|
carbon emitted [kg] for the energy consumed over the interval. |
cost
#
Returns the cost of energy from this time step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start_time
|
Timestamp
|
start of window |
required |
end_time
|
Timestamp
|
end of window |
required |
energy_rate
|
float
|
power applies in W, if negative then energy is drawn away (i.e., cooling), positive energy_rate means heating. |
required |
Returns:
Type | Description |
---|---|
float
|
cost in USD for the energy consumed over the interval. |
smart_control.reward.natural_gas_energy_cost
#
Energy carbon and cost model for natural gas.
NaturalGasEnergyCost
#
Bases: BaseEnergyCost
Energy cost and carbon emission model for reward function.
Attributes:
Name | Type | Description |
---|---|---|
gas_price_per_month |
Cost/energy consumed [$/1000 cubic feet] by month. |
carbon
#
Returns the carbon produced in this time step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start_time
|
Timestamp
|
start of window |
required |
end_time
|
Timestamp
|
end of window |
required |
energy_rate
|
float
|
thermal power in W applied during the window. |
required |
Returns:
Type | Description |
---|---|
float
|
carbon mass consumed in kg. |
Raises ValueError if the energy is negative since natural gas can only be applied for heating.
cost
#
Returns the cost of energy from this time step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start_time
|
Timestamp
|
start of window |
required |
end_time
|
Timestamp
|
end of window |
required |
energy_rate
|
float
|
thermal power in W applied during the window. |
required |
Returns:
Type | Description |
---|---|
float
|
cost of energy consumed in window in USD. |
Raises ValueError if the energy is negative since natural gas can only be applied for heating.
smart_control.reward.setpoint_energy_carbon_reward
#
Reward Function for Smart Buildings.
The reward function provides a feedback signal to the reinforcement learning agent that indicates the benefit of the action taken. During training, the agent learns an action policy to maximize the cumulative, or long-term reward.
For this pilot there are three principal factors that contribute to the reward function: * Setpoint: Maintaining the zone temperatures within heating and cooling setpoints results in a positive reward, and any temperature outside of setpoints may also result in a negative reward (i.e., penalty). * Cost: The cost of electricity and natural gas is a negative reward (cost). Then by minimizing negative rewards/maximizing positive reward, the agent will reduce overall energy cost. To compute the cost, both energy consumption and the energy cost schedules are required. * Carbon: By receiving negative reward for consuming natural gas, the agent will learn to shift energy use to renewable sources. This factor requires an energy-to-carbon conversion formula/table.
The three factors can be scaled and combined into a single reward function
r = s(setpoint) - u x f(cost) - w x g(carbon)
where: r is the incremental reward at this step s(setpoint) is the reward for maintaining setpoint f(cost) is the cost of consuming electrical and natural gas energy g(carbon) is the cost of emitting carbon, and u, w are weighing factors for cost and carbon depending on the policy.
The fundamental metric unit of energy is the Joule (J), and the unit of energy applied over a fixed time interval (energy rate) is power measured in J/sec or Watts. However, energy is expressed based on diverse traditional units. For example, electrical energy unit is one hour of 1,000 W, or kWh. However, natural gas energy is measured in British thermal units (Btu) or cubic feet. So coordinate conversions are necessary.
Notes on Setpoint Reward: Setpoint reward is the incremental reward (beneficial feedback) for maintaining comfort conditions inside the zone.
We postulate that productivity is adversely affected when the zone air temperature is outside the deadband. Near the deadband, individual productivity decreases a little, but decreases smoothly and monotonically the farther the zone air temperature is away from the deadband.
Cumulative productivity is the maximum potential reward, and is parameterized by how many persons occupy the zones, and the average hourly per-person productivity.
Two other parameters are added to describe how productivity decays outside the deadband.
productivity_midpoint_delta_temp: The difference in temperature beyond the setpoint, at which productivity decays to 50%. decay_stiffness: Parameter that controls the slope of the decay, the higher the value, the steeper the slope.
The function for setpoint reward is based on a piecewise logistic regression. Maximum/full productivity occurs when the zone is occupied and inside its deadband. Productivity decays smoothly on a logistic curve outside the deadband.
SetpointEnergyCarbonRewardFunction
#
SetpointEnergyCarbonRewardFunction(
max_productivity_personhour_usd: float,
productivity_midpoint_delta: float,
productivity_decay_stiffness: float,
electricity_energy_cost: BaseEnergyCost,
natural_gas_energy_cost: BaseEnergyCost,
energy_cost_weight: float,
carbon_cost_weight: float,
carbon_cost_factor: float,
reward_normalizer_shift: float = 0.0,
reward_normalizer_scale: float = 1.0,
)
Bases: BaseSetpointEnergyCarbonRewardFunction
Reward function based on productivity, energy cost and carbon emission.
Attributes:
Name | Type | Description |
---|---|---|
max_productivity_personhour_usd |
average occupant hourly productivity in $ |
|
productivity_midpoint_delta |
temp difference from setpoint of half prod. |
|
productivity_decay_stiffness |
midpoint slope of the decay curve |
|
electricity_energy_cost |
cost and carbon model for electricity |
|
natural_gas_energy_cost |
cost and carbon model for natural gas |
|
energy_cost_weight |
u-coefficient described above |
|
carbon_cost_weight |
w-coefficient described above |
|
carbon_cost_factor |
cost value in $ per kg carbon emitted |
|
reward_normalizer_shift |
shift reward by subtracting the from the reward |
|
reward_normalizer_scale |
divide the shifted reward by this value |
compute_reward
#
Returns the real-valued reward for the current state of the building.
smart_control.reward.setpoint_energy_carbon_regret
#
Reward (Regret) Function for Smart Buildings.
The reward function provides a feedback signal to the reinforcement learning agent that indicates the benefit of the action taken. During training, the agent learns an action policy to maximize the cumulative, or long-term reward.
For this pilot there are three principal factors that contribute to the reward function: * Setpoint: Maintaining the zone temperatures within heating and cooling setpoints results in a positive reward, and any temperature outside of setpoints may also result in a negative reward (i.e., penalty). * Cost: The cost of electricity and natural gas is a negative reward (cost). Then by minimizing negative rewards/maximizing positive reward, the agent will reduce overall energy cost. To compute the cost, both energy consumption and the energy cost schedules are required. * Carbon: By receiving negative reward for consuming natural gas, the agent will learn to shift energy use to renewable sources. This factor requires an energy-to-carbon conversion formula/table.
The three factors can be scaled and combined into a single regret function
r_i = [u x (s(setpoint) - s_max)/s_max - v x f(cost)/f_max - w x g(carbon)/g_max] / [u + v + w]
r_i -> [-1, 0]
where: r_i is the incremental reward at step i s(setpoint) is the reward for maintining temperature inside setpoint s_max = occupancy x productivity, the maximum possible reward f(cost) is the cost of consuming electrical and natural gas energy f_max: maximum momentary cost that occurs at max energy use g(carbon) is the cost of emitting carbon, g_max and u, w, w are weighing factors for the policy.
The fundamental metric unit of energy is the Joule (J), and the unit of energy applied over a fixed time interval (energy rate) is power measured in J/sec or Watts. However, energy is expressed based on diverse traditional units. For example, electrical energy unit is one hour of 1,000 W, or kWh. However, natural gas energy is measured in British thermal units (Btu) or cubic feet. So coordinate conversions are necessary.
Notes on Setpoint Reward: Setpoint reward is the incremental reward (beneficial feedback) for maintaining comfort conditions inside the zone.
We postulate that productivity is adversely affected when the zone air temperature is outside the deadband. Near the deadband, individual productivity decreases a little, but decreases smoothly and monotonically the farther the zone air temperature is away from the deadband.
Cumulative productivity is the maximum potential reward, and is parameterized by how many persons occupy the zones, and the average hourly per-person productivity.
Two other parameters are added to describe how productivity decays outside the deadband.
productivity_midpoint_delta_temp: The difference in temperature beyond the setpoint, at which productivity decays to 50%. decay_stiffness: Parameter that controls the slope of the decay, the higher the value, the steeper the slope.
The function for setpoint reward is based on a piecewise logistic regression. Maximum/full productivity occurs when the zone is occupied and inside its deadband. Productivity decays smoothly on a logistic curve outside the deadband.
SetpointEnergyCarbonRegretFunction
#
SetpointEnergyCarbonRegretFunction(
max_productivity_personhour_usd: float,
min_productivity_personhour_usd: float,
max_electricity_rate: float,
max_natural_gas_rate: float,
productivity_midpoint_delta: float,
productivity_decay_stiffness: float,
electricity_energy_cost: BaseEnergyCost,
natural_gas_energy_cost: BaseEnergyCost,
productivity_weight: float,
energy_cost_weight: float,
carbon_emission_weight: float,
)
Bases: BaseSetpointEnergyCarbonRewardFunction
Reward function based on productivity, energy cost and carbon emission.
Attributes:
Name | Type | Description |
---|---|---|
max_productivity_personhour_usd |
max occupant hourly productivity in $ |
|
min_productivity_personhour_usd |
min occupant hourly productivity in $ |
|
productivity_midpoint_delta |
temp difference from setpoint of half prod. |
|
productivity_decay_stiffness |
midpoint slope of the decay curve |
|
electricity_energy_cost |
cost and carbon model for electricity |
|
natural_gas_energy_cost |
cost and carbon model for natural gas |
|
energy_cost_weight |
u-coefficient described above |
|
carbon_emission_weight |
w-coefficient described above |
|
carbon_cost_factor |
cost value in $ per kg carbon emitted |
compute_reward
#
Returns the real-valued reward for the current state of the building.