Skip to content

Reward Functions#

smart_control.reward.base_setpoint_energy_carbon_reward #

Base Reward Function for Smart Buildings.

BaseSetpointEnergyCarbonRewardFunction #

BaseSetpointEnergyCarbonRewardFunction(
    max_productivity_personhour_usd: float,
    productivity_midpoint_delta: float,
    productivity_decay_stiffness: float,
)

Bases: BaseRewardFunction

Reward function based on productivity, energy cost and carbon emission.

Attributes:

Name Type Description
max_productivity_personhour_usd

max productivity for average occupancy in $

productivity_midpoint_delta

temp difference from setpoint of half prod.

productivity_decay_stiffness

midpoint slope of the decay curve

compute_reward #

compute_reward(
    reward_info: RewardInfo,
) -> smart_control_reward_pb2.RewardResponse

Returns the real-valued reward for the current state of the building.

smart_control.reward.electricity_energy_cost #

Energy carbon and cost model for electricity.

ElectricityEnergyCost #

ElectricityEnergyCost(
    weekday_energy_prices: Sequence[float] = WEEKDAY_PRICE_BY_HOUR,
    weekend_energy_prices: Sequence[float] = WEEKEND_PRICE_BY_HOUR,
    carbon_emission_rates: Sequence[float] = CARBON_EMISSION_BY_HOUR,
)

Bases: BaseEnergyCost

Energy cost and carbon emission model for reward function.

carbon #

carbon(start_time: Timestamp, end_time: Timestamp, energy_rate: float) -> float

Returns the carbon produced in this time step.

Parameters:

Name Type Description Default
start_time Timestamp

start of window

required
end_time Timestamp

end of window

required
energy_rate float

power applies in W, if negative then energy is drawn away (i.e., cooling), positive energy_rate means heating.

required

Returns:

Type Description
float

carbon emitted [kg] for the energy consumed over the interval.

cost #

cost(start_time: Timestamp, end_time: Timestamp, energy_rate: float) -> float

Returns the cost of energy from this time step.

Parameters:

Name Type Description Default
start_time Timestamp

start of window

required
end_time Timestamp

end of window

required
energy_rate float

power applies in W, if negative then energy is drawn away (i.e., cooling), positive energy_rate means heating.

required

Returns:

Type Description
float

cost in USD for the energy consumed over the interval.

smart_control.reward.natural_gas_energy_cost #

Energy carbon and cost model for natural gas.

NaturalGasEnergyCost #

NaturalGasEnergyCost(
    gas_price_by_month: Sequence[float] = GAS_PRICE_BY_MONTH_SOURCE,
)

Bases: BaseEnergyCost

Energy cost and carbon emission model for reward function.

Attributes:

Name Type Description
gas_price_per_month

Cost/energy consumed [$/1000 cubic feet] by month.

carbon #

carbon(start_time: Timestamp, end_time: Timestamp, energy_rate: float) -> float

Returns the carbon produced in this time step.

Parameters:

Name Type Description Default
start_time Timestamp

start of window

required
end_time Timestamp

end of window

required
energy_rate float

thermal power in W applied during the window.

required

Returns:

Type Description
float

carbon mass consumed in kg.

Raises ValueError if the energy is negative since natural gas can only be applied for heating.

cost #

cost(start_time: Timestamp, end_time: Timestamp, energy_rate: float) -> float

Returns the cost of energy from this time step.

Parameters:

Name Type Description Default
start_time Timestamp

start of window

required
end_time Timestamp

end of window

required
energy_rate float

thermal power in W applied during the window.

required

Returns:

Type Description
float

cost of energy consumed in window in USD.

Raises ValueError if the energy is negative since natural gas can only be applied for heating.

smart_control.reward.setpoint_energy_carbon_reward #

Reward Function for Smart Buildings.

The reward function provides a feedback signal to the reinforcement learning agent that indicates the benefit of the action taken. During training, the agent learns an action policy to maximize the cumulative, or long-term reward.

For this pilot there are three principal factors that contribute to the reward function: * Setpoint: Maintaining the zone temperatures within heating and cooling setpoints results in a positive reward, and any temperature outside of setpoints may also result in a negative reward (i.e., penalty). * Cost: The cost of electricity and natural gas is a negative reward (cost). Then by minimizing negative rewards/maximizing positive reward, the agent will reduce overall energy cost. To compute the cost, both energy consumption and the energy cost schedules are required. * Carbon: By receiving negative reward for consuming natural gas, the agent will learn to shift energy use to renewable sources. This factor requires an energy-to-carbon conversion formula/table.

The three factors can be scaled and combined into a single reward function

r = s(setpoint) - u x f(cost) - w x g(carbon)

where: r is the incremental reward at this step s(setpoint) is the reward for maintaining setpoint f(cost) is the cost of consuming electrical and natural gas energy g(carbon) is the cost of emitting carbon, and u, w are weighing factors for cost and carbon depending on the policy.

The fundamental metric unit of energy is the Joule (J), and the unit of energy applied over a fixed time interval (energy rate) is power measured in J/sec or Watts. However, energy is expressed based on diverse traditional units. For example, electrical energy unit is one hour of 1,000 W, or kWh. However, natural gas energy is measured in British thermal units (Btu) or cubic feet. So coordinate conversions are necessary.

Notes on Setpoint Reward: Setpoint reward is the incremental reward (beneficial feedback) for maintaining comfort conditions inside the zone.

We postulate that productivity is adversely affected when the zone air temperature is outside the deadband. Near the deadband, individual productivity decreases a little, but decreases smoothly and monotonically the farther the zone air temperature is away from the deadband.

Cumulative productivity is the maximum potential reward, and is parameterized by how many persons occupy the zones, and the average hourly per-person productivity.

Two other parameters are added to describe how productivity decays outside the deadband.

productivity_midpoint_delta_temp: The difference in temperature beyond the setpoint, at which productivity decays to 50%. decay_stiffness: Parameter that controls the slope of the decay, the higher the value, the steeper the slope.

The function for setpoint reward is based on a piecewise logistic regression. Maximum/full productivity occurs when the zone is occupied and inside its deadband. Productivity decays smoothly on a logistic curve outside the deadband.

SetpointEnergyCarbonRewardFunction #

SetpointEnergyCarbonRewardFunction(
    max_productivity_personhour_usd: float,
    productivity_midpoint_delta: float,
    productivity_decay_stiffness: float,
    electricity_energy_cost: BaseEnergyCost,
    natural_gas_energy_cost: BaseEnergyCost,
    energy_cost_weight: float,
    carbon_cost_weight: float,
    carbon_cost_factor: float,
    reward_normalizer_shift: float = 0.0,
    reward_normalizer_scale: float = 1.0,
)

Bases: BaseSetpointEnergyCarbonRewardFunction

Reward function based on productivity, energy cost and carbon emission.

Attributes:

Name Type Description
max_productivity_personhour_usd

average occupant hourly productivity in $

productivity_midpoint_delta

temp difference from setpoint of half prod.

productivity_decay_stiffness

midpoint slope of the decay curve

electricity_energy_cost

cost and carbon model for electricity

natural_gas_energy_cost

cost and carbon model for natural gas

energy_cost_weight

u-coefficient described above

carbon_cost_weight

w-coefficient described above

carbon_cost_factor

cost value in $ per kg carbon emitted

reward_normalizer_shift

shift reward by subtracting the from the reward

reward_normalizer_scale

divide the shifted reward by this value

compute_reward #

compute_reward(
    reward_info: RewardInfo,
) -> smart_control_reward_pb2.RewardResponse

Returns the real-valued reward for the current state of the building.

smart_control.reward.setpoint_energy_carbon_regret #

Reward (Regret) Function for Smart Buildings.

The reward function provides a feedback signal to the reinforcement learning agent that indicates the benefit of the action taken. During training, the agent learns an action policy to maximize the cumulative, or long-term reward.

For this pilot there are three principal factors that contribute to the reward function: * Setpoint: Maintaining the zone temperatures within heating and cooling setpoints results in a positive reward, and any temperature outside of setpoints may also result in a negative reward (i.e., penalty). * Cost: The cost of electricity and natural gas is a negative reward (cost). Then by minimizing negative rewards/maximizing positive reward, the agent will reduce overall energy cost. To compute the cost, both energy consumption and the energy cost schedules are required. * Carbon: By receiving negative reward for consuming natural gas, the agent will learn to shift energy use to renewable sources. This factor requires an energy-to-carbon conversion formula/table.

The three factors can be scaled and combined into a single regret function

r_i = [u x (s(setpoint) - s_max)/s_max - v x f(cost)/f_max - w x g(carbon)/g_max] / [u + v + w]

r_i -> [-1, 0]

where: r_i is the incremental reward at step i s(setpoint) is the reward for maintining temperature inside setpoint s_max = occupancy x productivity, the maximum possible reward f(cost) is the cost of consuming electrical and natural gas energy f_max: maximum momentary cost that occurs at max energy use g(carbon) is the cost of emitting carbon, g_max and u, w, w are weighing factors for the policy.

The fundamental metric unit of energy is the Joule (J), and the unit of energy applied over a fixed time interval (energy rate) is power measured in J/sec or Watts. However, energy is expressed based on diverse traditional units. For example, electrical energy unit is one hour of 1,000 W, or kWh. However, natural gas energy is measured in British thermal units (Btu) or cubic feet. So coordinate conversions are necessary.

Notes on Setpoint Reward: Setpoint reward is the incremental reward (beneficial feedback) for maintaining comfort conditions inside the zone.

We postulate that productivity is adversely affected when the zone air temperature is outside the deadband. Near the deadband, individual productivity decreases a little, but decreases smoothly and monotonically the farther the zone air temperature is away from the deadband.

Cumulative productivity is the maximum potential reward, and is parameterized by how many persons occupy the zones, and the average hourly per-person productivity.

Two other parameters are added to describe how productivity decays outside the deadband.

productivity_midpoint_delta_temp: The difference in temperature beyond the setpoint, at which productivity decays to 50%. decay_stiffness: Parameter that controls the slope of the decay, the higher the value, the steeper the slope.

The function for setpoint reward is based on a piecewise logistic regression. Maximum/full productivity occurs when the zone is occupied and inside its deadband. Productivity decays smoothly on a logistic curve outside the deadband.

SetpointEnergyCarbonRegretFunction #

SetpointEnergyCarbonRegretFunction(
    max_productivity_personhour_usd: float,
    min_productivity_personhour_usd: float,
    max_electricity_rate: float,
    max_natural_gas_rate: float,
    productivity_midpoint_delta: float,
    productivity_decay_stiffness: float,
    electricity_energy_cost: BaseEnergyCost,
    natural_gas_energy_cost: BaseEnergyCost,
    productivity_weight: float,
    energy_cost_weight: float,
    carbon_emission_weight: float,
)

Bases: BaseSetpointEnergyCarbonRewardFunction

Reward function based on productivity, energy cost and carbon emission.

Attributes:

Name Type Description
max_productivity_personhour_usd

max occupant hourly productivity in $

min_productivity_personhour_usd

min occupant hourly productivity in $

productivity_midpoint_delta

temp difference from setpoint of half prod.

productivity_decay_stiffness

midpoint slope of the decay curve

electricity_energy_cost

cost and carbon model for electricity

natural_gas_energy_cost

cost and carbon model for natural gas

energy_cost_weight

u-coefficient described above

carbon_emission_weight

w-coefficient described above

carbon_cost_factor

cost value in $ per kg carbon emitted

compute_reward #

compute_reward(
    reward_info: RewardInfo,
) -> smart_control_reward_pb2.RewardResponse

Returns the real-valued reward for the current state of the building.