Investigating whether the inclusion of humid heat metrics improves estimates of AC penetration rates: a case study of Southern California

Global cooling capacity is expected to triple by 2050, as rising temperatures and humidity levels intensify the heat stress that populations experience. Although air conditioning (AC) is a key adaptation tool for reducing exposure to extreme heat, we currently have a limited understanding of patterns of AC ownership. Developing high resolution estimates of AC ownership is critical for identifying communities vulnerable to extreme heat and for informing future electricity system investments as increases in cooling demand will exacerbate strain placed on aging power systems. In this study, we utilize a segmented linear regression model to identify AC ownership across Southern California by investigating the relationship between daily household electricity usage and a variety of humid heat metrics (HHMs) for ~160000 homes. We hypothesize that AC penetration rate estimates, i.e. the percentage of homes in a defined area that have AC, can be improved by considering indices that incorporate humidity as well as temperature. We run the model for each household with each unique heat metric for the years 2015 and 2016 and compare differences in AC ownership estimates at the census tract level. In total, 81% of the households were identified as having AC by at least one heat metric while 69% of the homes were determined to have AC with a consensus across all five of the heat metrics. Regression results also showed that the r 2 values for the dry bulb temperature (DBT) (0.39) regression were either comparable to or higher than the r 2 values for HHMs (0.15–0.40). Our results suggest that using a combination of heat metrics can increase confidence in AC penetration rate estimates, but using DBT alone produces similar estimates to other HHMs, which are often more difficult to access, individually. Future work should investigate these results in regions with high humidity.


Introduction
Global heat stress projections show significant growth in both exposure to and frequency of dangerous heat conditions through the 21st century [1,2].As temperatures and humidity rise, widespread access to air conditioning (AC) will be crucial to mitigate the health risks posed by exposure to extreme heat events [3][4][5].However, growth in AC adoption and use has major implications for the world's energy systems and, depending on the pace of decarbonization effort, greenhouse gas emissions.By 2050, it is estimated that AC will be the second largest source of global electricity demand due in large part to the huge growth in cooling units expected in developing countries, many of which are in the hottest regions of the world [6].Increasing cooling demand will exacerbate the intensity and frequency of peak demand events putting even more strain on aging electricity systems [7,8]; when these electricity systems fail, power outages interrupt vital services and increase heat exposure, putting public health at risk [9][10][11][12][13].
Although the relationship between electricity demand and temperature has been well established [14][15][16], there are aspects of the thermal environment beyond air temperature that influence human comfort levels [1,10,17,18], and therefore, energyconsuming behaviors.Heat stress, a physiological response to extreme humid heat conditions that limit the body's ability to regulate temperature, depends on a combination of both temperature and humidity, among other factors [1,10,[19][20][21][22][23].Despite studies showing cooling demand is impacted by humidity conditions as well as temperature [24][25][26][27], most research to date uses temperature-derived variables as the only climate indicators in predicting electricity demand [7,[28][29][30][31].Because of the proven link between humid heat and energy demand, it is likely that humidity levels impact both AC ownership and patterns of adoption.However, the connection between AC ownership and humidity has not been explored in the literature.
As the power sector transitions to a grid that more heavily relies on variable sources of generation and demand side management, grid planners will benefit from more accurate estimates of AC ownership at local scales to manage future cooling loads [32] and potentially leverage these loads for demand side management strategies [33,34].Developing a thorough understanding of spatial and temporal trends in cooling behavior would also help identify areas with high growth potential for AC adoption, as well as communities that lack access to AC and are most vulnerable to extreme heat.Developing estimates and projections of residential AC ownership is difficult because detailed information on homeowner appliances and energy behavior is rarely publicly available.Most reported AC penetration rates are from appliance saturation surveys or residential energy consumption surveys that are carried out by federal or state governments [35][36][37].These studies are time intensive, expensive, and generally limited in spatial scale to larger geographic regions (e.g.climate zones or groups of states).Some studies have used the survey data to build predictive models of AC ownership [38][39][40][41][42].For example, researchers used responses from the American Housing Survey and American Community Survey to estimate the probability of AC ownership in census tracts across 115 US metropolitan areas and found patterns of inequality in AC access [42].However, the empirical model constructed in the study is based on nationwide trends that might not hold true in certain regions; specifically, the model did not perform as well in relatively cool climates (e.g. the Northeast, Northwest, Midwest, Colorado, and coastal California).The coarse resolution of survey data limits the ability of these models to develop highly resolved estimates of AC ownership.
As smart meters have become increasingly common, their electricity data records have been used in a variety of energy building studies to investigate historical energy behavior and demand with much higher level of detail than previously possible [43][44][45][46].Chen et al developed a methodology to determine whether or not a home has AC using household level smart meter electricity records and local weather data, and then characterized AC penetration rates at the census tract level across Southern California [47,48].This study was novel in generating highly resolved estimates of AC ownership across a large geographic region with widely varying microclimates, building stock, and socioeconomics.The methodology was also used to identify populations that might be especially vulnerable to extreme heat events due to the confluence of low rates of AC penetration and high poverty levels [49].These studies resulted in highly resolved estimates of AC ownership across a large geographical area but were limited by their focus on dry bulb temperature (DBT) alone to characterize climate-energy interactions.
More recently, studies that model electricity demand have included humidity-related indices and found that humidity is a critical element in estimating both cooling and overall demand [24,31,[50][51][52][53].In one study, models were developed using monthly, state-level electricity from the United States and various climate indicators to project residential electricity demand under climate change scenarios.The results showed that projections based solely on DBT can underestimate electricity demand by as much as 10%-15% [24].A second study used electric load data from EIA and hourly meteorological data for four electricity regions of southeastern United States, and found that apparent temperature (AT), which captures both humidity and temperature, was better for modeling historical electricity demand than DBT alone [52].The projected demand using AT was also higher for all four regions than when using DBT.These studies are significant because they show that humid conditions will alter electricity demand for space cooling, but they focus only on growth in demand from current units, ignoring potential installations of new AC.
While previous studies have assessed the demand for cooling using DBT, to our knowledge, no study has used humidity-related indices to identify patterns of AC ownership.We believe this relationship warrants investigation, as the literature has shown that humidity impacts both human perception of heat and overall demand for cooling.In this study, we compute a variety of humid heat metrics (HHMs) from local weather station data that encompass both temperature and humidity and build on the methodology developed by Chen et al to test our hypothesis that estimates of AC penetration rates, i.e. the percentage of homes in a defined area that have AC, can be improved by considering humidity as well as temperature.Southern California is a particularly interesting case study to develop high resolution estimates of AC penetration rates because the building stock, socioeconomics, and microclimates, which greatly impact the likelihood of a household having AC, all vary significantly across relatively small spatial extents [54].Further, in California, 75% of people have AC, which is roughly 16 points lower than the national average [55].Therefore, it is especially prudent to uncover patterns and trends in AC ownership in California to foresee where growth in electricity demand might occur and locate communities that are most at risk during extreme heat events.

Electricity records
Southern California Edison (SCE), an investorowned utility, provided hourly residential electricity data from the years 2015 and 2016 for roughly 200 000 households (including single family homes and apartment units within multifamily buildings) within their service area.The customers were randomly selected so that the dataset is statistically representative of Greater Los Angeles's 4.5 million residential households at 99% confidence level.SCE also supplied the street level address of each customer, which allowed for a highly detailed geospatial analysis.All electricity data were stored on USC's center for High-Performance Computing with a highly secure HPC Secure Data Account, to remain in line with the security and confidentiality requirements of SCE.
Steps were taken to screen outliers in the data that might distort the relationship between household electricity and the study's heat metrics.Households with less than half a year of electricity records were removed from the dataset, as well as homes that had less than 20 kWh of annual electricity demand, the amount of electricity an average home in California consumes each day [56].We omit these homes as including unoccupied homes could distort estimations of AC penetration rates Homes with solar panels were removed from the dataset because electricity demand met by solar panel generation is not measured by the smart meters.Thus, the gap between measured and actual demand would convolute the relationship between the home's electricity consumption and outdoor weather conditions.The data provided by SCE does not identify customers with residential solar, so a method developed by Chen et al was employed to detect these homes based on their hourly electricity consumption [48].Only a small fraction of the homes were identified as having solar panels (1%-2%) so their omission should introduce no significant bias during the period studied.As solar penetration increases over time, this assumption would need to be reevaluated in future studies.After the screening steps, 158 114 households remained in the dataset.

Weather data and heat metrics
Local weather data were collected at an hourly resolution for the years 2015-2016 from three different sources of land-based weather stations: the California Irrigation Management Information System (CIMIS), the National Oceanic and Atmospheric Administration's National Center for Environmental Information (NCEI), and the Environmental Protection Agency Air Quality System (EPA AQS) [57][58][59].In total data from 102 stations were used.Each of the sources contain data from land-based weather stations across the Southern California region that are automated and quality controlled.Hourly ambient DBT, relative humidity (RH), and wind speed were measured by all three sources.
Dew point (DP) temperature was also available from CIMIS and NCEI stations, and NCEI stations measured wet bulb temperature (WBT).Using the DBT and RH, DP and WBT were calculated for the stations that did not record their values.Effective temperature (ET), AT, and Steadman's model of heat index (HI) were computed using the weather data retrieved from the weather stations described above.These HHMs were selected because they are commonly discussed in literature regarding human perception of heat and heat related public health risk and incorporate humidity in their calculations.There is also stronger consensus within the heat literature on their definition and how to calculate them, while many other heat metrics are not as well defined.Table 1 defines both the measured and calculated heat metrics used in this study.The formulas and packages used to compute the calculated metrics are outlined in the SI.

Statistical model
The segmented linear regression developed by Chen et al is implemented in this study to model the relationship between residential electricity use and each of the heat metrics [47].In that model, a household's daily aggregated electricity demand was regressed against daily average DBT to determine whether the household had AC during the period of study.To test which heat metric best estimates AC ownership, we therefore aggregate hourly electricity demand to daily electricity demand for each of the households and regress against the daily average value across each respective heat metric.Figure 1 shows the segmented linear regressions between daily aggregated electricity use and each of the six heat metrics for an example household in the study region.
A distance cutoff was implemented so that any household more than 20 miles away from a weather station was removed from the analysis (refer to SI S1 for distribution of distance from household to weather station).This distance was selected to try to keep as many homes as possible in the dataset without matching homes to weather stations that would not

Metric Definition
Dry bulb temperature The ambient temperature measured by a thermometer, referred to as air temperature [60].

Wet bulb temperature
The temperature of adiabatic saturation measured by a thermometer covered with a wet cloth.At 100% relative humidity a , the wet-bulb temperature is equal to the air temperature.At lower humidity, the wet-bulb temperature is lower than dry-bulb temp [60].

Dew point temperature
The temperature that air needs to be cooled to achieve 100% relative humidity a .The higher the relative humidity, the closer the dew point to the actual air temperature [60].

Heat index
Human perceived equivalent temperature when considering air temperature and relative humidity a [60].Apparent temperature Temperature equivalent perceived by humans (feels like) caused by combined effects of air temperature, relative humidity a , and wind speed [61].According to the National Digital Forecast Database, the apparent temperature is equal to the dry bulb temperature between 50 and 80 • F, the heat index above 80 • F, and the wind chill below 50 The temperature of saturated air that would incur the same level of discomfort for humans as the measured dry bulb temperature and relative humidity a .Thus, the equation for effective temperature includes terms for both the dry bulb temperature and relative humidity [63].
a Relative humidity: the amount of water vapor present in air expressed as a percentage of the amount needed for saturation at the same temperature [60].accurately represent the local conditions of the home.
On days where weather station data were missing, households were matched with weather data from the next closest weather station (so long as station was within 20 miles from home).The segmented linear regression depicts two key pieces of information.The first is the stationary point temperature (SPT) which is the inflection point on the plot and is regarded as the outdoor temperature at which a household is expected to turn on their AC if they have it in their home.The second takeaway is the electricity-temperature sensitivity (E-T sensitivity), the slope of the line to the right of the SPT.The slope is the sensitivity of a household's electricity consumption to the ambient temperature and is impacted by occupant and household characteristics that are not explicitly explored in this study due to data limitations (e.g.cooling preferences, occupancy rates, insulation, AC efficiency).In this study, multiple measures of heat are used, and the temperature refers to the heat metric used in a given regression (e.g.WBT, ET).The r 2 values, which measure the goodness of fit of the segmented linear regression model, are recorded for the values to the right of the SPT to compare the correlations between electricity and temperature across the heat metrics.
A household is determined to have AC if two conditions in the segmented linear regression are met.The first condition is that the slope to the right of the SPT (referred to as slope-right) is greater than zero, because it is presumed that a household with AC would have electricity consumption that positively correlates with increasing ambient temperatures.The second condition is that the absolute value of the slope-right is greater than the absolute value of the slope to the left of SPT (referred to as slope-left).A majority of homes in California are heated with natural gas, meaning the slope-left should typically be near-zero for these homes [64].Thus, a household with an absolute slope-right value smaller than the absolute slope-left value likely does not have AC, as the household's electricity demand at temperatures above the SPT is only nominally dependent on the temperature.This condition is set to rule out homes that have near-zero E-T sensitivities caused by noise or slightly higher electricity consumption of appliances on warmer days.If a household does not meet these criteria, it is assumed that the household did not use an AC during the period of study.Examples of households that do and do not meet these criteria are shown in the SI.
The segmented linear regression is run for each of the households in the study, across each of the heat metrics defined in table 1.After running the regression for each individual household, an AC penetration rate is computed for each census tract by dividing the number of homes identified as having AC within a census tract by the total number of homes available in our dataset in that census tract.Differences in the computed AC penetration rates, E-T sensitivity, and SPT when separate heat metrics are used are discussed below.

Spatial analysis
Maps were created to illustrate the geospatial variations in AC ownership across the study region and differences in estimated AC penetration rate for each of the heat metrics used.The results of the household regression for each of the six heat metrics were aggregated to the census tract level to protect the privacy of the customer data.Then, estimates of AC penetration rates were depicted using choropleth maps and census tract boundary shapefiles from the US Census Bureau [65].The climate zones as defined by the California Energy Commission were also depicted to generate a better understanding of how AC ownership differs across the microclimates of the region [66].

Differences in estimated AC penetration rates
The AC penetration rates from each HHM were compared against the AC penetration rates found using DBT.Areas shown in red have lower rates of AC penetration (when the given heat metric is used instead of DBT) and tend to be in inland and desert areas, which are hotter and drier; areas shown in blue have higher estimates and are typically coastal, which tend to be cooler and more humid.The estimates for AT, ET, and HI closely align with DBT, while more significant differences are observed in the maps for WBT and DP.
A summary of the study region's average regression results for each heat metric is shown in table 2. The estimated AC penetration rate ranges from 73% (DBT) to 83% (DP).In general, there is agreement between the AC penetration rates estimated by the HHMs and DBT.However, the regional estimates of AC penetration rates with AT, ET, and HI are closer to the estimates produced with DBT than WBT or DP are, a trend also depicted in the choropleth maps in figure 2. The regional average E-T sensitivity values computed by the regression models range from 0.08 kW • C −1 for DP to 0.15 kW • C −1 for ET across the six heat metrics evaluated (see table 2).
Regional average r 2 values for each heat metric are also given in table 2. The model is fit to minimize the r 2 value for all data points, but the reported r 2 values only consider the set of data points to the right of the SPT in the segmented regression model, as we are most interested in how a home responds to temperature at the critical point at which a cooling system is turned on.The r 2 values for the six heat metrics range from 0.15 to 0.40.DP represents the lower boundary of this range, and HI and AT both have an r 2 of 0.40.In general, these results show that heat metrics that include humidity either have an r 2 value that is lower or similar to the regression model analyzing DBT alone.
These results contradicted our initial hypothesis was that HHMs would be significantly better suited for identifying whether a home has AC.We expected that household cooling demand would be best correlated with heat indices that account for humidity, based on the understanding that a person's comfort level is impacted by both temperature and humidity.The weak correlation between WBT and demand could be explained by the findings in Vecellio et al [67] which show that WBT does not appropriately capture nonlinear function of temperature and humidity that is appropriately matched to human physiology [67].Additionally, the other three HHMs performed no better than DBT.The results make sense from an engineering perspective, given that the regional climate zones analyzed in this study generally do not consistently experience high humidity.In an AC unit, the temperature and moisture content of outdoor air is reduced air upon interaction with the AC's cooling coils, which are kept below the air's DP [68].While there is an energy penalty associated with dehumidifying the air (i.e. the latent load), the total energy load is dominated by the sensible load (i.e. the energy required to reduce the air temperature) except in extremely humid climates [69].Hence,  it is likely that the low humidity levels in Southern California do not cause an observable signal in the overall electricity demand of a household (see SI for distribution of RH and DBT across study region).Consequently, our results may be region specific; a city that is both hot and humid likely demonstrates a stronger link between humidity metrics and overall demand.However, people living in more humid climates are also more likely to be more tolerant of higher humidity levels than those living in dryer regions due to regional acclimatization [70], which might dilute an observable relationship between cooling load and humidity.Conducting studies in regions with diverse climates would provide insight into the interactions between humid heat, human behavior and acclimatization, and electricity demand, but the lack of availability of household level electricity data is a limiting factor.While the difference in r 2 results from the regression models are not definitive enough to state which of the metrics should be used to determine AC ownership, evaluating AC penetration with multiple metrics can provide higher confidence in the estimations.In figure 3, the homes were grouped by the number of heat metrics that identified the household as having AC, and the breakdown of which heat metrics identified the households as having AC within each grouping is shown in the bar chart.In figure 3(a), 69% of households were determined to have AC using the segmented linear regression methodology with all five of the heat metrics (note that DP is excluded because preliminary results showed it was a poor predictor of AC ownership).Figure 3 offers insight into our confidence in the total AC penetration rate across the region of study, which is highest for the set of homes identified as having AC based on agreement between 5 metrics (69%) and slightly less as we add the additional homes identified with at least 4 metrics (+3% of homes) or 3 metrics (+2%), raising the overall AC penetration rate estimates to 72% and 74%, respectively.We have low confidence for regional AC penetration rate estimates in the range of 76%-81%, which includes all homes identified with at least one metric.
These results align with regional estimates (table of estimates given in SI) conducted by [36,42,[71][72][73][74], suggesting that we can have high confidence in the 69% of homes that were identified as having AC by all heat metrics.Although the most recent California Residential Appliance Saturation Survey estimates that 86% of customers in SCE territory have AC, the estimate is based on 2019 survey data rather than 2015 and 2016.Additionally, the survey includes any household that reported owning an AC, regardless of how often they use it, and our method might not capture households that use their AC infrequently (e.g. a vacation home with low average occupancy throughout the year).Similarly, the study by Romitti et al [42] reports a higher average AC penetration rate, 81%, for the Los Angeles-Long Beach-Anaheim area but also uses more recent survey data and would capture all AC ownership, regardless of use.

Conclusion
Highly resolved estimates of AC ownership are essential to prepare for future cooling demand and identify communities who will be most at risk to future extreme heat events.However, determining AC penetration rates at fine scales is difficult due to the lack of availability of household level data and limited understanding of how AC use behavior responds to varying heat metrics.This study improved upon existing methods of predicting AC penetration rates by incorporating a variety of humidity and temperature related heat metrics with a robust dataset of electricity records for ∼160 000 homes in Southern California.
In total, 81% of the households were identified as having AC by at least one heat metric (when excluding DP), while 69% of the homes were determined to have AC with a consensus across all five of the heat metrics.These results are aligned with the results from other studies of the region (SI S4).A limiting factor of any method used to estimate AC penetrations rate is that there is no ground truth of residential AC ownership to validate against, particularly across small spatial extents, which is important for understanding heat vulnerability across different socio-economic groups.Hence, our method is advantageous because it provides insight into our relative certainty in estimating if a home uses AC based on five analyses of electricity usage and a respective heat metric.Accordingly, while this analysis suggests that between 69% and 81% of households in SCE have AC, we have higher confidence that the true range is 69%-74% of homes for the years analyzed.The computed regional AC penetration rates range from 73% for DBT to 83% for DP.Maps of AC penetration rates show that there are geospatial variations in the prediction of AC ownership.For DP and WBT, where regional estimates diverged from DPT more significantly, the dryer, hotter regions were estimated to have lower AC ownership than when DPT was used.The opposite was true in the milder, more humid coastal regions (i.e.calculated AC penetration rate was higher for DP and WBT than DBT).The regional average r 2 values vary from 0.15 to 0.40, and the highest values are from HI and AT.WBT performed worse than DBT (0.28 vs 0.39), suggesting that the demand for cooling is more dependent on air temperature than humidity.While this contradicts our initial hypothesis, it makes sense with thermodynamic principles, and results might be different in areas of very extreme humidity where the latent load of AC units is much more pronounced.
While it is difficult to draw a conclusion as to which heat metric is most accurately predicts AC ownership from the results of this study, using DBT alone possesses several advantages and performed similarly to or better than other metrics within this study region.DBT is a well understand metric of heat, and DBT data can be easily retrieved from a variety of historical weather sources, unlike other heat metrics.Additionally, regional meteorological models and climate models can predict DBT with more accuracy than humidity and heat metrics that include humidity [75][76][77][78].We chose Southern California as our study region because it is one of the only regions where researchers can gain access to smart meter data at a large scale (through a formal process outlined by California's Public Utilities Commission) [79] across diverse climate zones, and it is expected to have relatively large increases in AC adoption in the coming years when compared to other regions of the United States that already have high AC penetration rates.While we acknowledge that the outcome of this study may be regionally specific, the outlined methodology can serve as a framework that should be repeated in more humid climates as smart meter data becomes available to confirm this conclusion.Furthermore, repeating this study with higher resolution temperature and heat metrics would be desirable to ensure that the distance to weather station, which can be as much as 20 miles in this analysis, does not skew results.

Figure 1 .
Figure 1.An example set of segmented linear regressions for one home in La Crescenta, CA that was identified as having AC with all six heat metrics evaluated on each x-axis.

Figure 2 .
Figure 2. Choropleth maps depicting the difference between census tract level AC penetration rates estimated with each HHM and DBT.The difference is found by subtracting the AC penetration rates computed with DBT from the AC penetration rates computed using the each of the HHMs (a) WBT, (b) AT, (c) ET, (d) HI, and (e) DP.Generally, the AC penetration rate computed with a HHM is lower (red) in desert regions and higher (blue) in coastal regions than when DBT is used.

Figure 3 .
Figure 3. (a): Percentage of homes identified as having an AC with all five heat metrics (i.e.consensus across all metrics).(b)-(e):The additional homes identified as having AC with a consensus of n metrics.(f): Summary of the percentage of homes identified as having AC by n heat metrics.The transition from dark to light blue implies diminishing confidence in the homes identified as having AC (e.g.we have more confidence in the homes identified with 5 metrics, represented with dark blue, than the homes identified with 1 metric, represented with light blue).

Table 1 .
Description of heat metrics used in this study.

Table 2 .
Summary of the study region's averaged regression results for each heat metric.