Predicting and understanding residential water use with interpretable machine learning

Predicting residential water use is critical to efficiently manage urban water resource systems. Simultaneously, understanding the factors driving residential water use is required to plan for future urban change and achieve effective water resource management. Current approaches examining residential water use identify the drivers of household water use through parametric or non-parametric statistical approaches. Parametric approaches have high predictive errors and lack the ability to accurately capture interactions between features but allow for easy interpretation. Non-parametric approaches have lower predictive errors and can capture non-linear feature interactions but do not allow for easy interpretation. We use non-parametric statistical models of household water use and recent advances in interpretable machine learning to understand the drivers of residential water use. Specifically, we use post-hoc interpretability methods to examine how drivers of water use interact, focusing on environmental, demographic, physical housing, and utility policy factors. We find all four categories of factors are important for estimating water use with environmental and utility policy factors playing the largest role. Additionally, we identify non-linear interactions between many variables within and across these classes. We show this approach provides both high predictive accuracy and identification of complex water use factors, offering important insight for urban water management.


Introduction
Residential households are the primary consumers of public supply water in North America [1,2].Effective urban water management therefore requires water providers to anticipate residential use at shortand long-term time scales [3] to maintain a reliable, low-cost supply.In the short term, water utilities use high-quality predictions of future residential use to efficiently operate water distribution systems [4].In the long term, utilities and water resource planners need to understand the drivers of residential water use to effectively enact efficiency programs and develop long-term water supply plans that account for urban growth and change.These drivers also inform state-level policymakers and regulators to prepare for future drought conditions and design water conservation policy [5].Both high-quality predictions of residential water use as well as understanding the key factors that drive residential water use at the household scale are vital to improving the efficiency and reliability of urban water management.
Current approaches to making high-quality predictions of residential water use and quantifying drivers of water use are largely separate.Recent work either (1) uses parametric statistical approaches to identify the drivers of residential water use behavior and their relative importance or (2) uses non-parametric statistical machine learning (ML) approaches to make high-accuracy predictions.In the former category, parametric statistical approaches are used to identify drivers of water use from panel surveys and water bills [6].These approaches have identified environmental factors such as temperature and precipitation [7]; demographic factors such as population characteristics, household sizes, and incomes [8,9]; physical housing characteristics such as lot size and household size [10]; as well as curtailment and water rates [11] as important drivers of residential water use.Econometric methods have also been used to specifically identify the drivers of residential water use through parametric explanatory statistical approaches [12,13].However, these methods are characterized by relatively high predictive errors and lack the ability to accurately capture the interactions between the features without a high degree of casespecific domain knowledge [1].In almost all these cases, the approaches used are focused on providing easily interpretable statistical evidence of the effect of hypothesized water use drivers.The models chosen are high-bias and low-variance and are limited in their ability to capture non-linearity between factors.This reduces their ability to make high-quality model predictions and to identify the full range of drivers of water use that are not identified a priori.
Conversely, existing work that has aimed to make high-quality predictions of residential water use has not focused on identifying the drivers of use.Researchers have used non-linear AI/ML statistical predictive models to (1) integrate data sources across many sectors and make predictions of water use at temporal resolutions ranging from minutes to years in the future [1,11,[14][15][16][17][18][19][20] or (2) to disaggregate water use within households to identify specific end uses of water at a high resolution [21].However, this work is largely focused on improving the predictive ability of models.While the models used account for the interactions between water use drivers, their increased complexity compared to parametric approaches means they serve largely as black boxes and do not allow for easy interpretation.This limits their utility for analysts or policymakers to easily study the importance and interaction between the features contributing to residential water use.
Capturing non-linear interactions between drivers of water use and having high interpretability in modeling techniques simultaneously is important for two reasons.First, the non-linearity and interactions among factors associated with water use which lead to poor quality predictions from non-parametric models also limits the quality and usefulness of the explanations.Second, in complex human-natural systems, new dynamics can emerge in response to system changes.This means policy interventions, climate shifts, or demographic change may induce changes in water use patterns fundamentally different than historical observations [12,[22][23][24].Capturing the drivers of water use through interpretable modeling aids in making future predictions as populations, climate, and built environments change simultaneously.
Here, we train non-parametric statistical models of household water use and leverage recent advances in interpretable AI/ML to understand the drivers of residential water use and make high-accuracy predictions.Broadly, interpretability is confirming the outputs of AI systems [25] and is used in AI applications where predictions must be understood to be utilized such as determining bail amounts in the justice system [26], predicting disease progression in healthcare settings [25], or determining credit scores [27].
Here we use post-hoc interpretability, referring to the application of interpretation methods after model training to examine how drivers of water use interact [28].We focus on understanding which factorsenvironmental, demographic, physical housing, and utility policy-drive and accurately predict water use and what types of interactions exist between these factors.We do this by building census block grouplevel predictions of average household water use utilizing non-parametric statistical predictive models in a case study of Santa Cruz, California using a proprietary data set of approximately a decade of householdlevel billing data.
We find all four categories-environmental, demographic, physical housing, and curtailment characteristics-are important for estimating water use with environmental and curtailment features having the greatest association with water use.Among environmental drivers, increases in temperature and decreases in precipitation are associated with increased water use.Physical housing factors such as increased square footage, and increased number of rooms are all associated with more water use.Utility water curtailment decreases water use.Among demographic characteristics, increases in the number of single-family residential homes in a block group and decreases in renters are associated with increased water use.Finally, we find interactions between variables.Notably, increasing temperature is associated with increasing water use in block groups with an above-average fraction of singlefamily residential homes, above-average income, and above-average years built.This work demonstrates how non-parametric predictive models coupled with interpretability provide insight into drivers of water use coupled with high-accuracy water use predictions by capturing non-linear interactions between variables.

Case study
Our case study takes place in the city of Santa Cruz, located on the central coast of California.Santa Cruz's water service area serves approximately 95 000 people [29].Compared to the state, the city of Santa Cruz has a slightly above average median household income ($78k compared to a state average of $75k) and a larger percentage of the population below the poverty level (21% compared to a state average of 13%).In the service region, approximately three-quarters of homes are single-family, and approximately half of the housing stock is rented.The region is characterized by a warm-summer Mediterranean climate, receiving an average of 795 mm yr −1 of precipitation, with the bulk of the seasonal rainfall occurring between November and March [29].Mean monthly temperatures range between 52 • F and 65 • F and reference evapotranspiration averages 990 mm yr −1 , with monthly lows of 30 mm in December and January and a high of 129 mm in June.Santa Cruz has a unique water system with almost entirely locally sourced surface water, and small (less than 5% of supplies) groundwater reserves [29].Between 2010 and 2013, residential water use in the service region averaged 59-61 gallons per capita per day (gpcd), dropping to a range of 43-57 gpcd for the year 2015-2020.Santa Cruz has curtailed demand during drought periods in the past [30], making demand estimation critical to their planning.

Data
A-priori we select a set of environmental, demographic, physical housing, and utility variables, guided by existing work identifying drivers of residential water use [1,31,32].All variables are aggregated to the block group level at a monthly time scale.A detailed summary of all variables and sources included is shown in appendix table 1.We use monthly billing records from the City of Santa Cruz from 2010-2020 to calculate average monthly Census block group-level residential water use, hereafter referred to as average monthly water use.The billing data contains approximately 11 000 residential accounts.This data was provided under a proprietary agreement for use in this study and is structured such that each bill is associated with an account that contains a service address, monthly water use, and monthly water bills.

Environmental
We include temperature and precipitation values from the National Land Data Assimilation System (NLDAS) [33].We use data from a land surface model for physical consistency across variables, and NLDAS temperature and precipitation are well validated against gauge data.Actual evapotranspiration (AET) values are not available in NLDAS and so collected from the Global Land Data Assimilation System (GLDAS) [34].Given the small spatial area of our case study, we assume spatially constant monthly environmental variables.

Physical housing
We collect housing characteristics for each service address from county tax assessor data and aggregate these to the block group level [35].These include the number of bedrooms in the dwelling unit, the taxassessed value of the home, the number of units on the land parcel, the effective year built (either the year built or the year of the most recent large renovation), the living area of the home, and the size of the lot.Additionally, we use the billing records to determine if the account is associated with a single or multi-family home.

Census data
We also collected block group level data from the census related to demographics [36].We use block group-level median household income, gender ratio, percentage of the block group above the age of 65, percentage of households with children, percentage of block group population identifying as Hispanic or Latino, block group percentage of owner-occupied housing, block group percentage of the workforce primarily working from home, as well as measures of poverty and educational attainment.

Curtailment
Finally, we include a variable indicating the level of curtailment mandated at any given time.Curtailment is the process of requiring, asking, or incentivizing households to reduce their water consumption in response to drought.In Santa Cruz, smaller levels of curtailment are voluntary but are accompanied by increased drought surcharges.Higher levels of curtailment are mandatory and come with increased surcharges.The levels of curtailment and surcharges are pre-approved prior to droughts.Only two curtailment requests were made during our study period, one for 5% of demand and the other for 25%.We therefore include curtailment as a categorical variable indicating whether 0%, 5%, or 25% curtailment is enacted.We do this for two reasons, first due to the low cardinality of the variable and second because of the change from a voluntary to mandatory water restriction between the 5% and 25% curtailment.We note that the inclusion of this variable allows us to quantify the efficacy of curtailment as it was implemented but may not generalize to higher curtailment levels or to levels between those observed in our dataset.Additionally, Santa Cruz has a history of using curtailment for demand management [37,38], which likely limits the ability to generalize curtailment findings to other water systems that do not have a history of curtailment.

Predictive models
We use parametric and non-parametric statistical models to relate average monthly water use (y) to our set of independent variables (X).We test linear regression, ridge regression, lasso regression, gradient boosted trees, and random forest methods in line with previous work [1].Each statistical method aims to estimate a relationship between X and y by minimizing a loss function to provide the best fit.We select among these methods by performing a cross-validation procedure in which we train each model on a partial dataset and retain the remaining observations as a validation dataset [39].A full discussion of the training methods is included in appendix section A.2.

Variable importance
We use the random forest model to calculate the importance of each variable and use the importance to select a subset of independent variables to include in a final model.We calculate feature importance as the mean decrease in node impurity.Node impurity is defined as the average decrease in the residual sum of squares gained by including the variable in the decision tree across all decision trees in the dataset [40,41].We visually inspect the variables for their contribution to the predictive quality and select the 14 most important for inclusion in the final reducedvariable model, noting a large cutoff in model quality after the 14th variable (appendix figure 8).We train a random forest model on the remaining 14 variables and discuss that for the remainder of the paper.

Interpretation
To investigate the associations of variables with our response and the interaction between them, we use three model interpretation techniques: partial dependence plots (PD plots), individual conditional expectation plots (ICE plots), and accumulated local effects plots (ALE plots).Descriptions of the interpretation techniques are given in appendix section A.3.

Descriptive statistics
Figure 1 shows the average monthly water use in winter (a) and summer (b) months in CCF/month.The average is 5.69 CCF/month across all months and block groups, with a median of 5.30 CCF/month, a standard deviation of 1.2 and a slight right skew.This is consistent with other large-scale billing analyses [1].The environmental, demographic, and physical housing variables have a mix of distributions with some being relatively uniform across their range and others having a relatively tight distribution around the mean.

Predictive accuracy
A plot of actual and fitted data for validation samples is shown in figure 2, with observations colored by whether the block group is in the upper or lower half of average monthly water use during baseline, nondrought periods.Across all cross-validation folds, we find an average out-of-sample adjusted R 2 of 0.63 for our model.For comparison, the best existing parametric approaches developed separate models for base and seasonal use and found in-sample R 2 values of 0.28 and 0.65 respectively [12] and only focus on single-family residential homes.Our results match closely with existing, non-parametric approaches which found out-of-sample adjusted R 2 values of 0.65-0.69when using Random Forest and Gradient boosting Regression techniques [1].

Variable importance
Variable importance calculated on our reducedvariable model indicates that all four categories of variables (demographics, environmental, physical housing, and utility policy) contribute to predicting average monthly water use.Figure 3 shows the variable importance of the 14 most important variables, colored by their category, where a larger value in the figure indicates greater importance toward the overall prediction.We find temperature is the most important variable followed by demand curtailment.The remaining environmental variables, precipitation and AET, are both included as important but less so than temperature.Only one demographic variablethe block group fraction of households that rent their dwelling-is found to be important.The remaining important variables are all descriptors of physical housing characteristics which we believe are related to household size (size, bedrooms, units, rooms) and the value or condition of the home (lot size, home value, fireplaces).These results match previous enduse analyses or survey-based studies of residential consumption which have found that the number of occupants is an important characteristic of indoor water use [2,42].

Partial dependence plots
In addition to ranking variables using an importance metric, we construct partial dependence plots for selected important variables to examine how changes in each of these variables lead to estimated changes in average monthly water use.These are shown in figure 4. For each panel, the PD plot shows how changes in one given variable impact the average monthly water use while controlling for the impact of all other independent variables.In addition, we create ALE plots for the same variables and include them in appendix figure 7.In all cases the trends shown in the PD plots match those seen in the ALE plot, indicating there are no issues in interpreting the PD plots due to correlated independent variables.Housing variables also have strong relationships with average monthly water use.We see a positive  relationship with block group percentage of singlefamily homes, with water use increasing gradually between 40% and 80% single-family households and then increasing sharply.Household area has a similar pattern: water use is consistent until household area exceeds 1700 square feet, above which water use increases steeply.Finally, the block groupaverage number of fireplaces has a generally positive association with water use.While fireplaces do not use water directly, they are likely associated with both older homes and therefore inefficient plumbing and newer high-value homes and therefore wealthier occupants.
The most important demographic variable-the fraction of homes in a census block group primarily occupied by renters-is associated with a stepchange in average monthly water use.Block groups with less than 50% renters have higher water use, and block groups with more than 60% renters have substantially less water use.Previous parametric studies have found conflicting associations between renting and water use, with studies hypothesizing that renting is associated with multi-family homes and therefore less outdoor water use [43] and, conversely, that renters often do not pay their own bills and therefore lack price incentives to conserve [44].
The policy variable, curtailment, is not included as a partial dependence plot, as the variable only takes on three values (no curtailment, 5% curtailment, and 25% curtailment).However, the PD methodology does allow us to quantify the marginal effect of curtailment.We find that, relative to no curtailment, the marginal effect of the inclusion of the 5% curtailment variable is a predicted 2.6% reduction in average monthly water use, while the marginal effect of the 25% variable is a predicted 24.8% reduction in water use.A water use reduction of approximately 25% when curtailment is implemented may appear intuitive, but existing work looking at multiple utilities has found significant variation in water use change when curtailment is implemented [45].Curtailment adherence has been found to be impacted by regional drought media coverage [11], housing and demographics [20], and demand hardening [46].Our results indicate that curtailment was implemented very effectively despite Santa Cruz' low baseline water use.

Individual conditional expectation plots
For each important factor, we also quantify changes in average monthly water use for block groups with different characteristics.To do this we use ICE plots.We generate ICE plots for each variable, where a third variable is used to subset block groups presented For example, in figure 5(a), the red line shows the average relationship between temperature and average water use for block groups with an above-average fraction of single-family homes, while the green line shows the same relationship for those with a below-average number of single-family homes.
We find heterogeneity in two-way variable interactions across in the important predictors of average monthly water use.A common pattern in our results is that increasing temperature is associated with increased water use in block groups with an aboveaverage fraction of single-family residential homes, a below-average fraction of renters, an above-average number of fireplaces, above-average income, and above-average years built.Block groups without these characteristics show very little association between temperature and water use.Notably, we do not find this heterogeneity with respect to housing size.Figure 5(a) shows the relationship between temperature and water use, subset by an above or belowaverage fraction of single-family homes.The remaining variables follow a similar trend and are shown in appendix A.6, figures 9-16.This indicates that the block group's average household water use is at least partially the result of an interaction between environmental and physical housing factors.
Additionally, we find nonlinearities present in the ICE plots.Results indicate the increase in average monthly water use associated with the block group's average number of fireplaces is only present for certain block groups.Namely block groups with high fractions of renters, low household area, low MHI, and low year built (older).Other block groups have linear relationships.Figure 5(b) shows an example of a nonlinearity present in the association between water use and the block group average number of fireplaces for block groups with a below-average fraction of renters.This indicates that block group average water use is the result of multiple physical housing factors interacting.
Finally, we find other variables that have relationships with water use that do not change when subset.The relationship between water use and block group percentage of single-family homes, block group percentage of renters, and block group average household size show no heterogeneity when subset.Figures 5(c)-(e) show examples in which there is no heterogeneity, a trend demonstrated by all subsetting variables (appendix A.6).

Discussion
In this work, we train non-parametric statistical models of residential water use and use interpretable AI/ML to understand the drivers of residential water use.We identify which factors-environmental, demographic, physical housing, and utility policydrive water use prediction and what interactions exist between these factors.This work demonstrates the benefits of using predictive modeling coupled with interpretable ML tools as a method for quantifying residential water use.The ability of ML models to Individual conditional expectation of temperature on water use.ICE plots show the relationship between a variable (x) on average water use (y) for census block groups subset by a third variable (colored lines).Red lines labeled high show the average response of block groups with an above-average value of the subsetting variable, while green lines labeled low show the response of variables with a below-average value of the third subsetting variable.Differences between red and green lines show that there is heterogeneity across block groups in the relationship between important variables and water use.capture nonlinear interactions between large numbers of features without extensive domain knowledge of their interaction improves the model's predictive ability.Coupling this with post-hoc interpretable ML techniques provides a high level of inference about the nature of the interactions.We find many non-linear interactions among features contributing to predicting water use, which linear, parametric approaches would not be able to capture.
Our results highlight the benefits of using predictive modeling coupled with interpretable ML tools as methods for quantifying residential water use.We find all variable categories (environmental, demographic, built environment, and utility) are important, paralleling a growing body of work on residential water use drivers [12].For all important variables, our model identifies non-linear associations with block group-level water use.Temperature has a slight non-linear threshold, while the block grouppercentage single-family homes, fraction renters, and household area all have very sharp increases similar to step-functions.ICE plots allow us to observe heterogeneity in the association of a variable with water use across households.Conditioning ICE plots on other variables allows us to view the two-way interaction between variables.We find interactions between environmental and physical housing categories of variables as well as within different physical housing variables.In all of these cases, it would be difficult to capture these non-linear relationships and heterogeneity with parametric, linear approaches without deep domain-specific knowledge of the relationships between these variables.
Our results also reveal important insights for our case study in Santa Cruz.We find the increased temperature is associated with increased water use for block groups with an above-average fraction of single-family homes, number of fireplaces, income, and build year, as well as those with a below-average number of renters.One hypothesis for this is irrigation.Previous detailed household end-use studies have found irrigation to be a latent factor contributing to household water use [2,13,42,47].We find some evidence for this in Santa Cruz, namely through the increased water use associated with block groups with a high percentage of single-family homes, a low fraction of renters, and above-average income.This is bolstered by the conditional ICE plots which show that block groups with a high percentage of singlefamily homes, a low fraction of renters, and aboveaverage income increase water use as temperature increases-an indication this water may be used for outdoor purposes.Additionally, when we condition on household area or lot size, we do not see substantial changes in the relationship between temperature and water use for tracts with above or belowaverage lot size or household area.This is largely contrary to previous findings which find that larger houses have a greater association between temperature and water use.We hypothesize that if irrigation is a latent factor contributing to water use in Santa Cruz, irrigation is more likely in single-family homes, which are owned and in areas with above-average incomes.
Our approach can be applied in other cities to inform water planning and policy as well.For a water utility aiming to manage or reduce its urban water demand, understanding which factors are driving water use is critical.For example, our finding that water use increases with temperature but only until about 17 • C indicates that, in Santa Cruz, the increasing frequency of warm months is more likely to be an important climate change impact on water use compared to higher extreme temperatures.In other climates, this relationship is likely different, and our approach can be used to identify which aspects of climate change are most likely to influence water demand.Additionally, our analysis can be used as a form of policy evaluation to assess the impact of previous curtailment implementation.Given that curtailment happens during droughts, when high temperatures and low precipitation drive up water use, measuring curtailed use relative to a pre-drought baseline may not capture the full impact of curtailment policy.Finally, the interaction of demographic and environmental factors can guide water efficiency and conservation efforts, helping planners target locally appropriate efficiency programs for communities with higher-than-average indoor and outdoor use.
The key limitations in our analysis are datarelated.We analyze billing records for only approximately a decade, during which many drivers of water demand remained relatively static (such as demographics and population).Additionally, while we utilized household-level billing data, demographic data is only available at a census block group level, meaning we were aggregating over many households (between 600 and 3000) to create each data point.Aggregating water use to a block group level may mask extremes such as very high or low water use, limiting the ability to draw conclusions about these types of households in our analysis.Additionally, some of the important variables we identified in predicting water use may be driven by unidentified latent variables and must be interpreted with caution.For example, fireplaces do not use water directly and instead likely reflect unmeasured patterns in housing stock.There is also an opportunity to consider additional variables in future work.We base our selection of variables upon previous work, but other demographic variables or environmental variables (such as drought indices) may have relationships with household water use.Finally, while we do describe larger drivers of water use, these insights are specific to this case study and should be generalized with caution.The water billing data used in this analysis, while proprietary, is comparable to water billing datasets from most utilities that use digital billing systems.We note the size, format, and sensitive nature of billing records may require substantial processing and data management efforts.In this work, we performed extensive de-identification to remove personally identifiable information from the dataset and created a database of records to aid in the analysis.as a categorical variable.A potential issue for a variable like this-particularly as it changes in time-is that curtailment may serve to capture residual temporal changes in water use not explained by any other variables.To test this, we trained a model which included the year as a variable and still found curtailment was the second most important.

A.2. Model selection A.2.1. Bias variance
In this section we provide an overview of predictive modeling and the bias-variance tradeoff, describe our training process, and the models used.When fitting predictive models, we partition our data into training and testing samples, sometimes referred to as training and validation samples.We minimize loss on the training data and then evaluate error on the separate testing data.Model selection is made on the basis of test sample Mean Squared Error (MSE Test ) which is defined as: ) where y Test are the block group-level residential water use values in the test dataset, x Test are the independent variables in the test dataset, and f is a trained statistical model.We show how MSE Test can be decomposed into three terms: variance (Var( f(x test ))), bias ([Bias( f(x Test ))] 2 ), and irreducible error (Var(ϵ)).Notionally, variance refers to the amount by which estimates of y would change if we utilized a different training data set, bias is the model error induced by using models to approximate a real-world process.Our aim is to select a statistical model that balances bias and variance in this specific problem context.For example, an ordinary least squares regression model will be a high-bias, low-variance model.That is, a linear relationship will likely not be a good estimate of the relationship between our independent variables and response.However, the prediction error is unlikely to change if other training and test samples were used.More complex models will likely have decreased bias at the trade-off of increased variance.Our goal is to determine the level of model complexity that provides the lowest overall bias and variance [1,39,48,49].

A.2.2. Training
We perform cross-validation of our models to select the statistical model with the lowest MSE when making predictions.Here we use stratified crossvalidation as our data varies both spatially and temporally.Random cross-validation likely includes block groups in both the training and validation sample with different months of data; in which most independent variables would remain the same, leading to an artificially inflated model quality due to data leakage [50].To address this, we split into 48 folds so that block groups are not repeated in training and validation data.48 is the smallest number of divisions that prevents repetition of block groups across training and test sets while providing roughly equal-sized cross-validation folds.After preprocessing, we have 67 block groups remaining in our dataset.As some block groups have fewer observations than others, we combine block groups with fewer observations until we have roughly equal-sized cross-validation folds.This results in 48 folds.This is calculated by the group-Kfold function of the r package caret [41] which is designed to split training and testing samples such that no group is contained in both sets.
Given that partitioning must be performed at the block group level, and there are relatively few block groups we only use training and validation datasets as opposed to training, validation, and test datasets.This is recommended when working with small sample sizes [51], and has been done before in similar applications [21,31,52,53].The crossvalidation process is sufficient for model selection and to determine that our chosen model is not overfitting.Additionally, our focus is on using predictive models to assess the relationships in our dataset.As such we prioritize Increasing the size of our training data by not including a test set.For each fold, all models are trained on the block group-level data, and error metrics are calculated.Adjusted R-squared, MAPE, and MAE are shown in appendix figure 6.

A.2.3. Model descriptions
We test parametric and non-parametric models to develop relationships between our chosen independent variables and water use and describe the models used at a high level below.Our choice of models to test is based on the results of Lee and Derrible [1] who tested 12 statistical models for their use in predicting household water use and found that nonparametric methods outperformed parametric methods in almost all cases.We test a series of models implemented in R using default training parameters for each.The first model we test is ordinary least squares regression (OLS), which aims to explain the response variable as a linear combination of the variables.This is implemented in R using the LM package.We also test parametric shrinkage methods Ridge [54] and LASSO regression [55].These are extensions of linear regression which use regularization to account for collinearity among independent variables.OLS regression follows the assumptions of linearity, normal residuals, homoskedasticity, and no multicollinearity, while ridge and lasso regression both require the first 3.In our case, we were able to rule out the models on the basis of their predictive accuracy.
We also test two non-parametric methods, XGBoost, and Random Forest models.XGBoost is a sequential tree-based model that fits decision trees sequentially using the residuals of the prior tree in  each subsequent tree until a model quality threshold or iteration threshold is reached [56].Random forest models are also tree-based and build an ensemble of decision trees on randomly partitioned subsections of data and then use the average prediction across all trees to estimate for a given data point [40].Both nonparametric models do not make any statistical assumptions about the data.Based on the results of the cross-validation discussed in the previous section, we find Random Forest and XGBoost have very similar performance across all metrics, so here we continue using Random Forest as it is computationally more efficient.Finally we perform a series of tuning steps to further refine the RandomForest model.We test the number of trees in increments of 50 between 250 and 1000, as well as the number of variables randomly sampled as candidates at each split at 5 levels between 10 and 20.We perform a grid search of all combinations and evaluate the out-of-sample performance.We find that 650 trees and 20 variables provide the best out-of-sample performance and use those models to train a model used throughout the rest of the work.

A.3. ALE plots A.3.1. Partial dependence plots
Partial Dependence plots are a model-agnostic method for conducting inference on statistical models [57].The plots aim to show the marginal effect of a variable on the predicted outcome of the model and can show whether that relationship is linear, monotone, u-shaped etc.To calculate the partial dependence function for the jth variable ( fj ), we divide the training data into x j , the dependent variable of interest, and x j ′ , all other independent variables.We estimate fj for a given value of x j as where n is the number of observations in the dataset, f is the predictive model, and x (i) j ′ are the values of the variables which we are not interested in.We then sample from the marginal distribution of x j , evaluate fj at each sample point and plot fj (x j ).Notionally, a point on a PD plot, (x j , fj (x j )) is showing the average value of the prediction for a given x j across all x j ′ .

A.3.2. Individual conditional expectation
In some instances there may be heterogeneous changes in f(x j , x j ′ ) across values of x j ′ for a given x j which are masked by averaging the responses across the values of x j ′ .To examine the heterogeneity of fj (x j , x j ′ ) across values of x j ′ we can plot the individual predictions for each instance of x in a technique called an ICE plot [58].This results in a series of lines (one for each observation), in which the value of x j is changed but all values of x j ′ remain the same.These lines are averaged with respect to x j to return a partial dependence plot.Additionally, we can compute the average of points on an ICE plot with respect to another variable in x j ′ to return partial dependence plots grouped by another variable as a way to understand the impact of changing two features on the prediction.

A.3.3. Accumulated local effects
Partial dependence functions have limitations imposed by an assumption of independence between x j and all features in x j ′ .The issue that arises is that for a given observation x, in which we hold x ′ j constant and vary x j across its marginal distribution we may be evaluating our model's predictions on points that lie outside the input space of our observations.If we are creating a PD plot for the household area, one observation could be an 800 square foot household with 1 bedroom and 1 bathroom.By varying household area across its distribution, we will eventually evaluate the model's prediction for a data point that has 2800 sq ft but still one bedroom and one bathroom-an unrealistic observation in our dataset.
This can be ameliorated using ALEs plots [59].ALE plots perform a similar function as PD plots but use a different methodology.ALE plots cluster values of x j and calculate the marginal change in prediction when moving from one value of x j within the cluster to another.By only replacing x j with values close to it, ALE plots eliminate the issue of unrealistic data points.

Figure 1 .
Figure 1.(a), (b) Average monthly water use for winter (Dec, Jan, Feb) and summer (June, July, Aug) months.(c) Scaled distribution of key predictors of average monthly water use with 5th and 95th percentile values labeled on the left and right respectively.

Figure 2 .
Figure 2. Model fit.Out-of-sample actual and predicted average monthly water use values for the final, reduced-variable predictive model, colored by whether the block group has low or high use during a baseline, non-drought period.The black line is a slope of 1, representing a perfect fit.Out of sample values taken from a randomly selected cross validation fold.

Figure 3 .
Figure 3. Variable importance.Relative importance of the 14 most important variables as determined by their contribution to reducing prediction error.Colors show variable category.Values shown are the average improvement in out-of-sample model error when the variable is included, where a larger value indicates a larger reduction should the variable be included.

Figure 4 .
Figure 4. Partial dependence plots.PDPs for the 6 most important numerical variables.For each panel, the y axis shows the average change in water use prediction for a given change in x, holding all other variables constant.Values on the x axis are the 5th through 95th percentile of data.We find environmental, housing, and demographic variables all have strong, non-linear associations with water use.

Figure 5 .
Figure 5.Individual conditional expectation of temperature on water use.ICE plots show the relationship between a variable (x) on average water use (y) for census block groups subset by a third variable (colored lines).Red lines labeled high show the average response of block groups with an above-average value of the subsetting variable, while green lines labeled low show the response of variables with a below-average value of the third subsetting variable.Differences between red and green lines show that there is heterogeneity across block groups in the relationship between important variables and water use.

Figure 6 .
Figure 6.Model performance.Out-of-sample MAE, MAPE, and adjusted R 2 for all models tested.For MAE and MAPE, lower show models with better out of sample performance.For R 2 , higher values indicate better out of sample performance.

Figure 7 .
Figure 7. Accumulated local effects.ALE plots for the variables shown in figure 3. ALE plots show the marginal change in water use for a marginal change in x sampled throughout the distribution of x.Similar to PD plots, positive slopes show a positive relationship between x and y.Here, we compare ALE plots against the PD plots in figure4and find similar trends using both methods, indicating that our data results are not impacted by some of the limitations of PD plots described in section A.3.3.

Figure 8 .Figure 9 .
Figure 8. Variable importance.Cumulative variable importance with the red line showing the cutoff chosen in this work.

Figure 10 .
Figure 10.ICE plots of block group percentage single family homes.

Figure 11 .
Figure 11.ICE plots of block group fraction renters.

Figure 12 .
Figure 12.ICE plots of block group averagehousehold area.

Figure 14 .
Figure 14.ICE plots of the block group average number of fireplaces.

Figure 15 .
Figure 15.ICE plots of block group average MHI.

Figure 16 .
Figure 16.ICE plots of block group average year built.