Use of Beta Regression to investigate the link between home air infiltration rate and self-reported health

The UK has introduced ambitious legislation for reaching net zero greenhouse gas (GHG) emissions by 2050. Improving the energy efficiency of homes is a key priority in achieving this target and solutions include minimising unwanted heat losses and decarbonising heating and cooling. Making a dwelling more airtight and applying insulation can result in a lower energy demand by reducing unwanted heat loss through fabric and openings. However, the supply of sufficient outdoor air is required to dilute indoor airborne pollutants. This research investigates the relationship between dwelling air infiltration and self-reported health at population neighbourhood level for Greater London. This paper links data from a variety of sources including Energy Performance Certificates (EPCs), the Greater London Authorities’ Large Super Output Area (LSOA) Atlas and the Access to Healthy Assets and Hazards (AHAH) database at LSOA level. Beta regression has been performed to assess the influence of air infiltration rate on self-reported health, whilst controlling for other socioeconomic factors. All factors have been ranked in order of their association with self-reported health. Findings indicate that air infiltration rate has a positive association with the percentage of people reporting themselves to be in “good or very good” health.


Introduction
The UK government revealed its carbon emissions plan in 2019 that the country will target net zero emissions by 2050 [1]. In 2016, approximately 500,000 GWh of energy was consumed by the UK domestic sector, accounting for 30% of total consumption [2]. Space heating is responsible for roughly 60% of home energy consumption. Given that most dwellings that will exist in 2050 have already been built, improving home energy efficiency is a key priority. The UK Government's efforts to promote home energy efficiency began towards the end of 20 th century [3]. The Decent Home scheme aimed at providing minimum standards of housing conditions for those who are housed in the public sector [4]. The Warm Front scheme helped the poorest UK households to install heating measures and insulate their homes [5]. The Green Deal assisted occupiers to pay for energy efficient home improvements ranging from building envelope to renewable energy [6].
Human health and wellbeing and the influence of the built environment is not a new topic. The first schemes in the UK date back to the 19 th century when Edwin Chadwick in his report 'The Sanitary Conditions of the Labouring Population' revealed that diseases could be eradicated by improving the ventilation of London homes [7]. In the 20 th century, Ebenezer Howard envisioned the garden city concept which aimed at providing citizens with a healthy living environment [8].
The direct impact of housing on human health has been acknowledged in many studies where poor housing increased risk for depression, anxiety, respiratory and cardiovascular diseases [9,10]. Cold, damp and mould are some of the highest risk factors on occupant health. Cold is strongly associated with winter deaths, however better heating and higher SAP ratings can help alleviate this issue [11,12]. Emissions from indoor combustion, volatile organic compounds (VOCs) and toxic metals are also found to be damaging to occupant health [13][14][15]. Summer overheating within homes also affects occupant thermal comfort [16]. These issues rely heavily on ventilation to help regulate the indoor environment, promoting human health and wellbeing.
This research extends recent work by the authors investigating the link between home energy efficiency and self-reported health in Greater London [17]. Here we link to additional neighborhood data in the form of the Access to Healthy Assets and Hazards (AHAH) version 2 to draw on wider environmental, social and demographic factors as opposed to using the index of multiple deprivation as in Symonds et al. [17]. This paper focuses on the link between air infiltration and self-reported health and ranks this variable in terms of its association with self-reported health at Large Super Output Area (LSOA) level to other environmental, social and demographic factors.

Methods
This study is comprised of secondary data analysis using data derived from LSOA Atlas, Access to Healthy Assets and Hazards (AHAH) v2, Strategic noise mapping and Energy Performance certificates (EPCs) databases. Data were filtered, sorted and combined at Lower Super Output Area (LSOA) level for London area (N=4,835). Variance inflation factors were calculated to avoid multicollinearity between independent variables in the regression. Statistical analyses were then carried out to investigate association between self-reported health and median air infiltration rate and other confounding factors at LSOA level. Figure 1 illustrates the methodology schematically.

Data and sources
Data was obtained primarily from four large datasets: LSOA Atlas, AHAH v2, EPCs and the 2011 census [18][19][20][21], with additional data sourced from other sources such as strategic noise mapping and Greater London Authority [22,23]. Table 1 summarises each variable included in this study, the publication year and the data source.   [20]. Residents were asked to rate their general state of health based on a five-point scale: 'very good', 'good', 'fair', 'bad' or 'very bad'. In this study, subjective health is defined by the proportion of respondents who rated their personal health status as 'good' or 'very good'. Therefore, subjective health is used as a proxy for health and wellbeing among residents within the local neighbourhood (LSOA). It should also be noted that self-reported 'good' or 'very good' health may not reflect the true clinical health status of individuals.

The main independent variable considered is the median air infiltration rate of homes within
LSOAs and is expressed in air changes per hour (ach). Data relating to the energy efficiency characteristics of homes was extracted from the EPC database. The air infiltration rate at individual dwelling level is calculated based on the SAP methodology [21] and includes uncontrolled ventilation relating to: • Chimneys, flues, fans and passive vents • The number of stories in the dwelling • Wall construction depending on steel, timber or masonry types • Unsealed suspended floors or draught lobbies • Unsealed windows and doors Demographic and economic factors, employment rates and households without car access were extracted from the 2011 census [20]. Female fraction and the median age of each LSOA were derived from ONS data [20]. Owned outright housing describes the percentage of households where the owner-occupier owns a house outright. Furthermore, estimated median annual household income was provided by the GLA [23]. In England, the General Certificate of Secondary Education (GCSE) is the qualification that marks the graduation of compulsory education [25]. The average GCSE point scores were taken from the department for education via neighbourhood statistics [20].

Statistical analysis
Python and R were used for data processing, analysis and visualization including calculating variance inflation factors, performing beta regression analysis, and calculating interquartile odds ratios and pvalues [26,27].

Variance inflation factors
(VIFs) measure the amount of multicollinearity in a set of multiple regression variables [28]. In this study, VIFs were used to identify the existence and severity of multicollinearity between independent and confounding variables. Variables with VIF >5 were subsequently excluded from the regression analysis.

2.2.2.
Beta regression was used to examine the relationship between self-reported health and the independent variables (covariates). Proposed by Ferrari and Cribari-Neto, beta regression model is designed for continuous variates where values are assumed to reside in the standard unit interval. This method assumes that the dependent variable is beta-distributed with its mean related linearly to independent variables via coefficients and a link function [26]. Beta regression is therefore appropriate for this analysis as the self-reported health metric is constrained to within 0-100% (or 0-1). In R, the betareg R package was used to implement maximum likelihood regression and establish a logit link function [29]. The dependant variable was hereby mapped from the original 0-1 to the range of real values with beta regression coefficients  identified for each independent k th covariate. (ORs) have been used to quantify the strength of the association between covariates and self-reported health. ORs represent the odds for outcome A to occur given exposure B exists, compared to the odds for outcome A to occur given exposure B's absence [30]. The OR may be calculated for each k th covariate as:  , where c represents one unit increase in the covariate [31]. In this analysis we calculate the interquartile OR, which is the ratio of the odds for each covariate at its 75 th vs the 25 th centile:  Here  represents the fitted beta regression coefficient and ( 75 − 25 ) is the absolute difference between the 25 th and 75 th centile for each covariate. This helps compare associations between different covariates since they all have different units, distributions and scales. (p-values) are defined as the largest probability under the null hypothesis about the unknown distribution of the test statistic to have observed a value as or more extreme than the actual observed value [32]. A small p-value (less than 0.05) means the observed outcome is possible but not very likely under the null hypothesis. A p-value of less than 0.05 is said to be statistically significant at the 95% confidence level. Table 2 summarises the characteristics of all variables in this study at Greater London LSOA Level. On average, 84% percent of people within an LSOA reported 'good or very good' health. For London dwellings, the median air infiltration rate is around 0.6 ach.

Beta regression
Regression results which yield interquartile ORs for air infiltration and covariates with self-reported health are presented in figure 2. Results indicate that air infiltration has a positive and statistically significant association with self-reported health. Air infiltration shows positive association with selfreported health. It had the 8 th highest interquartile OR amongst the variables which had positive associations.

Discussion
This research aims to analyze the relationship between home air infiltration rate and self-reported health in Greater London. Results suggests a clear link exists between air infiltration rate and self-reported health at population level. In terms of significance and positive association, air infiltration ranks lower in terms of its interquartile OR than some of the key socio-demographic factors such as household income, employment rate and age, but is above the education variable (Average GCSE point score) and all environmental variables including distance to green and blue spaces, road and rail noise and the air quality index.
Increasing the air tightness of dwellings is widely accepted as a means of cutting down carbon emissions from the buildings sector, by reducing heat losses through bulk air movement. However, the statistically significant positive association between air infiltration and self-reported health observed in this study suggests higher air infiltration rates may positively contribute to subjective health. This could be due to higher uncontrolled ventilation allowing indoor moisture, pollutants, VOCs and odors to be diluted with fresh outdoor air [13][14][15][16].
The observed relationship between air infiltration rate and self-reported health is consistent with previous studies. Hamilton et al.'s modeling study revealed that compensatory ventilation is required in airtight dwellings to help reduce indoor pollutants, and hence improve health [12]. Hence, it is critical to maintain appropriate levels of background ventilation whilst improving air tightness when it comes to promoting occupant health and wellbeing.

Limitations
The study has several limitations. The first limitation relates to the variables included in this analysis. While this study adopted secondary data for regression analysis, variable collinearity is an important issue in regression analysis. Due to high collinearity with other variables, some predictors were removed. Although only variables with a VIF of less than 5 were included, some degree of collinearity between covariates still exists.
Another limitation concerns the estimation of air infiltration rates using EPC data based on SAP calculations. Despite the large uncertainties in the SAP methodology [33], air infiltration rate only describes unintended ventilation and intended ventilation (e.g. window opening, extract fans, etc…) is excluded. Variation in the time horizon of the used data is a third limiting factor. Air quality data was collected in 2008. However, health service, green and blue space locations were gathered in 2017. The data collection period spans over a ten-year period, which inevitably leads to inconstancies. Also, since the self-reported health survey was carried out in spring, seasonal variations in mood for subjective wellbeing are also not accounted for [34].
The fourth limitation relates to the studied area and generalisability. This research only considers homes within Greater London which has a unique housing stock and climate. Finding may not be applicable to stocks which rely more on mechanical ventilation in different climate zones. Future research could examine whether these associations are applicable to other cities and climates.
A final limitation is that results are aggregated at LSOA level (1,000-3,000) people which only allows inference at neighbourhood level. Data at individual level would be preferable, but it is much more difficult to match health records and personal data at this resolution.

Conclusion
This study investigated the relationship between dwelling infiltration rate and self-reported health at population (LSOA) level for Greater London. Whilst annual household income and housing tenure appear to be the main indicators of subjective wellbeing, results show a positive and statistically significant relationship between air infiltration and self-reported health. These findings imply the importance of appropriate compensatory ventilation in homes which have undergone air tightening. It is thus crucial for policy developers and engineers to consider indoor environmental impacts while retrofitting the existing building stock.