The variability of volatile organic compounds in the indoor air of clinical environments

The development of clinical breath-analysis is confounded by the variability of background volatile organic compounds (VOCs). Reliable interpretation of clinical breath-analysis at individual, and cohort levels requires characterisation of clinical-VOC levels and exposures. Active-sampling with thermal-desorption/gas chromatography-mass spectrometry recorded and evaluated VOC concentrations in 245 samples of indoor air from three sites in a large National Health Service (NHS) provider trust in the UK over 27 months. Data deconvolution, alignment and clustering isolated 7344 features attributable to VOC and described the variability (composition and concentration) of respirable clinical VOC. 328 VOC were observed in more than 5% of the samples and 68 VOC appeared in more than 30% of samples. Common VOC were associated with exogenous and endogenous sources and 17 VOC were identified as seasonal differentiators. The presence of metabolites from the anaesthetic sevoflurane, and putative-disease biomarkers in room air, indicated that exhaled VOC were a source of background-pollution in clinical breath-testing activity. With the exception of solvents, and waxes associated with personal protective equipment (PPE), exhaled VOC concentrations above 3 µg m−3 are unlikely to arise from room air contamination, and in the absence of extensive survey-data, this level could be applied as a threshold for inclusion in studies, removing a potential environmental confounding-factor in developing breath-based diagnostics.


Introduction
Breathomics, is being developed to stratify patient phenotypes, and monitor metabolic and disease mechanisms [1][2][3], and biomarker discovery with breathomics is being applied to conditions that include respiratory disease, cancer, infections and pulmonary illnesses [4][5][6]. Further, breath analysis for assessing occupational exposure to volatile organic compound (VOC) is well-established [7][8][9]. A challenge in the development of clinical breath-testing is accounting for the heterogeneity of patient responses to variable backgrounds of environmental VOC. Failure to adequately address this factor may confound breathomic biomarker discovery and breath-testing activity [10,11].
Variability in breath biochemistry derives from: environmental contaminants; genetics; diet and lifestyle; diurnal changes in metabolism; endocrine cycles; medication; emotional/ psychological states; and, disease progression, see figure 1. Environmental VOC exposure from inhalation, trans-dermal absorption or ingestion may result in elevated exhaled concentrations of VOC, and/or metabolic/catabolic products not originally in the environment [12]. Further, endogenous VOC, and disease markers may also be obscured, consequently, the VOC exposome generates a risk of false-attribution leading to breathtesting outcomes that are difficult to reproduce or translate into clinical practice.
Studies of VOC in hospitals, homes, and workplace settings over a prolonged period have shown exhaled breath VOC and environmental VOC often contain the same VOC. Common VOC have been observed in matched samples from ventilators, blood, breath and the room-air of clinics [13,14]. Widely reported VOC biomarkers for respiratory diseases, such as cyclohexanone for chronic obstructive pulmonary disease (COPD) [15] have also been detected in indoor air at concentrations (1.13 µg m −3 ) [16], close to those in exhaled breath (0.4-10 µg m −3 ) [17].
Spatial and temporal variability of VOC in clinical settings has been observed with acetone, ethanol and propanol concentrations found to vary significantly while other VOC did not. Within the same study exhaled concentrations of acetone, ethanol, acetic acid, ammonia, isoprene and hydrogen cyanide were found to be higher in the breath of ten clinical staff than in their surrounding environment, and propanol (a disinfectant) was at higher environmental concentrations [18].
The sources and patterns of VOC in the indoor air of dwellings have been studied extensively with 2246 samples monitoring a panel of 61 VOC, enabling seasonal effects to be attributed to 8 proposed VOC patterns and sources [16].
It is also helpful to note that the exogenous VOC toluene and benzene may also arise from degradation of the adsorbent Tenax TA (used in thermal desorption tubes), highlighting the point that analytical systems also create a trace VOC profile that may vary in response to changes in the environmental background (acidity, basicity, ozone and humidity), and that Tenax and multibed thermal desorption sampling tubes should not be assumed to be inert [19].
Exogenous VOC may affect breath marker discovery by: raising exhaled breath concentrations (falsepositive); raising the concentration threshold for inclusion of an exhaled VOC as coming from endogenous origin causing endogenous compounds to be excluded from a study (false-negative); give rise to catabolite signals with endogenous features (falsepositive); and/or act as a contrast agent that leads to correct identification of a process, mechanism, or breath-biochemical derangement, but in a nonreproducible manner.
This study addressed the environmental VOC breathomics issue with room-air quality-control samples taken from three sites in a large NHS provider trust in the UK-University Hospitals of Leicester obtained in support of a real-world prospective clinical study of 277 acutely breathless hospitalised patients admitted with one or more of these conditions: asthma, acute COPD; pneumonia; or, heart failure. In addition, 55 healthy aged-matched controls were also sampled. The aim was to characterise the variability of composition and concentration of clinical-VOC, and establish, thresholds, or reference-levels, to inform occupational-exposure and breathomics workflows [20]. The resultant sample-set captures real-world operational clinical-VOC exposure across 27 months and reveals the variability and extent of exogenous and endogenous VOC present in clinical room air at trace levels.

Materials and methods
This study and sampling campaign was part of a larger prospective, real-world, observational study, carried out in a tertiary cardio-respiratory centre in Leicester, United Kingdom, and the study design, and methodology have been described in detail previously [20].
Two hundred and forty five room-air samples were collected, of which 225 samples were collected from Respiratory Medicine and Clinical Decision Units at Glenfield Hospital, Leicester, UK and 20 samples were collected from Paediatric Respiratory Medicine, Leicester Royal Infirmary Hospital, Leicester, UK, from November 2016 to February 2019. These samples were taken during normal clinical operations in the presence of patients participating in respiratory research, and research clinical staff. On occasions, other patients and staff were present, particularly if the samples were collected from a ward-bay.
VOC artefacts and contamination were reduced/eliminated from the materials and components used within this study with appropriate combinations of solvent cleaning, vacuum polishing, and temperature conditioning. All equipment was sealed in aluminium packaging ready for use before shipping to the clinic. Research and clinical staff involved in the study were trained to follow specifically designed sampling standard operating protocols, and were proficiency tested in their use [21]. Sampling protocol checklists were used to verify compliance with the standard operating protocols.
Indoor air was sampled (1000 cm 3 ) using an Escort ELF pump (Part No. 497702, MSA), at a flow rate of 500 cm 3 min −1 for 120 s onto a Tenax®/Carbotrap 1TD hydrophobic adsorbent tube (Part No. C2-AXXX-5032, Markes International Ltd, Llantrisant, UK, see figure S1 available online at stacks.iop.org/JBR/16/016005/mmedia). Samples were sealed and immediately stored at ca. 4 • C, before shipping to Loughborough Centre for Analytical Science within approximately three days. Room air samples were dry-purged as soon as possible upon receipt with a 120 cm 3 of purified nitrogen at a flow rate of 60 cm 3 min −1 . Toluene-D8 (69 pg) and trichloromethane-d (280 pg) internal standards were loaded during the dry purge onto the sample tube with a six-port valve attached to a permeation tube-based test atmosphere generator (constructed in-house). Dry-purged and internal-standard spiked samples were then sealed and stored at 4 • C prior to analysis.

TD-GC-MS operating conditions
Samples were analysed by thermal-desorption/gaschromatography/mass-spectrometry (TD/GC/qMS). A Unity-2 thermal desorption unit (Markes International, Cardiff, UK) was interfaced to a GC (Agilent, 7890 A) coupled to a quadrupole mass spectrometer (Agilent MS 5977 A). The VOC collected during sampling were recovered and concentrated into a hydrophobic cold trap yielding a 10 000-fold enrichment. Separation by gas chromatography used a low-bleed cross-bond diphenyl dimethyl polysiloxane stationary phase (Rtx-5 MS Cap. Column 60 m; 0.25 mm, ID; 0.25 µm), the instrumentation parameters are summarised in table S1.

Statistical process control
The approach to quality control has been described previously [22]. Instrumentation performance was continuously monitored by analysing 0.2 µl of a reference mixture containing 20 standards (table S2) daily before analysis (every 4 samples). Instrument performance was evaluated by monitoring the Z-scores of: retention time, peak area, height, width, and symmetry for the 20 standards in the reference mixture. Analysis was undertaken when instruments were within Z = ±3 for more than 80% of the 100 quality control parameters [23], [22]. The two internal standards were also monitored to track the combined stability of the TD-GC-MS analysis and dry purging system see figure S2 for comparative examples of data. Instrument condition was verified by running system blank desorption profiles before and after every analysis. The blank desorption profiles were also assessed statistically [22], figure S3. These procedures simultaneously screened for possible adsorbent tube conditioning artifacts, adsorbent degradation artefacts, cold-trap and transfer line artefacts, and chromatographic integrity, Further all adsorbent tubes were weighed on a five figure balance after conditioning and discarded if any significant weight change was observed (Indicative of loss of adsorbent, ingress of contaminant or environmental adducts.). A formalised data quality and quality control monitoring clinical bioinformatic system was developed to capture all measurement meta-data in a consistent and traceable manner [24].

Data processing
The TD-GC-MS data were deconvolved and an average of 120 VOC features per sample (AnalyzerPro Spectral Works, UK) were extracted. The deconvolution method was optimised to minimise overdeconvolution, (AnalyzerPro software method parameters were: minimum peak area value = 200, S/N = 3, width of peak = 0.01 and smoothing factor = 3). The extracted features were aligned using linear retention indexing (AnalyzerPro Spectral Works, UK) [25], and clustered using the VOCCluster algorithm [26] that assigned a unique identifier in the form of (ERI-m/z1-m/z2-m/z3-m/z4-m/z5) to each VOC isolated and grouped from the 245 samples; ERI indicated the linear retention index for the VOC environmental-feature and m/z1 … m/z n were the nominal masses of the compound's ion fragments in decreasing order of abundance needed to uniquely define the deconvolved VOC features within the data-cube. The resultant sorted, and grouped features were consolidated into an environmental-VOCdata matrix that contained the extracted peak areas for each of the features isolated from each room-air sample.
Ubiquitous siloxanes arising from analytical artefacts were also removed from the Environmental-VOC-Matrix, and seven samples were discarded as they did not meet the quality control measures.

Multivariate analysis for seasonality effects
Once compounds that occurred in less than 30% of samples had been excluded, the matrix features were log 10 transformed and Pareto scaled before multivariate analysis was used to determine if there were any seasonal effects in the study [27,28]. The data was classified into two groups; September-February, n = 99 samples; and, March-August, n = 139 samples. Orthogonal partial least squaresdiscriminant analysis was initially used to identify seasonally invariant VOC, and these were removed from the analysis leaving 44 compounds for unsupervised principle components analysis. This approach followed a multivariate statistical processing workflow using SIMCA-P+ software with integrated 7-fold cross-validation to protect against overfitting (Version 16.1, Umetrics, UK) [27]. Compounds were putatively assigned a level-2 identification in accordance with the Metabolomics Standards Initiative [29], based on retention-index and NIST mass-spectral library matches.

Results
Two hundred and forty five clinical room air samples yielded 7344 VOC features, with 328 compounds that occurred more than 12 times (5%) that were selected for further assessment, see figure 2. Sixty eight compounds appeared in more than 30% of the total samples. Thirty nine of these have been assigned a putative identity (class 2 identification level [29]) and a further 11 compounds were classified as hydrocarbons (Class 3 identification level [29]), see table 1. These data were used for multivariate analysis. (The identities of 18 of these compounds have yet to be elucidated.)

Seasonality
Seasonal variations in concentration, were identified from multi-variate analysis with 17 compounds found to be present at higher concentrations in samples obtained between September and February, figure S4 and table S3. In 12 of the 17 compounds the differences in concentration were statistically significantly different (normalised to the internal standard).
Cyclohexanone (1.5-fold increase, t-statistic = 1.79, one-tailed critical t-value = 1.66, p = 0.038 with 62 degrees of freedom) has been previously reported as a marker for COPD (sensitivity = 60% and specificity = 91%) [30]. Benzaldehyde (2.2-fold increase t-statistic = 2.27, one-tailed critical t-value = 1.66, p = 0.012 with 100 degrees of freedom) is involved in fatty-acid and tryptophan metabolism as well as glycolysis/gluconeogenesis [31]. Exhaled benzaldehyde has also been reported: in breath samples from participants with severe pulmonary arterial hypertension, as well as healthy individuals [32]; possibly generated from bacterial degradation of common amino acids such as phenylalanine, tryptophan or tyrosine [32]; lung cancer cell lines (n = 6) discriminating from healthy control cell lines (n = 1) [33]; and, has also been proposed to originate from exposure to tobacco smoke, radiation or air pollution with peroxidative properties capable of damaging enzymes and DNA [34]. Phenol (2.1-fold increase, t-value statistic = 1.84, one-tailed critical t-value = 1.66, p = 0.034 with 95 degrees of freedom.) is associated with petroleum products as well as tobacco smoke [35]. Phenol is also associated with oesophageal or gastric adenocarcinoma, and has been observed to be significantly higher in cancer patients (n = 81) compared to healthy individuals (n = 129) (P < 0.05) [36], figure 3.
Ethanol (2-fold increase, t-statistic = 2.63, onetailed critical t-value = 1.66, p = 0.0048 with 113 degrees of freedom) and ethanal (2-fold increase but not statistically significant) may be attributed to ethanol consumption and metabolism with the observed increases in abundances of due to higher seasonal alcohol consumption. (Note that the ethanal mass spectrum did not fall completely in the mass spectrometric scan range and verification with derivatisation was not undertaken.) Cleaning product/disinfectants such as 2-propanol (2-fold increase, t-statistic = 2.18, onetailed critical t-value = 1.66, p = 0.016 with 85 degrees of freedom) were also higher in winter.
Variation between different clinical settings was discernible with 3 compounds unique to the paediatric setting that were related to the paediatric anaesthetic sevoflurane [37,38]. Sevoflurane and its two metabolites 1,1,1,3,3,3-hexafluoro-2-propanone, and 1,1,1,3,3,3-hexafluoro-2-propanol were putatively identified based on NIST-mass-spectral library matches, see figure S5. Sevoflurane and its metabolite 2-propanol, 1,1,1,3,3,3-hexafluoro have been values with their intensity normalised to 280 pg of C( 2 H)Cl3. P represents the data from the children's clinic. The rest of the data were obtained from the adults' wards with seasons of each year (spring, summer, autumn and winter) indicated by the grey shaded brackets found underneath each year. Note that 7016 VOC isolated from less than 5% of samples are not included. measured previously in human breath (n = 6), and used to build a three-compartment pharmacokinetics model to study environmental contaminants and breath data [39]. The presence of these VOC may well have arisen from a ventilation circuit shared with a surgical theatre on a different floor of the building.
As well as cyclohexanone and benzaldehyde other disease markers were found in the room air of clinics and figure 4 also includes the distributions of the inflammatory biomarkers nonanal, hexanal and decanal across the sampling campaign.
Another commonly reported breath biomarker, octanal, was also present, and provides a useful case-study into the variability of the concentrations encountered and the frequency of occurrence. Octanal was isolated from 101 room air samples, a frequency of occurrence of 42%, the highest concentration observed was estimated to be 96 ng m −3 (expressed as a C( 2 H)Cl 3 equivalent) with a minimum observed concentration of 1.04 ng m −3 , and the median observed octanal concentration was 24.32 ng m −3 . Tests for normality and log-normality (Shapiro-Wilk) indicated a non-normal distribution and assessment of the time series data did not reveal any seasonal pattern, see figure 5. Widening such an assessment to all of the 328 most frequently observed VOC revealed that most were present at concentrations below a threshold of 3 µg m −3 (expressed as a C( 2 H)Cl 3 equivalent) with 14 compounds present at higher concentrations in the range 10-100 µg m −3 , see figures 6 and S6. The compounds present at the highest concentrations appear to be associated with solvent and disinfection formulations (2-propanol for example), or, higher molecular weight waxes associated with the use of nitrile protective gloves. The more volatile the contaminant the more frequently it was observed.

Discussion
This study describes the complexity of the contamination profiles and concentration distributions of VOC in the indoor air of hospitals and highlights seasonal and clinical variations. Ninety five percent of the compounds observed occurred in in less than 5% of samples, creating a highly variable and nonreproducible VOC profile. Such variability may be attributed to the constantly changing demographic of the occupants of a busy clinical facility, combined with the range of therapies and therapeutics being administered. A less variable constituency of the background contained a mixture of ubiquitous exogenous VOC. Further, evidence that exhaled VOC may also be considered as VOC source in a clinical environment, and that seasonal factors were also present, was noted.

Exhaled VOC
Changes in the VOC composition of room air from exhaled breath have been described with on-line proton transfer mass spectrometric studies reporting the presence of low molecular weight exhaled VOC, with estimated exemplar office air concentrations for acetone, ethanol, and isoprene of 52, 32, and 13 µg m −3 respectively [40], while studies with cinema audiences have monitored, and coded, VOC levels in room air to the emotional states of the scenes being viewed [41]. Other VOC associated with skin volatiles and the gastric tract were also reported as constituents of the cinema auditorium's air. At a significantly larger scale, changes in VOC levels in stadium air have been reported at sports events with isoprene, acetone and ethanol increasing to 8.5, 9.7 and 580 µg m −3 respectively during a football match [42]. Such studies suggested that the observation of exhaled volatiles Table 1. Fifty VOC that have been putatively identified that appeared in more than 30% of the samples. VOC no. ERI code generated by VOCCluster [26] Putative compound identification  Table 1.
VOC no. ERI code generated by VOCCluster [26] Putative compound identification in clinical room air was to be anticipated, and that the 10 4 sample enrichment obtained through two stage thermal desorption, combined with temperature programming, would enable concentrations lower than 1 µg m −3 to be monitored. Exhaled VOC at trace levels were found in the room air of the clinics studied, most notably the two metabolites of sevoflurane, for it is difficult to conceive of an alternative source. The aldehydes in figure 4, noted as oxidative stress biomarkers associated with respiratory disease, have also been associated with aging and outgassing from linoleum flooring [16], and decanal is also an oxidative product of skin lipids. It seems plausible to propose that the observed airborne concentrations of aldehydes were derived from a combination of constant background emissions due to outgassing from flooring and building materials, overlaid with concentration transients from exhaled breath combined with changes in room occupancy and associated ventilation.

Seasonal VOC
Previous studies on VOC domestic air contaminants [16] have considered seasonal factors in indoor contamination profiles and levels. Factors such as seasonal changes in ventilation and indoor-based activities were identified. Further, the effect of solar-radiation on building materials and outgassing was highlighted. The VOC in table S2 that differentiate between the summer and winter months may be attributed to sources that include: personal care, cleaning and disinfection formulations (propan-2-ol, (1 R,2 S,5 R)-2isopropyl-5-methylcyclohexanol [dimenthol], (6S)-2,6-Dimethyl-7-octen-2-ol and 2-butoxyethanol); outgassing from plasticiser/polymer components (2-ethyl-1-hexanol and 2,5-Cyclohexadiene-1,4dione-2,5-diphenyl, phenol, 1-methylethylbenzene [isopropylbenzene] and cyclohexanone); pollutants associated with fugitive emissions from vehicles and their exhausts (toluene, ethylbenzene, benzene, 1,4-dimethylbenzene and 1,2-dimethylbenzene, 1-methylethylbenzene); and exhaled volatiles (cyclohexanone, ethanol and ethanal). Such seasonal changes are consistent with changes in ventilation, noted previously [16], and increased usage of cleaning products associated with higher bed occupancy rates during the winter season. Increased concentrations of Breath-biochemistry research and biomarker discovery and monitoring studies manage room air contamination compounds in different ways. Some studies invoke an 'alveolar gradient' concept [43] and others remove compounds from the data-processing workflow where the environmental background is greater or equal to 5% of the exhaled concentration. An alveolar gradient approach is problematic because: the data presented in this paper indicate that important disease markers are routinely also present in room air; and in a clinical environment a participant's exposure to a specific VOC is unknown, as is the subsequent rate of the compound's elimination prior to the breath sample being taken. Another approach that excludes breath data if it is detected in the environment [44] is also unsatisfactory for the first reason given above.
Alveolar gradient and environmental exclusion approaches are unlikely to resolve the confounding factor of variable environmental backgrounds, participant exposures, and participants' catabolisms. Further, such approaches also create the possibility of uncontrolled, and variable, site and time dependent factors determining whether or not participant data is admitted into a study. Figure 5 shows a range of inclusion concentrations (20 times the environmental background that might be applied over the 245 breath data sets acquired) for octanal with a range from ca. 20 ng m −3 to 2 µg m −3 (expressed as a C( 2 H)Cl 3 equivalent). Adopting a concentration gradient approach means that the inclusion criteria for breath data varies between participants in an apparently arbitrary manner with little or no evidence to relate the studies threshold concentration (for that particular sample) to exposure or washout.
With the exception of a few solvents (formulation compounds) and some mould release agents, associated with disposable PPE, most volatiles, when present, were below a threshold, see figure 6. In the case of octanal this was 82 ng m −3 (expressed as a C( 2 H)Cl 3 equivalent), and an exhaled breath concentration above this level was highly unlikely to be due to environmental exposure. A study of the 328 common pollutants indicates that a default level of 3 µg m −3 (expressed as a C( 2 H)Cl 3 equivalent) is a threshold concentration above which most clinical room air VOC do not rise. In the absence of reliable site-specific environmental survey data applying such a threshold is a more reliable, reproducible and systematic approach than adopting the variable levels acquired by spot sampling. Infrequent VOC  figure's whiskers indicate the 20-fold threshold widely applied for the inclusion/exclusion threshold for admitting a VOC into a breath biomarker discovery and validation data set. The non-reproducibility and variability of such an approach may be discerned. Figure 6. The minimum, median (red diamonds) and upper limit of observed concentrations across the 328 most frequently observed (>5%) VOC over the duration of the study. The octanal entry is denoted by the solid red circle. The concentrations of these compounds spanned a range of 2-3 orders of magnitude and were typically below 3 µg m −3 (expressed as a C( 2 H)Cl3 equivalent) with the exception of volatile solvents (RI less than 690) and heavier waxes associated with PPE production (RI greater than 2180). contamination transients would be identified as outliers and rare events in the exhaled breath profiles and be excluded from the data pipeline in the event they occurred.
Seasonal factors mean that studies and clinical breath testing need to account for possible bias. More robust discovery designs will randomise recruitment of participants over the seasons of the year and avoid overweighting those months with higher occupancy and higher clinical VOC levels. Such attention to scheduling might also be usefully applied to sampling sessions in clinic to identify other possible synchronisation of environmental exposure with sampling activities; cleaning, cooking and medication for example.
VOC that are frequently present within a facility at elevated concentrations may introduce a 'contrast agent' into breath studies, in that differentiation between classes of participants may be based on differences in how the contaminant is metabolised, arising from catabolic changes due to disease or treatment. Such a situation may result in fortuitous discovery, or non-reproducible results. It will be helpful to note and report background contaminants routinely detected at elevated concentration (i.e. above 3 µg m −3 ) and whether they may be modelled as part of breath data or not.
Finally, adopting a concentration threshold approach does not remove the necessity for obtaining reliable environmental background data as part of breath testing operations, such samples and analysis should still an integral part of a quality assurance and control protocol.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.