Screening for volatile biomarkers of colorectal cancer by analyzing breath and fecal samples using thermal desorption combined with GC-MS (TD-GC-MS)

Breath and fecal VOCs, among others, represent a new and encouraging clinical practice for the differential diagnosis of CRC. The purpose of our research was to identify VOCs present in exhaled air and feces of 20 HVs and 15 CRC patients. For collection of gas phase released from feces, emission microchambers were applied. Sorption tubes were used to enrich analytes for both breath and fecal samples. TD technique combined with GC-MS was used at the separation and identification step. The combination of statistical methods was used to evaluate the ability of VOCs to classify control group and CRC patients. Heptanoic acid, acetone, 2,6,10-trimethyldodecane, n-hexane, skatole, and dimethyl trisulfide are observed in elevated amounts in the patients group. The performance of diagnostic models on the tested data set was above 90%. This study is the first attempt to document the using of TD-GC-MS to analyze both breath and fecal samples to search for volatile biomarkers of CRC. A full evaluation of the results described herein requires further studies involving a larger number of samples. Moreover, it is particularly important to understand the metabolic pathways of substances postulated as tumor biomarkers.


Introduction
CRC is the third most commonly diagnosed cancer worldwide, and in Europe it is the second most common cause of cancer deaths [1,2]. With the emergence of metabolomics methods, there is research into the metabolite profiles associated with CRC to serve as a screening tool for early diagnosis [3,4]. Typical methods used to diagnose CRC include procedures such as: (1) per rectum examination, (2) contrast infusion, (3) colonoscopy/rectoscopy, (4) virtual colonoscopy, (5) CEA blood test, (6) FOB test, (7) fecal calprotectin test [5][6][7]. Colonoscopy provides the highest sensitivity and specificity for the diagnosis of CRC. During the examination, biological material can be collected with biopsy forceps and then the material is evaluated histopathological. It is also possible to remove small polyps [7]. However, it represents an invasive diagnostic method. All metabolic processes in a living organism result in the production of endogenous VOCs [8][9][10]. In the case of cancer cells, altered metabolism is observed, involving often increased glucose and glutamine consumption, increased glycolysis, changes in the use of metabolic enzyme isoforms, and increased secretion of lactate [11,12]. At present, the origin of many of the VOCs observed in various cancers has not been established precisely. Exploring the metabolic pathways that result in the generation or elimination of these VOCs will provide a better insight into the biochemical transformations that occur in tumors. For example, one of the key sources of aldehydes is their production as secondary oxidation products. The C3-C10 aldehyde production is associated with the reduction of hydroperoxides by cytochrome P450 (CYP450) through the lipid oxidation of omega-3 and -6 PUFAs, such as linoleic acid and arachidonic acid. However, aldehydes can also come from dietary sources, metabolized alcohols and smoking [13]. The formation of ketones is similarly closely associated with a higher fatty acid oxidation rate, which has been observed in several cancers. The substrate of ketogenesis, acetyl-coenzyme A (Acetyl-CoA), is formed as the main product of β-oxidation of long-chain fatty acids in the mitochondria. In turn, β-oxidation of branched fatty acids leads to the formation of heavier ketones. Nevertheless, it should be remembered that ketones in biological matrices can also come from exogenous sources, such as food and air pollution [14,15]. Alkanes, on the other hand, arise mainly during the peroxidation of PUFA lipids that make up biological membranes, resulting in phospholipid degradation and ultimately cell destruction [16]. All metabolic abnormalities in cancer cells result in a modified profile of volatile chemicals that can be excreted via breath or feces, among others [13,17]. Furthermore, changes in signal transduction, gene regulation and cellular proliferation can be tracked by the VOC profile of cells. In addition, the emission of CRC cell metabolites into the colon can result in major changes in the microbiome [18]. In addition, VOC profiles in CRC may result from inflammation-induced dysbiosis. Both factors translates into changes in the profile of volatiles produced by intestinal bacteria. On the other hand, there is evidence that changes in the composition and homeostasis of the microbiome may contribute to CRC [9]. The above indication of the importance of individual factors and their complex interactions, illustrates the complexity of the biology of VOCs emitted into the gastrointestinal tract. Either way, changes in the profile of VOCs, or VOCs that are produced at much higher or lower levels than normal, can therefore serve as biomarkers for cancer detection.
The presence of VOCs in biological samples at trace levels makes breath and fecal analysis challenging. For this reason, during the sample preparation stage, there is a need for use of sampling and preconcentration techniques such as SPME [19][20][21] and TD [22,23]. Following sampling of VOCs, separation, determination and identification of analytes are most often carried out by GC-MS [21,23,24], SIFT-MS [25][26][27][28] PTR-MS [29][30][31]. Currently, the emerging method for VOCs detection seems to be electronic nose [32][33][34]. Nevertheless, chromatographic techniques are still preferred for the analytics of VOCs, because they provide detailed information about the composition of the samples tested.
In this paper we presented method for noninvasive searching for volatile CRC biomarkers in breath and fecal samples. The basis of the study was an untargeted analysis of breath and fecal samples of patients diagnosed with CRC using conventional methods. For the breath sampling step, we used Tedlar® bags and then transferred the samples with a pump to sorption tubes. For fecal samples, we used emission microchambers to release and collect the volatile fraction. The release of volatiles from sorption tubes was performed using a thermal desorber directly connected to the GC-MS system providing separation and identification. Statistical analysis, including non-parametric U Mann-Whitney test with FDR correction for multiple comparisons, DFA and FA, allowed to determine a few characteristic chemical pattern for groups of patients with cancer diagnosis and healthy ones. The potential biomarkers that were selected in this way were used to create an efficient predictive model based on ANNs.

Materials and chemicals
1-bromo-4-fluorobenzene as analytical standard, methanol with purity ⩾99.99%, and Tedlar® bags for breath sampling were purchased from Sigma Aldrich (Steinheim, Germany). Sterile containers for fecal sampling were bought from local pharmacy. Disposable aluminum dishes (20 ml) to be placed in the emission microchambers were made in-house. VOCs were trapped in the steel tube filled with a Tenax TA/Carbograph 5 TD (Bio Monitoring) sorption bed manufactured by Markes International, Bridgend, UK. Nitrogen, helium and argon (purity of 99.999%) were purchased from Air Liquide, Poland.
All statistical operations were carried out using IBM SPSS Statistics, package version 28. Where appropriate, Microsoft Power Point 2019 was used to mark and combine figure

Instrumentation
To liberate VOCs from fecal samples the emission chambers system was used-Micro-Chamber/Thermal Extractor™-µ-CTE™ 120 (Markes International, Bridgend, UK). Detailed information about the design and metrological characteristics of the µ-CTE™ 120 system was presented in previous literature [35]. VOCs collection was carried out at nitrogen flow rate of 50 ml min −1 . The operating temperature of the emission chambers was 31 • C, and the sampling time was 20 min.
A two-stage TD model Markes TD-100 (Markes International, Bridgend, UK), as a sample injection method, was connected directly to the chromatographic column via a heated transfer line equipped with a deactivated capillary (2 m × 0.25 mm). TD conditions were as follows: desorption temperature of analytes from the sorption tube: 280 • C for 5 min, at a flow rate of 50 ml min −1 ; desorption from the cold trap: at 300 • C for 3 min, at a flow rate of 2 ml min −1 ; the trap heating rate was set at 100 • C min −1 , and the transfer line temperature was set at 200 • C.
The analysis was carried out using an Agilent 7820 A gas chromatograph coupled to an Agilent 5977B MSD mass spectrometer (Agilent Technologies, USA). The gas chromatograph was equipped with a DB-5 MS capillary column (30 m × 0.25 mm × 1 µm). Helium was used as the carrier gas in constant flow mode, with a flow rate of 0.5 ml min −1 . The column temperature was programmed as follows: an initial temperature of 40 • C was held for 5 min, then the temperature was increased by 10 • C min −1 -310 • C and held at this value for 4 min. The mass spectrometer was operated in EI mode (70 eV). The temperature of the ion source was set to 230 • C and the transfer line to 250 • C; the sampling rate was set to 2.9 scans s −1 , and the mass range was 50-550 m z −1 .

Sample collection
Prior to the study, approval was obtained from the Ethical Committee from the Nicolaus Copernicus University in Torun. Patients were recruited from the Department of General, Gastroenterological and Oncological Surgery at the Regional Hospital in Torun (Decision Number KB 49/2018). A summary of the data on the study subjects, is shown in table 1.
To ensure the anonymity of the study and protect personal data, each patient was assigned a unique code. Healthy persons were recruited for program which was carried out as part of the routine diagnostics of subsequent patients with suspected colon/rectal pathology or other non-specific abdominal ailments requiring endoscopy. The persons who had no pathology of the large intestine mucosa found in the colonoscopy, both in the endoscopic image and in the possibly taken biopsy were selected to participate in the study as a control group, i.e. HV (bowel pathology free). All subjects were fasting. Samples from CRC patients were taken after diagnosis, after a colonoscopy. All samples were collected at hospital. Patients were qualified for the program as part of the routine diagnostics. Breath samples were collected from test subjects into clean Tedlar® bags, according to earlier study [36,37] and processed on the same day to reduce the possibility of potential analyte losses. Stool samples were collected after defecation in sterile disposable containers dedicated to fecal analyses. Each time, the patient himself or with the help of the staff collected stool sample of about 3-5 ml with a sterile spatula. Then, the samples were placed in a sterile container, labeled and frozen at −20 • C, and transported to the laboratory in a frozen state, covered with cooling refills. Afterwards, samples were kept at −80 • C until analyses.
Breath samples were transferred from Tedlar® bags into sorption tubes containing Tenax TA/Carbograph 5 TD bed (Bio Monitoring) using a pump at a flow rate of 150 ml min −1 for 5 min. After the procedure was completed, the sorption tubes were immediately sealed to reduce the influence of air in the laboratory.
For the fecal study, a previously optimized protocol was used [35]. Briefly, 250 mg of sample was weighed each time on aluminum dishes. The samples were then placed in emission microchambers (Markes International, Bridgend, UK), at the outlet of which sorption tubes with Tenax TA/Carbograph 5 TD bed (Bio Monitoring) were mounted to collect the volatile fraction of feces.

Quality control/quality assurance
The Tedlar® bags were rinsed three times with argon, filled halfway with gas and stored at 55 • C to desorb any interfering compounds. Prior to breath sampling, the bag was evacuated. To determine the blank value, hospital air was collected (field blank) and subjected to instrumental analysis under the same conditions as the breath samples. The results were corrected for the values obtained from the analysis of the blank samples.
Regarding emission chambers, it is important for reliability reasons that they are made of low-emission materials and do not react with the target analytes. Therefore, the device was made of stainless steel to reduce the impact of wall memory on the final results. Prior to testing, the interior of the emission chambers was heated at 110 • C with a flow of purge gas (nitrogen, 50 ml min −1 ) to ensure that the interior of the chambers was sufficiently clean, to eliminate the effect of wall memory effect, and to determine the effect of purge gas purity on the results. In addition, steel tubes with Tenax TA/Carbograph 5 TD sorbent bed were conditioned in a TD unit at 300 • C for 1 h at a helium flow rate of 100 ml min −1 before being installed at the outlet of the chambers. To determine the quantitative value of the blank, a steel tube filled with Tenax TA/Carbograph 5 TD sorbent bed was installed at the outlet of the empty emission chamber in each measurement run. The operating conditions for the emission microchambers were identical to those for the analysis of fecal samples. After a defined period of operation of the device, the sorbent tube was removed from the chamber outlet and subjected to release and identification of analytes by TD-GC-MS under the same conditions as for conducting analysis of fecal samples. The peak areas of VOCs emitted from the fecal samples, were corrected each time by the estimated values of the blank.
Every morning before the beginning of analyses, the GC-MS instrument was subjected to a cleaning method. It included gradual heating of the column to 300 • C and purging with gas (the total time of this method was 30 min). If the background level was not determined to be acceptable, the cleaning method was repeated. The background level emitted from the clean sorption tubes was also checked.
An IS of BFB (1-bromo-4-fluorobenzene) at a concentration of 100 ng ml −1 in methanol was used. It was applied to the each sorption tube at a level of 0.1 ng each time before sampling. When the VOCs in the samples were tested, the ratio of the IS peak area to the peak area of each detected VOC was used to compare the obtained profiles. This eliminated possible errors due to lack of repeatability between sorption tubes.

Statistics and data analysis approaches
The resulting chromatograms were subjected to manual integration. Compound identification consisted of comparing the mass spectra of the peaks on the chromatogram with data in the standard NIST 17 mass spectra library. The accepted level of matching was at least 75%. In addition, it was checked whether a chemical compound identified in this way had a similar retention time during each run.
Data analysis was carried out according to the procedure shown in the flowchart presented in figure 1.
The first step was to perform a preliminary evaluation of the obtained data. Compounds whose presence was recorded once or twice in each population were excluded from the resulting database, which reduced the number of artifacts. The data were then presented as ratios of the peak area of each compound and the IS, and data reduction was performed by subtracting compounds present in blank samples. For missing values we used 'Median Approach' . This method replaces missing elements with a median value of non-missing elements in the corresponding variable.
A Pearson correlation was conducted in order to check the correlation and the level of significance between studied samples and VOC profiles obtained. The whole VOCs profile of each sample was used in this approach. The correlation matrix was designed based on the HCA and heat map approach, obtained using the method average linkage between groups within the interval Pearson distance.
The normality of the data distribution was then checked using the Shapiro-Wilk test. In the next step, significant differences between the study populations of healthy and diseased subjects were indicated using the Mann-Whitney U test. Variables that passed the Mann-Whitney U test were subjected to FDR correction for multiple comparisons. This was dictated by the fact that when many variables are tested at the same time, many statistically significant p (probability level) values will appear by chance, due to the natural variability of the process. Therefore, it is necessary to make appropriate adjustments of pvalues to control the probability of a false conclusion. For this purpose, the Benjamini-Hochberg procedure was used. Further reduction of the number of variables was carried out by means of DFA using forward stepwise method with a desired tolerance of 0.1 and F-input and F-output values of 1 and 0, respectively. Significant variables obtained from DFA, were used to perform FA. This method has the well-known ability to substitute original variables for so-called factors associated with individual variables, and provides an opportunity to describe the effect of individual associations on the association of fecal and breath samples with predefined groups: control and diseased subjects. PCA with Varimax rotation of its loads was used to calculate the factors.
The reduced data obtained were used to create a predictive model based on an ANN-to provide a polynomial classification of fecal and breath profiles as healthy or CRC. In practice, the statistical model could be used to accurately screen for CRC cases based on analysis of VOCs in feces and breath [38]. The model's performance was checked by ten-fold crossvalidation, using 80% of the data as the training set and 20% as the test set. The selection of variables used to ANN considered two criteria: a compound which presented itself as a discriminating feature; and that this compound could be addressed as a product of metabolic reactions. Multilayer perceptron ANN with one hidden layer were used to create the diagnostic models. The activation function of the hidden layer was sigmoid. Softmax was the output layer activation function. Calculated probabilities were used as input to build ROC curves for each predicted class.

VOC profiles for breath and fecal of HVs and CRC patients
Based on the conducted analyses, a matrix of data was compiled, including information on the name of VOCs identified in each sample, their CAS number, retention time and peak area calculated relative to the IS surface area. The prepared matrix was the starting point for running a statistical analysis. After artifact data reduction, 147 compounds detected in the volatile fraction of fecal samples and 93 in breath samples were included in the matrix of data-in total for samples taken from healthy and CRC patients. A list of identified VOCs can be found in supplementary data tables S1 and S2. Examples of chromatograms (supplementary data, figures S1 and S2) indicate the complexity and multifactoriality of the breath and fecal samples studied, which is mainly due to the individual variability of the subjects. For this reason, it was decided to provide insight into the overall profiles, reflecting the contributions of particular functional groups to the volatilome of feces and breath.
It can be seen that the percentages of acids, alcohols, aldehydes, ether esters, hydrocarbons, ketones, VNCs, VSCs and other volatile compounds were similar in all samples tested. However, it should be noted that the detailed composition differed, for example, fecal samples were characterized by a greater variety of terpenes in the volatile fraction. In addition, fecal samples are characterized by a higher proportion of the presence of nitrogenous compounds as products of protein decomposition and alcohols as products of fermentation. In the breath, on the other hand, a higher proportion of the presence of ketones In HCA, large individual variation of VOC profiles in fecal samples is seen in both the control group and CRC patients, as evidenced by the large number of clusters on the dendrogram. This suggests the influence of many factors on the VOC profiles in fecal samples of individual subjects. The strongest correlations are positive and are associated with the presence of selected fatty acids, hydrocarbons and alcohols (figure 2). The differences in breath VOC profiles between control samples and samples from patients are smaller than those of fecal samples ( figure 3). Nevertheless, there is still a high degree of individual variability. The strongest correlations are positive and are associated with the presence of organic acids, hydrocarbons and VNCs.

Assessment of discriminating variables
Using the Shapiro-Wilk test for both the variables from the fecal and breath samples, distributions with high right skewness (positive skewness) were obtained. In addition, most of the variables failed the test due to lack of statistical significance (significance criterion: p < 0.05), which in the case of the Shapiro-Wilk test means the presence of a distribution deviating from the Gauss curve. This fact dictated the next step of the analysis-the Mann-Whitney U test to identify differentiating features.
The Mann-Whitney U-test with FDR correction resulted in 34 of the 147 variables describing fecal samples having a significant (p < 0.05) U value after correction, indicating a significant difference between HV and CRC patients. Analytes such as acetone, organic acids, VSCs, cresols, indole and some terpenes can be found. A significant number of VOCs increased in fecal samples from CRC patients. Only for a few compounds was a decrease in their number observed in samples taken from diseased persons. Regarding the breath samples, the Mann-Whitney U statistical test showed that 11 of the 97 detected compounds had the ability to differentiate the samples into two groups: healthy and CRC patients. Most of these compounds were present only in the breath samples of patients diagnosed with CRC. Only a few compounds were observed to be reduced, elevated or absent compared to control samples. Details on the results of the Mann-Whitney U test connected with FDR correction can be found in supplementary data tables S3 and S4.
Classification by DFA using the stepwise method on parameters determined by the Mann-Whitney U test with FDR correction resulted in a further reduction of variables to 9 for fecal and 6 for breath samples. The chemical compounds, marked with stars in figure 4, have significant F-values (p < 0.05). The highest discriminatory ability is expressed by the partial Wilks' Lambda.
FA combined with PCA, performed on variables selected as a result of the DFA, indicated that for both fecal and breath samples, two factors with eigenvalue higher than 1 were obtained. For fecal samples, the first factor was mainly loaded by skatole, 2-ethyl-1hexanol, and dimethyl trisulfide, while the second one by dibutyl phthalate, n-hexane, and acetone. For breath samples, the first factor was loaded by 2,6,10trimethyldodecane, 2-ethyl-1-hexanol, and acetone, while the second one by heptanoic acid and sevoflurane (figure 5).

Predictive machine learning model
Some of the VOCs present in human breath and feces, which were selected on the basis of statistical analysis, are of endogenous origin and are formed by biochemical processes. The presence of other compounds may be related to exogenous sources, involving the introduction of these compounds into the body from the environment during respiration (e.g. dibutyl phthalate and 2-ethyl-1-hexanol) [8,17]. On the other hand, sevoflurane is an anesthetic drug and can therefore be detected in the breath of patients undergoing certain medical procedures [39].   When selecting variables to be included in the formation of ANNs, it is important to consider VOCs whose origin is related to metabolic reactions. Therefore, compounds from exogenous sources (sevoflurane, dibutyl phthalate, and 2-ethyl-1-hexanol) were not included in the model. Details on the potential origin of discriminating compounds, are shown in figure 6. Meanwhile, table 3 shows the trends of discriminant variables (potential CRC biomarkers) for the breath and fecal samples tested.
An ANN system was developed in order to provide multinomial classification of fecal and breath profiles into HV or CRC patients. Figure 7 shows the generated ANN graphs for classifier variables for fecal and breath samples, where the rectangles refer to the neurons that constitute the input and output layers, and the circles refer to the hidden layer. In addition, the numbers near the edges represent the connection weights that are responsible for signal propagation in the network. As indicated by the ROC curves, the ANN classifiers for feces while treating CRC patients as a positive class, allowed the identification of healthy and CRC patients with 100% accuracy (sensitivity and specificity of the model at 100%). Based on the discriminating variables for breath samples (CRC patients treated as positive class), they were classified as the CRC patients had 94.1% sensitivity and 100% specificity.
Additionally, to compare the VOCs profiles found in exhaled breath and fecal samples, TD-GC-MS analysis of tumor tissues was conducted. An example chromatogram can be found in the supplementary data figure S5. Among the chemical compounds identified in the volatile fraction of CRC tissues, acetone was found by statistical analysis as a potential biomarker of CRC in breath and fecal samples, and n-hexane in fecal samples. This still leaves open the question of whether changes in the concentrations of certain VOCs in exhaled breath and feces are induced by vital tumor activity. The biochemical pathways of many VOCs are not yet fully understood, so living tumor tissue may affect metabolism, causing changes in the concentrations of some VOCs observed in breath and feces.
Another problematic issue is that VOCs in breath and feces include a large number of different chemical compounds, with significant overlap in health and disease, dependent also on the individual status of the human body. That in turn further contributes to the variability of results reported in different studies. Consequently, different volatile biomarkers are identified in the same disease. In addition, inconsistencies in data are observed due to the use of different sampling protocols, instrumental methods and statistical approaches, highlighting the need for standardization of such studies. This fact is reflected in the comparison of our results with studies conducted in other research centers, summarized in table 4.
For now, this makes it impossible to use VOC analysis in breath and feces in clinical practice aimed at diagnosing CRC. Nevertheless, despite difficulties in the interpretation of the results, the studies presented indicate that there is plenty of evidence to suggest that VOCs emitted from feces and present in the breath can be used to provide future perspectives in the context of personalized medicine.

Conclusions and future remarks
The proposed protocol proved to be suitable for the investigation of potential volatile biomarkers of CRC in breath and fecal samples. Simultaneous determination of VOCs in two biological matrices, such as breath and feces, may enhance the diagnostic potential of volatilome assessment in CRC. Developed ANN model provided suitable performance and CRC cases were predicted with good accuracy. This study has found compounds that are positively or negatively associated with the presence of CRC: acetone; heptanoic acid; 2,6,10-trimethyldodecane in breath samples, and n-hexane; acetone; dimethyl trisulfide; skatole in fecal samples. However, as of today, both fecal and breath samples do not provide clinically useful volatile indicators of disease in routine testing. More attention and commitment needs to be devoted to investigating and confirming the detailed origin (endogenous/exogenous, pathological/therapeutic, etc) and physicochemical properties of VOCs and their probable clinical associations with colorectal pathology. Larger groups of patients must also be included in the study, due to the complexity of the many factors affecting the fecal and respiratory volatilome. The research and analysis presented here is of a pilot, exploratory nature and is a starting point for further research.
For clinical purposes in the future, the focus should be on the ability to differentiate cancer stages. In addition, quantification is needed-how much of a given compound represents cut-off value relative to the control group. In the presented work, the upward or downward trends of the selected potential VOCs differentiating the control group and CRC patients are indicated so far. In addition, the diseasespecific mechanisms responsible for distinguishing a given colorectal pathology from interfering conditions should be further investigated.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).