Cross-validation of the peppermint benchmarking experiment across three analytical platforms

The Peppermint Experiment is a breath analysis benchmarking initiative that seeks to address the lack of inter-comparability of outcomes across independent breath biomarker studies. In this experiment, the washout profiles of volatile terpene constituents of encapsulated peppermint oil (mainly α-pinene, β-pinene, limonene and 1,8-cineole) in exhaled breath are characterized through a series of measurements at defined sampling intervals up to 6 h after ingestion of the capsule. In the present work, the Peppermint Experiment was carried out on a cohort of volunteers (n= 11) that provided breath samples in three sittings on different days (i.e. triplicates per volunteer) for concurrent analysis by three different analytical platforms. These platforms were proton transfer reaction-time-of-flight-mass spectrometry (PTR-TOFMS) interfaced with a buffered end-tidal (BET) breath sampler, gas chromatography-ion mobility spectrometry (GC-IMS) in conjunction with a compatible handheld direct breath sampler, and thermal desorption comprehensive two-dimensional gas chromatography-time-of-flight-mass spectrometry (TD-GC×GC-TOFMS) with a Respiration Collection for in-vitro Analysis (ReCIVA) system for trapping breath volatiles onto adsorbent tubes. Regression analysis yielded mean washout times across the cohort of 448 min (PTR-TOFMS and GC-IMS) and 372 min (TD-GC×GC-TOFMS), which are in good alignment with published benchmark values. Large variations in washout profiles were observed at the individuals level, both between (inter-individual) and within (intra-individual) participants, indicating high variability in the degree of absorption, distribution, metabolism and excretion of volatile terpenes in the body within individuals and across the cohort. The comparably low inter-instrument variability indicates that differences in benchmark values from independent studies reported in the literature are driven by biological variability rather than different performances between sampling methods or analytical platforms.


Introduction
The persistent challenges faced in breath research are well known and have been widely reported [1][2][3]. One particular hurdle is associated with the disparate use of a wide range of breath sampling techniques and analytical platforms for the detection of exhaled volatiles, which poses challenges in comparing research outcomes and data from independent studies [4]. Prevalent analytical technologies include gas chromatography coupled to mass spectrometry (GC-MS), direct injection techniques, such as proton transfer reaction-mass spectrometry (PTR-MS), and ion mobility spectrometry (IMS) [5], amongst others. To address the issues associated with sampling and analytical diversity within the field of breath research, a benchmarking experiment utilizing a standardized protocol was proposed as a means to compare different procedures and/or techniques [6]. This standardized approach, called the Peppermint Experiment, aims to compare and consolidate datasets from a wide range of breath sampling and analysis approaches in order to establish a set of reference benchmark values. The experimental protocol has been reported in detail in the literature [6], but in brief, the experiment involves measuring the breath washout profiles of the main volatile terpene constituents of peppermint oil over time at defined intervals after the ingestion of the encapsulated oil. Outcomes of the Peppermint Experiment have been reported for several different techniques ranging from GC-MS [7], PTR-MS and selected ion flow tube-mass spectrometry (SIFT-MS) [8], to gas chromatography-IMS (GC-IMS) [9] and secondary electrospray ionization-mass spectrometry (SESI-MS) [10,11].
The present study adopted the Peppermint Experiment to explore the degree of variability in terpene washout times from coincidental breath samples measured concurrently by three different analytical platforms. Breath samples were analysed using PTR-time-of-flight-MS (PTR-TOFMS), GC-IMS and comprehensive two-dimensional GC-TOFMS (GC×GC-TOFMS), with sampling carried out using systems compatible to the individual techniques, namely a buffered end-tidal (BET) sampler, a handheld direct breath sampling interface, and a Respiration Collector for in-vitro Analysis (ReCIVA), respectively. Breath samples were provided by a cohort of 11 volunteers in three sittings on different days (triplicate analysis per volunteer) following the defined experimental protocol of the Peppermint Experiment. Breath samples collected concurrently from each participant at the designated sampling times of the protocol were analysed by each analytical platform.

Ethics and cohort
This work was undertaken in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of Friedrich-Alexander Universität Erlangen-Nürnberg (Ethics No. 21-414-S). Informed written consent was obtained from each participant. A total of 11 participants were recruited for this study, all of whom were adults with no medical symptoms and no history of smoking. The demographics and metadata of the cohort are reported in table 1.
Volunteers were supplied with an unflavoured toothpaste (Proxident AB, Falun, Sweden) to replace their normal commercial toothpaste product for use on the morning of the test. This minimized the risk of introducing potential confounders from the toothpaste, which are commonly peppermint flavoured and thus contain similar volatile constituents to the peppermint oil. Although the volatile composition of this unflavoured toothpaste was not explicitly analysed, according to the manufacturer it is formulated without added aroma. Furthermore, an initial sensorial screening of the toothpaste prior to its use indicated an absence of perceivable odour, thus the presence of peppermint-related confounders at concentrations that would pose a risk to confound the data can be ruled out. In addition to the use of this peppermint-free toothpaste, participants were asked to avoid peppermint-containing and dairy or caffeinated products during the sampling period, but were otherwise permitted to eat and drink unrestricted throughout the day of the experiments, although half of the volunteers agreed to either not consume any food until the 285 min sampling time or fast for the entire duration of each session. Participants remained in the respective fasting or non-fasting groups throughout all sessions (replicate measurements). Furthermore, no further restrictions (e.g. implementation of a standardized diet or limitations to food intake) were prescribed. Information about food and drink intake was collected via the participant questionnaires. This study was conducted at the height of the COVID-19 pandemic, thus stringent precautions were observed to ensure the perceived safety of staff and participants. This included the use of disposable items, instrument cleaning protocols between participant measurements, and a daily questionnaire concerning symptoms of infection. Nitrile gloves and FFP-2 facemasks were worn by the study personnel when handling sampling interfaces and whenever participants were present.  [6] that was since discontinued, thus an alternative product was acquired; several capsules of the discontinued product were available from an earlier batch for comparative screening purpose (see next section), but not in sufficient quantities for use in the cohort trials (requiring 33 individual capsules).

Volatile constituents of peppermint oil
The use of an alternative peppermint oil capsule product to the type used in previously published studies in the Peppermint Experiment series dictated a need to compare the volatile constituents of both oils. This was achieved using a static headspace sampling approach. The outer casing of a single capsule was punctured with a syringe and the peppermint oil within was extracted and transferred to a 20 ml glass vial (Sigma Aldrich, Taufkirchen, Germany), which was immediately sealed using a screw-top cap fitted with a polytetrafluoroethylene-lined silicon septum. The vial was heated to 39 • C (slightly higher than body temperature) and held at this temperature for 30 min to allow for equilibration between the oil and headspace gas phases. Using a gas-tight syringe (Hamilton 1710, 100 µl, Hamilton Company, Gräfelfing, Germany), 100 µl headspace gas was then extracted from the vial and injected onto a dual-bed adsorbent tube (Biomonitoring, mixture of Tenax TA and Carbograph 5TD, total mass 200 mg; Markes International Ltd, Llantrisant, UK). The terpenes in the extracted headspace gas were analysed using GC×GC-TOFMS using the same routine as for the breath samples, as described in section 2.5. Each peppermint oil was analysed in this manner in triplicate. Background samples using empty vials were collected and analysed for comparison. Reference standards were used to verify the assignment of the features derived from the GC×GC-TOFMS analyses.

Experimental procedure
A detailed description of the Peppermint Experiment protocol, including a rationale of the experimental design, is provided in an introductory paper on this initiative [6], thus only a brief description is given here. Each participant was asked to provide an initial (baseline) breath sample and then swallow a single peppermint oil capsule (containing 200 mg oil), ingested with 150 ml still water (local tap water). Subsequent breath samples were collected and analysed at 60, 90, 165, 285 and 360 min after capsule ingestion. Each participant carried out the trial on three separate occasions (i.e. sittings on three different days) in order for the intra-individual variability to be explored in addition to the interindividual differences (i.e. within-cohort variability). Samples of ambient air from the room of the experiments were collected at the end of each session day. Figure 1 depicts the study design, indicating the breath collection intervals, the sequence of sampling and analysis for each analytical platform, and an overview of how the inter-instrumental, inter-individual and intra-individual datasets were derived.

Breath sampling and analysis
Each participant was instructed to provide consecutive breath samples across the three analytical platforms, with each breathing manoeuvre timed to ensure a short and reproducible sequence within and across the sessions. These were carried out as follows: 1. three consecutive ∼10 s tidal volume exhalations into a BET sampler that was directly interfaced to the PTR-TOFMS instrument for online analysis (total sampling time: <1 min, including pauses between deep exhalations) [8,12]; 2. a single 12 s deep exhalation into the GC-IMS breath sampling interface [9]; and 3. continuous breathing into the ReCIVA sampling device [7] for the collection of 500 ml endtidal breath onto adsorbent tubes, for subsequent analysis by two-dimensional gas chromatography (TD-GC×GC-TOFMS; sampling time: ca. 6-8 min).
Details of sampling methods, analytical platforms and their operating conditions, as well as the analysis procedures, are as follows.
PTR-TOFMS. A BET sampler was interfaced to a PTR-TOF 8000 instrument (both IONICON Analytik GmbH, Innsbruck, Austria) for direct analysis of the end-tidal phases of exhalations. A breath sample was supplied by the participant by exhaling slowly for approximately 10 s through a one-way disposable mouthpiece. The BET sampler and its transfer line to the analyser were heated to 70 • C, with sampling into the instrument at a flow rate of 70 ml min −1 . The PTR-TOFMS was operated with drift tube (reaction chamber) conditions of 2.2 mbar, 60 • C, and 555 V (extraction voltage 34 V), establishing an E/N of 122 Td (1 Td = 10 −21 V m −2 ). The continuous analyses were performed with a cycle time of 1 s across a mass spectral range of m/z 20-210. Due to the nature of sampling via the BET system, ambient air is automatically analysed in the intervals between exhalations and can thus be accounted for.
GC-IMS. A compatible handheld interface for direct breath sampling and immediate analysis was connected to a BreathSpec GC-IMS (both G.A.S. Dortmund mbH, Dortmund, Germany). A breath sample was provided by a participant by deep exhalation through a disposable one-way mouthpiece/breath reservoir tube. The measurement sequence of the GC-IMS instrument was initiated 2 s into the exhalation in order to exclude deadspace gas from the analysis. The transfer line from the mouthpiece to the GC-IMS was heated to 60 • C. GC pre-separation of volatiles in the breath samples was achieved through use of an MXT-200 capillary column of 30 m length and 0.53 mm internal diameter, with a 0.50 µm trifluoropropylmethylpolysiloxane stationary phase layer. The GC column and IMS were supplied with purified air or nitrogen gas (purity 5.0; Linde GmbH, Pullach, Germany), respectively. The column was operated at 45 • C, whereas the six-port valve, the GC-to-IMS transfer line, and the IMS detector itself were all held at 60 • C. The IMS drift gas flow was set to 150 ml min −1 . The six-port valve was set to the loading position A, then switched to the injection position B to start the analysis after the 10 s sampling time. Blank samples were measured by injecting 1 ml water vapour into the system in the same manner as performed for the calibration (see supporting information (SI)).
GC×GC-TOFMS. End-tidal breath was collected onto a single adsorbent tube per sampling interval using a ReCIVA sampling device (Owlstone Medical Ltd, Cambridge, UK); the other three tube ports of the ReCIVA were fitted with solid metal rods to limit the sampling to just one tube (see below). Instead of using a conventional silicon mask to interface with the participant, a hydrothermally-treated 3D-printed mouthpiece adapter was used in conjunction with a disposable pulmonary function (sterility) filter; the suitability of this configuration with the ReCIVA device compared with the silicon mask has been explored and reported in two recent publications [13,14]. The capnography-controlled end-tidal breath sampling of the ReCIVA was configured to sample 500 ml total breath volume onto the adsorbent tube at a rate of 200 ml min −1 . Preliminary investigations did not identify correlations between tube placement and signal responses, and previous studies observed no statistically significant differences amongst banks of the ReCIVA system [15], thereby supporting the decision to use only a single tube per breath collection interval (further details are provided in the SI). The ReCIVA was supplied with clean air via a dedicated clean air supply pump (CASPER; Owlstone Medical) at a flow rate of 40 l min −1 . Background (ambient) air samples were collected at the end of each sampling session. Additional system blanks were collected via connection of the ReCIVA to the CASPER system. After sample collection, the adsorption tubes were loaded onto a TD unit in combination with an autosampler (Unity-XR and Ultra-XR, respectively; both Markes International) coupled to a GC×GC-TOFMS instrument (Agilent 8890 gas chromatograph; Agilent Technologies, Palo Alto, CA, USA; flow modulator featuring a 22 µl sample loop, SepSolve Analytical Ltd, Peterborough, UK; BenchTOF-Select mass spectrometer; Markes International). A 20 m × 0.18 mm ID MEGA wax column with film thickness 0.18 µm ( 1 D; MEGA S.r.l., Legnano, Italy) and a DB-35 ms column with dimensions 4.6 m × 0.25 mm with film thickness 0.15 m ( 2 D; Agilent Technologies) were used for chromatographic separations. The detector was operated in electron ionization mode at 70 eV, with a scan range of m/z 40-350 at 50 Hz. Regular quality control using terpene standards were run between sample sequences to check and accommodate for potential inter-batch variations. Detailed information about tube conditioning and storage, as well as the TD-GC×GC-TOFMS data acquisition, is provided in the SI.

Data processing and washout calculations
The analytical data were processed according to the respective conventional approaches, as follows: PTR-TOFMS. The mass spectra obtained every second by PTR-TOFMS were integrated and subsequently processed using the PTR-MS Viewer software (v3.4; IONICON Analytik). Raw data were collected as signal intensities in counts per second (cps), which were normalized to the reagent ion (H 3 O + ) signal (ncps) and converted to concentration based on the compound-specific sensitivity values determined by calibrations (see SI). Mean concentrations of the volatile terpenes were calculated from the plateaus of the three end-tidal exhalations for each sampling interval, which were subsequently used to characterize their washout kinetics. The terpenes are reported as collective sums due to overlapping ion signals and the resulting inability to separate the individual volatiles, as previously reported [8].
GC-IMS. The GC-IMS data were processed to identify, extract and integrate the GC-IMS peak volumes for features of interest using the VOCal software (v0.1.3; G.A.S Dortmund). The reactant ion peak with known mobility was used as an internal ion mobility standard to compensate for run-to-run instrument variability in drift time measurements. This was achieved using a normalisation function according to conventional methods described in the literature [16,17]. The identities of the detected terpenes were confirmed by comparing the retention and drift times with the literature and reference standards.
GC×GC-TOFMS. The GC×GC-TOFMS data were processed using the ChromSpace software (v2.1.3; Markes International). Details on the data analysis procedures are provided in the SI. Mass spectra and retention indices were compared with the literature and reference standards in order to confirm the identities of the detected compounds.
Washout times. The linear regression models used to determine the washout times for comparison with the literature benchmark values were calculated as described in the protocol paper [6]. The log-fold changes in concentrations (c) in relation to pre-ingestion concentrations (c 0 ), i.e. log(c/c 0 ), were plotted over time for each data series using logminutes for the sum of terpenes (α-pinene, β-pinene and limonene; plus 1,8-cineole for the PTR-TOFMS datasets). These models were used to calculate the washout values for each dataset of all participants and replicates for a single platform by taking the lower 95% confidence interval (CI) of the time for the respective signal to return to its pre-ingestion level, i.e. the intercept of the signal on the time axis of the log-log plot (see later; figure 3). Specifically, the lower 95% CI indicates the shortest time after capsule ingestion at which the individual peppermint compounds can no longer be detected in breath, i.e. when the washout results in exhaled concentrations dropping below the pre-ingestion baseline level of the respective system. Overall washout values were calculated in the same way by combining all datasets that met the inclusion criteria. The inclusion criteria for datasets of individual participants was a washout profile that contained at least three data points, i.e. a washout peak at either t = 60, 90 or 165 min. Reasons for an individual participant not meeting this criterion were a delayed or late washout with respect to the sampling protocol, with a corresponding absence of follow-up samples beyond the protocol (i.e. after t = 360 min). All data visualizations were made using Origin 2022b (OriginLab Corporation, Northampton, MA, USA).

Volatile constituents of peppermint oil capsules
The volatile constituents of the new peppermint oil capsule used in this study were dominated by αpinene and β-pinene (relative abundances: 28.8% and 24.5%, respectively), followed by limonene and menthone (20.1% and 18.6%), with minor contributions from menthol and 1,8-cineole (6.8% and 1.2%), as determined by peak abundances from the TD-GC×GC-TOFMS analysis and in relation to reference standards; table 2 indicate the retention times of the individual compounds in each dimension. The peppermint oil contained similar major volatile terpene constituents to those reported previously, including the original capsule used in the earier studies [18][19][20][21], albeit at varying abundances. The relative abundances of α-pinene, β-pinene, limonene, menthone, menthol and 1,8-cineole in the peppermint oil of those capsules were 6.4%, 6.6%, 12.9%, 28.1%, 7.9%, and 38.1%, respectively, i.e. 1,8-cineole was the dominant compound, followed by menthone. Full details of the capsule comparison are provided in table S1. This altered terpene profile has slight ramifications for the benchmarking experiments, as only αpinene, β-pinene and limonene were observed consistently across all breath samples in the present study using the new capsule, compared with menthone,  [7]. These additional compounds are not considered in the present paper but might be of interest in future studies on the pharmacokinetics of volatile peppermint oil constituents and their metabolites. For the PTR-TOFMS data, ion signals at m/z 81.070 and m/z 137.132 were assigned to a combination of the monoterpenes (limonene, α-and βpinene) and a presumed minor contribution from 1,8-cineole (see table 2). A preliminary study on peppermint oil terpene constituents by Malásková et al reported similar findings, albeit relating to the original encapsulated oil product, which support the signal assignments used in the present work [21]. Analyses by GC-IMS revealed that βpinene and α-pinene concentrations in the reference standards were sufficiently high to produce monomer (linear M 1 and cyclic M 2 ) and dimer (D) ions, whereas limonene and 1,8-cineole signals were obtained only in the monomer region. By comparison, only limonene, β-pinene and αpinene monomer signals were observed in exhaled breath across all measurements. Consequently, in this work monomer responses were used for quantitation, although the related ion assignments should be considered tentative.

Terpene washout
The washout profiles of the sum of terpenes within and across the cohort, i.e. the individual replicates and group means, respectively, are presented in figure 2 for the individual analytical platforms, as well as for the consolidated mean data for all three platforms. The plots comprise 21 washout profiles for PTR-TOFMS and GC×GC-TOFMS, and 19 profiles for GC-IMS (two datasets from the latter were excluded due to sampling issues). In general, typical washout profiles exhibited a maximum in the relative fold-change in terpene concentrations at either t = 60, 90 or 165 min post-ingestion. This washout profile follows a power relationship that can be transformed into a linear form, as described previously [6].
In the present study, on account of the large intra-individual variations observed across replicate measurements of each participant, each replicate is considered as an independent dataset. The datasets of four volunteers were excluded due to atypical washout patterns that did not fulfil the criteria for further processing, i.e. profiles that could not be characterized by a linear regression. Specifically, atypical washouts were those with delayed peaks (after 165 min post-ingestion), as well those exhibiting concentration fluctuations relating to dietary confounders. Individual cases of deviating washout profiles coincided across all platforms, thereby indicating that these observations related to physiological effects rather than fluctuations in the performance of the individual instruments. Consequently, in such cases data from all three platforms were excluded for further processing. Use of participant replicates as independent datasets, and exclusion of the data of four participants, yielded a data series of n = 21. Peak concentrations of the terpenes coincided across all three analytical platforms in 17 of 21 datasets (81%), albeit with different fold-changes. In the remaining four datasets, the discrepancies between the platforms to detect the washout maxima were on account of similar concentrations at the 60 and 90 min data points. Baseline concentrations in breath were close to or above the extrapolated limits of detection for each platform, with concentrations estimated to be on average 5.45, 8.17 and 5.94 ppb for PTR-TOFMS, GC-IMS and GC×GC-TOFMS measurements, respectively.
The peak washout concentrations of terpenes in the PTR-TOFMS datasets were most frequently

Washout times in comparison to benchmark values
The mean washout times for the sum of terpenes were calculated for the individual analytical platform datasets (i.e. PTR-TOFMS, GC-IMS, or GC×GC-TOFMS) as well as for the consolidated datasets across all platforms using linear regressions of the log-log plots, as shown in figure 3.
In the Peppermint Experiment, the lower 95% CI of the x-axis intercept of the washout function is used as the metric to determine washout values. The washout values derived from the mean datasets for each analytical platform are summarized in table 3. Data from participants with a maximum exhaled terpene concentration at either 60 min, 90 min or 165 min were used to generate the model. The mean washout time derived from the PTR-TOFMS datasets was 448 min post-ingestion, with an x-axis intercept of 531 min. The mean GC-IMS data returned an identical washout time of 448 min, with an average mean washout value of 502 min. The data from GC×GC-TOFMS yielded a slightly shorter washout time of 372 min, with an average mean washout at 423 min. These washout values are well aligned with the benchmark values reported in the literature, also under the consideration that there was a slight offset in terms of the sampling times resulting from the analytical procedure. Specifically, the slower ReCIVA sampling procedure (6-8 min) relative to those for PTR-MS and GC-IMS sampling (ca. 1 min and 12 s, respectively) could have had an impact on the washout times derived from the GC×GC-TOFMS datasets. First, this method was last in the series (see figure 1), but the data were nevertheless allotted to the defined sampling times for ease of comparison, thus sampling via the ReCIVA for GC×GC-TOFMS analysis was slightly delayed relative to those assigned times. Secondly, and more importantly, potential quantitative changes in exhaled terpenes over the longer sampling period for GC×GC-TOFMS (ca. 6-8 min) could result in slightly different concentrations in these data relative to the preceding direct sampling approaches of PTR-TOFMS and GC-IMS. This might especially impact the concentrations determined at the later sampling times when the abundances of terpenes in breath are continuously decreasing (i.e. higher in the first minute of sampling compared to at the end, 6-8 min later), thus yielding overall reduced composite concentrations and an apparent faster washout for the GC×GC-TOFMS datasets. Despite these considerations, the sampling order (PTR-TOFMS, then GC-IMS, then GC×GC-TOFMS) was deliberately kept consistent (and not randomized) throughout the study in order to maintain the closely proximity of sampling between the three approaches. In the present protocol, the briefest sampling was achieved by GC-IMS (12 s), which bridged the interval between the preceding sampling via PTR-TOFMS (ca. 1 min) and subsequent sampling via ReCIVA for GC×GC-TOFMS analysis (ca. 6-8 min), thereby offering the best alignment of breath datasets between each independent approach.
The study by Henderson et al that used PTR-MS [8] reported a mean terpenes washout benchmark of Table 3. Mean washout times obtained from plotting average logarithmic washout curves of terpene peak signal fold-change versus time to washout for each dataset. It should be noted that all studies hitherto published on the Peppermint Experiment commonly utilized a different peppermint oil product to the one used in the present work, which deviated in its terpene constituents compared the type used here (as reported in section 3.1). Moreover, in this study the data are presented as the sum of all terpenes, whereas some of the Peppermint Experiment studies reported data on the individual terpenes, with washout profiles observed to vary depending on their individual adsorption, metabolism, distribution and excretion (ADME) [8,10]. Consequently, deviations in the time required to return to baseline values between the present work and the published literature were expected.

Biological variability
The washout profiles of individual replicates as well as across the cohort indicated a high degree of variability, both intra-and inter-individual, as is evident in figure 2. At the individual level, no consistent behaviour was observed: in some cases, the maximum concentration of exhaled terpenes ranged from 60 to 165 min over the three different sessions (biological replicates), whereas in other cases these maxima coincided at the same washout time for each replicate. Similar observations of intra-participant variations were reported in previous studies [11,22]. This intraindividual variability dictated similar observations for inter-individual differences, as can be derived from the high relative standard deviations (RSDs) of the consolidated data at each sampling time point (see table S4).
The observed intra-and inter-individual variability in washouts are associated with the pharmacokinetics of the terpene constituents of the peppermint oil supplement. The peppermint oil capsules are administered orally, thus their bioavailability, in combination with other factors, such as physiology and environment, will influence the metabolic processes of the volatile oil constituents, and accordingly also their washout behaviours [23]. Although these factors were not specifically investigated in this study, the data suggests that they contribute to the observed variability in terpene washout profiles. The abstinence of food intake by some participants in the present trials reveals trends in relation to satiety and washout times, whereby exhaled terpene concentrations returned to baseline levels quicker in participants who fasted (participants 1, 3, 6 and 7) compared to those who consumed food during the study (2, 4 and 5). This could indicate that consumption of food (i.e. degree of satiety) influences ADME of the peppermint terpenes, although a larger sample size would be required to draw statistically meaningful conclusions for this observed phenomenon.

Instrumental variability
A primary goal of this study was to explore the impact of using different sampling and analytical methods on the washout profiles. The outcomes highlight how changes in the analytical configuration may be objectively evaluated through the Peppermint Experiment. A linear regression of the washout profiles was selected as a benchmarking method to capture information about all aspects of sampling and analytical methodologies, as discussed in the protocol paper [6]. In this study, the slope of the regression curve (β 1 ) reveals information about instrumental variation (presented in table S5), as the cohort across all three platforms was identical. Notably, as only a relatively small number of participants and datasets are included in the present study, each dataset can have a large effect on the overall washout times. Correspondingly, it is important to note that the washout times reported herein may act as a basis for the future addition of datasets and a subsequent recalculation to derive new benchmark values, as is conceptually foreseen with the Peppermint Experiment initiative.
Comparing the respective β 1 of the regression curves, it becomes evident that the BET-PTR-TOFMS and ReCIVA-GC×GC-TOFMS methods yielded similar sensitivities; by comparison, the sensitivity of GC-IMS method was slightly lower, which reflects the limits of detection determined for each method (table S6). Furthermore, exhaled peppermint terpenes measured by PTR-TOFMS returned to baseline levels slower than for the other two methods, reflecting the lower limit of detection for this system. The intra-participant correlation coefficient (R 2 ) values for the terpene washout model were typically greater than 0.90 (data not shown), with the aggregated data across each analytical platform yielding R 2 values of 0.99 for both PTR-TOFMS and GC-IMS, and 0.98 for GC×GC-TOFMS, respectively (see table S5). This indicates a high degree of reproducible sampling and analytical precision. The incorporation of intra-and inter-participant variability into the design is helpful for the different platforms to be tested against a range of participant types such that the outcome is not unduly affected by the results from any one individual.
Overall, inter-individual differences, indicated by high RSDs (listed in table S4) were in a similar range between the platforms. Thus, biological variability was large in comparison to the influence of sampling and analytical procedures, as previously indicated by Henderson et al [8].

Study limitations and suggested improvements
The Peppermint Experiment was originally proposed as a benchmarking concept to facilitate comparisons in breath sampling and analysis procedures across different laboratories and thereby provide a metric for quality assurance in breath datasets. Since its initial introduction, the experiment has been performed by numerous independent research laboratories, whereby several challenges as well as shortcomings in the protocol have become apparent. In particular, larger sample sizes consisting of different cohorts would allow the extensive variations in inter-and intra-individual washout profiles to be explored in greater detail. The complex composition of the peppermint oil capsule containing several volatile terpenes results in a broad biological variability in their metabolism, on the one hand with ADME between individuals exhibiting high variability, and on the other hand, with individual terpenes following different biological pathways in the body. These pharmacokinetic differences were indicated in this study by the different time-to-peaks within replicate measurements performed by the same individual and between participants. However, the degree of variation needs to be further investigated in future work in order to better characterize influencing factors and gain knowledge on how these can be minimized or accounted for.
The present study expanded upon the original Peppermint Experiment by introducing triplicate sampling sessions per participant and using three analytical platforms for each sampling point for comparative analysis of breath samples. The ensuing data reveal a number of aspects of the experiment that could be optimized to improve its suitability as a benchmarking procedure. First, imposing stricter dietary restrictions, for instance, by enforcing a standardized diet prior to the measurement day and/or prohibiting food intake during the study, would limit the number of datasets that do not fulfil the inclusion criteria. The latter is especially important; in the present study there were several cases in which the washout profiles of individual target compounds exhibited additional fluctuations relating to food/beverage intake by the participant during the trial. The additional presence of these target compounds from dietary sources masked the typical washout kinetics from the peppermint oil alone, thus requiring that these datasets be excluded for further processing. Compliance of prior restrictions, e.g. food/beverage consumption or teeth brushing before the study commences, could be similarly more strictly assessed to avoid related confounders. Second, the use of a larger cohort than the standard n = 10 would provide statistical power to the ensuing washout values. Third, although not directly derivable from the present data, introducing shorter sampling intervals (more frequent sampling times), especially in the timeframe where the most significant increase is expected (between 0 and 165 min post-ingestion) would be beneficial in order to determine the true maximum exhaled concentration of terpenes, as discussed in a previous study [22]. At the same time, extending the trial beyond 360 min post-ingestion, e.g. to 480 min, would accommodate for delayed washout of specific terpenes in some individuals and allow corresponding datasets to be included in calculations of group washout values. Any potential extension of the sampling period beyond 6 h, however, must be balanced with the additional burden on participants, especially if fasting conditions are to be observed. Finally, future studies could randomize the sampling order across the three platforms to explore any potential bias imposed by the sampling sequence on the respective datasets and ensuing washout values.
As a final point of consideration, differences in the terpene compositions of different peppermint oil capsule products-both in terms of the compounds and their concentrations-will have an impact on the washout kinetics. Given that products vary between manufacturers and countries, as was observed here between the new peppermint oil capsule and the discontinued product used in previous studies, this limits the degree of comparability between studies using different products. Although washout profiles should generally follow similar kinetics, doseresponse effects could affect washout times. This presents a challenge in inter-comparability between independent trials that should be addressed in dedicated future studies of the Peppermint Experiment.

Conclusion
Despite the attractive notion of breath-based diagnostics, the development and implementation of viable and effective tests has been limited, to date. A major challenge in breath research is aligning outcomes of studies performed using different sampling approaches and/or analytical platforms, making direct comparisons difficult. The Peppermint Experiment was proposed to provide a standardized procedure and numerical metric by which method performances may be assessed in relation to community benchmark values. In this work, the Peppermint Experiment was carried out using three different analytical platforms to determine the intercomparability of data outputs from common breath samples, reflecting current efforts in the field of breath research. Following ingestion of a peppermint oil capsule, the exhaled concentrations of the terpene constituents reached their peak between 60 and 165 min post-ingestion. Despite the differences between sampling and analysis techniques, all platforms returned comparable precision (R 2 of 0.98-0.99) and similar sensitivities (−1.29 ± 0.07, −1.03 ± 0.04 and −1.36 ± 0.08 for PTR-TOFMS, GC-IMS and GC×GC-TOFMS, respectively). Mean washout values derived from the respective PTR-TOFMS, GC-IMS and GC×GC-TOFMS datasets were 448 min, 448 min and 372 min, respectively.
A major observation in this work was the high degree of inter-and intra-individual variability in washout profiles, which were large compared to inter-instrumental variations. This suggests biological variability, specifically the pharmacokinetics, plays a dominant role in the ensuing washout values, due to differences in how the peppermint oil and its constituents are metabolized and excreted from the body via breath.
The present study highlights the need for further investigations and developments in the use of the peppermint protocol as a benchmarking tool in order to verify these preliminary benchmark values, with additional work needed to fully describe the effect of biological variation observed and the influence of individual factors, such as food intake, diet, and physiology. Overall, this study indicates that breath volatile concentrations measured by different analytical platforms are highly comparable when adequate sampling methods and quality assurance procedures are in place. The observed large intra-and inter-individual variations underline the challenges in breath research relating to biological variability and support the concept of adopting longitudinal breath analyses for prospective breath tests, whereby individuals act as their own control to monitor perturbations in breath volatile profiles over time.

Data availability statement
The data cannot be made publicly available upon publication because they are not available in a format that is sufficiently accessible or reusable by other researchers. The data that support the findings of this study are available upon reasonable request from the authors.