Applied machine learning for stroke differentiation by electrical impedance tomography with realistic numerical models

Electrical impedance tomography (EIT) may have potential to overcome existing limitations in stroke differentiation, enabling low-cost, rapid, and mobile data collection. Combining bioimpedance measurement technologies such as EIT with machine learning classifiers to support decision-making can avoid commonly faced reconstruction challenges due to the nonlinear and ill-posed nature of EIT imaging. Therefore, in this work, we advance this field through a study integrating realistic head models with clinically relevant test scenarios, and a robust architecture consisting of nested cross-validation and principal component analysis. Specifically, realistic head models are designed which incorporate the highly conductive layers of cerebrospinal fluid in the subarachnoid space and ventricles. In total, 135 unique models are created to represent a large patient population, with normal, haemorrhagic, and ischemic brains. Simulated EIT voltage data generated from these models are used to assess the classification performance of support vector machines. Parameters explored include driving frequency, signal-to-noise ratio, kernel function, and composition of binary classes. Classifier accuracy at 60 dB signal-to-noise ratio, reported as mean and standard deviation, are (79.92% ± 10.82%) for lesion differentiation, (74.78% ± 3.79%) for lesion detection, (77.49% ± 15.90%) for bleed detection, and (60.31% ± 3.98%) for ischemia detection (after ruling out bleed). The results for each method were obtained with statistics from 3 independent runs with 17,280 observations, polynomial kernel functions, and feature reduction of 76% by PCA (from 208 to 50 features). While results of this study show promise for stroke differentiation using EIT data, our findings indicate that the achievable accuracy is highly dependent on the classification scenario and application-specific classifiers may be necessary to achieve acceptable accuracy.


Introduction
RELIABLE stroke type differentiation is the first step in planning treatment for stroke.Faster differentiation leads to faster treatment, which leads to better patient outcomes [1].
Stroke occurs when blood flow in the brain is interrupted.Stroke is divided into two types based on the two fundamental causes of blood flow interruption.The more prevalent type, comprising ∼85% of cases, is ischemic stroke [2].In the case of acute ischemic stroke (AIS), a blood vessel (usually an artery) is obstructed by thrombosis or embolism [3], resulting in tissue ischemia.Contrarily, acute hemorrhagic stroke (AHS) is caused by a ruptured vessel, usually due to aneurysm or arteriovenous malformation [4].
The result of the rupture is reduced local blood pressure, increased intracranial pressure, and pooling of blood in the brain [4].
The standard of treatment for AIS is thrombolytic therapy with recombinant tissue plasminogen activator (tPA) [5,6].However, of the eligible AIS patients who arrive for medical attention within 2 h of symptom onset, 25% do not receive tPA within the directed treatment window [7].While the treatment window has been extended to 4.5 h from symptom onset [5], the most important exclusion for tPA treatment is delay in arrival for medical attention [8], with as high as 69% of AIS patients in one population arriving too late to meet eligibility [9].
Crucially, tPA is contraindicated in AHS patients and can worsen outcomes [8].Therefore, it is critical that bleeding in the brain is rapidly and confidently identified or excluded.Clinical presentation is not enough to determine the type of stroke, so imaging is needed [3,10], and at present there is no method for rapidly detecting stroke and differentiating stroketype.Magnetic resonance imaging (MRI) and x-ray computed tomography (CT) are slow and expensive [11], and CT suffers from a delay (3+ hours) in the exhibit of ischemic stroke markers [12].Some early-stage technologies have been investigated to date for rapid stroke triage in an ambulance.For example, mobileCT [13], transcranial ultrasound [14], and microwave-based imaging [15] have all been studied.However, limited development and differentiation capability coupled with high costs have inhibited the adoption of these platforms.To this end, electrical impedance tomography (EIT) is an emergent imaging modality that has demonstrated potential in this area [16].It relies on the difference in electrical conductivity between tissues of different types.For stroke in particular, the higher conductivity of blood relative to healthy brain tissue may enable the detection of AHS, and the reduced conductivity caused by blood flow restriction may support detection of AIS.
EIT hardware is compact, low complexity, and low cost; particularly compared to the present leading stroke imaging modalities.These advantages enable portability and accessibility and thereby stroke detection and differentiation in non-hospital environments such as ambulances.Furthermore, EIT measurements are quick (on the order of minutes) and could be used for regular or constant monitoring due to the absence of ionizing radiation.While EIT has some limitations (like low image resolution and low sensitivity to the phenomena of interest) [17], recent advances in machine learning (ML) methods can support data interpretation without the need for an image and aid decision-making by uncovering subtle underlying patterns [18].Although ML techniques are not the sole strategy for stroke differentiation using absolute EIT data, the severely ill-posed nature of the problem makes relying solely on human interpretation an unnecessary complication that has been shown to be eased by the support of ML [19,20].As ML classifiers can leverage subtle patterns in measurement data that may be imperceptible by human intervention alone, ML techniques may offer the best possibility of improved screening accuracy.
We identified 22 studies related to EIT-based stroke detection ; 8 of which investigated some form of stroke differentiation using an ML classifier [21][22][23][24][25][26][27][28].Out of these works, none attempted classification using models with more than 4 layers and more than 4 lesion locations or studied the impact of including ventricles in the model.Only one considered more than one case of binary class composition [24].At the time of this work, [28] was the most advanced study to date on stroke classification using raw EIT data.
To better assess the feasibility of EIT data for stroke detection and differentiation, we provide an advanced numerical study with attention dedicated to parameters not previously considered in numerical assessment of EIT for stroke.The present study: (i) provides the most realistic numerical phantoms to date used in EIT-based classification studies by incorporating the ventricles along with the cerebrospinal fluid (CSF); (ii) utilizes a more comprehensive set of 16 stroke locations, including very small (<5 ml) volumes and deep brain sites; (iii) quantitatively compares four imaging frequencies, five noise levels, and ML considerations including kernel function and PCA; and (iv) uniquely considers all four clinicallyrelevant methods of binary classification.These advances are important because the realism and representative nature of numerical models is vital to assessing their utility in real-life scenarios.Furthermore, understanding how the composition of the binary classes for ML classification affects accuracy, sensitivity, and specificity supports identifying the appropriate use case for EIT measurements in the stroke patient pathway.
This paper is organized as follows.Section 2. describes the design of each layer of the 3D models in detail.In section 3, methods for the production and processing of the simulated EIT raw data are presented, followed by explanation of the binary classification architectures.Section 4. follows with analysis of the classification performance across a number of variables.Finally, a summary and concluding remarks are found in section 5.

Model design
Models play a crucial role in feasibility and performance assessment for imaging methods and classification algorithms.In general, anatomical accuracy and diverse, well-bounded variation are important criteria in human model development to better approximate performance in a real-life setting.Work in [42] demonstrated that model complexity and the precision of representation of the fine morphology within the head can have dramatic effects on EIT measurements.For example, nonlinear effects on the sensitivity with respect to lesion location are observed between models with differing layer complexity.
We note that some prior EIT studies have produced complex head models (e.g., with more than 5 layers [29,30,32] or up to 12 lesion locations [41]); however, these did not attempt stroke classification.Additionally, aside from the early pilot studies [29] and [30], the ventricles have been largely overlooked in head models for EIT despite their complex structure and substantial conductivity relative to other intracranial tissues.Therefore, in this work, we aim to introduce head models that are realistic and diverse enough to enable classifier generalizability.To help fulfill these criteria, 3D model layers derived from structural MRI scans were employed for anatomical accuracy.
The head and brain stereolithography (STL) files from [43] and [44] respectively were adapted using computer-aided design software as detailed in this section.To accurately capture the variation in tissue morphology in the head, these model layers were modified based on statistical studies from literature.Finite element models (FEM) were generated using the software package Electrical Impedance Tomography and Diffuse Optical Tomography Reconstruction Software (EIDORS) [45] aided by Gmsh [46] and Netgen [47].The layer permutation and FEM generation process is based on the workflow initially provided in [28] which is built on EIDORS with support from [48] and [49].Significant development of this framework was executed to satisfy the needs of this study.

Anatomical structure and model variants
The anatomical structure of the human head is affected by some intrinsic factors, namely sex and age.For example, brain volume in women is comprised of a higher percentage of gray matter and a lower percentage of CSF than in men [50].While total volume and relative size of some features are the most prominent sex-based characteristics, higher healthy ventricle volume variance is reported in males [51,52].Furthermore, intracranial anatomy is notably impacted by aging, largely as a decrease in gray and white matter volume [52,53].Brain anatomy is highly symmetric in general [51], and individual variation in healthy anatomy varies substantially in some cases [51].
Although it is unfeasible to replicate the breadth of human anatomical variation, it is modeled by appropriate means and ranges for all tissues.To maximize generalization of the model, three model variants of different sizes were generated for each layer to comprise each layer group (nine for the outermost).One combination of a single layer from each group comprises one model set or permutation.A visual representation of these definitions is depicted in figure 1.For all layer groups the smallest variant was designed to approximate the volume one standard deviation below the female average; the largest, one standard deviation above the male average; and the baseline, either the overall mean, or a volume bisecting that of the small and large.The scale between layer variants within each group varies by tissue type and each is addressed in the following subsections.
Ultimately, there are 135 healthy permutations that may represent individual patients with different anatomies.There are 2,160 lesion variants, including those modeling bleeds and clots of different sizes, and at different locations.
The following subsections present measured anatomical properties cited in literature for each tissue type followed by the approach used in this study to generate models which reflect these properties.All model layers were scaled to 59.60% by volume (∼83.87% in each dimension) to reduce computation complexity in the FEMs for faster simulation, as in [35].Computation time is thus reduced as the number of tetrahedra (between 270,000 and 298,000 depending on permutation) is determined by setting the maximum size of the elements in EIDORS.

Scalp and skull aggregate
The decision to aggregate the scalp and skull layers was made to concentrate model complexity and variance on internal structures, as it has been shown numerically that internal tissue conductivities can be accurately retrieved despite the independent impacts of the skull and the scalp [54].Furthermore, a preliminary analysis that compared simulated voltages generated from models utilizing an aggregate layer with models using discrete layers for scalp and skull demonstrated that the primary impact is an attenuation factor.This simplification has also been applied in other studies [27,28,38].As these layers are close to the measuring electrodes, employing designated layers for scalp and skull should be considered in future studies to further increase the approximation to reality.
The head STL file (originally adapted from a polygon mesh [43]) was modified to generate the baseline head layer with approximately average dimensions.From a sample of >2,200, the length and width of human heads are (19.52 ± 0.92 cm) and (15.14 ± 0.69 cm) respectively, with only very small differences in mean when considering sex or race [55].Two head layers of different sizes were generated from the baseline head model: one larger and one smaller as outlined above.Finally, six additional head layers of different shapes were generated by scaling the baseline model in single dimensions as in [28], resulting in 9 total head variants.

Brains
Adult mean total brain volume is approximately (1260 ± 115 ml) in males, and (1130 ± 99 ml) for females excluding the volume of the CSF and ventricles [51].These mean values are relatively well established, but as with most anatomical estimates, there is a range of reported values.
The baseline brain STL was produced by reducing the mesh of a 3D file generated from structural MRI [44] as in [28].This baseline model was scaled up 5% and down 5% in all dimensions to generate two variants.The realistic volume of the baseline brain is 1170 ml including the ventricles.

Cerebrospinal fluid and ventricles
The primary consideration behind including cerebrospinal fluid (CSF) in the model is its experimentally known and relatively high conductivity.Due to this high conductivity material surrounding the brain, some of the injected current is shunted around the brain and does not contribute to the measurement of tissue conductivity within the brain.Other studies have stressed the value of including CSF in models where tissue conductivity impacts measurements [35,42,54,56].
The thickness of the CSF in the SAS varies widely due to the folded surface of the brain.With a sample of structural MRI from 23 healthy subjects (mean age: (25.0 ± 2.8 years)), CSF thickness is estimated as 1.1 mm to 8.4 mm one standard deviation (SD) from sample mean with an overall mean thickness of 4.2 mm for the portion of the forehead analyzed in [58].
To model CSF in the SAS for each brain variant, the respective brain model was scaled up to result in a shell with volume of ∼100 ml at full-scale.The mean thickness of the full-scale baseline CSF layer is 3.4 mm (model depicted in figure 2).This approximation neglects the inhomogeneous distribution of CSF in the SAS.Each of the three CSF variants is paired with its respective brain variant for all simulations to ensure a realistic CSF volume for all permutations.This layer therefore does not increase the overall number of model permutations.
The baseline ventricles were modeled from the same MRI-derived brain model [44].The cerebral aqueduct and interthalamic adhesion are neglected due to the narrow dimensions of these features.Two size variants were produced by scaling to ±20% in all dimensions.The approximate total realistic volumes are 9.8 ml, 19.2 ml, and 33.1 ml, showing agreement with mean and SD estimates from [51].For another degree of diversity, the baseline ventricles were rotated in the coronal plane clockwise and counterclockwise to simulate ventricular positional asymmetry.The degree of asymmetry is exaggerated from typical variation [51] to explore the possible effect in later studies.The ventricles layer group is depicted in figure 3.

Lesions
Clinically informed factors for lesion shape, location, and volume were considered when generating the lesion models as discussed in [59].While similar studies typically employ 6 or fewer locations [23][24][25][26][27][28], this limits the scope and generality of the classifier.Furthermore, it is common to explore only shallow or large-volume lesions, which does not thoroughly represent realistic stroke cases that a detection and monitoring device would likely encounter.
Therefore, 16 discrete lesions were designed with a wide distribution of locations including shallow and deep brain sites [3].An emphasis was placed on small lesions to represent more difficult detection scenarios.
Lesion volumes up to 30 ml were included as the rate of recovery in patients with lesions of this size is still high [60].Of the 16 variants, three are 30 ml, seven are 10 ml, and six are 5 ml in volume, as depicted in figure 4. As lesions are typically spherical or ellipsoid in shape due to the physiological mechanisms of stroke [61], all lesions are spherical.As stated, there are 2,160 lesion scenarios considered for each of the two stroke types.

Layer properties
Frequencies in the range of 5 Hz-2000 Hz were selected for the study in keeping with the clinical stroke work from UCL [62].Tissue conductivity in this frequency range is based on the IT'IS Database [63], and the assigned properties are plotted in figure 5. Assumed error in tissue conductivity does not have a major impact on performance [24].Therefore, tissue  conductivity variance was not a factor explored in this study.

Scalp and skull aggregate
The conductivity of the skull and scalp aggregate was approximated by the average proportional thickness of each layer.The inner cancellous bone (diploë in cranial bone) is more conductive than the surrounding cortical bone [64].Average overall cranial thickness is (6.29 ± 1.41 mm) neglecting the distribution of the cranial regions [65], and the average cortical thickness is (3.28 ± 0.479 mm) [66].The thickness of cancellous bone in the skull represents approximately 30%-55% of the skull [64,66], neglecting skull thickness variations and sexual dimorphism.
Average total scalp thickness in adults is 3-4 mm, with one study finding mean and variance of (3.71 ± 0.438 mm) [67].Therefore, the total thickness of the skull and scalp is approximately (10.00±1.05mm).Based on these thickness ratios and the reported conductivities from [63], a rigorous estimate of the aggregate conductivity for just the skull and scalp is between 0.028 and 0.031 S m −1 from 5 to 2000 Hz.For the models used in this study, a conductivity of 0.05 S m −1 was selected for the outermost layer in consideration of the multitude of other tissues that comprise the head and face, including blood, muscle, fat, and tendon, all of which are higher conductivity than the calculated range reported above.This decision is further supported by [28] in which an even higher value of 0.1 S m −1 is used for the outermost aggregate layer.

Brains
Measured dielectric properties of human brain tissue are taken from [63,68].Gray and white matter have slightly different conductivities in the frequency range of interest, and white matter is significantly more anisotropic than gray matter [68].Therefore, the average conductivity of the brain depends on the composition of the brain, which is dependent on factors such as age and sex [50,69].
As stated, because assumed error in tissue conductivity does not have a major impact on performance [24], and to maximize generality of the model, the complexities of differences in brain tissue conductivity were neglected.The brain was treated as an aggregate of gray and white matter, and the frequency dependent conductivity values for aggregate brain tissue from [63] were used.

CSF and ventricles
The conductivity of human cerebrospinal fluid at body temperature is well documented at 1.79 S m −1 [56,70].It is approximately constant at low frequencies, and as an ionic liquid, its conductivity is substantially higher than other biological tissues.The same conductivity was assigned to the SAS CSF and all four ventricles.

Hemorrhagic
Blood has a constant conductivity of 0.7 S m −1 in the frequency range of interest [63].In clinical settings, hemorrhagic stroke volume is measured as the volume of extravasated blood.However, the spherical volumes in this study are defined only by conductivity of the region, and the blood in the hemorrhagic lesion site must share physical space with the preexisting brain tissue.Therefore, the volume was treated as approximately 50% blood and 50% brain tissue, and the conductivity in the lesion volume as the average of these tissues.The conductivity for hemorrhagic lesions was therefore approximated to 0.35 S m −1 at all frequencies.Furthermore, the conductivity of blood is high relative to healthy brain tissue, so achieving good classification performance with a lower conductivity difference further supports the hypothesis that EIT and machine learning can reliably detect and monitor for stroke in realistic scenarios.

Ischemic
Ischemia is simply the restriction of blood supply, so a rough estimate of the low-frequency conductivity of brain tissue is trivial given the spectral conductivity of white and gray matter.Ignoring the complex tissue composition in the mid-brain, the approximate ratio of gray to white matter is 1.4-1.5 [69].Accepted approximation of the vascular volume for each is (5.2 ± 1.4%) and (2.7 ± 0.6%) respectively [71], resulting in a rough total mean brain vascularity of (4.2 ± 1.1%).
This estimate is approximated to a 5% reduction in brain conductivity at each frequency.This estimate is based on blood's conductivity being significantly higher than that of brain tissue (>5x at all frequencies) and the assumption that all vasculature in the affected volume is devoid of blood during an AIS event.While there is limited in-vivo measurement of ischemic tissue, particularly in humans and at low frequency, animal studies have shown an in-vivo conductivity decrease of 10%-14% at frequency below 100 Hz [72].
As tissue ischemia takes time to develop (from 3 to 24 h in most cases [12], up to and beyond 48 h [73]), a 5% conductivity decrease is a more reasonable estimate for clot-affected brain tissue in the early stages of stroke when EIT-based imaging would be used for stroke detection.

Methodology
In EIT measurements, a number of electrodes are placed around the imaging object in a specified configuration, typically in one or multiple rings or according to electroencephalography (EEG) standards.A low-frequency sinusoidal current is injected between a pair of electrodes while voltages are measured between two other electrodes.Each pair of measurement electrodes is referred to as a channel.The pair of measurement electrodes is then cycled to another pair, which repeats until all channels have been measured.The injection pair is then changed, and the measurements repeated.The set of distinct injection and measurement pairs constitute a protocol, and the complete set of voltage measurements comprise a frame of data, where the number of measurements is equal to the number of injection pairs multiplied by the number of channels.The voltage measurements at the injection pair are typically excluded from the frame [62].
In adherence to the most advanced experimental work to date on this topic [62], our study follows the same methods to generate simulated EIT data.32 electrodes are arranged corresponding to EEG standards as in [62], with 18 locations from the original 10-20 system, 12 from the 10-10 extension, and 2 from the 10-5 extension.Current injection follows the same protocol and amplitudes as [62] at select frequencies, with amplitudes of 45 μA at 5 Hz and 20 Hz, 90 μA at 200 Hz, and 280 μA at 2 kHz.Each simulated frame consists of 930 independent measurements at one frequency after excluding measurements on injection channels.

Numerical model
The STL files described in the previous section were converted to finite element models (FEM) of between 270,000 and 298,000 tetrahedra each depending on model permutation.This was performed using EIDORS [45] aided by Gmsh [46] and Netgen [47] and code adapted from the framework introduced in [28].After meshing the outermost layer [48], the inner layers are inserted in serial, removing the nodes of the outer layer within the boundaries of each inner layer [49].EIDORS was then used to generate simulated EIT voltage data from the resulting FEMs.
Specifically, three sets of FEMs were generated at each frequency: one set with healthy models, one set with hemorrhagic lesions (bleed), and one set with ischemic lesions (clot).The healthy set is comprised of the 135 model permutations representing unique patients without lesions, while the two stroke sets each consist of 2160 permutations (the same patients with each of the 16 lesions.)Conductivities were assigned to the elements of each layer as plotted in figure 5.The EIT electrodes were meshed into the outer layer of the model with a locally refined mesh.An example of a FEM from the healthy set is depicted in figure 6.

Data production
While the model variants account for patient-topatient anatomical variability, measurement variation is necessary to represent various sources of measurement-to-measurement noise such as electrode placement which can severely affect performance [24], or subject movement [16].This was achieved by adding random gaussian noise to the simulated voltage data at various signal-to-noise ratios (SNR).
The voltages were calculated using EIDORS tools to forward solve the complete electrode model (CEM), a set of boundary conditions widely accepted as sufficiently comprehensive for EIT [74].This model includes the impact of discrete electrodes, the current shunting effect from the electrodes themselves, and the effective contact impedance z , l which is assigned a value of 0.01 Ω as in [48].For each model permutation, 250 frames of simulated EIT data were generated.This data generation was repeated for four frequencies and six SNRs, resulting in 250 frames per model permutation at frequencies of 5 Hz, 20 Hz, 200 Hz, and 2000 Hz; and SNRs of 10 dB, 20 dB, 30 dB, 40 dB, 60 dB, and 80 dB.These parameters were selected in alignment with the frequencies and SNR ranges reported in studies focused on clinical measurements [62,75].The goal of this large dataset was to evaluate classification performance as a function of EIT excitation frequency and measurement SNR.It should be noted that there are many other factors to consider that may affect EIT measurements beyond those shown in the data presented in this study.For example, it is well established that electrode position errors can dramatically alter EIT measurements [16,76], thereby confounding even a well-tuned classifier.Practical methods of addressing this issue proposed thus far include algorithmic approaches [33,37,39], image guidance [77], or by fixing electrodes relative to one another in a rigid structure to minimize interelectrode positioning errors.EIT in general is reportedly robust to tissue conductivity spectra [26] and highly robust to errors in electrode contact impedance [33,76].

Machine learning model
The applied machine learning (ML) model classifies the simulated EIT frames based on the condition presented by the respective model (healthy, bleed, or clot), where each frame is an observation.The frames are labeled according to patient type and the voltage measurements serve as features for the classifier.Supervised learning models are trained with labeled data and performance is evaluated on unseen measurements.In cross-validation (CV), the training and evaluation steps are repeated multiple times with a new subset of the measurements serving as the unseen data in each iteration.The model with the best average performance is then taken as the final model, which reduces overfitting and results in an optimized model.The applied classifier is based on a support vector machine (SVM) trained with nested CV as depicted in the flow chart in figure 7. One full sequence of this nested structure with a particular model subset is referred to as a run of the simulation.
For a single run, a random subset of the 250 noisy frames is selected for each model permutation and patient type (i.e., normal, bleed, and clot).The subset consists of 2 or 4 frames per unique model permutation per lesion type depending on the classification method (as explained in section 3.4.)such that the total number of observations among both classes is 17,280.The classes are balanced for all architectures to avoid bias toward any one patient type.In cases where one class consists of two patient types, the number of observations from each is also equal.As the full set of unique permutations with lesions has 16 times as many frames as the healthy set, the number of randomly selected healthy frames for each permutation is 32 or 64.The randomly selected subsets serve as the observations for this run and are then split into training and test folds.
In this work, we present a novel machine learning architecture that aims to mimic the clinical application of this classifier by comprising the isolated test folds of strictly unseen patient-representative models and lesion locations.Previous studies have taken care to differentiate between the training and testing sets.However, they either specifically leave out single characteristics with the intention of testing their effect [27,28] or they make the testing set more varied than the training set [22].Neither of these differentiations mimic its real-life application in which the ML classifier will be trained on a varied dataset and is tested on a patient with novel characteristics.As a result, it is difficult to translate the results of these works to its real-life performance.
The classifier consists of two CV loops, an inner and an outer CV loop.For each run, the random subset of frames (17,280 observations from 2,160 model permutations) is split into ten folds, each with a unique training set and testing set made up of 90% and 10% of the original data set, respectively.Principal component analysis (PCA) is performed on the training data to reduce the number of features for the classifier from 208 to 10.The transformative coefficients from the training data are then stored and later applied to the test-set data.This ensures there is no knowledge of the test-set data when performing PCA, avoiding data contamination.Performance metrics for the final classifier are obtained from testing the excluded test sets, and this procedure is repeated for all ten of the unique training-testing data pairs.The inner CV loop uses k-fold CV with nonstratified folds and k = 10 to optimize the SVM hyperparameters using a Bayesian optimization procedure with only the training data.The outer CV loop uses leavep-out CV (LpOCV) to evaluate SVM per- formance on the isolated test fold.Experimentation with the number of LpOCV folds showed an 8-fold LpOCV process provided sufficient samples to analyze fold-based performance statistics while maintaining a uniform number of holdout lesions in all test folds.
A cautious selection process for the test folds was developed to ensure no training example leakage between the training split and the isolated test fold.In general, each LpOCV fold consisted of: (i) all frames from 8 of the 135 model permutations of all types (including their lesion variants); (ii) all lesion frames that contain either of 2 selected lesions (out of the 16 lesions); and (iii) an equal number of healthy frames (from the same permutation) such that the resulting classes are balanced.For each run, the model permutations and lesions in each test fold are selected randomly without repeats.In doing so, we are able to mimic the testing of this classifier on new unseen patients for each outer fold.
Randomized data selection and composition of the splits could result in biased evaluation results.To limit the impact of this, three independent runs were performed for all parameter configurations with uniquely random selections and splits.Most results are therefore statistics from evaluation on 24 isolated test folds (3 runs of 8 LpOCV folds).
SVMs were selected for this work for their reportedly high performance in stroke classification [24,78].Moreover, SVMs are guaranteed to converge to the global minimum regardless of initial conditions due to their implementation of quadratic programming [79].SVMs are preferred in instances where the feature space is not sampled thoroughly as the SVM decision boundary maximizes the distance between the closest observations of opposing classes, reducing the risk of misclassifying new observations [80].PCA was applied to reduce the dimensionality of the data (930 features per observation) while preserving maximum differentiability of the classes.A series of grid searches established the lowest number of principal components (PCs) for improved classification accuracy.To accommodate the high computational cost, these simulations were performed on the Stampede 2 supercomputer at the Texas Advanced Computing Center [81].Average computation time was most dependent on SNR, kernel function, and binary classification method (as described in the following subsection).

Classification architectures
While the general goal in stroke triage is to assess and diagnose, the highest priority objectives can depend on clinical scenarios.For example, ruling stroke in or out altogether (i.e., lesion detection) may advise clinicians to consider trauma; while, for a patient with clear signs of stroke, it may be higher impact to determine the type of stroke (i.e., lesion differentiation) or strictly establish the presence or absence of ICH (i.e., bleed detection).Therefore, it is important to consider all the possible clinical scenarios and to analyze if different measurement parameters are more optimal in certain scenarios over others.
To evaluate such scenarios, four binary classification methods were implemented with SVMs, and the composition of the two classes in each of these methods is depicted in figure 8.Each method is given a descriptive name and corresponding two-letter reference label to aid recognition of the method depicted in the following figures and discussion.

Analysis of performance
Classifier performance is reported and discussed for the parameters analyzed in this work.These parameters include the binary classification methods, SVM kernel function, PCA and the number of PCs retained, SNR, and driving frequency.
For all results presented, the statistics are calculated by the true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).The generalized classification accuracy is calculated as: and is included for all parameters presented.As a result of the symmetric classes and the fact that all samples are strictly assigned a class, the accuracy values provided are equivalent to F 1 score.Also reported are the sensitivity (recall or true positive rate), and specificity (true negative rate).It should be noted that the distinction between sensitivity and specificity in terms of patient type depends on the classification method applied, as this determines the composition of each class as described in section 4.1.Sensitivity and specificity are calculated as:

Classification method
The performance of the four classification methods is compared directly in figure 9 with all other parameters fixed.Statistics were calculated from the performance for each of the eight LpOCV folds across three independent simulation runs, each with a different composition of isolated models per fold.Therefore, each box-and-whisker illustrates the distribution in performance of 24 samples.
The behavior of the classifier among the four methods can be explained in part by the relative conductivity difference between diseased tissue in the hemorrhagic and ischemic cases.Indeed, clot detection (HC) is the most confounding for the classifier due to the subtle difference between the healthy brain and ischemic lesions.For all methods with hemorrhagic patients, the classifier had the highest predictive values for the class that contained the bleed subjects.In particular, the lesion detection method (HL) saw the highest proportion of correct predictions overall for the class containing a mix of both lesion types.The proportion of correct predictions for the healthy class is comparatively low due to a significant proportion of false negatives (lesions misclassified as healthy).This indicates a low sensitivity to lesion detection, but a high specificity (low FP rate).
In the case of lesion differentiation (BC), predictions were less skewed to one class.Surprisingly, correct identification of clot was more reliable than bleed, with only two samples having a high FN rate.These two folds occurred in separate runs and had unique hold out lesion pairs.For one fold, the holdout pair was two lesions (5 and 10 ml) on opposite sides of the brain in the temporal lobes; while the other was the two lesions approximately in the prefrontal cortex (one 5 ml and one 30 ml).
Finally, the overall accuracy is highest in the case of bleed detection (OB).It is apparent that, as expected, the identification of the higher conductivity bleed samples is improved by grouping the healthy and clot cases into the same class.
A preliminary look into the misclassified lesions showed that assessment is not straightforward, with no universal trend.The most missed lesion variant depends on the classification method, the composition of the holdout folds, and the type of stroke that lesion variant represents in a given simulation.For example, it would make sense for the deepest lesion surrounded by the ventricles at the center of the brain to most confound the classifier.Nevertheless, this lesion is not always the most difficult to detect.

Kernel selection
For an SVM, the kernel function projects the feature data onto a space of higher dimension, thereby improving the separability of the classes when a suitable kernel function is selected [82].As the most realistic approach to kernel selection is trial and error [82], three common kernel functions were compared: linear, polynomial, and radial basis function (RBF).Statistics from three independent runs were compiled and are plotted in figure 10.
For all classification methods, the linear kernel provided suboptimal separation of the classes.Although, for some individual runs, performance was deceivingly high, which supports the multi-run analysis used in this study to obtain the most generalized case.Note the opposing trends of the sensitivity and specificity in figure 10 with respect to kernel, suggesting performance improvements may be possible with an ensemble classifier using more than one kernel.
For the methods with a strict healthy class (HC and HL), the RBF kernel suffered from overfitting.In particular, specificity with the RBF kernel was very poor with a median below 50% for both methods, while the respective Clot and Lesion classes saw high sensitivity.In these methods, the polynomial kernel provided the most balanced performance.Dissimilarly, it is not abundantly clear if the RBF or polynomial kernel is superior in the BC or OB methods.On one hand, grouping is overall tighter for the RBF kernel with higher sensitivity to the bleed class in both cases; on the other hand, median classification accuracy is highest with the polynomial kernel.While the RBF kernel has its merits, polynomial was selected as the optimal kernel for the most general case except where otherwise mentioned.

Principal component analysis (PCA)
Dimensionality reduction through PCA significantly improves run times and computational load.Furthermore, by distilling the most highly separable components from the noise-based elements, classification accuracy can be improved.To identify the optimal number of PCs to use, grid searches were performed with other parameters fixed to directly compare performance across a range of PCs.
We performed grid searches with each kernel function to ensure the optimal kernel was selected regardless of PCA.Individual grid searches were also performed for each classification method to evaluate the optimal number of PCs (n) for each method.The results from fine grid searches showed similar trends for all methods, so the results for only one method are reported: the grid search results with a polynomial kernel for the bleed detection (OB) method are presented.The initial search used a coarse grid that spanned the full range of available PCs, from 1 to 930.As the goal of the initial search was to form a basis for the following fine grid search, only one run is depicted in figure 11.Classification accuracy seems to peak at = n 100, with the most viable options ranging from 50 to 200.
A follow up search employed a fine grid between 40 and 200 in steps of 10.To combat the potential bias stemming from test fold composition, three runs were performed each with randomly selected holdout models in the test folds.Statistics for the fine grid searches for the bleed detection case (OB) are depicted in figure 12.The range of statistics presented indicate that a single optimal n within this range of PCs is not evident, and there are a range of viable options.By the nature of PCA, the variance explained by each additional component is diminishing.So, the smallest viable option of = n 50 PCs is selected.As stated, the trend of minimal performance improvement past = n 50 is upheld by the other binary methods, so this selection is maintained for them as well.

Accuracy versus noise
Noise in the measurement data reduces classification accuracy as feature separation between classes is  blurred.Depicted in purple in figure 13, the accuracy for the lesion differentiation (BC) method in particular increases monotonically with SNR, reaching maximum accuracy at 80 dB with a mean and SD of (83.59 ± 10.19%).The expected trend is followed by all other methods at low SNR.However, for the bleed detection (OB) case (red in figure 13), accuracy begins to suffer from overfitting at SNR above 60 dB as evidenced by the extension of the lower whiskers below the 50% mark.At 80 dB, the SNR with the highest OB classification accuracy, the mean and SD is (78.79 ± 15.59%).
The same tendency is demonstrated in the 80 dB results for lesion detection (HL) and clot detection (HC), green and blue in figure 13 respectively.This overfitting may be caused by bias from the use of more than one measurement simulated from each permutation, where the diversity in the data introduced by the  added noise improves the generality of the classifier as discussed in [83].The highest accuracy occurs at 60 dB for these methods, at (74.78 ± 3.79%) for HL and (60.31 ± 3.98%) for HC.
Results from [24,84] suggest that an SNR of 80 dB or higher is required for lesion detection, with SNR 60 dB feasible with application of certain algorithms [28].EIT hardware systems can reportedly achieve and exceed 80 dB in benchtop experimental studies [75], while scalp measurements collected from stroke patients and healthy volunteers in [62] is reported as 43 dB nominally, ranging from 44.1 to 45.5 dB [75].40 dB is therefore considered a realistically attainable SNR in clinical settings.The results obtained in this study support work toward achieving higher SNRs in clinical settings, as well as developing novel data processing techniques to bolster stroke detection performance at lower SNRs.

Accuracy versus frequency
Frequency dependent conductivities of the brain tissues imply two main expectations: the frequency at which the greatest difference in conductivity between lesion types occurs is likely to produce the most differentiable features, and classification utilizing multiple frequencies may provide better results than any single frequency alone.The frequency-dependent conductivity ratios for diseased tissue to healthy tissue for the two stroke types are contrasted in figure 14.These conductivity values were calculated using data from [63] as delineated above in section 2.2.
However, the results as seen in figure 15 do not show a clear correlation with accuracy improving with frequency.A possible explanation for this is that the higher driving current amplitude at higher frequencies partially compensates for the lower tissue conductivity differences at these frequencies.Frequency was not found to significantly impact overall classification accuracy for any method at a confidence level of 0.95.The maximum difference in means between the highest and lowest performing frequencies in the bleed detection (OB) case was 6.1%.
The class-specific sensitivity values with the highest performance occur at different frequencies for all classification methods.For example, in the lesion detection (HL) case: the highest classification accuracy occurs at 20 Hz with a mean and SD of (75.55 ± 4.65%); the highest specificity to the healthy class occurs at 2 kHz with a mean and SD of (91.29 ± 9.13%); and the highest sensitivity to the lesion class  occurs at 5 Hz with a mean and SD of (63.08 ± 9.51%).This is not simply due to the random composition of each training and test fold, as the results for each independent run follow the same trends.In other words, the frequencies at which the highest accuracy, healthy specificity, and lesion sensitivity occur are the same for all three individual runs.This supports the use of multi-frequency classifiers and suggests that a multistage classification technique can achieve improved performance.

Conclusion
The potential for stroke detection using EIT is well established.This work presents the most advanced study to date by providing classification results using a richer numerical model and a robust, dynamic ML framework.Interchangeable model layers generate patient-representative phantoms strengthened by incorporated ventricles, a previously omitted layer.The variability in location of stroke is portrayed more faithfully with a diverse set of lesion variants including deep and low-volume cases.A rigorous partitioning methodology ensures holdout sets contain specific isolated patients and unseen lesions to replicate clinical conditions.This, coupled with optimization by nested leave-p-out and k-fold cross-validation provides high generality, minimizing performance bias.
The results of this investigation suggest that the preliminary nature of previous studies presents an incomplete picture of the difficulty a practical EITbased stroke detection system will face.A compelling system must be validated with a meaningfully diverse population of patients and lesions, and understanding what constitutes these criteria is fundamental.
A benefit of using SVMs is their relatively straightforward decision-making process.Factors that confound the classifier can therefore be observed and used to inform more comprehensive model sets.Deeper analysis with other ML algorithms would help to resolve the best architectures for this application.Furthermore, unique configurations are needed depending on the classification method employed (e.g., bleed detection, lesion differentiation, etc).Hybrid or cascade techniques are also expected to improve performance.
An impediment to research progress in this field is the markedly limited experimental EIT measurements on stroke patients.To produce this crucial data is laborious and not easily sanctioned, but its impact on future development would be instrumental.For now, supplementing training sets with simulated or phantom measurements remains a realistic and valuable strategy.

Figure 1 .
Figure 1.An example of a model permutation consisting of the baseline layers for the head, CSF, brain, and ventricles to represent a single healthy patient.The inclusion of one of the lesions produces a stroke patient variant.

Figure 3 .
Figure 3. Ventricles layer group consisting of five variants depicted from the front and underside.The baseline model (blue) is depicted in all columns to compare with modified variants.(A) Volume scaled models (cyan and magenta).(B) Clockwise tilted (yellow).(C) Counterclockwise tilted (red).

Figure 5 .
Figure 5. Conductivity of model tissue layers for all excitation frequencies (log-log scale).

Figure 6 .
Figure 6.FEM for a healthy patient with baseline layers.CSF is blue, brain is transparent, ventricles are red.Dimensions are in scaled meters.

Figure 7 .
Figure 7. Machine learning flowchart.The outer loop steps through the LpOCV folds, while the inner loop steps through a grid search of the number of PCs used.PCA: Principal component analysis, PCs: Principal components, SVM: Support vector machine, BC: Box constraint, KS: Kernel scale, TP: True positive, TN: True negative, FP: False positive, FN: False negative.

Figure 8 .
Figure 8. Binary classification methods considered in this study.The class compositions for each method are illustrated, and each method is paired with its two-letter reference label.Classes are balanced for all methods as illustrated by the relative size of the boxes.

Figure 9 .
Figure 9. Classification method performance compared.Statistics from three independent runs for each method.Fixed parameters: 5 Hz, 40 dB SNR, polynomial kernel, 50 PCs.

Figure 10 .
Figure 10.Classification performance of SVM based on kernel function used.Statistics from three independent runs for each kernel.Fixed parameters: lesion detection (HL), 5 Hz, 60 dB SNR, 50 PCs.

Figure 11 .
Figure 11.Results from coarse grid search for bleed detection scenario (OB).Standard deviation values are depicted as clouds on each side of mean value.Statistics are based on performances for each k-fold in a single run.= n 50 is marked with a vertical dashed black line.

Figure 12 .
Figure 12. Results from fine grid search for bleed detection (OB) scenario.Statistics are based on three independent runs.

Figure 13 .
Figure 13.Prediction accuracy versus SNR for each classification method, with statistics drawn from three independent runs.Frequency 5 Hz, 50 PCs, polynomial kernel.

Figure 14 .
Figure 14.Ratio of conductivity values used in this study for each lesion relative to brain tissue at each frequency (log-log scale).

Figure 15 .
Figure 15.Prediction accuracy versus frequency for each classification method, with statistics drawn from three independent runs.SNR 60 dB, 50 PCs, polynomial kernel.