Predicting CPAP failure after less invasive surfactant administration (LISA) in preterm infants by machine learning model on vital parameter data: a pilot study

Objective. Less invasive surfactant administration (LISA) has been introduced to preterm infants with respiratory distress syndrome on continuous positive airway pressure (CPAP) support in order to avoid intubation and mechanical ventilation. However, after this LISA procedure, a significant part of infants fails CPAP treatment (CPAP-F) and requires intubation in the first 72 h of life, which is associated with worse complication free survival chances. The aim of this study was to predict CPAP-F after LISA, based on machine learning (ML) analysis of high resolution vital parameter monitoring data surrounding the LISA procedure. Approach. Patients with a gestational age (GA) <32 weeks receiving LISA were included. Vital parameter data was obtained from a data warehouse. Physiological features (HR, RR, peripheral oxygen saturation (SpO2) and body temperature) were calculated in eight 0.5 h windows throughout a period 1.5 h before to 2.5 h after LISA. First, physiological data was analyzed to investigate differences between the CPAP-F and CPAP-Success (CPAP-S) groups. Next, the performance of two types of ML models (logistic regression: LR, support vector machine: SVM) for the prediction of CPAP-F were evaluated. Main results. Of 51 included patients, 18 (35%) had CPAP-F. Univariate analysis showed lower SpO2, temperature and heart rate variability (HRV) before and after the LISA procedure. The best performing ML model showed an area under the curve of 0.90 and 0.93 for LR and SVM respectively in the 0.5 h window directly after LISA, with GA, HRV, respiration rate and SpO2 as most important features. Excluding GA decreased performance in both models. Significance. In this pilot study we were able to predict CPAP-F with a ML model of patient monitor signals, with best performance in the first 0.5 h after LISA. Using ML to predict CPAP-F based on vital signals gains insight in (possibly modifiable) factors that are associated with LISA failure and can help to guide personalized clinical decisions in early respiratory management.


Introduction
Less invasive surfactant administration (LISA) has recently been introduced to treat preterm infants with respiratory distress syndrome (RDS).With this method exogenous surfactant is administered with a small catheter placed in the trachea while keeping the infants on continuous positive airway pressure (CPAP) support (Verder et al 1992, Kribs et al 2007, Herting et al 2019).In this way mechanical ventilation and its damaging effects are avoided (Aldana-Aguirre et al 2017).However, despite this treatment, a significant part of the infants fails CPAP and needs to be intubated within the first 72 h of life.CPAP Failure (CPAP-F) is associated with adverse short term outcomes (Dargaville et al 2016).Accurate prediction of CPAP-F after LISA is clinically important as it can identify which modifiable factors are associated with failure and guide future individual decision making in LISA and improve short term outcomes.
While studies have investigated risk factors for CPAP-F in general (Fuchs et al 2011, Dargaville et al 2013), only few have investigated the risk for CPAP-F after LISA, incorporating several clinical parameters (Janssen et al 2019, Balazs et al 2022).Finding gestational age (GA), high c-reactive protein (CRP), lower surfactant dose and lack of antenatal corticosteroids treatment to be associated with CPAP-F following LISA.
In our neonatal intensive care unit (NICU), high resolution vital parameter data of preterm infants is acquired by clinical monitoring and stored in a data warehouse.This provides the opportunity to assess the effects of the LISA procedure on the physiologic parameters and use this information for the development of an algorithm to predict CPAP-F after LISA.ML-algorithms have been used to predict several clinical outcomes in the neonatal intensive care unit (NICU) (McAdams et al 2022).Next to univariate (Gulczyńska et al 2019) and multivariate analysis (Roberts et al 2020) to predict intubation in preterm infants, machine learning (ML) algorithms analyzing clinical and low resolution vital parameter data were able to predict the need for intubation in the intensive care setting (Siu et al 2020, Im et al 2022, Kanbar et al 2023).However, until now, no study focused on predicting CPAP-F after the newly developed LISA method while incorporating ML analysis of high resolution physiological data.
The aim of this feasibility study was to predict CPAP-F (in terms of intubation in the first 72 h of life) after a LISA procedure, based on high temporal resolution vital parameter data derived from standard monitoring.First, the physiological signals surrounding a LISA procedure were analyzed to find characteristic features that show differences between CPAP-F and success (CPAP-S).Next, a ML-model was developed to predict whether a LISA treatment was successful in terms of avoiding subsequent CPAP-F.This study will improve our understanding on the clinical (and possibly modifiable) circumstances in whom infants may fail CPAP after LISA.Besides, if more accurate models are developed, prediction of CPAP failure can guide clinical decisions around the LISA procedure (a repeat LISA procedure or intubation for administering surfactant).

Population
This was a retrospective study of the period between January 2016 and June 2019 on 110 patients born before a gestational age of 32 weeks.The study population was presented in de Kort et al (2020).Patients were included in case they had RDS and were treated with LISA within the first 72h of postnatal life.Infants were treated with LISA according to national guidelines (FiO2 > 0.3 at CPAP 6 cm H 2 O).For all patients, clinicians kept records in the electronic medical record (EMR).A waiver was provided by the Medical Ethical Committee, (Máxima Medical Centre, Veldhoven, the Netherlands) according to the Dutch Law of Research with Humans.No additional parental consent was required.
Retrospectively, from EMR data, all patients were labelled into CPAP Success (CPAP-S: one LISA, no intubation within 72 h after birth) and CPAP Failure (CPAP-F: one or multiple LISAs, intubation within 72 h after birth).Next, data availability in the data warehouse was checked.Patients were excluded if the dataset was incomplete around the time of LISA (n = 45), when a physician was unable to label the data (n = 9) or in case they received multiple LISAs but did not require intubation (n = 5).Finally, 51 patients remained in the study population for further analysis, of whom 33 patients (64.7%) were labelled CPAP-S, and 18 patients (35.3%) were labelled CPAP-F.Patient characteristics of the included infants can be found in table 1.
All patients admitted to the NICU were monitored continuously using a Philips IntelliVue MX800 patients' monitor (Philips Medical Systems, Böblingen, Germany) according to clinical standard.All data was automatically stored in a data warehouse (PIIC-iX, Data Warehouse Connect; Philips Medical Systems, Andover, MA).Data was then retrieved from the data warehouse and pseudonymized such that no signal could be related back to a patient without having the key file.Two categories of data were obtained for analyses: parameter data, which was obtained after processing by the patient monitor, and waveform data, which came directly from the sensors.Parameter data included heart rate (HR), respiration rate (RR), peripheral oxygen

Annotation of LISA and intubation moments
To synchronize the EMR and vital sign data, the exact starting points of the LISA procedure and of intubation needed to be labeled.One physician annotated the procedures using a graphical user interface (GUI), as shown in figure 1.First, the physician inspected all four different signals (SpO2, HR, RR and temperature) over a 3 h period, as seen in box A. Additionally, the physician was given two overview signals (RR and SpO2) over the 72 h period after birth, as seen in box B. Finally, the physician had buttons to move the view, annotate the labels for the LISA(s) and intubation (if applicable for the patient), make a comment and save the results, as seen in box C.
To determine the exact LISA and intubation moments, the physician used the EMR recording combined with the monitoring data (heart rate changes associated with atropine premedication and laryngoscopy, deep desaturation followed by sharp increase in SpO2).For the intubation, respiration rate (converging to a constant value after starting mechanical ventilation) was also used.The EMR denoted times are not the exact treatment times, however these times are relevant markers to start looking for the exact treatment times.

Signal processing
The signal processing pipeline is seen in figure 2. For every patient three sources were used to extract features: 1. From the EMR the GA, BW and procedure times were extracted.The procedure times were used for the previously described annotation.
2. From the monitor parameter data, the RR, HR, SpO2 and temperature were extracted and an average and standard deviation of these parameters was calculated per each minute.Additional calculations included the interquartile range extracted from the HR, the slope of the saturation, defined as the change in SpO2 value between the current time point and the preceding minute, and the percentage of desaturations within a minute.The parameter data was automatically pre-processed by the patient monitor before it was stored in the data warehouse.3. From the waveform data the ECG and CI were extracted.These required additional processing that will be described below.
A list of all the features used can be found in table 2. For the ECG signal, a peak detection algorithm was used to obtain the time between consecutive R peaks and compute the RR-intervals.The RR-intervals were used to calculate several heart rate variability (HRV) features in the time-domain (e.g.root mean square successive difference (RMSSD) (Kanbar et al 2023)), nonlinear features (e.g.Sample entropy between (SampEn_HR)) and frequency-domain features (i.e.low frequency (LF_HR)) (Chiera et al 2020), as shown in table 2. These HRV features were selected based on previous studies done by our group (Joshi et al 2020, Varisco et al 2022a).These features were considered to include aspects of the cardiovascular system and the development of the nervous system (Chiera et al 2020).Additionally, we extracted the signal instability index (SII) from the ECG.The SII value is low when there are no disturbances in the ECG and high in case the signal is instable, a condition that has been associated with infant motion (Joshi et al 2020).For the SII we calculated the average, standard deviation, interquartile range and skewness, which provides information on the magnitude and changes in the infant motion.Motion can be powerful in predicting illness in these infants (Cabrera-Quiros et al 2021) but is often difficult to interpret.
The CI signal was first pre-processed with the ECG to filter out the heart signal using Lee's algorithm, which allows to filter the cardiac artefacts from the input respiration signal and to normalize the resulting signal (Lee et al 2012, Varisco et al 2022b).On the CI-signal we used a peak detection algorithm to find the peaks in the CI and consequently calculate the breathing rate, for which we computed the average and standard deviation.Additionally, we calculated a ribcage respiratory effort (RRE) by following the steps described by Redmond and Heneghan (2006).First the mean of the CI-signal was subtracted and then it was passed through a tenth-order Butterworth filter with a cutoff at 0.8 Hz, removing high frequency noise.From the CI-signal turning points were detected and the subsequent difference between peaks and troughs was calculated.The median peak-tothrough amplitude over the full duration of the signal was then determined and the signal was normalized by dividing by means of this value.From the RRE we used time domain features which were the average, standard deviation, interquartile range and skewness.We also calculated frequency domain features which included the low frequency -and high frequency power as well as the ratio between these two.

Observational study and statistical analyses
Features were extracted every minute for 0.5 h time frames.For feature analyses we used one-minute segments for high resolution data to get a view of the clinical situation.Next, we calculated the mean and standard deviation in 30 s timeframes and these epochs were used for the ML models, not shifted.Patients were included Figure 2. Signal processing pipeline, for each patient data was extracted from the EMR (GA, BW, procedure times); from the monitor data (RR, HR, saturation and temperature) and from the waveform data (ECG and CI signals).Peak detection was performed on the ECG signal to determine peak to peak intervals (RR-intervals), from this signal HR and HRV features were calculated.Additionally, the ECG signal was used to determine the signal instability index features.Peak detection was also performed on the CI signal to determine breathing rate and ribcage respiratory effort.All EMR, monitor features, resulting waveform features and annotated moments were used in the observational study and averaged over 1 min when applicable.After the observational study was performed, data was changed to a format that can be used by the machine learning experiments.in a time frame in case data was present for at least 80% of the time.Additionally, patient data was excluded half an hour before a second procedure, like a second LISA or an intubation, to exclude characteristics from a second procedure.This exclusion resulted in a different number of patients per time frame.For the observational study features were displayed between 1.5 h before and 2.5 h after the first LISA per group.Features used for the ML experiment were averaged over 0.5 h time frames per group.
For each time frame the mean and standard deviation were calculated for all patients available in that time frame for CPAP-S and CPAP-F groups.First, using a Kolmogorov-Smirnov test we tested if the distribution was normal.If not, we used a Mann-Whitney U test to test for significant differences between the two groups.For categorical variables Fishers exact test was used to determine a p-value.A value of p < 0.05 was considered (1) logistic regression (LR) and (2) support vector machine (SVM).LR was chosen because it is relatively simple to relate feature importance back to the used physiological signals.SVM was used because it uses a different mathematical basis in which it searches for the optimal hyperplane in the multidimensional feature space to separate the two groups.
Features were imported into the algorithm with nested cross-validation and split into an outer training and outer test data using k-fold splitting (k = 5).The outer training data was split by means of a k-fold splitting (k = 5) into inner training data and inner validation data for hyperparameters tuning.Based on the performance via a grid search the optimal set of hyperparameters were selected, being for the LR Regularization C and the elastic net l1 ratio; and for the SVM regularization C, kernel type and gamma.Different criteria were set to select an optimal set of hyperparameters.First, the 20% hyperparameters with the largest area under the curve (AUC) from the inner validation data were selected.Second, the minimum absolute difference between AUC of the inner training and inner validation data was selected to find the model with the least overfitting.Third, if multiple hyperparameters performed similarly, the maximum AUC from the inner validation data was selected.Fourth, if multiple hyperparameters still had the same performance the first set was chosen as all models would perform similar and the choice was arbitrary.The optimal sets of hyperparameters were then chosen to train a ML model considering all outer training data which was then tested on the outer test data.The full process was repeated over all the outer folds and the performance of every fold was combined to evaluate the performance of the machine learning model.To evaluate performance in both the inner and outer cross-validation we used a receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) (Huang and Ling 2005).The AUROC was calculated for every outer fold, the five outer folds were averaged and the standard deviation was calculated for the five of them.
Additionally, for the LR we extracted the feature relevance to investigate which features contribute most to the prediction model.This is useful information for physicians as they can relate the physiological signals back to clinical practice.

Results
Observational study and statistical analyses We investigated 51 infants, who received LISA, of which 33 (64.7%) had subsequent CPAP-S and 18 (35.4%)had CPAP-F.Infants that failed CPAP after LISA had-as expected-lower GA and BW (table 1).
Three physiological signals (SpO2, temperature and HRV) showed statistically the largest differences when comparing the two groups of infants.Figure 3(A) shows SpO2 and temperature in the 4 h periods around a LISA procedure, in combination with the median GA for every group, and the number of patients included per group and per time frame.For SpO2, seven out of the eight time frames showed significant differences between the CPAP-S and CPAP-F groups.The CPAP-F group had a significantly lower body temperature than the CPAP-S group for all time frames.Temperature showed a decrease in its values around the first LISA as the incubator was opened for the procedure.Additionally, a decrease in both temperature signals can be seen due to patient handling, for example diaper changes.No filtering was performed to remove these moments as it was expected to be similar for both groups.
Figure 3(B) shows the RMSSD which represents the general idea of the HRV of both the CPAP-S and CPAP-F patient groups.HRV was found to be significantly lower in CPAP-F group at every time frame.Both groups showed oscillations in HRV and a rise around the first LISA procedure.In post-hoc analysis in our population HRV features did not change with GA in both groups.

Machine learning
Figure 4 shows the ROC curves computed by means of the LR model (4A) and SVM model (4B) considering each 0.5h time frame separately.ROC curves were created using a scikit-learn function and the true and predicted classes.For every outer fold an AUC was calculated which were combined and averaged over the number of outer folds.For every set of outer folds the standard deviation in the AUC was also calculated.Both the average and standard deviation are displayed in the legend as the for each 0.5 h time frame.For LR the highest AUC (0.90) was found in time frames T 1 and T .
5 For the SVM the highest AUC (0.93) was obtained in time frame T , 1 which is the time frame directly following the LISA procedure.By choosing a set TPR of 0,8, which would indicate for clinical perspective an acceptable sensitivity.confusion matrices for T 1 show that the most common error in the prediction at T 1 is that failure is predicted when the true class is a successful LISA.SVM shows fewer false failures.
Figure 5 shows the 20 highest ranking features for the LR model computed considering time frame T . 1 Similarly to what has been done in a previous study from our group, features are shown using median log odd ratios (Varisco et al 2022a), where a negative value indicates a prediction toward CPAP-S and a positive value indicates a prediction toward CPAP-F.From the figure it can be seen that GA and BW had an important contribution to the prediction.Additionally, several HR, HRV, SpO2 and respiration-based features were found to be among the top 20 most relevant features.
Figure 6 shows the final experiment, for which we excluded the GA and BW from the features that used for our ML models.Compared to the models shown in figure 4, models that did not include GA and BW as input features showed only a slightly lower AUC.Similarly to what has been found with our previous ML models, also in this case timeframe T 5 was still found to be associated with the highest AUC.

Discussion
In this pilot study we used ML on features calculated from high resolution vital parameter data to predict CPAP-F after LISA in preterm infants.We showed that physiological signals surrounding a LISA procedure are significantly different in infants that fail and succeed subsequent CPAP treatment.This data can be used to predict CPAP-F in a ML model as shown by the high AUC values obtained in this study.The LISA procedure was performed at T , 0 negative time frames are prior to the procedure and positive time frames are after the procedure.The curves were created using the true and predicted classes using a standard python script.For all the outer folds, an area under the ROC curve was calculated which was averaged over all outer folds.For all outer folds also a standard deviation is calculated both presented in the legend following the time period.(A) Shows the results for the logistic regression, every experiment is denoted with a different color in the legend.Also denoted in the legend is the algorithm that was used to predict (LR), the time frame used for that experiment and the average and standard deviation of the area under the ROC curve (AUC) per experiment.Additionally, the diagonal blue dotted line shows the results of a random classifier being correct for 50% of the time.(B) Shows the results for the support vector machine (SVM), with every experiment denoted by a different color in the legend.Th legend also shows the algorithm used for the experiment (SVM), the time frame and the average and standard deviation of the AUC.Again the diagonal blue dotted line shows the results of a random classifier.The two lower figures show confusion matrices for time frame T 1 at a TPR of 0.8. (C) shows the confusion matrix for the LR and (D) shows the confusion matrix for the SVM.
In the observational part of this study we showed that many physiological parameters between the CPAP-F and CPAP-S groups differed significantly before and after LISA procedure.CPAP-F patients were significantly less saturated during almost all the time windows.This lower SpO2 in CPAP-F patients is related to worse oxygenation, despite the efforts of the NICU staff to keep SpO2 within the target range of 88%-95% by adjusting FiO2.We hypothesize that a decreased oxygenation reflects RDS severity and an inadequate response to surfactant treatment.Regrettably, FiO2 data was not logged and therefore a saturation index could not be calculated.Besides, body temperature was lower in the infants that fail CPAP after LISA.Hypothermia is a common issue in preterm infants because of increased surface to body ratio, decreased subcutaneous fat and altered blood skin flow (reduced vasomotor response).The more immature and smaller the infant, the higher the risk on hypothermia (Suri et al 2012, Lyu et al 2015, Mank et al 2016, Laptook et al 2018).Hypothermia is associated with surfactant inactivation and may therefore be a direct cause and risk factor for CPAP-F after LISA (Suri et al 2012).However, hypothermia could also reflect the lower GA and BW of the infants, who are known to be at higher risk for CPAP-F in general.However, we nevertheless consider body temperature as an important and modifiable risk factor as better temperature management by the NICU staff may result in less CPAP-F.Additionally, in these infants HRV is a measure for stress and adaptability (Chiera et al 2020) and HRV response to LISA may predict subsequent failure.For both groups a procedure (LISA) is a stressful period and thus HRV increases.However, after the LISA procedure, the CPAP-S group seems to show more adaptive potential, as a higher HRV compared to the CPAP-F group is observed.Furthermore, already before the LISA procedure significant differences in HRV were observed between the CPAP-F and CPAP-S groups.HR, HRV and respiration rate all vary with GA (Lavanga et al 2021, Iyer et al 2023), as these features are important for the prediction it may reflect a large range dependency.However, in post hoc analysis of our current data set the dependency of each feature itself with GA did not display significant correlations.Therefore we conclude that the contribution of the different parameters is likely due to other aspects (like RDS severity) and not only GA.
The relevance of predicting intubation or failure of non-invasive respiratory support has been shown before in multiple studies in both adults (Siu et al 2020) and neonates (Gulczyńska et al 2019, Roberts et al 2020).The study of Roberts et al and Gulczyńska et al focused on predicting intubation, using demographic and categorical data.Our study, in contrast, includes multiple features from the physiological data that can be monitored bedside (Gulczyńska et al 2019, Roberts et al 2020).Our ML experiments were performed using LR and SVM algorithms with both models showing similar results.Both LR and SVM predicted best in time frame T , 1 with LR reporting an AUC of  0.90 0.09 and SVM an AUC of  0.93 0.07.Interestingly, both models were able to predict CPAP-F even before a LISA procedure, although their performance increased immediately after the procedure.From this it can be hypothesized that already before the LISA procedure a risk profile on CPAP-F is present in the vital parameter data.Foremost timeframe T , 1 which reflects the response to the LISA procedure in vital parameters, has the best performance in predicting CPAP-F after LISA.
By extracting the feature relevance, we observed that GA, BW, SpO2, RR and HRV were the highest ranking features and best performing in separating the two groups of infants.Even when GA and BW were removed from the input features that were used as input for the ML model, leaving only the vital parameter data for the prediction, the performance of the ML model remained adequate.This may facilitate an implementation in bedside patient monitoring systems, since GA and BW usually are not registered in the monitor.
Our study has several limitations.First, GA and BW are known risk factors for CPAP-F and were different between groups.As some features used for prediction depend on GA and BW there may be a larger dependency on GA and BW that we need to investigate further, but in the analysis in this small cohort no significant correlation of the features with GA and BW were observed.Second, time periods could not be compared with each other because the number of patients depended on the availability of the signals which was different for every period.In the future the number of patients per period should be equal for each group in order to allow for a more equal comparison.Third, the standard deviation of the AUC between folds is still quite high and provides an indication of overfitting in our small patient group due to the use of a model with many features.Our study shows the feasibility of using a ML algorithm based on physiological parameters to predict CPAP-F after LISA in a limited number of preterm infants.This was in accordance with earlier studies showing feasibility of MLalgorithms on physiological parameters to predict CPAP failure after extubation of preterm infants (Siu et al 2020, Im et al 2022).However, before becoming clinically useful it is warranted to validate the findings of our Additionally, for both figures all features from the physiological data were included and the demographic features (GA and BW) were excluded.The LISA procedure was performed at T , 0 negative time frames are prior to the procedure and positive time frames are after the procedure.The curves were created using the true and predicted classes using a standard python script.For all the outer folds, an area under the ROC curve was calculated which was averaged over all outer folds.For all outer folds also a standard deviation is calculated both presented in the legend following the time period.(A) Shows the results for the logistic regression, every experiment is denoted with a different color in the legend.Also denoted in the legend is the algorithm that was used to predict (LR), the time frame used for that experiment and the average and standard deviation of the area under the ROC curve (AUC) per experiment.Additionally, the diagonal blue dotted line shows the results of a random classifier being correct for 50% of the time.(B) Shows the results for the support vector machine (SVM), with every experiment denoted by a different color in the legend.Th legend also shows the algorithm used for the experiment (SVM), the time frame and the average and standard deviation of the AUC.Again the diagonal blue dotted line shows the results of a random classifier.The two lower figures show confusion matrices for time frame T 1 at a TPR of 0.8.(C) shows the confusion matrix for the LR and (D) shows the confusion matrix for the SVM.
study in a larger and preferably external patient cohort in order to reduce the influence of individual patient and center characteristics (Siu et al 2020, Im et al 2022).

Conclusion
In very preterm infants receiving LISA, we found differences in high resolution vital parameter data (saturation, temperature, HRV) between the groups that subsequently succeeded or failed on CPAP.These differences were already present before the procedure and became more distinct directly after the procedure, of which temperature is a modifiable factor.We showed that a ML approach of analyzing of vital signals is clinically feasible in predicting CPAP-F with the highest accuracy already in the first 0.5 h after LISA.This is clinically relevant as this approach reflects a direct response to a LISA procedure and may determine, further improve and personalize surfactant and respiratory strategy.

Figure 1 .
Figure 1.Annotation program for defining the Less Invasive Surfactant Administration (LISA) and intubation moments.Box A shows four signals over a three-hour period: saturation, heart rate (HR), respiration rate (RR) and temperature.Box B shows two overview signals over the full 72 h: RR and saturation.Box C shows per patient the pseudo code, a Save and Exit button, a comment button, the labels that needed to be annotated and options to move the view of the three-hour period signals.For the saturation and RR, a mean and standard deviation were shown while for the temperature and HR only showing the mean was sufficient.In all signals the EMR based time points of the LISA and intubation moment(s) are present in red and the time points annotated by the physician are present in green.Note that the EMR denoted times in red are not the exact treatment times, however these times are relevant markers to start looking for the exact treatment times.

Figure 3 .
Figure3.(A) Shows the saturation and temperature averaged over one minute physiological signal per patient group over a period of −1.5 h before and 2.5 h around the Less Invasive Surfactant Administration (LISA) procedure.The graph is split into 8 time frames of half an hour.On the left vertical axis in black and using a solid line the saturation data is displayed for the two patient groups, CPAP-S (Success) in green and CPAP-F (Failure) in red.On the right vertical axis in blue and using the dotted line temperature is displayed for the two patient groups, Success in green and Failure in red.Below the monitoring data the median gestational age (GA) per time frame is displayed.Asterisks denote when the two groups differ significantly from one another within a half hour time frame.Finally, there are two timelines displayed which show the number of included patients in the half hour time frames per group, green for the Success group and red for the Failure group.(B) Shows the RMSSD which represents the general idea of the heart rate variability (HRV) signal during the study period.First the signal for the two groups over the study period split into half hour time frames, followed by the median GA and the number of patients included per time frame for each group.Asterisk displays when groups are significantly different from one another.For both figures a Mann-Whitney U test was used to determine the significance and differences were significant when p < 0.05.

Figure 4 .
Figure4.The two upper figures show the resulting receiver operator characteristic (ROC) curves for the machine learning experiments performed over half hour time frames using z-score feature scaling.The LISA procedure was performed at T , 0 negative time frames are prior to the procedure and positive time frames are after the procedure.The curves were created using the true and predicted classes using a standard python script.For all the outer folds, an area under the ROC curve was calculated which was averaged over all outer folds.For all outer folds also a standard deviation is calculated both presented in the legend following the time period.(A) Shows the results for the logistic regression, every experiment is denoted with a different color in the legend.Also denoted in the legend is the algorithm that was used to predict (LR), the time frame used for that experiment and the average and standard deviation of the area under the ROC curve (AUC) per experiment.Additionally, the diagonal blue dotted line shows the results of a random classifier being correct for 50% of the time.(B) Shows the results for the support vector machine (SVM), with every experiment denoted by a different color in the legend.Th legend also shows the algorithm used for the experiment (SVM), the time frame and the average and standard deviation of the AUC.Again the diagonal blue dotted line shows the results of a random classifier.The two lower figures show confusion matrices for time frame T 1 at a TPR of 0.8. (C) shows the confusion matrix for the LR and (D) shows the confusion matrix for the SVM.

Figure 5 .
Figure 5. Shows the feature relevance for the 20 highest ranking features used in a logistic regression experiment in half hour time frame T . 1 On the vertical axis the name of the feature is displayed and horizontally the median log odd ratios are shown.A negative ratio indicates prediction towards CPAP-Success and a positive ratio indicates a prediction toward CPAP-Failure.

Figure 6 .
Figure6.The two upper figures show the resulting receiver operator characteristic (ROC) curves for the machine learning experiments performed over half hour time frames using z-score feature scaling.Additionally, for both figures all features from the physiological data were included and the demographic features (GA and BW) were excluded.The LISA procedure was performed at T , 0 negative time frames are prior to the procedure and positive time frames are after the procedure.The curves were created using the true and predicted classes using a standard python script.For all the outer folds, an area under the ROC curve was calculated which was averaged over all outer folds.For all outer folds also a standard deviation is calculated both presented in the legend following the time period.(A) Shows the results for the logistic regression, every experiment is denoted with a different color in the legend.Also denoted in the legend is the algorithm that was used to predict (LR), the time frame used for that experiment and the average and standard deviation of the area under the ROC curve (AUC) per experiment.Additionally, the diagonal blue dotted line shows the results of a random classifier being correct for 50% of the time.(B) Shows the results for the support vector machine (SVM), with every experiment denoted by a different color in the legend.Th legend also shows the algorithm used for the experiment (SVM), the time frame and the average and standard deviation of the AUC.Again the diagonal blue dotted line shows the results of a random classifier.The two lower figures show confusion matrices for time frame T 1 at a TPR of 0.8.(C) shows the confusion matrix for the LR and (D) shows the confusion matrix for the SVM.

Table 1 .
Characteristics of the CPAP success (CPAP-S) and failure (CPAP-F) after less invasive surfactant administration (LISA) patient groups.Medians and inter quartile ranges (IQR) are shown for the patients.Asterisks indicate a p < 0.05.

Table 2 .
(Ahsan et al 2021)me and description of the used features in the machine learning model.Letter for the origin denotes if a feature was extracted from a waveform (W), monitor data (P) or the electronic medical record (E).Machine learningFor the ML experiments we used the features shown in table 2. Physiological signals tend to have different magnitudes of values which makes ML more difficult.For this we used a z-score for scaling, which was applied on the entire dataset before it entered the model(Ahsan et al 2021).To perform the experiments, we used two feature-based machine learning algorithms since both are proven methods in the medical setting(Mcadams etal 2022 Handelman et al 2018, Mangold et al 2021):