MLP-RL-CRD: diagnosis of cardiovascular risk in athletes using a reinforcement learning-based multilayer perceptron

Objective. Pre-participation medical screening of athletes is necessary to pinpoint individuals susceptible to cardiovascular events. Approach. The article presents a reinforcement learning (RL)-based multilayer perceptron, termed MLP-RL-CRD, designed to detect cardiovascular risk among athletes. The model underwent training using a publicized dataset that included the anthropological measurements (such as height and weight) and biomedical metrics (covering blood pressure and pulse rate) of 26 002 athletes. To address the data imbalance, a novel RL-based technique was adopted. The problem was framed as a series of sequential decisions in which an agent classified a received instance and received a reward at each level. To resolve the insensitivity to the initialization of conventional gradient-based learning methods, a mutual learning-based artificial bee colony (ML-ABC) was proposed. Main Results. The model outcomes were validated against positive (P) and negative (N) ECG findings that had been labeled by experts to signify individuals ‘at risk’ and ‘not at risk,’ respectively. The MLP-RL-CRD approach achieves superior outcomes (F-measure 87.4%; geometric mean 89.6%) compared with other deep models and traditional machine learning techniques. Optimal values for crucial parameters, including the reward function, were identified for the model based on experiments on the study dataset. Ablation studies, which omitted elements of the suggested model, affirmed the autonomous, positive, stepwise influence of these components on performing the model. Significance. This study introduces a novel, effective method for early cardiovascular risk detection in athletes, merging reinforcement learning and multilayer perceptrons, advancing medical screening and predictive healthcare. The results could have far-reaching implications for athlete health management and the broader field of predictive healthcare analytics.


Introduction
Among competitive athletes, intensive physical exertion can amplify underlying silent cardiovascular conditions, occasionally resulting in rare, life-threatening events, such as sudden cardiac death (Maron et al 2007, Wronka et al 2013, Hao et al 2022).This occurrence, especially in young and ostensibly healthy athletes, has burgeoned into a significant concern for sports medicine specialists and the lay public alike (Liu et al 2023a, Shan et al 2023).Pre-participation medical screening for athletes, which encompasses clinical history-taking and physical examination, variably includes electrocardiography (ECG) depending on jurisdictional recommendations (Mont et al 2017, Curry et al 2018, Yu et al 2022).ECGs can highlight abnormalities in cardiac rhythm, structure, and function, and may indicate systemic diseases (Li et al 2021, Tian et al 2022, Zhou et al 2022).Importantly, certain ECG patterns can signal conditions such as arrhythmogenic right ventricular cardiomyopathy, hypertrophic cardiomyopathy, or long QT syndrome, all of which are recognized to elevate the hazard of abrupt cardiac death (Zipes and Wellens 1998).A normal or negative (N) ECG typically clears an athlete for competitive training, albeit with a risk of overlooking certain cardiovascular abnormalities (false negative, FN, ECG) (Zhuang et al 2022a(Zhuang et al , 2022b)).Conversely, a suspicious or positive (P) ECG incites further, often costly, testing.Athletes are barred from active training until, and only if, subsequent confirmatory tests return normal results, negating the initial false positive (FP) ECG findings (Lu et al 2023).The accuracy of preparticipation ECG screening among athletes is palpably crucial as FN and FP ECGs pose risks of unforeseen cardiovascular events and unjustified anxiety, respectively (Zhu et al 2021, Wang et al 2022).
Because of adaptive changes brought about by long-term intensive training, which can manifest as changes in heart rate, rhythm, and morphology, it is challenging to differentiate between pathological and physiological patterns on the ECG (Sokunbi et al 2021, Zareiamand et al 2023).Manual ECG interpretation by physicians is the standard of care but can be costly, time-intensive and may not be easily accessible.Artificial intelligenceenable classification models trained on large datasets are increasingly being implemented in medical diagnostic systems to facilitate decision-making (Hannun et al 2019).These models reduce human biases and can often yield more reproducible results (Siontis et al 2021, Shen et al 2023).
Machine learning models typically employ fixed strategies for extracting features from input parameters, resulting in poor generalization ability, rising time, and low precision (Moravvej et al 2021b, Dang et al 2023, Han et al 2023, Sun et al 2023).Because of its stratified architecture, deep learning proficiently captures sophisticated features to yield better classification performance, and are increasing being used in numerous applications (Wang et al 2020, Soltani et al 2023, Zeng et al 2020, Moravvej et al 2022b).Multilayer perceptron (MLP) is an estimator that was initially developed for nonlinear XOR, and has subsequently been effectively employed to resolve combinatorial optimization issues (Moravvej et al 2021a, Hong et al 2023), finding applications in information processing, pattern recognition, image processing, classification, linear and nonlinear optimization, and real data prediction (Duraković et al 2011, Moravvej et al 2022a, 2023).MLP functions as a universal approximation where input signals propagate forward.The processing node related to a human neuron is the fundamental component of the artificial neural network method.Every processing node acquires a collection of input values and computes their sum, and then passes this sum through an activation function that determines the nodeʼs output value.In MLP, nodes comprise fully interconnected layers, with the exception that nodes within the same layer are not interconnected (Sartakhti et al 2021).
The merits of using MLP are manifold, particularly underscored by its adeptness in effectively navigating nonlinear problems and its aptitude in assimilating from encountered errors, refining prediction and classification accuracy in ensuing iterations.One of the pivotal advantages of MLP arises from its inherently structured layers that empower it to discern and make judicious decisions by meticulously evaluating diverse features.This can be exceptionally helpful in managing the intricate and variable data encountered in healthcare applications, where accurate and reliable model predictions are imperative for informed decision-making processes (Zhang et al).The adeptness of MLP at function approximation enables it to generalize data during training, making it possible to produce reasonable predictions or outputs from input data it has never seen before.This generalization capability, in combination with its capacity to handle non-linearities, amplifies its suitability for a plethora of applications, notably in scenarios dealing with high-dimensional data or instances where the relationship between variables is not explicitly known or straightforward.MLPs also allow for the change of weights during the training process through backpropagation, systematically minimizing the error between the predicted and actual outputs, which lends itself to continually refining the model.The network can be designed with multiple hidden layers and many neurons, providing ample flexibility and enabling the MLP to learn complex patterns and representations from the input data (Zhang et al).
Data imbalance in the training data can markedly undermine the efficacy of classification models (Soleimani et al 2023).This is especially pertinent for pre-participation ECG screening, where P cases are infrequent and outweighed by N observations (Panhuyzen-Goedkoop et al 2020, Marani et al 2023).Standard countermeasures against class imbalance comprise data-and algorithm-level methods.The former include under-sampling, oversampling, or an amalgamation of both techniques (Moravvej et al 2021a).With the latter, the algorithm assigns more weight to the minority class.Deep learning techniques like deep reinforcement learning (RL) can address data imbalance (Moravvej et al 2021c, Danaei et al 2022, Moravvej et al 2022c).It removes noisy data and identifies superior features using a reward function that differentiates between classes, either by penalizing minority classes more harshly or giving upon them more liberal rewards.
Population-based training is a method that optimizes the deep learning neural network by selecting the best solution from among a population of created models (Vakilian et al 2021a(Vakilian et al , 2021b)).Compared with traditional training methods, it is less likely to get stuck in local optima.Jaderberg et al (2017) applied population-based training to state-of-the-art models of deep RL, machine translation, and generative adversarial networks, and could show consistent improvements in accuracy, training time and stability compared with stochastic gradient descent.In Mousavirad et al (2021) and Karaboga et al (2007), effective training of the weights of neural networks was achieved using a differential evolution-based strategy and ABC, respectively.We previously published a method of a mutual learning-based artificial bee colony (ML-ABC), in which we improved upon the ABC algorithm by using mutual learning between two chosen position parameters instead of determining the candidate food source based on the one with the highest fitness.
In this study, we proposed a new classification model based on MLP, RL and ML-ABC, named MLP-RL-CRD, for pre-participation ECG screening in athletes, which we trained on an imbalanced dataset.In our MLP-RL-CRD model, the challenge of diagnosing cardiovascular risk was posed as a classification problem.Every row of clinical parameters in the dataset was input into the MLP model; the latter output then a binary prediction of whether the subjects are at risk (P) or not at risk (N) of cardiovascular events, which are analogous to the outcomes of P and N ECG findings, respectively, as annotated by experts.Specifically, the classification was defined as a RL guessing game modeled using a Markov decision process.Here, the environmental state was a row of dataset, and the agent was a deep MLP comprising several fully connected layers.Before the start of the game, we employed ML-ABC for the initialization of weights with the intent to identify a propitious region within the search space, initiating the backpropagation algorithm on the model in a potentially more fruitful area.For this, we considered the fitness knowledge accrued via mutual learning amongst the current and neighboring food sources, aiming to generate a food source of augmented value.Thereafter, the agent classified the patient as either at risk (P) or not at risk (N), and received a reward.A correct decision would garner a positive reward; and a wrong decision, a negative reward.To account for the dataset imbalance, the reward for the minority class was assigned a larger absolute value.We trained our MLP-RL-CRD model on a 26 002-athlete dataset comprising clinical data and expert-labelled P and N ECGs, and demonstrated that the MLP-RL-CRD method exhibits better performance (with an F-measure of 87.4% and a geometric mean of 89.6%) in contrast to other deep models and conventional machine learning approaches.Ablation studies were performed to assess the relative contributions of the RL and pre-training strategies.Finally, to explore ways to improve model performance, various alternative model component options, e.g.evolutionary algorithms, and loss functions, were tested and compared in a series of experiments on the same study dataset.
The rest of this paper is organized as follows.The MLP-RL-CRD model is explained in section 2, and the results presented and discussed in section 3. Section 4 outlines the conclusion and proposes directions for future work.

Proposed method
According to figure 1, the MLP-RL-CRD model incorporates MLP, ML-ABC, and RL.The rationale behind their selection is delineated as follows: • MLP: It excels in navigating nonlinear problems and enhancing prediction and classification accuracy by learning from errors across successive iterations.Its intrinsic layered structure enables MLP to judiciously evaluate and differentiate between diverse features, which is vital in managing complex and variable data encountered in healthcare applications.
• RL: Considering the significant issue of data imbalance in ECG screening, with positive (P) cases being less frequent, RL is used to regulate the learning process.It does so by specifically incentivizing the accurate classification of the minority class, ensuring that rare instances are effectively recognized and handled.
• ML-ABC: It helps to identify a promising starting point in the search space for the backpropagation algorithm in the model.This is pivotal for optimizing the training process and enhancing the predictive performance of the model.Incorporating mutual learning between position parameters enables the consistent generation of a food source with elevated value, leveraging information sharing and augmenting the search strategy.
In the following, we provide the background to the ABC algorithm and ML-ABC, as well as present our model architecture and explain the training process.
2.1.Preliminaries 2.1.1.Artificial bee colony algorithm ABC (Karaboga and Basturk 2007) is an optimization technique that simulates honey bee foraging behavior.The four elements of the ABC algorithm are employed bees, onlookers, scouts, and food sources.Employed bees search for a new food source close to the initialized target space before returning to the hive and informing the onlookers.Observers identify a food source by estimating the likelihood of its existence and recruiting to a nectar source.The employed bee associated with the source changes into a scout and starts randomly seeking an alternative food source when the source becomes unusable.
In equation (1), a new reformation position was established using the location of the employed bee.If the nectar quality at the new position surpassed that of the preceding location, a bee would keep the new position, forsaking the old one.Conversely, if not, the bee would maintain its prior position.
where i is the ith place and each solution s i possesses a size of D. D represents the number of parameters awaiting optimization, while k denotes a random solution (k ≠ i).i j j represents a number randomly selected from the interval [0, 1].By modifying a single element of of s i , the potentially innovative solution v i may be conceived.

Mutual learning-based ABC
In a method optimized across D dimensions, one dimension was altered in value through a random selection process, and an enhanced solution was identified during each iterative cycle, grounded in its fitness value.Equation (1) indicates that the recently planned solution, v i j , is only reliant on two variables, s i j and s i k , a condition that makes the food source v i j both unpredictable and subject to variability.Our approach incorporated fitness knowledge gained via reciprocal learning between the present and neighboring food sources.This was undertaken with the intention of generating a food source possessing an elevated value, stemming from the necessity within the ABC algorithm for a food source that boasts a superior fitness value.
where Fit i and Fit k represent the fitness values of adjacent and current food sources, respectively.The symbol j i j indicates a random integer uniformly distributed within the interval [0, F], where F > 0 functions as the mutual learning factor.The fitness values of emerging solutions transitioned towards more optimal food sources through a comparison between existing and proximate food sources.Should the prevailing food sources prove to be more suitable, the candidate solution would navigate towards the more favorable solution; alternatively, it would gravitate towards the adjoining source in other instances.The parameter F played a pivotal role in mitigating perturbations among the food positions; ensuring that F is a non-negative positive number would guarantee a superior resultant solution.As F ascends from zero to a particular positive value, the perturbation on the respective point diminishes, suggesting that the alternate food sourceʼs fitness value approximates that of the higher fitness value.An elevated value of F diminishes the potency of both exploration and exploitation.

Model architecture 2.3. Pre-training
In this phase, the MLP weights were established by employing ML-ABC.As depicted in figure 2, we organized all weight and bias terms into a vector, constituting a potential solution in the Ml-ABC algorithm.
To compute the excellence of a potential solution, we establish our fitness function as: where N represents the total count of training instances, y i and y i ˜signify the ith target and the correspondingly predicted output of the model, respectively.

Deep Q-network training
We employed an algorithm based on RL to tackle the imbalance issue emerging from disparate data volumes within our research dataset.In this framework, every sample in the training dataset constituted a state within the environment, while the network functioned as the agent, executing a series of classifications across all pairs.When the agent projected the class label of a pair, it was engaging in an action: the pair observed at the tth timestep was identified as the state s t , and the classification enacted was a t .Subsequently, the environment reciprocated by providing a reward, r t , to navigate the agentʼs actions.Reward values were orchestrated in such a manner that categorizing a sample from the majority class would yield a lower absolute value in contrast to that from the minority class.The reward function is expressed as: where D P and D N are the means of the minority and majority classes, respectively.Correct/incorrect classification of a sample from the majority class would yield a reward of +λ/ − λ, where 0 < λ < 1.The agentʼs aim in deep Q-learning was action selection, such that the sum of discounted future rewards (R t ) was maximized:  where γ is the discount factor, r t¢ is the immediate reward at time step t¢, and T is the last time-step of the episode.Using γ, more importance was given to rewards in the near future (closer to the current time step t) compared with distant future.Each episode was ended if all the samples had been classified correctly, or at least one sample from the minority class was misclassified.The expected return of taking action a in state s at time step t, and following policy π afterward, is computed as: s a s t t t , , ,...
where Q π (s, a) is called the action-value function.At each state s, the optimal action is the one that maximizes the action-value function: s a s t t t , , ,...
where maximization is taken over all policies.The recursive form of equation ( 5) can be written as: The best action-value function can be estimated iteratively using the Bellman equation: During training, upon observing state s, the policy network outputs action a.After executing this action, the environment returns a reward r, and the next state becomes s¢.The tuple s a r s , , , ( ) ¢ is then saved into the replay memory M. Minibatches B of these tuples are drawn randomly from the replay memory which are used to update the network parameters via gradient descent.The update is done based on the following loss function: ( ) where θ i is the network parameters at ith training iteration and y is the estimated target for the Q function.The desired target y is equal to the immediate reward for the state-action pair, plus the discounted maximum future Q value: .
For terminal states, y is equal to r.At ith iteration, the gradient of the loss function is calculated as follows: The network weights are updated using the gradient of loss function computed in equation (6): where α is the learning rate.The pseudocode of our proposed method is shown in algorithm 1.
1: Initialize network weights 0 q by pre-training the network on training data using the ML-ABC algorithm 2: Initialize replay memory M; 3: Initialize action-value function Q with weights 0 q 4: Initialize target action-value function Q ˆwith weights 13: Makea gradient descent step on y Q s a , ; with respect to network weights θ 14: end for 15: end for 3. Experimental results

Dataset
The database assembled contained the pre-participation medical records of 26 002 athletes, capturing a diverse array of individual health metrics.These parameters, encompassing height, weight, resting pulse rate, as well as diastolic and systolic pressure, were meticulously documented alongside ECG results to facilitate a comprehensive dataset.Moreover, a critical evaluation was conducted based on P and N ECG findings, which were instrumental in categorizing the subjects.Through this evaluation, they were discerned to be at risk, which included 6507 samples, or not at risk, encompassing 633 samples, respectively (Barbieri et al 2020).We allocated 70% of the samples (10 200 samples) for training and reserved the rest for validation.

Metrics
In the assessment of the classification efficacy of the devised model, we employed five fundamental performance metrics, specifically accuracy, recall, precision, F-measure, and G-means, each serving a pivotal role in evaluating distinct aspects of model performance (Moravvej et al 2022c).These metrics are defined as follows: TP, TN, FN, and FP denote true positive, true negative, false negative, and false positive outcomes, respectively.The F-measure and G-means serve as prevalent metrics used for assessing imbalanced classification scenarios (Danaei et al 2022).

Model performance
The proposed model employed a 64-bit Windows operating system with 64 GB of RAM and a 64 GB GPU.After 262 epochs, the best model was obtained.The total training time for the dataset was one and a half hours.
The  (Myles et al 2004).In addition, two ablation iterations of the proposed model, i.e. base model + random weights (which possessed a base architecture similar to our model but used random weights instead for initialization) and base model + random weights + RL (which used RL for classification), were compared against the full model.Standard performance metrics were evaluated (table 1).The MLP-RL-CRD model attained the best results, reducing the F-measure and geometric means errors by 59% and 34%, respectively, compared with the next best performing model, decision tree.Compared with base model + random weights and base model + random weights + RL, MLP-RL-CRD decreased the error rates by about 70%, underscoring the important contributions of the ML-ABC and RL components to model performance.
Table 2 displays the results of the MLP-RL-CRD model when applied separately to male and female groups.A detailed analysis of the results shows that the performance metrics, including accuracy, recall, precision, F-measure, and G-means, demonstrate remarkable consistency across both groups.For the male group, the model achieved an accuracy of 0.885, while for the female group, the accuracy was slightly lower at 0.882.The recall rates, at 0.900 for males and 0.897 for females, suggest that the model is equally effective in identifying true positive cases in both groups.The precision scores, being nearly identical for both groups (0.875 for males and 0.876 for females), indicate the modelʼs consistent capacity to classify cases across sexes.The F-measure, which balances precision and recall, is 0.875 for males and 0.876 for females, further emphasizing the modelʼs balanced performance.Similarly, the G-means, which accounts for performance in both the positive and negative classes, closely aligns between the two groups, being 0.900 for males and 0.898 for females.Intriguingly, when the MLP-RL-CRD model applies to both groups together, the results are not markedly different from those obtained for the individual groups.This consistency across the individual and combined groups implies the model does not show a bias toward any specific sex, maintaining a high level of accuracy and reliability regardless of the sex of the individuals in the dataset.

Impact of other metaheuristics
In the next experiment, we compared our improved ABC algorithm with a number of optimization algorithms that used different metaheuristics to obtain the initial model parameters, while keeping the other model components constant: standard artificial bee colony, differential evolution (Price 2013), firefly algorithm (Yang 2010b), bat algorithm (Yang 2010a), cuckoo optimization algorithm (Yang and Deb 2009), and grey wolf optimization (Mirjalili et al 2014).The obtained results are reported in table 3.As can be seen, the proposed ABC reduced the error by about 18% compared with standard ABC; which in turn outperformed the other algorithms.

Impact of the reward function
We distributed incentives of ±1 for the accurate/misguided classifications pertaining to the majority category, and similarly, ±λ incentives were given for correct/erroneous categorizations linked to the minority category.The variable λ was contingent upon the comparative ratios of majority to minority instances: the ideal figure for λ was expected to diminish as this proportion escalated.In order to assess the effect of λ, the operational efficiency of the MLP-RL-CRD model is appraised with λ being initialized from a value within the collection of progressively increasing values {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}, all the while maintaining the majority class reward unvarying (figure 3).When λ = 0, the influence of the majority class was nullified; and when λ = 1, the relative contributions of both majority and minor classes were made equivalent.The efficacy of the model reached its zenith at a λ figure of 0.4 (ascending from 0 to 0.4; and descending from 0.4 to 1) across all evaluation standards.Although it is necessary to diminish the influence of the majority class by modifying λ, an excessively low figure might impair the holistic performance of the model.

Impact of parameter F on the model
As previously described, F acts as a mutual learning factor, and i j j in equation (2) can range from 0 to F. To assess the performance of the algorithms in the proposed method, F is assigned values from the following collection: 0.5, 1, 1.5, 2, 2.5, 3.5, 4, 4.5, and 5.The outcomes of different configurations are illustrated in figure 4. As is clear, the algorithmʼs effectiveness is enhanced as F escalates from 0.5 to 3.5.The efficacy of the approach diminishes as F escalates from 3.5 to 5.This observation underscores that both a lower or higher value of F can adversely affect the performance of the algorithm.

Exploring the number of MLP layers
Increasing the number of layers in MLP would yield a more complex model but increases the risk of over-fitting.The number of layers with an insufficient number of layers would lack the flexibility to represent the salient features in the training data.To study the effect of several layers in MLP of the proposed approach, six values 1, 2, 4, 8, 10, 12 are tested as the number of layers in the MLP.The achieved results have been depicted in table 4. As seen, when the number of layers took values from 1 to 4, the results exhibited a descending trend; and for values from 4 to 12, an ascending trend.Therefore, the best value for the number of layers in the MLP was 4.

Impact of loss function
Class imbalance also is managed using traditional techniques, such as adjusting data enhancement processes and loss function manipulations.Among these strategies, the loss function potentially takes on a more crucial role, as it accentuates the significance of the minority class.We applied five functions, namely weighted cross-entropy (Özdemir and Sönmez 2020), balanced cross-entropy (Huang et al 2020), dice loss (Li et al 2019), Tversky loss  (Salehi et al 2017), and focal loss (Lin et al 2017), to examine the efficiency of the loss functions within the suggested model.In the balanced cross-entropy and weighted cross-entropy loss methods, equal importance is attributed to both positive and negative instances.The focal loss method diminishes the influence of straightforward instances, facilitating the model to focus more intensely on the comprehension of intricate samples.The results are given in table 5. Overall, the reductions of error with focal loss compared with Tversky loss were about 19% and 15% for accuracy and F-measure, respectively.However, the focal loss function performed 60% worse than RL.

Discussion
The paper presented an innovative RL-based multilayer perceptron model for cardiovascular risk detection among athletes, contributing significantly to the domain of preventive healthcare in sports.The studyʼs key strength lies in its unique application of RL to address the data imbalance issue.The technique transformed the problem into a sequential decision-making process where the agent classified instances, receiving a reward at each level.Notably, this provided a different way to handle data imbalance, shifting the modelʼs attention more towards the minority class by offering a higher reward.Introducing the ML-ABC in training, the model further strengthened the studyʼs novelty.Conventional gradient-based learning methods are often insensitive to initialization, a shortcoming addressed in this approach.The candidate food source in the ML-ABC process evolved from being determined merely by individual fitness to a mutual learning factor.This change helped optimize the initial weight consideration, which could be a potential game-changer in training machine learning models.The Experiments meticulously were conducted to derive optimal values for key parameters, including the reward function.The conducted ablation studies independently corroborated the positive incremental impact of the model components, bolstering the studyʼs reliability.
The dataset primarily used for this model represents a distinct demographic group: athletes.While this is appropriate for the modelʼs intended use, it may inadvertently constrain its predictive accuracy when applied to different population groups or clinical contexts.This limitation arises from the unique characteristics of each population, which can substantially affect health outcomes, including the risk of cardiovascular events.Diversity in the dataset, in terms of geographical locations, ethnic backgrounds, age groups, fitness levels, and lifestyle factors, is crucial to create a genuinely robust model.Each of these elements can subtly impact cardiovascular health, implying that models trained on more heterogeneous datasets are likely to display more reliable performance across diverse scenarios.To enhance the modelʼs accuracy and generalizability, future research should aim to train and validate the model using multiple datasets.Ideally, these datasets should be derived from varied regions or populations, capturing a wide spectrum of demographic and biomedical variables.For instance, these datasets could be sourced from non-athlete populations, older age demographics, or individuals with pre-existing health conditions, among others.Incorporating data from a variety of regions and populations will allow us to test the modelʼs performance across a broad array of conditions and settings.This approach will help identify potential biases or limitations within the model, enabling timely refinements.Using multiple datasets for model validation will provide an opportunity to measure its performance across various data strata.This evaluation will yield additional insights into the modelʼs predictive capacity and reliability in different contexts.
Indeed, the research was primarily confined to using anthropometric, demographic, and biomedical data as predictive factors for cardiovascular risks.This provides a limited view of the many complex elements that influence cardiovascular health.Many other potentially significant factors, like genetic markers, lifestyle habits (such as diet, physical activity, accentuate levels, and smoking), as well as detailed medical histories, could also play critical roles in determining an individualʼs cardiovascular risk profile.Genetic markers, for example, have been shown to have a significant influence on cardiovascular health.Certain genetic mutations or polymorphisms could predispose individuals to higher risk, and their inclusion in the model could enhance its predictive capacity.Lifestyle factors like diet and exercise are well-known influencers of heart health, and data on these could provide valuable insights into an individualʼs day-to-day risk management.A comprehensive medical history, including past cardiovascular events, medication usage, and other comorbidities, could also reveal valuable context for determining future cardiovascular risks.Including these diverse data types could not only improve the modelʼs predictive accuracy but also enhance its clinical utility by providing a more holistic view of each individualʼs health status and risk profile.This could allow for more personalized and effective interventions, ultimately leading to better health outcomes.Future research should, therefore, assess the feasibility and effectiveness of integrating these additional data types into the model.This could involve overcoming challenges related to data collection and privacy, dealing with increased model complexity, and ensuring the additional data improves the modelʼs performance without over-fitting.If successful, integrating such diverse data types could significantly augment the predictive power and clinical relevance of the MLP-RL-CRD model.This could ultimately pave the way for more precise and personalized strategies for preventing and managing cardiovascular risks among athletes.
The research conducted in this study is underpinned by certain hypotheses about the reward function, specifically regarding the relative significance of the majority and minority classes.These assumptions affect how the model prioritizes and responds to each class of data.Thus, the real-world performance of the model could be influenced by these suppositions, potentially limiting its effectiveness or accuracy in certain situations.Interpreting the real-world implications of these classifications might differ based on additional data or expert medical guidance.For instance, while it makes sense to prioritize the minority class (individuals at risk of cardiovascular events) in a medical context to ensure their timely detection, this may not always be the case.The significance of each class could vary depending on various factors such as the overall population health, prevalence of cardiovascular diseases in the community, the health systemʼs capacity to handle potential cases, among others.It is important to note that these assumptions could also potentially introduce bias into the model, impacting its ability to predict cardiovascular risks.For instance, if the model is too heavily focused on the minority class, it might over-predict the risk in certain individuals, leading to false positives.If the model undervalues the majority class, it could lead to false negatives, potentially missing individuals who might be at risk.Therefore, there may be a need to fine-tune these assumptions based on additional data gathered from diverse sources and populations.Incorporating expert medical advice would also be crucial in ensuring the reward function accurately reflects the complexities and nuances of cardiovascular risk detection.A more flexible reward function, capable of being adjusted based on these factors, might enhance the modelʼs practicality and applicability in diverse real-world scenarios.Future studies should consider these aspects for further refinement and improvement of the model.
In future research, a pivotal area of focus will extend our RL-based multilayer perceptron model to encompass a broader range of ECG analysis applications.The field has witnessed significant advancements, particularly in the realms of machine learning and artificial intelligence, which have paved new pathways for the diagnosis and prediction of various cardiac conditions (Liu et al 2023b, Lueken et al 2023).Subsequent versions of our model will aim to assimilate these advancements, evaluating their potential to augment both the accuracy and efficiency of cardiovascular risk assessments.This expansion may include integrating methods for identifying arrhythmias, ischemic heart disease, and other cardiac irregularities, which pose diagnostic challenges, especially in athletic health.Further, incorporating real-time data analysis capabilities into our model marks an essential progression.The burgeoning availability of wearable technologies and mobile health devices presents an opportunity to refine our model for ongoing ECG monitoring.Such a development would facilitate not only the prompt detection of potential cardiac events but also offer a window into the sustained cardiovascular health of individuals, with a particular emphasis on athletes who experience considerable physical exertion.This advancement would require the creation of algorithms adept at processing substantial data volumes efficiently, yet with sustained precision in risk prediction.Moreover, the application of advanced deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), could be explored to enhance the modelʼs ability to learn complex patterns in ECG data (Jin et al 2024).These techniques have shown promise in other areas of medical imaging and might offer more nuanced interpretations of ECG signals than traditional methods.Additionally, incorporating transfer learning approaches could allow the model to leverage pre-trained networks, significantly reducing the time and data required for training while still achieving high levels of accuracy.

Conclusion
The article describes a model based on multilayer perceptron, which uses reinforcement learning to identify athletes at risk for cardiovascular disease.The problem was constructed as a series of sequential decisions in which an agent received a reward at each level for classifying a received instance.To address data class imbalance, the majority class was assigned to receive a lesser reward than the minority class.To obviate drawbacks of standard gradient-based learning techniques like backpropagation during the training phase, which include initialization sensitivity, we proposed using a mutual learning-based ABC to determine the initial weights: the proposed food source produced with superior fitness between two entities was identified by a reciprocal learning parameter.The MLP-RL-CRD technique shows enhanced efficacy, exhibiting an F-measure of 87.4% and a geometric mean of 89.6%, surpassing the performance of other methods.Experiments on the study dataset are used to optimize crucial model parameters, such as the reward function.Ablation studies that excluded components of the proposed full model confirmed the independent positive incremental impact of these components on model performance.
Our model was successful in discriminating athletes at risk and not at risk for cardiovascular events based on clinical parameters, under minority P and majority N expert-assessed ECG findings, respectively.It can thus be feasibly incorporated into athlete screening programs as a cost-effective adjunctive tool for high-throughput triage of pre-participation ECG.In future works, the proposed algorithm can be trained and tested on more diverse populations, including non-athletic and older populations.

Figure 2 .
Figure 2. Encoding strategy in our proposed algorithm.

Figure 3 .
Figure 3.The performance indicators of the MLP-RL-CRD model are graphed in relation to the value of λ in the reward function.

Figure 4 .
Figure 4.The performance indicators of the MLP-RL-CRD model are graphed in relation to the value of F in the proposed model.

Table 2 .
Performance metrics of the MLP-RL-CRD model across male and female groups.

Table 1 .
Results of various classification algorithms.

Table 5 .
Results of different loss functions.

Table 4 .
The performance metrics plotted versus the different number of layer in MLP.