Enhanced classification of left ventricular hypertrophy in cardiac patients using extended Siamese CNN

Objective. Left ventricular hypertrophy (LVH) is the thickening of the left ventricle wall of the heart. The objective of this study is to develop a novel approach for the accurate assessment of LVH) severity, addressing the limitations of traditional manual grading systems. Approach. We propose the Multi-purpose Siamese Weighted Euclidean Distance Model (MSWED), which utilizes convolutional Siamese neural networks and zero-shot/few-shot learning techniques. Unlike traditional methods, our model introduces a cutoff distance-based approach for zero-shot learning, enhancing accuracy. We also incorporate a weighted Euclidean distance targeting informative regions within echocardiograms. Main results. We collected comprehensive datasets labeled by experienced echocardiographers, including Normal heart and various levels of LVH severity. Our model outperforms existing techniques, demonstrating significant precision enhancement, with improvements of up to 13% for zero-shot and few-shot learning approaches. Significance. Accurate assessment of LVH severity is crucial for clinical prognosis and treatment decisions. Our proposed MSWED model offers a more reliable and efficient solution compared to traditional grading systems, reducing subjectivity and errors while providing enhanced precision in severity classification.


Introduction
The assessment of disease severity and changes over time in medical images is of utmost importance, particularly in conditions such as left ventricular hypertrophy (LVH).LVH severity or grading plays a vital role in understanding disease progression and response to treatment.Echocardiography is the preferred diagnostic test for assessing LVH, surpassing the sensitivity of ECG (Lo et al 2022, Thanaraj et al 2023).It not only aids in diagnosing LVH but also identifies other abnormalities such as left ventricular dysfunction (both systolic and diastolic) and valvular heart disease (Karim 2022).By measuring the left ventricular end-diastolic diameter, posterior wall thickness, and interventricular septum thickness, echocardiography test provides essential parameters for determining the left ventricle mass index (LVMI).According to the American Society of Echocardiography and the European Association of Cardiovascular Imaging, LVH is defined as an elevated LVMI greater than 95 g m −1 in women and greater than 115 g m −1 in men (Bornstein et al 2023).The table 1 outlines guidelines for grading of LVH based on left ventricular (LV) septal wall thickness.It categorizes LVH severity for both women and men, providing specific measurement ranges for Mild, Moderate, and Severe LVH (Devereux andReichek 1977, Bornstein et al 2023).
There are several challenges faced by echocardiographers while grading the LVH.Assessing the severity of LVH is of paramount clinical significance, impacting both prognosis and treatment decisions, given that LVH severity grades are strongly associated with the risk of cardiovascular events.However, it is worth noting that this assessment can be particularly challenging, as it involves manual measurements and intricate

LVH Severity Women Men
Mild LVH 1.0 cm-1.2 cm 1.1 cm-1.3 cm Moderate LVH 1.3 cm-1.5 cm 1.4 cm-1.5 cm Severe LVH >1.6 cm >1.7 cm procedures, adding complexity to the evaluation process (Alkema et al 2016).The interpretation of these classes by domain experts often varies, leading to inconsistencies in clinical management.Moreover, the burgeoning issue of cardiologist burnout due to excessive workloads poses a significant risk of misdiagnosis, amplifying patient safety concerns (Quinn et al 2017, Panagioti et al 2018).Overwhelmed by demanding schedules and increased patient loads, cardiologists may face heightened challenges in maintaining diagnostic accuracy, potentially leading to errors in cardiovascular assessments.
Simultaneously, the obstacles faced by deep learning methods, such as the requirement for extensive and diverse data for effective training and the difficulty in assembling datasets for conditions like Severe LVH, further accentuate the need for comprehensive solutions in addressing both the human and technological aspects of cardiovascular care.All these factors highlight the need for standardized and automated approaches to enhance the accuracy and reproducibility of LVH grading, ensuring more consistent and reliable patient care.
To address the challenges mentioned above, we introduced a Siamese-based approach that leverages both few-shot and zero-shot learning techniques.These approaches have demonstrated promising outcomes in scenarios with limited data availability (Farhad et al 2023).By employing the Siamese architecture, our model can learn from a small set of examples and even generalize to recognize previously unseen cases.One key advantage of this approach is that it relies on calculating the Euclidean distance between feature representations.This distance metric provides valuable insights into the grading of LVH.By measuring the similarity between input data and learned prototypes, our model can make accurate predictions about the severity of LVH, even when confronted with scarce data, thereby offering a potential solution to the data scarcity challenge.
Our research work makes several contributions.Firstly, we have curated two datasets that include echocardiograms images representing both normal hearts (without LVH) and varying degrees of LVH severity, ranging from Mild to Moderate and Severe cases.Secondly, as far as our knowledge extends, we are the first to attempt LVH classification based on severity using echocardiography images.We have utilized a Multi-purpose Siamese-based approach to classify cardiac condition (i.e.: Normal, Mild LVH, Moderate LVH or Severe LVH), with limited labeled data by leveraging the power of both few-shot and zero-shot learning simultaneously.Thirdly, as traditional zero-shot learning relies on textual features, we introduce a novel generalized zero-shot learning approach for LVH grading.This approach proves particularly advantageous in domains where textual vectors typically used for image description are unavailable.Lastly, we have introduced a weighted Euclidean distance technique to specifically targets the most informative regions within the images.This approach directs our model's focus to areas that contain crucial information for LVH grading, thereby enhancing the accuracy of our system.

Literature review
There are various studies conducted to measure the size of left ventricle, classification of LV and predicting the cause of the LVH.To the best of our knowledge, there is no attempt of classifying the LVH based on its severity using computer vision methods.We have divided this section into three subsections as given below.Madani et al (2018) addressed the detection of LVH from echocardiography by employing UNet segmentation.They conducted a study to find the optimal trade-off between accuracy and time, considering image resolution.Additionally, they utilized a generative adversarial network to generate echo images with similar characteristics.Their dataset consisted of 462 LVH images and 1807 normal cardiac images, achieving a classification accuracy of 91.21% on the test images.

Techniques applied for classification of LVH
In a different approach, Deng et al (2017) explored data-mining techniques to propose a novel method for selecting ECG diagnostic criteria.Their dataset comprised 549 records, and their focus was on extracting features such as the peak amplitudes of Q, R, and S. Lastly, Jian et al (2020) presented a CNN-based approach for LVH diagnosis, which integrated a CNN with edge-detection algorithms and non-local mean filtering.They conducted experiments on a dataset containing 600 images, and the relative error between their measurements and those obtained from the hospital was below 15%.Another study Letz (2023) investigates the reliability of automated electrocardiogram measurements in assessing LVH compared to manual annotations in patients with end-stage kidney disease.The study analyzes three ECG-based LVH parameters and compares them between automatic and manual measurements in 301 ESKD patients undergoing hemodialysis.Results show minimal differences between automatic and manual measurements, indicating that automated algorithms can be as reliable as manual measurements for LVH parameter assessment and risk prediction in this patient population.Leclerc (2019) introduced the CAMUS dataset, a publicly available collection of echocardiography data from 500 subjects.Their study focused on LV endocardium and myocardium segmentation, exploring deep-learning and non-deep-learning techniques.Their findings indicated that the UNet-2 architecture achieved the highest accuracy.Other researcher (Madani et al 2018, Kim et al 2021, Chen et al 2023) have also utilized CAMUS dataset for LV segmentation.In another study Ali et al (2022), three state-of-the-art convolutional neural network architectures (SegNet, Fully Convolutional Network, and Mask Region-Based Convolutional Neural Network) were developed and evaluated for LV segmentation.Among the models, Mask R-CNN demonstrated superior performance in accuracy, precision, recall, specificity, Jaccard index, and dice similarity coefficients.

Techniques applied for predicting causes of LVH
The study by Duffy et al (2022) aimed to assess the accuracy of a deep learning workflow in quantifying ventricular hypertrophy and predicting the cause of increased LV wall thickness.The cohort study included data from multiple medical centers and used echocardiogram videos for training and testing the deep learning algorithm.Another study conducted was by Yu et al (2022), in which they collected 1610 transthoracic echocardiograms, including patients with hypertensive heart disease (HHD), hypertrophic cardiomyopathy (HCM), cardiac amyloidosis (CA), and controls.They developed a framework using ResNet and Unet++ for view classification, LVH detection, and etiology identification.The models achieved high performance, with the view classification network achieving an AUC of 1.0, the LVH detection network achieving an AUC of 0.98 and an accuracy of 92.4%, and the etiology classification network achieving AUCs ranging from 0.88 to 0.94 and an accuracy of 75.7%.In another study, Hwang (2022).gathered data from 930 subjects, including patients with HHD, HCM, Anomalous left coronary artery (ALCA), and normal subjects.A total of 4650 echocardiograms were collected from 5 standard echocardiographic views.The achieved AUC values were 0.96 for HHD, 0.98 for HCM, and 0.99 for ALCA.

Application of Siamese networks in various cardiac modalities
Several studies have investigated the utilization of Siamese networks across diverse cardiac modalities.In their work, Ivanciu et al (2021) implemented a Siamese network tailored for ECG-based biometric systems.Their dataset comprised 90 subjects, and they achieved an accuracy of 90%.Another approach was presented by Patanè and Kwiatkowska (2019), who proposed a model for excitement recognition from ECG signals using a combination of CNN, recurrent NN, and a Siamese network.Their evaluation on a publicly available dataset, which included nine subjects, yielded an area under the ROC curve of 0.8.Focusing on the analysis of the left ventricular myocardium, Tang et al (2019) introduced a Siamese network with high-level shared parameters.Their dataset incorporated MRI samples from 33 patients, resulting in an attained Dice score of 87.78.In a different application, Dezaki et al (2021) employed one-shot learning for cardiac view synchronization in echocardiography.Using 1996 4CH and 2CH pairs, they reported mean absolute errors of 4.0 and 3.1 for end-systolic and end-diastolic cardiac phases, respectively.
Unlike all of the above research work, our proposed cut-off distance approach and weighted Euclidean distance for zero-shot learning and few-shot learning provides a significant advantage in accurately classifying and grading LVH, particularly for the cases where labeled data is scarce.

Methodology
In this section, we have presented an overview of the dataset used, described the proposed model, and discussed the applied techniques for conducting this research study.The overall architecture of the proposed methodology is given in figure 1.

Datasets
The investigation focused on male patients, and data were collected from two hospitals situated in Hyderabad, Sindh, Pakistan.The datasets comprise echocardiograms obtained from patients aged between 45 and 65.These echocardiograms represent various cardiac conditions, including Normal, Mild, Moderate, and Severe LVH conditions.Explaining the male population of Sindh involves considering various factors such as body type, health conditions, and working habits.Sindh, as a province in Pakistan, is marked by a diverse population, and its male demographic reflects a range of body types influenced by genetic, cultural, and lifestyle factors.While individual variations exist, it is common to observe a mix of body types, from lean to more robust, owing to a combination of genetic predispositions and dietary practices.
The health condition of the male population in Sindh is intricately linked to diverse factors, including lifestyle choices, access to healthcare, and the prevalent medical conditions prevalent in both urban and economically disadvantaged areas.Working habits among Sindh's male population exhibit variations influenced by urbanization, occupation, and economic factors.In urban areas, where one of the hospitals is located, professionals may often be engaged in sedentary work, potentially impacting physical activity levels and contributing to specific health challenges.Conversely, in economically disadvantaged and rural areas, where the other hospital serves the local community, working habits may involve physically demanding occupations, thereby influencing body type and overall health profiles.The distinct socio-economic environments of these locations contribute to unique health characteristics within the male population.
The echocardiography data used in this study were gathered between March and December 2022.In the first hospital, the echocardiography data analysis revealed the following findings.In the 2 Chamber view, there were 22 images classified as Mild LVH and 31 images classified as Normal.Moving to the 4 Chamber view, there were 14 images classified as Mild LVH, 50 images classified as Normal, and 2 image classified as Moderate LVH.At the second hospital, the echocardiography data analysis revealed the following findings.In the 2 Chamber view, there were 9 images classified as Moderate LVH, 28 images classified as Mild LVH, and 1 image classified as Severe LVH.Similarly, in the 4 Chamber and PLAX views, there were 9 images classified as Moderate LVH, 28 images classified as Mild LVH, and 1 image classified as Severe LVH.The example of echocardiography frames and three views are shown in figure 1(a).All procedures were performed in compliance with relevant laws and institutional guidelines and have been approved by the appropriate institutional committee.An informed consent was obtained for experimentation with human subjects.The privacy rights of human subjects were observed.

Proposed approach
The proposed approach is divided into following steps.

Division of dataset images into two parts
In the context of LVH severity classification, the rationale behind our approach lies in the insight gained through discussions with echocardiographers.Our collaborative discussions led us to the understading that a specific region within echocardiograms contains crucial information related to the LVH condition (Celebi et al 2010).To capitalize on this finding, we devised a strategy to enhance the focus on relevant areas within the images.
In our methodology, we partition each image into two distinct sub-images, as illustrated in figure 1(b).The initial step involves identifying the lowest coordinates corresponding to the LV in all images.Subsequently, by discerning the highest coordinates among these identified lowest points, we establish what we refer to as the maximum point.This maximum point serves as a critical reference for the subsequent division of all images.The significance of this reference lies in its ability to encompass the lowest LV point in every image, thereby ensuring that our approach includes the maximum relevant information in the analysis.This meticulous division strategy is designed to enhance the accuracy and comprehensiveness of LVH severity classification by systematically incorporating the critical features associated with the condition.Let (x i , y i ) represent the lowest point of the LV in the ith image.We follow these steps: • Identify the maximum y-coordinate among the lowest LV points across all images: (1) • The upper portion of the image is defined as the region above the dividing point: (2) • The lower portion of the image is defined as the region below the dividing point: By following these steps, we can divide the images into upper and lower portions.
In the context of the complete dataset, D U represents the collection of images corresponding to the upper segments, while D L represents the collection of images corresponding to the lower segments in training data.Whereas, d U represents the collection of images corresponding to the upper segments, while d L represents the collection of images corresponding to the lower segments in testing data.The complete training (T)and test dataset (V) can be represented as follows: (5)

Training of the proposed MSWED
Following the division of the image into two halves, each half-image undergoes processing through a dedicated feature extraction component, illustrated in figures 1(c) and (d).The operational principle of Siamese neural networks involves taking these divided image halves as inputs and passing them through identical feature extraction components.The input to the component is an image of 80 × 80 dimension.The feature extraction component consists of two convolutional layers, each of which is followed by a max-pooling layer with pool size (2, 2).We use rectified linear unit activation, and the kernel size is (3, 3).With 512 nodes in the output layer, the feature vector for each image comprises 512 features, and we have employed Contrastive loss as part of the training of the model.If both the images are from 4CH or 2CH then the model will consider them as a similar pair.On the other hand, a pair where one image is from 4CH and the other image is from 2CH is taken as dissimilar pair by the model.The architectural choices for the feature extraction component in our model, including the number of layers, filter size, number of output nodes and other parameters, were informed by a series of random grid experiments.Furthermore, these specifications align with successful configurations utilized in prior research paper (Farhad et al 2023).
The first convolutional layer in the feature extraction component is responsible for detecting simple patterns and edges within the divided image halves.It acts as the initial feature extractor, identifying basic image features.Subsequently, the output from the first convolutional layer is passed to the second convolutional layer, which operates at a higher level of abstraction.Here, more complex patterns and LVH-specific features are detected.These convolutional layers collectively contribute to the extraction of increasingly meaningful representations of the input image, which are essential for accurate LVH classification.Lastly, we incorporate a flatten layer, which reshapes the output from the convolutional and pooling layers into a one-dimensional feature vector.
This step was imperative because it allowed us to obtain feature vectors from the images, a crucial aspect of our LVH classification approach.These feature vectors serve as a condensed representation of the input images, capturing essential information for LVH assessment.The significance of these feature vectors lies in their role in calculating the Euclidean distance between two training images.This distance metric serves as the foundation for our zero-shot learning approach, enabling us to discern the dissimilarity or similarity between different LVH patterns effectively.
For Multi-purpose Siamese Weighted Euclidean Distance Model (MSWED), we have trained two identical Siamese networks: one for the D U dataset and another for the D L dataset.The MSWED is trained using image pairs from their respective datasets.In order to train MSWED, the images are first processed by a feature extraction component.As explained earlier, the feature extraction component is used to convert an input image to one dimension feature vector.There are 512 nodes in the output layer therefore, the feature vector for each image has 512 features.After extracting the features from both two mages of training data (either from D U or D L ), the Euclidean distance is calculated.
Let x u , x ′ u be a pair of images in D U .Also, let M be the trained model, whose output is the feature vector, i.e. the 512 values from the output nodes.
Assuming model M U is trained on D U and M L is trained on D L , we can define the Euclidean distance between a pair of images as follows.
Definition 1. Euclidean distance (x i U , x j U ) : Let M U be the network trained on training data D U , and x i U , x j U be a pair of images in D U .Therefore, P = M U (x i U ) = {p 1 , . .., p n } is the feature vector (i.e.value of the n output nodes) corresponding to x i U , and Q = M U (x j U ) = {q 1 , . .., q n } is the feature vector corresponding to x j U .So, the Euclidean distance is calculated as follows: Definition 2. Weighted Euclidean distance (x j , x j ) Let x i be the complete image corresponding to the union of the upper segment x i U and lower segment x i L .In other words, Similarly, let x j be another image, whose upper and lower segments are x j U and x j L , respectively, i.e.
The weighted Euclidean distance (see figure 1(e)) is calculated as follows: where w1 and w2 are weights assigned to upper and lower segments, respectively.The utilization of weighted Euclidean distance (MSWED) in our proposed model is motivated by the need to address specific challenges posed by echocardiogram images.The division of images into upper and lower segments is a deliberate choice aimed at mitigating noise interference and capturing only the most informative regions for LVH classification.
The lower segment of the image, often prone to noise and containing less pertinent information, could adversely affect the classification accuracy (van Everdingen et al 2016).However, disregarding this segment entirely is not feasible, as it still holds critical information.To strike a balance, we introduce weighted Euclidean distance, acknowledging the potential noise in the lower segment while ensuring its consideration in the classification process.
Weighted Euclidean distance accommodates the distinct characteristics of upper and lower segments by assigning weights (w1 and w2) to each segment.In our model, the weights are empirically determined as w1 = 3 for the upper segment and w2 = 0.2 for the lower segment (explained further in section 4).This balance allows the model to give more importance to the upper segment while still incorporating the lower segment's contribution, thereby enhancing the robustness of the classification process.

Testing of the proposed MSWED
We have named our proposed model as multipurpose Siamese model because our model can perform few-shot and zero-shot classification at the same time.Few-shot learning involves using a small number of training samples to classify the test image, while zero-shot learning deals with classifying unseen classes during training using textual information and available image features.Our model is capable of simultaneously performing few-shot and zero-shot learning.It is trained using Normal and Mild images.When a test image is provided to the model, it checks whether the image surpasses a certain cutoff distance.This cutoff distance acts as a threshold, indicating whether the model should search for the nearest neighbor in the training data (few-shot learning) or if it belongs to an unseen class (zero-shot learning) as shown in figure 1(f).
The cutoff distance (C o ) is a defined threshold that represents the distance between any pair of Normal to Mild images from the training data.To compute (C o ), we calculated the standard deviation (SD T ) and the mean (M T ) of pairwise weighted Euclidean distances between Mild training instances and Normal training instances, as illustrated in (10).In accordance with the empirical rule, which states that approximately 99.7% of the data falls within three standard deviations of the mean (M) in a Normal distribution, we utilized ( 8)-( 10) to determine the (C o ) distance.
In these equations, M T represents the mean of the pairwise weighted Euclidean distances W(x i , x j ) among the Mild and Normal training instances, while (SD T ) represents the standard deviation.Finally, (C o ) is derived by adding three times the standard deviation (3SD T ) to the mean (M T ).

Algorithms
The algorithm 1, called Train-MSWED, trains the models M U , M L using a dataset (T), hyperparameters P U m , P L m , and initial models M U , M L .It iteratively selects random image pairs from upper and lower segments of the datasets D U and D L , calculates contrastive loss for each negative and positive pairs, and updates the models.The process continues for a specified number of epochs until the model is trained.After the model is trained, weighted Euclidean distance between Mild and Normal training images is calculated.The algorithm 1 'Test-MSWED' takes a trained models M U and M L , and test data (v), as input.It iterates through each test image and determines whether it belongs to a seen or unseen class based on the weighted Euclidean distance W (x i ,x j ) .If the distance exceeds the threshold, it classifies the image as an 'Unseen Class.' Otherwise, it finds the nearest neighbors in the training set (T) and predicts the class label accordingly.for qn ← qn ∪ {(q3, q4)} ▷negative pair 25: ▷Extract upper segments of the test set 5: x i L ← GetLowerSegments(x i , Pavg) ▷Extract lower segments of the test set 6: x j ← randomSelect(T, Normal) 7: ▷Test image belongs to unseen class 9: Predictions.append('Moderate')

Experiments and results
In this subsection, we outline the experimental setup and analysis of the results achieved by MSWED and other competing approaches.

Experimental setup
To assess the effectiveness of MSWED, we compared its performance with that of transfer learning model such as VGG16 (Simonyan and Zisserman 2014).Additionally, we compared the results of our proposed model with a CNN model used as a feature extraction component in MSWED.Furthermore, we evaluated our work against a recent study conducted by Yu et al (2022) and Koch et al (2015).Yu et al (2022) performed LVH classification using ResNet and Koch et al (2015) proposed a one-shot learning model for alphabet classification.The datasets were divided into a 60:40 ratio for these experiments.Precision, recall and F1 are used as the evaluation metrics for all experiments.The model was trained for 30 epochs with a batch size of 10.

Results
In this subsection, we have presented the results of MSWED model and other competing approaches.

Few-shot learning
As mentioned earlier, the MSWED model is capable of performing both few-shot and zero-shot learning simultaneously.We divided the Normal and Mild classes from the 2CH and 4CH data in a 60:40 ratio.For the PLAX view, we trained and tested the model using Mild and Moderate images because we were unable to obtain Normal data in the PLAX view from either hospital.We split the dataset randomly five times to create more representative training and testing sets.After every random split, we calculate the recall and precision of the test data.We repeated this process five times and then calculate the mean recall and precision as shown in tables 2-4.Across the three echocardiographic views (2CH, 4CH, and PLAX), the tables 2-4 show varying ranges of precision and recall achieved by different models.In the 2CH view, for example, the VGG16 model has a recall of 0.69 with a precision of 0.62, Yu et al's model has a recall of 0.7 with a precision of 0.69, and the proposed MSWED achieves a remarkable recall of 0.82 with a precision of 0.88.
Similarly, in the 4CH view, the recall and precision values for the VGG16, Yu et al's model, CNN model, Siamese by Koch et al (2015) and the proposed MSWED are in the ranges of 0.6-0.83 and 0.52-0.89,respectively.The proposed MSWED consistently surpasses the other models, indicating its superior performance in achieving higher recall and precision simultaneously.
In the PLAX view, the recall and precision values range from 0.65 to 0.86 and 0.58 to 0.84, respectively.Once again, the proposed MSWED outperforms the competing models, showcasing its effectiveness in terms of both recall and precision.
Furthermore, F1 Score, a metric that balances precision and recall, is also provided in the tables.The F1 Score for the proposed MSWED is consistently high across all echocardiographic views, indicating a robust balance between precision and recall.Overall, the results highlight that the MSWED model consistently achieves superior performance compared to other models, making it a promising choice for Few-shot learning in cardiac image analysis.

Zero-shot learning
In the context of zero-shot learning, we performed a comparison between a test image (categorized as either Normal, Mild, and Moderate) to the training images (categorized as Normal and Mild).This comparison involved the calculation of the weighted Euclidean distance between the two randomly selected images from the training dataset (one from each class) and the test image.Subsequently, the value weighted Euclidean distance was compared against the predefined threshold C o , which is computed using the formula specified in (10).If the weighted Euclidean distance value falls below the C o threshold, we proceed to search for the nearest neighbor within two randomly selected training images.However, if the weighted Euclidean distance value exceeds the C o threshold, it indicates that the test image belongs to the unseen class which is in this scenario is Moderate class.
The tables 5 and 6 presents the values of various metrics including (SD T ), (M T ) and C o for the 2CH and 4CH training and testing data.The values of (SD T ) (standard deviation of the target variable) range from 0.037 to 0.048 across the different splits.(M T ) (mean of the target variable) varies from 0.79 to 0.84, while C o ranges from 0.89 to 0.95.The tables 5 and 6 also includes the mean values of the Euclidean distance between different categories of the test images and the Normal images.The mean weighted Euclidean distance values for each category of test images vary across the splits.It is worth noting that the mean weighted Euclidean distance between Moderate test images and Normal train images is above the C o value for every split.This finding indicates that MSWED is not only capable of performing the few-shot learning but also zero-shot learning.
The tables 7 and 8 presents the performance comparison of competing approaches for the Zero-shot learning.The approaches are evaluated on two different test datasets: 2CH and 4CH.We compared our model with Yu et al (2022) for the LVH and Normal classification and Koch et al (2015) for zero-shot learning.The performance metrics measured are recall, precision and F1 score.For the test data of 2CH,  2022) feature extraction component achieved comparable result for both 2CH and 4CH data.On the other hand, the MSWED approach outperforms the other with a higher recall of 0.87 and a smaller standard deviation of 0.02.It also achieves a higher precision of 0.89 with a standard deviation of 0.02.Similarly, for the test data of 4CH, MSWED approach performs better, obtaining a higher recall of 0.84 with a larger standard deviation of 0.06 and a precision of 0.82 with a standard deviation of 0.05.Overall, the MSWED approach consistently demonstrates superior performance compared to the other approaches, achieving higher recall, precision and F1 values in both the 2CH and 4CH test datasets.By varying the weight assigned to the upper part of the image and observing the corresponding recall and precision values, we can analyze the performance of the model.The table 9 shows that as the weight assigned to the upper part of the image increases, both recall and precision values also increase.This suggests that giving more weight to the upper portion of the image improves the model's ability to correctly identify relevant information or features, resulting in higher recall and precision.
The table also includes a scenario where the weight assigned to the lower part is zero (last row).The data indicates that, in this specific case, both recall and precision values decrease compared to the scenario where the weight for the lower part is 0.2.This could be due to the exclusion of information from the lower part, which might contain valuable details for the classification task.It's worth mentioning that the model's performance was evaluated by trying multiple values of weight assignments and observing the corresponding recall and precision values.The table serve as illustrations of the model's performance for the specific weight assignments depicted.

Ensemble model with multi-view majority voting for enhanced accuracy
To improve the accuracy of our proposed approached, we experimented with a ensemble model which comprises of 6 Siamese models.Our approach involved training machine learning models with a dataset split in a 60:40 ratio for training and testing, respectively.We trained six models: three for the upper portion of images from Mild and Moderate in 2CH, 4CH, and PLAX views, and three for the lower portion of the images in 2CH, 4CH, and PLAX views .All other aspects of our methodology align with those described in section 3.2.2 of our research.
For model testing, we evaluated each model based on the view of the test image.For instance, if a test image displayed a 2CH view, we compared it against both 2CH Mild and 2CH Moderate images.We then calculated the weighted Euclidean distance.If the resulting distance exceeded a threshold value denoted as C o , the image was classified as depicting Severe LVH; otherwise, it was assigned to the class of its nearest neighbor.As we have three echocardiography views for each patient, we tested the models using all three views, and the final classification was determined by majority voting In essence, by employing multiple views and majority voting, we capitalize on diverse angles and information sources, making our diagnosis more robust and less susceptible to errors.This approach   increases the overall accuracy and reliability of our diagnostic system, as evidenced by our improved results.
Figure 2 shows the testing of model which is trained on three echocardiographic views.
In table 10, we illustrate the effectiveness of an Ensemble Model with Multi-View Majority Voting in diagnosing Mild LVH (Left Ventricular Hypertrophy) using echocardiography.Patient conditions are  12 illustrates the performance of an Ensemble Model with Multi-View Majority Voting in a scenario where the model is trained on a dataset containing both Mild and Moderate images.This task simulates a zero-shot learning challenge where the model must classify a Severe case, which it has not been explicitly trained on.In specific data splits, individual views, such as '2CH' and 'PLAX,' provided false classifications, mislabeling cases as 'Moderate.'However, the power of majority voting within the Ensemble Model became evident as it consistently yielded the correct result, classifying these cases as Severe.
In tables 10-12, we have presented the results for individual patients when the models are trained on random data.In order to yield more reliable outcomes, we randomly split the data into five times and subsequently evaluated the model.The table 13 displays the mean recall and precision metrics for three different cardiac views: 2CH, 4CH, and PLAX, along with the Ensemble Model's results.These mean values reveal that the Ensemble Model achieves an average recall of 0.9 and an average precision of 0.88, indicating its strong performance in accurately identifying cases of both Mild and Moderate conditions.These findings suggest the potential of the Ensemble Model with Multi-View Majority Voting as an effective diagnostic tool for cardiac conditions.

Analysis of distance function and computational complexity
In our recent publication Farhad et al (2023), we utilized an enhanced Siamese model to conduct LVH classification.We delved into an in-depth analysis of distance functionsy, the outcomes of which informed our choice of Euclidean distance as the preferred distance measure.Our investigation further extended to comparing the time and parameters utilized by various models.Notably, our proposed Siamese model demonstrated superior performance, boasting the shortest training time and the fewest parameters.Despite our thorough examination, Chebyshev Distance was inadvertently omitted from our initial analysis.Hence, we have rectified this oversight by incorporating it into our findings, as detailed in table 14.
Moreover, regarding the computational complexity, the transfer learning models tend to have higher complexity due to their multiple layers and parameters (Achille et al 2021).In contrast, our Siamese approach utilizes only two layers to extract features, resulting in lower computational complexity.

Extending the application of Siamese neural networks in disease severity assessment
We employed a Siamese neural network architecture to classify cardiac conditions ranging from Normal to Moderate LVH.However, the potential of the Siamese neural network architecture extends beyond this specific application.It can be utilized to assess the severity of diseases and track changes along a continuous spectrum.By measuring the Euclidean distance between the final fully connected layers of the paired subnetworks within the Siamese neural network, meaningful measures of disease severity can be obtained in comparison to Normal cases or other time points.This approach only requires image-level annotations and binary comparison labels for training the network.The current practice of categorizing diseases into discrete bins relies on a human-engineered approach, while the underlying nature of diseases often follows a continuous spectrum.Our Siamese neural network approach, utilizing measures derived from the output Euclidean distance, presents a continuous disease severity grading scales for LVH.The Siamese neural network possesses an intriguing characteristic: it can be trained using only binary comparison labels, yet it implicitly learns the degree of difference in disease severity between the paired input images.This feature has the potential to alleviate the annotation burden placed on clinical experts during the labeling of training data.Annotating a binary difference between two images is typically easier compared to categorizing the severity of disease on a single image.
During our testing, we examined the performance of the model using a Severe LVH image.Severe LVH images are relatively rare, making them more challenging to obtain.However, we managed to acquire one such image for evaluation.By comparing this Severe LVH image with random Normal, Mild, and Moderate LVH images, we were able to confirm the above observation as shown in table 15.The Euclidean distance between the Severe LVH image and the test image decreased as the severity of LVH increased from Normal to Moderate.In conclusion, the Siamese neural network, trained with binary comparison labels, demonstrates the potential to implicitly learn and differentiate the degree of disease severity.Table 15 also depicts the strength of our weighted Euclidean distance.In some instances, the lower part shows more differences due to noise than the upper part, which does not accurately reflect the true distinction between the images.

Discussion
This work represents the pioneering research in LVH grading using computer vision techniques.While previous studies have concentrated on left ventricle segmentation, none have delved into LVH grading, primarily due to the following reasons: Complexity for accuracy: While simple methods such as measuring wall thickness or left ventricular mass/body surface area can provide an initial assessment of LVH severity, they may lack the precision needed for accurate classification, especially in cases where subtle differences matter.Our proposed Multi-purpose Siamese Weighted Euclidean Distance Model (MSWED) leverages advanced techniques such as convolutional Siamese neural networks and weighted Euclidean distance to precisely classify the severity of LVH.By incorporating these techniques, our model achieves higher accuracy and reliability compared to traditional methods.
Subjectivity and error in manual grading: Manual grading systems often entail subjectivity and the potential for error, as they frequently rely on the interpretation of individual experts Sanghvi (2016), Bartolomé et al (2020).By introducing an automated approach like MSWED, we mitigate the risk of human error and ensure consistency in severity classification across different healthcare settings.This automation not only saves time but also enhances the reliability of LVH severity assessment.
Advancements in zero-shot and few-shot learning: Our model introduces novel approaches in zero-shot and few-shot learning, specifically tailored for LVH severity classification.Unlike traditional text-vector based zero-shot learning, our MSWED model employs a cutoff distance-based approach, enhancing accuracy and reliability.These advancements are crucial for handling cases where there might be limited labeled data available, thus improving the generalizability and applicability of the severity grading system.
Segmentation-based approaches: MSWED also offers a solution that overcomes the challenges associated with segmentation-based approaches.Traditional segmentation methods often require extensive training time and large amounts of annotated data to achieve satisfactory results.However, our proposed MSWED model bypasses the need for segmentation by directly analyzing echocardiograms as a whole, significantly reducing the computational burden and training time.By leveraging convolutional Siamese neural networks, our model learns to extract meaningful features directly from the raw images, eliminating the need for laborious manual segmentation.This not only streamlines the process but also ensures scalability and efficiency, making it suitable for real-world clinical applications where time and resources are limited.Therefore, our approach strikes a balance between complexity and practicality, offering a sophisticated solution without sacrificing efficiency or ease of implementation.

Inter-observer and Intra-observer reliability
In order to assess the classification results' quality, we introduced tables 16 and 17 to demonstrate the Inter-observer and Intra-observer reliability.Inter-observer variability, as indicated by κ values ranging from 0.6 to 0.7, highlighted Moderate agreement between two experts.This underscores the challenges in achieving consistent manual annotations from clinical experts using real-world data and emphasizes the need for interactive volumetric feedback for improved consistency.The higher agreement observed between the proposed MSWED model and the experts underscores the model's robustness in delivering accurate classification results.In the case of intra-observer variability, results surpassed inter-observer scores, demonstrating the remarkable consistency of manual classifications by experienced echocardiographers, even with challenging data.Notably, the MSWED model exhibited the highest intra-observer variability score, affirming its ability to replicate the expertise of the echocardiographers.

Ablation study
The table 18

Conclusion
In conclusion, our research work has made significant contributions to the field of LVH grading.Our first contribution involves the creation of two datasets comprising echocardiograms that encompass a wide range of LVH severity.Secondly, we have introduced a novel approach by leveraging a Siamese neural network for grading LVH.This innovative method enables accurate assessment of LVH severity even when labeled data is limited.Furthermore, we have developed a zero-shot learning approach, which proves valuable in domains where textual vectors for image description are unavailable.Additionally, we have incorporated a weighted Euclidean distance to focus on the most informative regions within LVH images for grading purposes.Together, these contributions advance the field of LVH assessment by providing comprehensive datasets, introducing innovative approaches, addressing data limitations, and enhancing the accuracy and reliability of LVH grading.

Future work
Our future endeavors will focus on expanding the geographical scope of our research beyond Hyderabad, Sindh, to encompass diverse populations from various countries.By collecting data from different regions, we aim to develop LVH classification models that are more universally applicable.This broader dataset will account for demographic variations, healthcare practices, and regional health disparities, thereby enhancing the generalizability and robustness of our classification system.Additionally, we recognize the importance of explainability and interpretability in AI-driven healthcare systems, particularly in ensuring accountability and transparency in decision-making processes.Legal requirements for explainability, especially in regions like the European Union, are increasingly prominent.Regulations such as the General Data Protection Regulation and the Medical Device Regulation impose strict standards for accountability and transparency in AI applications in healthcare.The work by Stöger et al (2021) provides valuable insights into the legal perspective of medical AI in the European Union, emphasizing the importance of explainability in compliance with regulatory frameworks and liability establishment.In our future research and discussions, we will emphasize the significance of explainability and interpretability in our proposed AI model for LVH severity grading.We will strive to provide transparency into the decision-making process, contributing to the broader goal of building trustworthy and legally compliant AI systems in healthcare.

Figure 1 .
Figure 1.Echocardiography images showing 4CH 2CH and PLAX views of the heart respectively (b) shows the dividing of images into upper and lower parts (c) shows proposed model for feature extraction component (d) shows training of MSWED (e) shows calculation of (Co) (f) shows testing of MSWED.

Algorithm 1 .
TrainMSWED (T, P U m , P L m ).Require: T: Training data, P U m : Hyperparameters for Model, P L m : Hyperparameters for Model Ensure: M U : Trained Model, M L : Trained Model 1: Q ← [] 2: Qp ← [], qp ← [] ▷for all positive pairs in D U and D L 3: Qn ← [], qn ← [] ▷for all negative pairs in D U and D L 4: q ← [] 5: Lc1 = 0, Lc2 = 0 6: Pavg ← CalculateAverageLVLowestPoint(T) ▷Calculate the average y-coordinate of the lowest LV points across all images 7: D U ← GetUpperSegments(T, Pavg) ▷Extract upper segments 8: D L ← GetLowerSegments(T, Pavg) ▷Extract lower segments 9: maxPair ← Maximum number of pairs processed in each epoch 10: while epoch ⩽ maxepoch do 11: for i = 1 to T − 1 do 12: current batch 30: M L .fit(q,Lc2,W L m , P L m ) ▷Train model M L with current batch 31: return M U , M L Algorithm 2. Test-MSWED (M U , M L , V, Pavg).Require: M U : Trained model for upper segments Require: M L : Trained model for lower segments Require: V: Test data Ensure: Predictions: Class predictions for the test data 1: Predictions ← [] ▷Array of predicted class for test images 2

Figure 2 .
Figure 2. Testing of the ensemble model.

Table 2 .
Performance of competing models using Few-shot learning-2CH.

Table 3 .
Performance of competing models using Few-shot learning-4CH.

Table 4 .
Performance of competing models using Few-shot learning-PLAX.

Table 5 .
Values of σ, (M T ), Co, µ T (dist i,j ) for 2CH training and testing data.

Table 6 .
Values of σ, (M T ), Co, µ T (dist i,j ) for 4CH training and testing data.

Table 7 .
Performance of competing approaches using Zero-shot learning-test data (2CH).

Table 8 .
Performance of competing approaches using Zero-shot learning-test data (4CH).

Table 9 .
Performance of MSWED for different weight assigned to upper and lower parts of the images.

Table 10 .
Performance of ensemble model with multi-view majority voting for Mild LVH.

Table 11 .
Performance of ensemble model with multi-view majority voting for Moderate LVH.

Table 12 .
Performance of ensemble model with multi-view majority voting for a Severe Case.

Table 13 .
Performance of Ensemble Model with Multi-View Majority Voting for Mild and Moderate Data., the Ensemble Model's collective classification achieves a significantly improved accuracy of 91.6%.This highlights the advantage of combining diverse views through majority voting, enhancing accuracy.In table 11, we extend this approach to classify Moderate LVH in patients based on these three views.Majority voting ensures a robust and consensus-based diagnosis.The ensemble model, using multi-view majority voting, demonstrates its effectiveness with an impressive overall accuracy of 100%, emphasizing its utility in accurate LVH severity classification across various echocardiography perspectives.The table

Table 14 .
Performance Analysis of Distance Function Utilized in MSWED.

Table 15 .
Euclidean distance of Severe LVH image with Mild, Moderate and Normal images.
compares the performance of two approaches: MSWED without Image Division and Weighted Euclidean Distance, CNN model and Proposed MSWED.For CNN model, we have utilized CNN model given in figure1(c).The assessment is carried out across various test data views, including 2CH, 4CH, and PLAX views.The evaluation metrics, recall and precision, measure the effectiveness of each approach.The MSWED without Image Division and Weighted Euclidean Distance approach and CNN model exhibit recall values ranging from 0.58 to 0.73 and precision values ranging from 0.55 to 0.74 across the different test data views.In contrast, the Proposed MSWED approach, incorporating image division, showcases higher performance with recall values spanning from 0.82 to 0.86 and precision values ranging from 0.84 to 0.89 across the test data views.This comparison demonstrates that the Proposed MSWED models consistently outperform the MSWED without Image Division and Weighted Euclidean Distance models in terms of both recall and precision, indicating the advantage of incorporating image division and the proposed methodology for improved performance in medical diagnostics.