Interactive segmentation of medical images using deep learning

Medical image segmentation algorithms based on deep learning have achieved good segmentation results in recent years, but they require a large amount of labeled data. When performing pixel-level labeling on medical images, labeling a target requires marking ten or even hundreds of points along its edge, which requires a lot of time and labor costs. To reduce the labeling cost, we utilize a click-based interactive segmentation method to generate high-quality segmentation labels. However, in current interactive segmentation algorithms, only the interaction information clicked by the user and the image features are fused as the input of the backbone network (so-called early fusion). The early fusion method has the problem that the interactive information is much sparse at this time. Furthermore, the interactive segmentation algorithms do not take into account the boundary problem, resulting in poor model performance. So we propose early fusion and late fusion strategy to prevent the interaction information from being diluted prematurely and make better use of the interaction information. At the same time, we propose a decoupled head structure, by extracting the image boundary information, and combining the boundary loss function to establish the boundary constraint term, so that the network can pay more attention to the boundary information and further improve the performance of the network. Finally, we conduct experiments on three medical datasets (Chaos, VerSe and Uterine Myoma MRI) to verify the effectiveness of our network. The experimental results show that our network greatly improved compared with the baseline, and NoC@80(the number of interactive clicks over 80% of the IoU threshold) improved by 0.1, 0.1, and 0.2. In particular, we have achieved a NoC@80 score of 1.69 on Chaos. According to statistics, manual annotation takes 25 min to label a case(Uterine Myoma MRI). Annotating a medical image with our method can be done in only 2 or 3 clicks, which can save more than 50% of the cost.


Introduction
In recent years, deep learning has delivered significant achievements in the domains of image recognition (Zheng et al 2017, Pu et al 2019, Tian 2020), object detection (Ren et al 2015, Redmon et al 2016, Zhang et al 2022), and image segmentation (Ronneberger et al 2015, Isensee et al 2021, Zhang et al 2021).While convolutional neural networks have the capacity to learn autonomously, substantial volumes of annotated image data are often indispensable.With the augmentation of data, the expenses associated with human annotation rise considerably, particularly in the context of pixel-level segmentation tasks.
Therefore, interactive segmentation appears to be an appealing and effective approach that enables human annotators to swiftly extract objects of interest.Interactive segmentation entails the user initially specifying portions of the background and foreground in the image using interactive techniques.Subsequently, the algorithm automatically computes the optimal segmentation that adheres to the user's input as a constraint.The interactive segmentation algorithm can enhance the results of automatic segmentation (Xu et al 2016) and develop interactive segmentation tools based on the algorithm.This empowers artificial intelligence to expedite the segmentation and labeling processes while reducing labeling costs.Interactive segmentation methods account for user-provided interaction data, streamlining the annotation process.In general, interactive inputs can take various forms, such as doodles (Bai and Wu 2014), clicks (Sofiiuk et al 2022), or bounding boxes (Xu et al 2017), among others, as illustrated in figure 1.
In recent years, with the development of deep learning, interactive segmentation algorithms based on deep learning (Bai and Wu 2014, Xu et al 2017, Sofiiuk et al 2022) have surpassed traditional methods (Boykov and Jolly 2001, Rother et al 2004, Grady 2006).Nevertheless, when employing interactive segmentation algorithms for object annotation, it is imperative to produce high-quality segmentation labels and ensure efficient inference speed.Notably, an increasing number of researchers have been investigating interactive segmentation algorithms employing click-based interactions (Lin et al 2020, Sofiiuk et al 2022).Meanwhile, FocalClick (Chen et al 2022) paper mentions the importance of clicks in interactive segmentation.In most existing works, they indiscriminately use all interaction points to generate the final prediction results.However, not all interaction points yield the same segmentation effectiveness.However, a common issue arises among these studies.Integrate interaction and image features solely ahead of the backbone network, where particular spatial and semantic information tends to be weakened within the earlier layers.While conducting segmentation, we observed that current interactive segmentation algorithms struggle to distinguish boundaries between distinct objects effectively.This limitation results in extended interactive click time and subpar segmentation performance.Typically, the image boundary represents the transitional region between objects or between objects and the background, making its precise segmentation a critical factor in achieving accurate results.The target boundary is illustrated in figure 2.
Our contributions are summarized as follows: • Embeding the features of user clicks and image features into the early and late fusion layers, thereby enhancing the utilization of interactive information.• Combining the boundary and backbone features, by decoupling the different supervision of the whole and the boundary to improve the segmentation ability of the network.
• Combining boundary loss, cross-entropy loss and focus normalization loss to optimize the network, experiments show the superiority of our work on 3 medical datasets.

Related works
The interactive segmentation algorithm is an image segmentation method that relies on user interaction.In contrast to automatic segmentation algorithms, interactive segmentation algorithms enable users to engage in the segmentation process and enhance the quality of segmentation results.This is achieved by allowing users to contribute additional information or feedback, thereby obtaining more precise segmentation outcomes.

Interactive segmentation methods
Early interactive segmentation algorithms primarily approached it as an optimization-based method (Adams and Bischof 1994, Mortensen and Barrett 1998, Boykov and Jolly 2001, Boykov and Funka-Lea 2006, Grady 2006, Fan and Lee 2015).With the advancement of data-driven artificial intelligence, interactive segmentation algorithms based on deep learning have gained prominence.For the first time, Xu et al (2016) fused interactive segmentation with deep learning, mainly relying on interactive information provided by users, including foreground and background points, which were then transformed into distance images.These images were then integrated with the original image and utilized as input for the Fully Convolutional Networks (Long et al 2015) to yield the segmentation outcome.Subsequently, deep learning-based interactive segmentation algorithms saw continuous refinement.Jang and Kim (2019) introduced the Backpropagating Refinement Scheme (BRS) network, based on click-based interaction.They integrated the backpropagation mechanism into the network's reasoning stage and computed the loss by combining interaction points with generated points corresponding to the mask.This optimization made use of interaction points as hard constraints, reducing the need for user clicks.

Network architecture
The network structure of the interactive segmentation algorithm is very similar to that of the automatic segmentation algorithm, but there are key differences.On the basis of the input image information, the interactive segmentation algorithm also needs to encode additional user interaction information, then the two are fused, and the fused information is input to the backbone network for feature extraction.Figure 3 shows a schematic diagram of the network structure of our method.
We construct an interactive segmentation network for 2D medical images.Based on HRNet(High-Resolution Net) + OCR(Object-Contextual Representations) (Wang et al 2020, Yuan et al 2020) semantic segmentation architecture, early and late fusion strategies are added to the network to alleviate the dilution of interactive information, and additional boundary-heads and corresponding boundary losses are added by using decoupling heads so that segmentation errors at the boundary can be corrected by learning the corresponding relationship between boundary pixels and their corresponding pixels.The segmentation ability of the model is improved without reducing the inference speed of the network.

Clicks encoding
In click-based interactive segmentation, there will be positive and negative clicks.These clicks are represented by their coordinates in the image.To feed them into a convolutional neural network, we should encode them into a spatial form, Benenson et al (2019) conducted a detailed ablation study on click encoding and found that small radius disks outperformed other encodings in terms of model performance.A disk map with a radius of 5 pixels is used in this paper to encode its positive and negative clicks.As shown in figure 4, during model training, the positive and negative clicks of samples randomly generate positive sample points (on the target object) and negative sample points (background) according to the original mask information, thereby simulating human clicks.

Early-late fusion
The interactive segmentation method based on deep learning usually adopts the early fusion strategy, that is, the interactive information is fused with the image information, and the fused information is input into the subsequent backbone network for feature extraction.Previous interactive segmentation networks (Lin et al 2020, Sofiiuk et al 2022) fuses interactive information with images before feeding into the backbone network, the so-called early fusion.The early fusion method led to the extraction of incorrect interaction information, and the interaction information at this time is much sparser than image features.If only early fusion is used, the  network may not be able to make better use of interaction information, resulting in the network not being able to respond to user clicks on time.
In order to prevent the interaction information from being diluted prematurely, we propose an early fusion and late fusion strategy to make better use of interaction features and image features.Instead of only performing fusion before feeding into the backbone network.The main idea is that the first fusion is at the beginning of the network as shown in figure 5, and the second fusion is at the first stage of the backbone network block.In the early fusion using the addition method to directly add image features and interactive features (interactive information of user clicks), the dimension of the feature map does not change, but each dimension contains more features.In the late fusion using the concatenate method to concatenate image features and interactive features, more dimensions and location information are retained, so that the subsequent network layer can use shallow features and deep features, which is more advantageous for segmentation tasks.

Decoupled head
Song et al (2020) pointed out that the problem of spatial misalignment exists in the classification and location tasks of target detection due to the different regions of concern of the two tasks.To solve this problem, YOLOX (Ge et al 2021) proposes a decoupled heade based approach, which reduces the mutual interference between the two tasks by designing two network header branches for the classification task and the positioning task.This design improves the accuracy of object detection by 3%.As shown in figure 6, medical datasets often have amorphous or irregularly shaped boundaries due to different patients or diseases, which increases the difficulty of segmentation tasks.In order to alleviate these problems, this paper refers to the research results of YOLOX [46] et al and designs a semantic segmentation model based on the Decoupled Head idea.Features are divided into overall features and edge features to improve the performance of the network model.We exploit the idea of decoupling to optimize the semantic segmentation model.Under decoupled supervision, the resulting overall  features and residual boundary features are further refined by supervising different part (overall or boundary) pixels.As shown in figure 7.This method combines different loss functions by decoupling the supervision of the overall and object boundary.Among them, the boundary head cooperates with the boundary loss to encourage the alignment between the predicted boundary and the ground truth boundary (GTB), so that the segmentation model can better distinguish the boundaries of different targets and significantly improve the segmentation effect of small objects and slender objects (uterine cavity).
The acquisition of the boundary map is based on the ideas of Song et al (2020).First, the predicted boundary map (PDBs) is detected according to the class probability map P ä R ( C× H× W) output by the current network, where C represents the number of classes, in interactive segmentation, is one instance per object, so the number of C is 2 (object and background).The image resolution is H × W. Specifically, we compute the boundary map B to represent the location of the PDB by computing the Kullback-Leibler (KL) divergence.For a pixel i in B, its value B i is calculated as follows: Where 1 indicates the presence of PDB and P i is a C-dimensional vector extracted from the probability map at pixel i.This is because the prediction graph has C channels, so each pixel can be converted to a vector of length C. N 2 (i) denotes the 2 neighborhoods of pixel i.In particular, the offset N 2 (i) of p to pixel i is 1, 0, 0, 1.We employ an adaptive threshold to ensure that the number B of boundary pixels in the input image is less than 1% of the total pixels in the input image, where 1% is approximately the proportion of the number of boundary pixels in the image.Meanwhile, GTB maps are also determined using equation (1).

Loss function
The loss function is used to evaluate the difference between the predicted value of the model and the real value, so as to guide the next step of the network training in the right direction.Usually the better the loss function, the better the model performance.As shown in equation (2), our network mainly includes three components of loss.Normalized focal loss (NFL) (Sofiiuk et al 2022), binary cross entropy (BCE) and active boundary loss (ABL) (Wang et al 2022).NFL is used to calculate the loss between the true value and the predicted value of each pixel, and ABL loss progressively encourages the alignment between the prediction boundary and GTB and utilizes BCE as an auxiliary loss to constrain the backbone output w a , w b , w c , represents the weight of NFL, BCE, and ABL loss respectively.In this paper, the weights are 1, 0.4, and 0.2 respectively.
NFL as equation (3): where γ is a hyperparameter of the NFL.p (i,j) represents the confidence of point (i, j) and it is denoted as : y is the corresponding ground truth on the point(i, j).
ABL is used to measure the distance between the predicted boundary and the true boundary.ABL is calculated as equation (5) where  = ) ,N b is the number of pixels on the PBD, θ is a hyperparameter and is set to 20 in our paper.M is the result of the distance transformation of GTBs.(The main is to set the boundary part of GT to 0, and calculate the Manhattan distance of the other non-0 pixels themselves from the nearest 0.) The closest distance to GTB is used as a weight at pixel i to penalize its deviation from GTB.If M i is 0, indicating that the pixel is already on the GTB, the pixel will be discarded in the ABL.CE refers to the cross entropy loss function.

Datasets and data preprocessing
There is a wide range of types and sources of medical data, including different medical institutions, medical devices and medical research fields, which may adopt different methods and techniques to collect data, and these differences may lead to different resolutions, noise and errors between datasets.Moreover, medical data can be of various types, including image data (such as MRI, CT, etc), which leads to high heterogeneity and complexity of different medical datasets, which may ultimately affect the performance of the algorithm.To this end, we combined three medical datasets (including the uterine MRI dataset, Chaos dataset (Kavur et al 2021) and VerSe dataset (Sekuboyina et al 2021)) to evaluate the performance of the designed interactive segmentation algorithm for medical images.The uterine MRI dataset was constructed by ourselves and approved by the scientifific research ethics committee of Beijing Shijitan Hospital, Capital Medical University (code: sjtkyll-lx-2022(1)).
The uterine myoma MRI dataset for a total of 157 cases, it consists of T2WI sagittal section, as shown in figure 8.There is a large difference between the imaging of different patients and due to the movement and extrusion of the organ itself, it will form targets of different shapes and sizes as shown in figure 6  Data preprocessing.In the process of model training, in order to unify the format of medical datasets, the annotation files, and original images were converted to jpg/png.And eliminate images that only contain the background.Finally, the three data sets are randomly divided according to the ratio of 8:1:1 to form a training set, a verification set, and a test set respectively, and then used for model training and test models.As shown in table 1.

Train settings
For data augmentation, we employed random cropping and adjusted the cropped image size within a range of 0.75-1.25.Additionally, we performed random flipping and random rotations of 90 degrees, and we introduced random translations and scaling operations.Finally, we resized the images to a dimension of 320 × 480 either by padding or random cropping.In the experiment, three losses are used to train the network, namely NFL, BCE and ABL, and the coefficient is 1, 0.4, and 0.2 respectively.ResNet18 pre-trained on ImageNet (Deng et al 2009) as the backbone.We set the batch size to 32, set the initial learning rate to 0.0005, and apply the Adam optimizer β 1 0.9, β 2 = 0.999.For all models, we train 240 epochs, and all experiments are implemented with the PyTorch (Paszke et al 2019) framework and run on a single NVIDIA 3090 GPU.

Evaluation metrics
We utilize Intersection Over Union (IOU) and the Dice coefficient as metrics, and NoC@k to evaluate the performance of the algorithm.We calculate NoC@80, NoC@85 and NoC@90.NoC@k refers to the number of clicks required when the intersection ratio between the model mask and the real mask reaches K.The smaller the indicator, the better the model effect.Where IOU represents the ratio between the intersection and union between the predicted result and the ground truth mask.The Dice coefficient is employed to assess the similarity between algorithmic segmentation outputs and actual annotations, commonly utilized as a performance evaluation metric for medical image segmentation models.Additionally, we compare the model's performance by calculating its IOU and Dice coefficient results after one, three, and five clicks with those of other models.When evaluating the performance metrics of the models, all models simulate annotation clicks generated by humans based on real masks.
Running time analysis.According to the seconds per click (SPC) mentioned in paper (Xu et al 2016), we measure the average run time to evaluate the runtime of different algorithms.

Result and discussion
To evaluate the performance of our proposed method, relevant experiments were conducted on three medical datasets, and the indexes NOC@80, NOC@85 and NOC@90 (the smaller the indicator is, the better the performance of the model) were used to compare with other click-based interactive segmentation algorithms, including RITM-H18 (Sofiiuk et  As shown in tables 2 and 3, we conducted comparative experiments on the Uterine Myoma MRI dataset with IOU (Dice) as the evaluation index.The experimental results show that our approach achieved a performance improvement of 0.2(0.1)percentage points in terms of the Noc@80 evaluation metric compared to the current state-of-the-art method.This indicates that our method requires fewer interactions to achieve the same level of performance.At the same time, as shown in table 3, we conducted related experiments to observe the impact of different clicks (1, 3, 5) on the performance.The experimental results show that our method performs the best in terms of performance.At the same time, as the number of clicks gradually increases, the performance also continues to improve.As shown in figure 11, we compared the results based on the proposed algorithm and manual annotation and found a high degree of consistency between them.This finding further validates the accuracy and reliability of our method.As shown in figure 12, We annotated the images containing noise.It can be seen that there may be no obvious difference between our method and the benchmark model when labeling simple images.However, when dealing with complex structures or noisy images, the proposed method shows obvious advantages.This can be attributed to the boundary loss introduced in the method in this paper as a soft constraint, which improves the modeling ability of complex structures and noises in images by enhancing the boundary information of images, to label them more accurately.As shown in tables 4 and 5, the results of our proposed method are compared between the Chaos dataset and the current mainstream networks.Experimental results show that compared with the standard network methods RITM-H18 and RITM H32, using IOU(Dice) Noc@80 as the evaluation index, our method achieves 0.14(0.19)and 0.3(0.32)improvements, respectively.This shows that the proposed method has better performance on this dataset.
As shown in tables 6 and 7, our relevant comparative experiments in the VerSe dataset show that compared with the benchmark network methods RITM-H18, the proposed method has improved the evaluation index IOU(Dice) Noc@80 by 0.1(0.08)points respectively.When we observed the impact of different click counts (1, 3, and 5 clicks) on performance, it became evident that our model achieved the best results compared to other models.
To sum up, we conducted relevant experiments on three datasets and observed that our method showed different performances on different data sets.This difference can be attributed to the quality of the data sets and the degree to which different conditions are difficult to distinguish.However, it is worth noting that our method shows significant improvements over the other methods on all three datasets.This illustrates the broader applicability of our approach.

Ablation study
We used the uterine MRI dataset to conduct ablation experiments of each module to verify the effectiveness of early and late fusion, using Boundary-Head with boundary loss, which is represented by ELF and B-H respectively for simplicity.Therefore, we designed the following four sets of experiments as shown in tables 8 and 9, where each component is selectively turned on (✓) and off (×).
In table 8, we perform ablation experiments on each of our models with HRnet18 as our backbone network.We found that using an early-late fusion strategy can improve 0.06 NoC@80.Combining the decoupled head module with the boundary head and boundary loss can improve 0.28 NoC@80.This proves the effectiveness of the early-late fusion strategy and Boundary-Head module.Finally, the combination of these two modules achieves 0.59 Noc@80, which is 0.54 higher than the base model.In table 9, we perform ablation experiments on each of our models with HRnet-32 as our backbone network.We found that using an early-late fusion strategy can improve 0.25 NoC@80.Combining the decoupled head module with the boundary head and boundary loss can improve 0.52 NoC@80.This proves the effectiveness of the early-late fusion strategy and Boundary-Head module.Finally, the combination of these two modules achieves 4.64 Noc@80, which is 0.61 higher than the base model.

Comparison of different weights of loss function
In section 3.2 Loss function, this paper introduces how to optimize network training by combining three loss functions: NFL loss function, BCE loss function and ABL loss function.In order to explore the influence of these three loss functions on network performance, we set different weights for these three loss functions in the experiment.
Specifically, through a series of experiments, we adjust the weight parameters of the three loss functions in the experimental setting and carry out multiple sets of comparison experiments.In these experiments, we keep the other hyperparameters unchanged and only adjust the weight of the loss function to ensure the reliability and comparability of the experimental results.We used MRI data sets of uterine fibroids to verify the model performance under different weight combinations.Specific experimental results are shown in table 10.For the NFL and BCE loss functions, we set them to 1 and 0.4 according to the parameters already tuned in the paper (Sofiiuk et al 2022).For the ABL loss function, we conducted several sets of comparative experiments, and set the weights as 0.1, 0.2, 0.3, 0.4, and 0.5, respectively.
The experimental results show that when the weights of these three loss functions are set to 1, 0.4, and 0.2 respectively, the model performance reaches the best.Therefore, in all the experiments in this paper, we used these three weights to train the model.

Conclusions
In this work, we use the uterine myoma MRI medical dataset combined with an interactive segmentation network.The click-based interaction method enables artificial intelligence to accelerate segmentation and labeling and reduce the cost of medical data labeling.The early and late fusion strategy is mainly proposed to alleviate the premature sparseness of interactive features, and on this basis, the network increases the boundary extraction head and boundary loss, so that the network can better extract the boundary details of the image and optimize its ability, thereby improving the segmentation ability of the network.Related experiments were carried out on two other medical data (VerSe, Chaos) to prove the effectiveness of our algorithm.Finally, the performance of our proposed interactive segmentation algorithm on these three medical datasets has been greatly improved.Finally, when we apply the model to the actual data set, we annotate it.Due to reasons such as poor data quality, the performance of the model may not meet user requirements.In this case, we need to optimize the annotation effect by fine-tuning from a non-algorithmic perspective to meet the actual needs of users.

Figure 1 .
Figure 1.Different means of interactions (a) is based on doodles, (b) is based on clicks, and (c) is based on bounding boxes.

Figure 2 .
Figure 2. Target boundary visualization.The boundary is generated by the mask.(a) represents the original image, (b) represents the image corresponding to the grayscale annotation file, and (c) represents the extracted boundary of the corresponding target.

Figure 3 .
Figure 3.The network structure of our method.Using click codes and images as input, propose an early-late fusion strategy to improve the model's utilization of interactive information, and at the same time supervise the boundary to further improve the model's performance.BN means Batch Normalisation, ReLU means ReLU activation function, and OCR means OCR network structure (Yuan et al 2020).Where Cls, Edge and Aux block represent the Segmentation Head, Boundary Prediction Head and additional auxiliary head branches of the network.

Figure 4 .
Figure 4. Click for visualization.Figure (a) is the mask data, and figure (b) is the visualization of positive and negative sample points to the original image according to the mask information, where green is the positive sample point and blue is the negative sample point.

Figure 5 .
Figure 5. Image and encoded click fusion block.

Figure 6 .
Figure 6.Classes of different morphologies in the dataset.In which the red area represents the uterine wall, the green area represents the uterine cavity, the blue area represents uterine myoma, and the yellow area represents the nabothian cyst.

Figure 7 .
Figure 7. Decoupled head.Two branches, representing the main branch and boundary branch respectively, cls_head indicates the Segmentation Head of the network, and edge_head indicates the Boundary Prediction Head. å . The Chaos dataset (Kavur et al 2021) is an abdominal CT imaging and abdominal MR Imaging dataset.As shown in figure 9, Chaos dataset contains 40 cases of data, which mainly provides MRI images and labeled images of four organs, namely the spleen, liver, left kidney and right kidney.The VerSe dataset (Sekuboyina et al 2021) consists of multi-slice spiral CT images from different clinical centers and corresponding spinal annotations.As shown in figure 10, the original images from the VerSe dataset and corresponding spinal annotations are shown.

Figure 8 .
Figure 8. T2WI sagittal image of the uterus, the left image is the original image, and the right image is the labeled image with mask, in which the red area represents the uterine wall, the green area represents the uterine cavity, the blue area represents uterine myoma, and the yellow area represents the nabothian cyst.

Figure 9 .
Figure 9. Chaos dataset, the left image is the original image, the right image is the tagged image, red represents the liver, green represents the right kidney, blue for the left kidney, and yellow for the spleen.

Figure 10 .
Figure 10.VerSe dataset, the left image is the original image and the right image is the annotated image, with different colors representing different vertebrae.
al 2022), RITM-H32, (Sofiiuk et al 2022), f-BRS-B (Sofiiuk et al 2020) and FocalClick (Chen et al 2022).All of our experimental results are based on the evaluation of the test set of the data set, and the average results of the slices under the evaluation indicators are obtained.

Figure 11 .
Figure 11.Annotated visualizations using our interactive segmentation method.Figures (A)-(D) represents the ground truth mapping in the original image, and figures (a)-(d) represents the labeling of the uterine wall, uterine cavity, myoma and nabothian cyst using our method.The green dots represent artificial clicks on foreground points when annotating with the model.

Figure 12 .
Figure 12.Annotated visualizations using our interactive segmentation method.The first line in the figure represents the real label, that is, the ground truth label.The second line represents the annotation based on baseline (Ritm-method); The third line is the annotation using our method.The green dots represent artificial clicks on foreground points and the red dots represent artificial clicks on background points when annotating with the model.

Table 1 .
The number of training sets, validation sets and test sets of the three medical datasets.

Table 2 .
Compares with previous works across the standard protocol (IOU).

Table 3 .
Compares with previous works across the standard protocol (Dice).

Table 4 .
Results obtained using IOU as the evaluation metric on the Chaos dataset.

Table 5 .
Results obtained using Dice as the evaluation metric on the Chaos dataset.

Table 6 .
Results obtained using IOU as the evaluation metric on the VerSe dataset.

Table 7 .
Results obtained using Dice as the evaluation metric on the VerSe dataset.

Table 8 .
Ablation studies the impact of each component at H18.

Table 9 .
Ablation studies the impact of each component at H32.

Table 10 .
Experiment of ABL loss function with different weights.