AI Enabled Ensemble Deep Learning Method for Automated Sensing and Quantification of DNA Damage in Comet Assay

Comet assay is a widely used technique to assess and quantify DNA damage in individual cells. Recently, researchers have applied various deep learning techniques to automate the analysis of comet assay. Image analysis using deep learning allows combining multiple parameters of images and performing computation at a pixel level to provide quantifiable information about the comets. The current deep learning analysis algorithms use a single neural network as a standard method, which relies on many comet images and prone to high variance in predictions. Here, we propose a new ensemble model consisting of a collection of deep learning networks with different configurations and different initial random weights trained on the same dataset to calculate one weighted prediction for DNA damage quantification. To develop this model, we curated a trainable comet assay image dataset consisting of1309 images with 9204 extracted features of cell head and tail length, area, etc With the proposed method we could achieve significantly higher accuracy (R2 = 89.3%, compared to 74% with the standard single neural network as reported in data published by M. D. Zeiler and R Fergus (European conference on computer vision, pp. 818–833 2014). Furthermore, deep regression with the proposed architecture produced much more reliable and accurate results than conventional method.

Although there are several different ways to assess DNA damage, comet assay has proven to be the most quantitative and economical. Comet assays use electrophoresis, a technique in which DNA moves in the presence of an electric field, to measure the extent of DNA damage within cells. In cells with only minor DNA damage, the movement of the DNA strands is minimal. However, in cells with large amounts of DNA damage, DNA moves out of the center of the cell and creates a tail, resembling a comet shape. 1 In comet assay images, the size and intensity of the tail and head are indicators of the amount of DNA material in them, and the extent of DNA damage is assessed by the ratio of DNA in the tail to DNA in the head. 2 Several image analysis tools, such as OpenComet, HiComet, CometQ and Comet Score, are available for comet assay analysis. [3][4][5][6][7][8] All these tools have predefined methods of segmenting and assessing the comet scores; however, they all rely on manually annotated image features with generic machine learning techniques such as support vector machines. Manual annotation and tuning image features are laborious and time intensive tasks that limit the efficiency of these methods when the images are noisy and contain multiple aspect ratios. In contrast, neural networks are capable of ingesting raw images as input and automatically learning to detect and score comets in an end-to-end process. Neural networks are increasingly getting better and more efficient at image processing and can often outperform manually created features while easily trained through transfer learning with the use of large publicly available image datasets. However, despite the classification accuracy provided by neural nets, there is still significant room for improvement in their prediction applications. For example, 9 used this approach to quantify DNA damage in comet assays, and used a single network (VGG-19) with a small dataset of around 200 comets for training and testing, resulting in a simple and efficient architecture. This encouraged us to create a bigger network with a larger dataset which resulted into a more accurate (from 81% R-squared score at 0.001 learning rate and 200 epochs to 89.34% R-squared score) and robust architecture with lesser loss. 10 simplified the analysis by creating 4 categories based on the level of DNA damage (healthy, poorly defective, defective and very defective), rather than calculating the exact amount of damage in 120 test samples. Binning cells in this way provides the opportunity to create a very accurate classification model but fails to achieve a precise quantification score. Work from 11 addressed issues involving limited datasets in DNA damage quantification by creating a dataset consisting of 1307 images with 8271 manually annotated comets, and used Mask R-CNN network for training and testing. Mask R-CNN is a commonly used deep learning network with an aim to produce a set of bounding boxes from an input image. 12 Mask R-CNN is a modification of RCNN (Region based Convolutional neural network) primarily used for object detection where it partitions an image into multiple segments, which is then used for recognition and detection. 13 With a significantly larger dataset and Mask R-CNN model, they were able to set a pipeline for comet image detection, segmentation and quantification with a mean average precision (mAP) of 0.61%. Their model was primarily created to show how deep learning could automate the segmentation of comets in an image and quantify DNA damage using a single model, paving the way for other researchers to create performance models for specific tasks that would achieve more accurate quantification of DNA damage. The more recent papers including 14 used Faster R-CNN model with transfer learning with Buccal Mucosa dataset and data augmentation (365 comets to 1095 comets for training) as a web-based tool, with an accuracy of only 66.67% for classification task, they had a lower classification accuracy as they focused on presenting a demonstration on how transfer learning could be combined with Faster R-CNN model. Another implementation 15 used the same Buccal Mucosa dataset to implement a classification technique, they used CNN with Extreme Learning Machine to create a 5 class classification model with 96.96% accuracy score, their model showed that classification application can build a very strong model with DNA damage segregation.
Transfer learning 16 is an optimized way of repurposing a model trained on an original dataset and task on a new dataset, which allows rapid progress and improved performance when modeling the target task. Another advantage of transfer learning is that an efficient machine learning model can be built with relatively less training data, because the model is already pre-trained. During the transfer learning process, we first train a base network on a commonly available dataset and task, and then exploit the knowledge gained z E-mail: kamatbiotech@gmail.com; sbhansa@fiu.edu from the previous task to improve generalization and use it to retrain the learned features or transfer them to a primary target network to be trained on a custom dataset and task. This procedure tends to work if the learned features are general, meaning they are suitable to both initial and target tasks, instead of specific to the base task, e.g., calculating the pixel intensity of the cell. In the case of deep learning, this is achieved by freezing the initial network layers after training on the first task and then fine-tuning the rest of the layers to learn the target task. Popular deep learning architectures like VGG and GoogleNet are trained on publicly available datasets like Coco or ImageNet, 17 and the trained weights are saved so that they can be used for transfer learning.
Ensemble modeling 18 is a technique in which multiple diverse models are grouped together to predict an outcome. This is done with either multiple modeling algorithms or multiple different datasets. The ensemble model averages predictions from each model  to generate one final prediction. This is particularly helpful to generalize errors from individual models. Since each model is susceptible to different kinds of errors, the prediction error decreases with the ensemble approach. Model aggregation can be further improved by weighing the contribution of each model before the final prediction, which is achieved by training a new model to learn from the mistakes of the individual sub-models. This approach is called stacked ensemble generalization. Although the ensemble model has multiple base models at the initial stage, it acts and performs as a single model. Here, we made several attempts to improve existing comet assay analysis models and data pipelines to optimize DNA damage quantification. We used transfer learning based on multiple networks pre-trained on ImageNet data, which enabled us to train the initial models on large object detection and classification datasets, and then apply the learned weights to our comet assay dataset. Furthermore, we performed stacked ensemble generalization to create a more robust analysis tool that achieves higher accuracy than any of the initial base models. This approach can not only make better predictions but also greatly reduces dispersion of the predictions.

Materials and Methods
Image acquisition.-The comet dataset (Fig. 1) for the ensemble model was created using the Trevigen comet assay kit (4250-050-ESK, R&D Systems). The dataset was created in two batches; one batch of neurons differentiated from human pluripotent stem cells was processed using the alkaline comet assay protocol, and a second batch was processed using the neutral comet assay protocol. The slides used during the experiments were imaged with fluorescent microscopy captured on a Nikon Ti-E inverted microscope with Plan Apo 10x/0.45 DIC slider objective. A total of 1309 images with one cell each were created from these experiments. Out of these, 1047 images (80%) were used for training the model, 131 images (10%) for validation and 131 images (10%) for testing of the model.
Annotation of comets.-The ground truth was generated by semi-automated analysis. In short, comets were re-oriented so that all tails stretched downward from the heads. Then, binary thresholding was performed to separate out meaningful objects from the background. Noise and comets that were out of bounds were removed, while the rest of the detected comet objects were cropped into 273 × 143-pixel images (with 0.6465 μm per pixel). Any crop containing tails only, overlapping comets, or artifacts were filtered using criteria such as size, circularity, and intensity. During segmentation, comet heads and tails were identified using differential thresholding. To quantify DNA damage, various comet head and tail features were then calculated, including area, length and DNA content (Table I). All values were saved into a CSV for further quantitative analysis.
Data preprocessing.-The dataset was preprocessed to ensure that clean images were provided as the model's input. We verified image size and adjusted images to grayscale. Grayscale images are more computationally efficient as the neural networks would only have to process two-dimensional data instead of three. All the images were one uniform size (215 × 417 pixels) for training and testing purposes, and resizing was done, if necessary, without losing any pixel information. Since neural networks usually require a significantly larger amount of data, data augmentation techniques were applied to increase the size of the training set. This involved position  augmentation consisting of image rotation, zoom-in, zoom-out, flipping and translation operations such as moving the main subject around the white space. These operations were applied on the images without changing the overall aspect ratio or altering the content of the images, keeping ratio of the head size vs tail size consistent.
DNA damage quantification performance.-The output from the ensemble architecture was a predicted number which indicates the percentage of damage. In this regression supervised model, the terms involved are described in the following Table II: The three main metrics were:    (Fig. 2) comprised a custom-made model stacked together with 3 pre-trained models. Once the images were preprocessed, we applied a bagging procedure, a technique to create multiple versions of datasets used to generate an aggregated predictor to make sure the model is trained without any bias and variance. The bagged models are effective because each submodel is fit on a slightly different training dataset, which in turn allows each submodel to learn differently and make slightly different predictions. The bagging algorithm was applied to all the models separately.
The models used for this architecture were a custom model, VGG19 and Xception.
Custom model.-With convolutional networks becoming increasingly more optimized and better at classifications in the computer vision field, several attempts were made to create a new model to improve and achieve better accuracy on the curated dataset. We developed a significantly more accurate ConvNet architecture by creating layers specifically optimized and tuned for our use case, which not only achieved outstanding performance on our dataset but also produced comparable performance on publicly available datasets.
During training (Fig. 3), the input to the layer was resized from (215 × 417 pixels) to (278, 143) RGB image to decrease the starting number of parameters. The image was passed through a series of convolutional layers, which used filters with a very small receptive field: 3 × 3 with Relu as the activation function. Each convolutional layer was followed by a max pooling layer with a pixel window of 2 × 2. The stack of convolutional layers was followed by two dense layers, the first with a 128-dimensional hidden layer and the second with 1 hidden layer to provide us with the final prediction score,. 19,20 VGG-19.-VGG-19 is a trained convolutional neural network created by the Visual Geometry Group at the University of Oxford. 21 The notation 19 stands for the nineteen layers with trainable weights, which is sixteen convolutional layers and three connected layers. The model was originally trained on the ImageNet challenge, consisting of a 1000-class classification model. As shown in Fig. 4, the network takes a (224, 224, 3) RGB image as input, to which conv3-64, that is 64 (3,3) square filters, are applied. To ensure the network is compatible for our use case, the images were resized from (215, 417, 3) to (224, 224, 3) pixels. All the convolutional layers use (3,3) filters. There are 5 sets of convolutional layers, the first set of 2 have 64 filters, the next 2 convolutional layers have 128 filters, the next 4 convolutional layers have 256 filters, and the next 2 sets have 4 convolutional layers each, with 512 filters per layer. There are max pooling layers in between each set of convolutional layers, which have 2 × 2 filters with a stride of 2 (pixels). The output of the last pooling layer is flattened and fed to a fully connected layer with 4096 neurons. The output goes to another fully connected layer with 4096 neurons, whose output is fed into another fully connected layer with 1000 neurons. All these layers are ReLU activated. Finally, there is a softmax layer which uses cross entropy loss. For our use case, all the layers except the softmax layer were frozen, and an additional two fully connected dense layers with 256 and 128 neurons respectively were added along with a dense layer for final prediction.
Xception model.-The Xception model (Fig. 5) is an extension of the inception model, which started as a module for Googlenet. It replaces the standard modules with depth-wise separable convolutions. It also uses one of the smallest weight serialization techniques, which makes it considerably faster. As explained for the VGG model above, the entire model trained over the ImageNet dataset was frozen, the images were resized from (215, 417, 3) to (299, 299, 3) pixels and an additional three fully connected dense layers with 256, 128 and 1 neurons were added along for final prediction. 22

Result and Evaluation
Training and testing parameters.-The VGG-19 and Xception models were both pre-trained on the ImageNet dataset and further trained on our dataset through transfer learning. Bayesian Optimization 23 was used to find the best possible parameters for the algorithms, the Bayesian optimalization relies on a converging iterative process where the choice of hyperparameters depends on the historical attempts. Using the optimization algorithms, the activation function used throughout the layers was "relu", with MSE as the loss function. We also observed that the Adam optimization algorithm 24 worked the best as the optimizer, with a learning rate of 0.0001. Because this was a regressive model, we used R-squared ratio as the performance metric, rather than accuracy. For the ensemble model, we used XGBRegressor with a learning rate of 0.01, 500 estimators, maximum depth of 3 and squared error as the loss function. In the process of predicting the extent of DNA damage, we obtained feature correlation results and incremental accuracy based on R squared error.
Feature Importance.-One of the challenges in fine-tuning the model is identifying the features that contribute the most to the result. This process, feature selection, is done using data and feature correlation. Data correlation helps in understanding the relationship and the association between different variables. As shown in Table III, comet area is highly correlated with comet DNA content (0.92), tail area (0.99) and tail DNA content (0.97), whereas body length and tail length are highly correlated (0.99), and so on. Feature selection led to removal of perfectly correlated features and reduced the dimension of the dataset while preserving all the necessary information about the comet.
Ensemble performance.-After training and testing, we compared the performances of the individual models to a range of combinations of several models. As seen from Fig. 6, our deep ConvNet model outperformed the high performing models such as VGG-19 and Xception with respect to DNA damage quantification. Table IV shows our model outperforms all the existing predictive quantification architectures. The combination of multiple models improved performance due to increased learning capability and reduced frequency of errors. As a result, compared to each model individually, stacking the models together improved the R squared accuracy score by 3.5%, 4.6% and 7.65% compared to the deep ConvNet model, VGG-19 and Xception models, respectively.

Conclusions
Well organized and annotated datasets play a critical role in developing robust and efficient deep learning frameworks. As there were no publicly available datasets available to train the models, we constructed a fully annotated dataset containing the percentage of DNA damage for 1309 comet assay images with 9204 extracted features to test and train quantification models.
Here, we have developed an ensemble architecture which comprises a combination of large classification models VGG-19, Xception and a custom ConvNet model. VGG-19 and Xception are some of the most popular architectures developed specifically for object localization and recognition, and they predict the extent of DNA damage exceptionally well in comet assay images. The custom model ConvNet was built after several tests and trials and converged efficiently for our dataset. Combination of all three models to create a stronger ensemble model achieved a high accuracy of comet assay quantification (>89% R-squared score). We also demonstrate that an

Model Accuracy
OpenComet 7 11.5% Faster R-CNN 13 61% Faster R-CNN with transfer learning (5-classClassification) 14 66.67% VGG-19 9 81% CNN with Extreme Learning Machine (5-class Classification) 15 96.96% Ensemble (Custom model ConvNet,  89.34% ensemble architecture creates an opportunity to capture different strengths of various models to build one stronger model. This can facilitate future research on finding other neural network models that generalizes and quantifies the image in a different manner and its effect when added into a similar ensemble network. In future we aim to improvise this architecture encompassing better models and implement an image augmentation process to create a much larger dataset to develop better accuracy AI/ML algorithms for image analysis.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.