Development of a cell counting system based on machine learning

Endothelial cells of the aorta are an excellent model system for studying inflammatory responses, angiogenesis, atherosclerosis, blood clotting, vascular contraction, and vasodilation. Cell counting is necessary when culturing human aortic endothelial cells (HAECs) in order to determine cell concentration and quantity, as well as to assess cell viability and proliferation. Here, a machine learning-based image recognition system was developed to create a cell counter. The system’s hardware includes a digital microscope, Raspberry Pi, and operating screen. Image capture, image processing, machine learning, and computer recognition are utilized in the processing techniques. The processing steps consist of training and testing stages. In the training stage, five HAECs images were selected as training samples while the remaining HAECs images were used as testing samples. Positive and negative samples were labeled using LabelImg and used to generate training images for the classifier program. The classifier program was trained using the built-in Adaboost and LBP models in Opencv to create a HAECs classifier. The system achieved a recognition rate of 95% for HAECs and 98% for colon cells in practical tests, demonstrating that this technology can be used as a tool for cell counting and can replace expensive and potentially inaccurate commercial cell counting software, making cell counting a more practical technique.


Introduction
In traditional cell culture processes, cell growth rates are typically determined through manual cell counting.Automated cell counters that use nuclear fluorescence staining have been developed to accurately detect various types of cells, including live cells, dead cells, cell clusters, and low-activity cells [1].However, traditional cell counting software typically uses contour segmentation algorithms or watershed segmentation algorithms to identify 2D cells attached to the bottom of culture dishes and may encounter difficulties in counting cells that overlap or aggregate due to environmental factors [2].Artificial neural networks have been applied to biology, and can recognize, infer, and determine cell counts by learning cell images [3].They can quickly and accurately segment images, infer patterns by seeing part of an image, and label cells.Therefore, they provide a good model for cell detection and morphological measurement [4].By using machine learning cell counters, accurate cell counts can be identified to reduce cell counting errors and obtain high-quality experimental data [5].This paper aims to construct a system that can accurately and efficiently count cell numbers and cell growth rates by using optical microscope images and computer vision technologies to mark the growth status of cells at that time, facilitating drug research.This can improve the speed of research data updates and reduce labor costs.To teach the computer to recognize what a cell is, we need to provide it with correct cell images (positive samples) and non-cell images (negative samples).We then mark our target (cell) as positive samples, and any other unmarked area becomes a negative sample [6].We use "turning" (angles) positive samples to enhance the number of positive samples, and then train a "cell" detector classifier using these positive and negative samples [7].Finally, we use this classifier to recognize the samples, which are non-training images.

System architecture planning
This study divides the detection of human aortic endothelial cells (HAECs) into two stages: training and testing, as shown in Figure 1.Both the training samples in the training stage and the testing samples in the testing stage are obtained from the same procedure.The cells are cultured, and several HAECs images are obtained using an optical microscope platform.Each HAECs image contains more than one thousand HAECs, as shown in Figure 2 [8].Some of the HAECs images is selected as the training sample, and the other HAECs images are used as the testing samples.The part of the image containing the cell is selected as the positive sample, and the part without the cell is selected as the negative sample for the training sample.LabelImg is used to label the positive (cell) and negative (non-cell) samples, and an xml file is generated to record the coordinates of the selected cells [9].The xml file is then combined with the image processing program to generate positive and negative images for training.The data augmentation process rotates the original selected positive sample image to obtain new positive samples, thereby increasing the number of positive samples.The training classifier program uses the built-in Adaboost and LBP models in Opencv to train a "human aortic endothelial cell" classifier, so that the pixels belonging to the cell on the image can be marked after local binarization and separated from the pixels in the culture medium [10].The classification program uses the Adaboost and LBP models already trained in Opencv to test the testing samples and label the position of the cells in the testing samples.

Sample labelling
When using object detection in images, a large known dataset is needed, which includes photos along with the object's location and name.Typically, for preparing this type of data, manual labelling is used in the initial stages.In this study, the image annotation tool LabelImg was used to label the samples and create positive and negative sample datasets.LabelImg is a small tool used to label the object's location and name in the photo.It is written in Python and uses Qt as its graphical interface.Annotations are saved in XML format according to the PASCAL VOC format (used by ImageNet).In addition, it also supports YOLO and CreateML formats [11].The operation screen is shown in Figure 3.

Local binary pattern
Due to the influence of lighting intensity and rotation on images, local binary patterns (LBP) have been proposed to reduce the impact of lighting intensity and rotation on samples.LBP is a method used to describe local texture features of images, which has rotational invariance and grayscale invariance.The changes in illumination do not affect the processing.The intensity of each cell captured by the camera is not restricted too much, and the cell may be detected even if it rotates or deforms during growth.Figure 4 shows the processing flow of LBP.Next, we use this figure as an example to illustrate the processing of LBP.Step 2. Split the sample image into 3x3 segments: split the cell image into cells with a size of 3x3, as shown in Figure 4(c).Other sizes can also be selected in practical applications.
Step 3. Compare with adjacent points: If we call these 3x3 segments, then compare the grayscale value of the center pixel of each segment with the grayscale values of the eight adjacent pixels, as shown in Figure 4(d).If the value of any of the eight adjacent pixels is greater than the value of the center pixel, the position of that adjacent pixel is marked as 1, otherwise, it is marked as 0, as shown in Figure 4(e).
Step 4. Calculate the LBP mask value: After comparison, we get the table on the right of the figure.Then, starting from the upper left corner in a clockwise direction, we can obtain a binary value of 11111011, which is equivalent to 251 in decimal, as shown in Figure 4(f).Therefore, the grayscale value of the cell is redefined as 251, and this value is called the LBP mask, which can be used to reflect the texture information of the area.
Step 5. Obtain LBP image: Each pixel can be obtained by its neighboring eight pixels using the LBP mask.Therefore, the above method can be used to convert the pixels of a grayscale image to an LBP feature map, as shown in Figure 4(g).The size of the map is 1/9 of Figure 4(a), which helps to improve processing speed.The obtained original LBP feature is still a picture , called the LBP feature map.Because the LBP feature map is obtained by comparing with the adjacent elements, the grayscale change of the entire image is overall, so the LBP feature map is robust to illumination changes.
Step 6. Calculate LBP histogram: In the application of the local binary mode, the LBP feature map is generally not used as the feature vector for classification recognition, but the statistical histogram is used as the feature vector for classification recognition.With the LBP values of each image region, we can convert them into a histogram, with the X-axis representing pixel intensity and the Y-axis representing the number of pixels, as shown in Figure 4(h).This makes it more convenient for machine learning operations.Each image region has its own histogram, and these histograms are combined, the histogram for the entire image is obtained.
As can be seen from the above steps, since the LBP mask is determined by "comparison" with the surrounding points, if the brightness or grayscale of the entire image changes, that is, the grayscale values of all pixels in the entire image increase or decrease by a fixed value, the result after LBP calculation is still 00111000, which has no effect on the result.This ensures that the light intensity of each cell captured is not restricted, and that cells may rotate or deform during growth, but they can still be detected.Therefore, the LBP algorithm can resist the impact of illumination changes.

Adaptive boosting classifier
Adaptive boosting (Adaboost) was proposed by Yoav Freund and Robert Schapire in 1995.Its adaptivity lies in the fact that the misclassified samples of the previous base classifier are boosted, and the weighted samples are used again to train the next base classifier.At the same time, a new weak classifier is added in each round until a sufficiently small error rate is reached or a maximum predefined iteration number is reached.Assuming X represents the sample space and Y represents the set of sample class labels, the problem is a binary classification problem, and Y={-1, +1} is restricted here.Let S={(Xi,yi)|i=1,2,...,m} be the training set of samples, where Xi∈X and yi∈Y.Wi is the weight distribution of the training data, and α is the weight of the weak classifier.The entire Adaboost iteration algorithm is shown in Figure 5 and consists of three steps: Step 2. Train the weak classifier.Specifically, if a sample point has already been accurately classified, its weight is reduced when constructing the next training set.Conversely, if a sample point is not accurately classified, its weight is increased.Then, the weight-updated sample set is used to train the next classifier, and the entire training process is iterated accordingly.
Step 3. Combine the various trained weak classifiers into a strong classifier.After the training process of each weak classifier is completed, the weight of the weak classifiers with smaller classification error rates is increased, so that they play a larger decision-making role in the final classification function.Conversely, the weight of weak classifiers with larger classification error rates is reduced, so that they play a smaller decision-making role in the final classification function.
In other words, weak classifiers with low error rates have a larger weight in the final classifier, otherwise a smaller weight.

Recognition ability
In terms of detection, we mainly use the "face cascade.detectMultiScale( )" function, which has four main parameters: scaleFactor, minNeighobors, minSize, and maxSize.MinSize and maxSize are a range set to select a suitable size of the cell, and our settings are (5,5) and (30,30) respectively.ScaleFactor is the scaling factor used when searching for images, and we set it to 1.06 times.This parameter is used to detect cells of different sizes.MinNeighobors is the number of times that the object needs to be determined as the target object, and our method is set to 30 times.
Figure 6 shows the results of testing the first image.There are a total of 702 cells (the total number of cells manually counted in the image), and 665 cells (the total number of cells recognized by the computer).The recognition rate analysis is shown in Table 1.True positive means that the cell selected by the computer is correct, false positive means that the cell selected by the computer is not a correct cell or multiple cells are selected in the same window, and false negative means the correct cell is not selected.Table 1.Recognition rate analysis.

Value (%)
True positive rate 94.7 False positive rate (including multiple cells are selected)

2.5
False negative rate (the correct cell is not selected) 2.8 Figure 7 shows the result of testing human colon cells [12], where there was a total of 77 cells (excluding incomplete cells) manually counted in the image and 76 cells recognized by the computer.
The analysis of the recognition rate is shown in Table 2.  False negative rate (the correct cell is not selected) 1.4 There are also some methods that use erosion transformation, distance transformation, binarization, and other technologies to calculate the area of the target object and distinguish and mark the position of the target object.However, these methods are difficult to identify in dense cells.This method uses a classifier to identify target cells, whether it is a dense cell or a sparse cell, satisfactory results can be achieved.The recognition ability of this method was confirmed.

Physical structure
The physical structure is shown in Figure 8.We set up the system on a Raspberry Pi and connected it to a USB microscope.The captured images were processed by the program, and the number of cells and their labelling status were displayed for researchers to record and observe.The system takes about 30 seconds to complete the recognition of an image.To allow for more convenient operation, the image recognized as a cell will be marked with a red frame, and the number of red frames will be displayed for the user to confirm again.The operator can capture the current image and customize the path and name for saving the captured image.This makes it more convenient for researchers in experiments or teaching.

Conclusion
This paper discusses the application of machine learning technology in cell counting.The technical architecture of this paper integrates technologies such as Adaboost and LBP based on the algorithm of the image recognition system.These techniques have empirically worked well for image recognition.This study plans to do cell counts on sparse cells and dense cells.The sparse cell takes human colon cells as an example, and dense cells take HAECs as an example.In the analysis results, the recognition rate can reach more than 94%.Among them, the recognition rate of sparse cells will be higher, because the graphics of sparse cells are easier to recognize.Cells located at the edge of the picture may be missed due to incomplete graphics.But even if there are cells that are missed in recognition, users in the future can easily identify them with the naked eye, reducing the burden on users.In this paper, this technology is realized on a machine, which proves the practicability of this method.This system can be applied in real-time analysis and offline analysis.For offline analysis, the microscope first takes a picture, and then opens the picture through the program for recognition.Real-time analysis means that the machine reads the microscope images in real time, recognizes them in real time, and displays the results.Both of these approaches will have practical significance in biotechnology in the future.

Figure 3 .
Figure 3. LabelImg operation screen.2.3.Local binary patternDue to the influence of lighting intensity and rotation on images, local binary patterns (LBP) have been proposed to reduce the impact of lighting intensity and rotation on samples.LBP is a method used to describe local texture features of images, which has rotational invariance and grayscale invariance.The changes in illumination do not affect the processing.The intensity of each cell captured by the camera is not restricted too much, and the cell may be detected even if it rotates or deforms during growth.Figure4shows the processing flow of LBP.Next, we use this figure as an example to illustrate the processing of LBP.

Table 2 .
Recognition rate analysis.