Development of segmentation algorithm for determining planktonic objects from microscopic images

Plankton are free-floating organisms that live, grow, and move along with the ocean currents. This free-floating organism plays important roles as primary producers, they serve as a link to energy transfer, and a factor that regulates the biogeochemical cycles. Indonesia, with almost 60% of its territory covered by the ocean, harbours a wide variety of planktonic species. However, one of the issues within usual planktonic studies is the lack of a fast and accurate method for identifying and classifying the plankton type. Thus, the computer vision methods on microscopic images were proposed to deal with the problem. The classification follows two main steps, detecting plankton location and followed by plankton differentiation. The segmentation algorithm is required to limit the determination area. The present study describes the segmentation methods on fifteen plankton types. The U-Net based architecture was implemented to segment the plankton texture from other objects. The segmentation result was also compared with the manual assessment to compute the performance parameters. The accuracy, 0.970±0.025, gives the highest value whereas the smallest value is found in the precision parameter, 0.761±0.156.


Introduction
Plankton are free-floating aquatic organisms that lives depend on water circulation. It functions as the vital foundation of life in the ecosystem, such as the driver of primary productivity, linking trophic pathways in food webs, and regulating the biogeochemical cycles in the water column [1,2]. Plankton communities also play a great role to regulate the climate by controlling the global carbon cycle, formation of clouds by the release of dimethylsulfonium propionate into the atmosphere, and even modifying the ocean's solar ray reflection [3]. In this Anthropocene era, climate change has caused a shift in global plankton distribution, changes in the community structures, reduction in diversity, which could cause the loss of diversity and abundance of higher trophic organisms that relied on plankton as their food source [3]. Furthermore, blooms of phytoplankton cells (harmful algal blooms) or jellyfish blooms due to climate changes have been regarded as one among many plagues or calamities of the seas, which has known to cause devastating damages to marine ecosystems and human life [4]. In general, plankton could be classified into two major groups: (i) phytoplankton, which consists of unicellular microalgae, and (ii) zooplankton, which consist of small animals living its whole live cycle as planktonic organisms (holoplankton) and larval stages of marine animals (meroplankton) [5]. Planktonic organisms could also be classified based on their size into 6 different groups, which are mega plankton (>20 cm), macroplankton, (2-20 cm), mesoplankton (0.2-20 mm), microplankton (20-200 µm), nanoplankton (2-20 µm), and picoplankton (0.2-2 µm) [1]. This study focused on marine phytoplankton that belongs to the size group of microplankton in Indonesian waters.
The phytoplankton group consist of a broad range of unicellular algae species, with an estimated number of species between 30,000 to 1 million species worldwide [6]. Among those species, the diatoms (Phylum: Bacillariophyta) and dinoflagellates (Phylum: Miozoa) are two major groups of phytoplankton that play an important role in the primary production and biogeochemical cycles in marine ecosystems [5]. There are 8,000 known and named species out of approximately more than 100,000 species of diatoms in the world, while about 2,200 species of dinoflagellates have been discovered and described [6,7]. As a note, Algaebase, one of the largest online taxonomic library for microalgae contain a description of 32,260 species from 54 classes and 15 phyla, in which most of them are marine diatoms [6].
As a tropical country, Indonesia is known as one of the world's marine biodiversity hotspots, which should include many unknown and undocumented phytoplankton species. However, there are no official records that estimate the number of phytoplankton species in Indonesia. Even so, studies in some Indonesian coastal and oceanic ecosystems, such as in Kepulauan Seribu (Jakarta), Lembeh Strait (North Sulawesi), and Makassar Strait, did report a number between 150 to over 400 species of phytoplankton in the ecosystem [8][9][10]. In all those studies, phytoplankton identification was done manually by a human under various microscopy techniques. However, manual identification by a human is a timeconsuming task that has several shortcomings, which includes human memory limit, psychological condition, physical condition, and taxonomic skill or experience that could lead to identification error or bias [11]. In addition, high morphological variations within several genera or species could also increase the rate of error in manual plankton identification [11]. Thus, more efficient, fast, and reliable methods to identify the phytoplankton are needed, especially because the environmental pressures from global change has caused a gradual decline in the marine phytoplankton communities all over the world [12]. At the same time, the combination of global change and increasing anthropogenic impacts leads to more frequent cases of harmful algal blooms in many coastal areas in the world [13], including Indonesia. Thus, the development of a rapid and reliable way to identify the species of phytoplankton has become a necessity to counter the issues causing by either the blooms or the species shift in the phytoplankton communities.
The objective of the research is to address the problems and limitations in manual phytoplankton identification. The computer vision technique is proposed to segment the phytoplankton region. The segmentation algorithm will automatically determine phytoplankton objects from any microscopic images. This paper is organised as follows. Section 2 presents the previous works related to the paper. Section 3 discusses the methodologies, including the data preparation, training, and testing of the proposed deep learning method. Section 4 explains the testing results of the proposed method on the dataset of plankton images. The conclusion of the research work is given in Section 5.

Previous Work
Several methods have been implemented to segment the plankton region from the acquired images. The segmentation was an initial step before implementing the classifier model. The classification system should define exactly the location and the border region of the plankton. However, the classification system can provide an accurate result if the input image shows only a clear plankton object. The binary segmentation was used to prepare the plankton dataset [14]. The plankton region was segmented by applying Otsu thresholding [15]. The proposed method determines the main axis of the plankton object using the principal component analysis. The object rotation was performed to enable the axis at a vertical position. The algorithm defines and crops the plankton's region of interest (ROI) after the completion of rotating the plankton. The cropped plankton images were collected for developing a plankton classifier system [12]. The image segmentation software is used to separate the plankton objects [16]. An unsupervised classification approach, namely k-harmonic means clustering, was applied by the software. This method can successfully localise the plankton's ROI from the original image. The segmented object was forwarded to the species classifier based on the convolutional neural network architecture. The plankton texture properties were extracted using four techniques: intensity gradient, pixel orientation, local binary pattern, and local ternary pattern [17]. These techniques were combined with the convolutional neural network to improve the pattern quality. The method combination can contribute to increasing the classifier performance. A real-time and adaptive method was developed for determining the size spectrum of the plankton [18]. The plankton data was collected by using a high speed and resolution camera. The algorithm selects and extracts the sharpest frame from the recorded video. The plankton's ROI was defined by applying edge detection and morphological methods. The segmented plankton was marked to determine the rectangle coordinates. The plankton cropping area has been created according to the segmentation coordinates.
A dynamic downscaling model was proposed to find the plankton's ROI from the images with various content [19]. The proposed method aims to separate the plankton object from the background with high complexity. The sharpness descriptor was used as an indicator to localise the plankton areas. The descriptor was computed based on the distribution of ROI gradient and pixel grayscale values. A deep learning model was offered to detect plankton species [20]. The model has been developed based on the combination of YOLOV3 and DenseNet. Plankton images were also generated by applying the CycleGAN model. This image creation aims to balance the data availability. The digital inline holography images were applied to distinguish the plankton into ten classes [21]. The hologram images are enhanced and segmented before being classified using a deep learning algorithm. The segmentation aims to provide the plankton's ROI in the form of a bounding box. The enhancement process uses a Gaussian blur filter to reduce the background noises. The applied adaptive thresholding and morphology methods were also used to complete a perfect segmentation of the plankton objects [21]. An object segmentation method based on a combination of the neural ordinary differential equations and the U-Net architecture has been proposed to segment the cell regions [22]. The segmentation method can segment the blood smear image into three parts, background, white cells, and red cells. The proposed method using a combination of neural ordinary differential equations and U-Net architecture has been shown to improve the segmentation accuracy compared to the other known segmentation methods. Therefore, this study considers utilizing those segmentation methods to develop a segmentation algorithm for phytoplankton, since there is a size similarity of the blood cells and the phytoplankton cells.

Dataset Collection
The paper used phytoplankton image samples from the Plankton Image Database (PID) which is managed by Plankton Laboratory, Research Center for Oceanography, National Research and Innovation Agency (RCO-BRIN). The PID consist of over 6,000 images in several different digital formats which were originated from samples collected from several areas of the Indonesian ocean. The phytoplankton size in the dataset belongs to the microplankton category at a size of 20 to 200 µm. The plankton images used in the paper are displayed in Figure 1. The image number of the collected plankton species is listed in Table 1. However, the species image availability in the dataset is not uniform. Some planktonic organisms were represented by more than one image whereas other plankton was only served by a single image.  Figure 1. Fifteen plankton species. The label was ordered according to the index number in Table 1.  Figure 2. Manual segmentation process and the generated binary image.

Manual Annotation
The plankton areas were manually segmented by using a web-based application, namely Supervisely [23]. The application menu provides facilities for creating polygon areas to cover the plankton region. An annotator should put the vertex points in the plankton edges. The application automatically connects the defined points to create a closed polygon. Figure 2 displays the process of manual segmentation using Supervisely. A curve edge needs more vertex points compared to the straight edges. The annotator should be more careful when annotating the wavy edges of the plankton. A total of 30 images are successfully annotated. The annotated points were used to generate a binary image of the segmented plankton region. The training stage refers to this binary image as the ground truth for improving the segmentation model.

Preparation of Training Data
The annotated images were used to generate many new pictures based on the geometric transformation. This process is known as data augmentation [24]. Therefore, a lot of input datasets can be obtained even though only a few samples are available. Several transformation methods were applied, such as translation, scaling, and rotation. The augmented images are the deformation from the original images. The total number of the samples can be multiplied by two, three, or more, which results in a greater number of augmented images than the number of the original images.
Here, the actual dataset number was duplicated twelve times higher. Hence, the sample number has increased from 30 to 360 images. This result indicates that an image can be used to create 12 other modified images. Figure 3 depicts the augmented images in the form of colour and binary formats-the transformations displayed in this figure include translation and rotation. The augmentation process also affects the result of the manual segmentation. Therefore, there will be two results, which were the original image and its masked region after the augmentation completion. The plankton dataset is then equally divided into four subgroups. The groups are called D0, D1, D2, and D3. In this study, the value of zero was the initial index for the subgroups to be suitable with the code indexing. Random selection was performed to distribute the image into four subgroups. The randomness was realised based on the permutation of the initial index.

U-Net Architecture
The U-Net architecture was used to segment the plankton and non-plankton regions. This architecture was firstly introduced [25] The U-Net architecture has been decomposed into two main parts, namely, contracting and expanding paths. Both paths have many stacked layers as implemented by the common convolutional neural network layers. The contracting path receives the input image and reduces its original dimension. The U-Net algorithm will restore the contracted size through the expanding path. These paths aim to capture the context and are followed by enabling precise localisation. The U-Net is also a powerful tool since the model can be developed from only a few labelled input images. This segmentation can help to localise the plankton area. It is assumed that the image does not acquire overlapping plankton objects. Figure 4 describes the U-Net architecture used in the paper. The full architecture is divided into four parts to enable the diagram presented in this paper. All the input images for the U-Net architecture were resized into 256 × 256 pixels. The contraction and expansion paths were composed of four blocks of convolutional neural networks. A single block consists of four layers that are ordered as follows convolutional, dropout, convolutional, and maxpooling. The expansion part was also composed of four blocks as used in the contraction part. The ReLU (Rectified Linear Unit) activation function is implemented to the output of convolutional layers [26]. The model was compiled by applying Adam optimisation [27]. The binary cross-entropy has been selected as a loss function since the output only gives two classes, plankton and non-plankton regions. The training process divides the training dataset into two parts, the training and validation, with the proportion of 80:20. The batch size was set at 16 and the maximum iteration number is defined at 50. The iteration process could be suddenly stopped when there is no significant improvement on ten consecutive iterations. The code has been written using Python and several additional libraries such as NumPy, OpenCV, and TensorFlow. The high-performance computing (HPC) facilities at National

Training and Testing Stages
As described in the previous sub-section, the training and testing stages divided the augmented dataset into four subsets. The first iteration of k-Fold validation uses subset D0, D1, and D2 to train the segmentation model [28]. The last subset, D3, was used as testing data after the training completion. This testing stage can be used to evaluate the trained model in terms of performance parameters. As conducted in the first iteration, the second step also uses three subsets for training and one subset for testing the model. However, the second iteration does not use D3 as testing data. The order of the testing subset was shifted to D0. Thus, D1, D2, and D3 were used as the training dataset. A similar arrangement needs to be applied for the third and fourth iteration. Figure 5 displays the complete iteration of the k-Fold validation. The performance parameter values, such as accuracy, precision, recall, and F1 score, were computed in each iteration step. The equations of performance parameters were written in Equations 1, 2, 3, and 4, respectively. The average value of k-Fold validation was determined by accumulating the performance values in each iteration.

Results and Discussion
Testing results of k-Fold validation is presented in Table 2. The 3rd fold provides the highest accuracy compared to the other folds. The maximum accuracy is 0.974 ± 0.019. Meanwhile, the 4th fold gives the best of precision and F1 score values. Their values are 0.802 ± 0.132 and 0.843 ± 0.095, respectively. The best recall is achieved at the 1st fold iteration with a value of 0.944 ± 0.058. Only the 2nd fold cannot provide the highest value among the performance results. The averaging function was used to determine the average value of each performance parameter. The function takes performance parameter values from the results of k-Fold validation. According to the average values, the accuracy and the recall variables, 0.970 and 0.908, provide results larger than 0.90. The F1 score, 0.815, achieves greater than 0.80 but still less than 0.90. The smallest amount was found at the precision parameter, whereas its value was only 0.761. The trained model was proven stable in different five folds of the dataset. There was no significant variation in the performance result of k-Fold validation. The segmentation algorithm has shown to be effective to deal with the geometric transformation of the test dataset. The average precision value was lower than the average recall value. This fact can describe that the segmentation algorithm can detect all of the existing plankton regions. Secondly, a lower precision value represents that the coverage of the plankton area is always smaller than the image size. The precision could be higher if the plankton size dominates the image area. Performance observation can be conducted separately for each plankton species. The performance results according to plankton species are listed in Table 3. Here, the F1 score is used as a parameter to evaluate the segmentation performance. Four species can reach an F1 score greater than 0.90. The species names are 4-Ceratium tripos (0.904), 10-Protoperidinium faltipes (0.901), 12-Skeletonema costatum (0.913), and 15-Thalassiosira sp. (0.920). Figure 6 depicts the perfect segmentation of Thalassiosira sp. Figure 6.a is the original image in the RGB colour format. The algorithm can distinguish the plankton region from the non-plankton regions. There was no significant difference between the binary image of manual segmentation (Figure 6.b) and detected by the proposed algorithm (Figure 6.c). Figure 6.d is an overlay between the detected plankton region and the original image. The non-plankton region is covered by black area and will not be considered in the classification process.  Meanwhile, the segmentation algorithm could not perform at the expected level on four plankton species. The F1 score of the species was lower than 0.75. The species names with low performance values were 1-Bacteriastrum furcatum (0.722), 2-Ceratium fusus (0.706), 7-Chaetoceros diversus (0.744), and 8-Chaetoceros laciniosus (0.574). An example of the segmentation result is displayed in Figure 7. The proposed classifier algorithm was applied to a microscopic image of C. laciniosus. The segmentation result (Figure 7.c.) does not completely match with the reference image based on manual segmentation (Figure 7.b.). A significant segmentation coverage was found in the surrounding area of the plankton's setae. The setae are in the form of a thin hair-like cell structure that grows on the body surface of the plankton. The plankton region defined by the algorithm extends from the reference regions. The algorithm still considers this extension as a part of the plankton body that causes inaccurate segmentation. This inaccurate segmentation can lessen the algorithm performance. The mistake in detecting the plankton region can occur due to the high similarity of the surrounding area of the plankton setae. The example shown in Figure 7.a., the other plankton can be seen on the top part of the observed plankton. This condition shows two plankton species that are partially overlapping each other. Therefore, the existence of the other plankton was interpreted as part of the viewed object.

Conclusion
The paper has proposed a computer vision technique to segment the plankton and non-plankton regions. The algorithm was developed based on the U-Net architecture and trained by using annotated dataset. Evaluation of k-fold validation proves that the trained algorithm can provide high accuracy and F1 score values. The segmentation algorithm analysed microscopic images of several plankton species, such as Bacteriastrum furcatum, Ceratium fusus, Ceratium trichoceros, Ceratium tripos, Chaetoceros affinis, Chaetoceros auxospore, Chaetoceros diversus, Chaetoceros laciniosus, Guinardia delicatula, Protoperidinium faltipes, Pseudo-nitzschia sp., Skeletonema costatum, Thalassionema nitzschioides, and Thalassiosira sp.. Data augmentation was applied to increase the training size of the proposed segmentation model. The segmentation result has been verified by manual segmentation to compute the performance parameters. According to average result of four k-fold validation, the accuracy, precision, recall, and F1 score are 0.970 ± 0.025, 0.761 ± 0.156, 0.908 ± 0.104, and 0.815 ± 0.117 respectively. The best segmentation of the plankton region can be found at species Thalassiosira sp. that indicated by the highest F1 score of 0.920. Misclassification can occur when either the plankton region is too thin or overlaps with the other plankton. This research has the bright future to be further developed and used  10 widely to help phytoplankton identification morphologically. A more in-depth analysis system needs to be carried out, especially for the identification of phytoplankton species that have similarities in shape, size, or in an overlapping position.