Blood flow characterization in nailfold capillary using optical flow-assisted two-stream network and spatial-temporal image

The blood flow velocity in the nailfold capillary is an important indicator of the status of microcirculation. The conventional manual processing method is both laborious and prone to human artifacts. A feasible way to solve this problem is to use machine learning to assist in image processing and diagnosis. Inspired by the Two-Stream Convolutional Networks, this study proposes an optical flow-assisted two-stream network to segment nailfold blood vessels. Firstly, we use U-Net as the spatial flow network and the dense optical flow as the temporal stream. The results show that the optical flow information can effectively improve the integrity of the segmentation of blood vessels. The overall accuracy is 94.01 %, the Dice score is 0.8099, the IoU score is 0.6806, and the VOE score is 0.3194. Secondly, The flow velocity of the segmented blood vessel is determined by constructing the spatial-temporal (ST) image. The blood flow velocity evaluated is consistent with the typical blood flow speed reported. This study proposes a novel two-stream network for blood vessel segmentation of nailfold capillary images. Combined with ST image and line detection method, it provides an effective workflow for measuring the blood flow velocity of nailfold capillaries.


Introduction
Multiple diseases such as diabetes [1], rheumatism [2], Peripheral Artery Disease [3], and systemic sclerosis [4] can bring about changes in nailfold microcirculation. Nailfold capillary morphology in diabetic patients is significantly abnormal compared to healthy individuals, such as capillary dilatation, avascular zones, and tortuous capillaries [5]. The loop shape, flow state, flow velocity, clarity, and periloop state of microcirculation in patients with rheumatism are quite different from those of ordinary people. There are significant abnormalities in capillary morphology in patients with hepatitis B and hepatitis C [6]. The mean capillary loop density in psoriasis was significantly less than that in normal [7]. Decreased capillary density, telangiectasia, and the appearance of giant capillaries are characteristics of patients with systemic sclerosis [8]. Therefore, observing the abnormal changes of nailfold microcirculation state indicators can not only play a preventive role but also improve the possibility of a cure.
However, in most cases, microcirculation indicators still need to rely on professional doctors for manual observation and judgment, which is timeconsuming and labor-consuming. It is of great application value to study automation and intelligent diagnosis by machine learning. In recent years, deep learning has been widely used in medical research [9], such as brain tumor segmentation [10], liver tumor segmentation [11], and retinal blood vessel segmentation [12]. The research on nailfold microcirculation blood vessel segmentation is also important and meaningful. The main parameters of observation of microcirculation are vascular morphology, flow, density, curvature, etc To achieve the automation and intelligence of microcirculation diagnosis, we must first segment the blood vessel and then extract the Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. corresponding features. Chen et al [13] and Lin et al [14] averaged the registered continuous images to obtain a vascular Mask. Wen et al [15] considered vascular detection as a supervised classification problem, using random forests to predict each pixel to segment blood vessels. Berks et al [16] used a skeleton algorithm and then put it into a neural network for feature classification.
In this study, we proposed an optical flow-assisted two-stream network and an ST-based blood velocity evaluation algorithm. This article is organized as follows. In section 2, the Two-Stream Convolutional Networks, the U-Net, the CBAM, and the ST images are introduced. Section 3 introduces the nailfold capillary dataset constructed in this study. In section 4, the proposed two-stream network and blood flow speed evaluation method are introduced. The results are shown in section 5. Section 6 summarizes this research work.

Related works
2.1. Two-stream convolutional networks Two-Stream Convolutional Networks [17] were proposed by Karen et al to better solve action recognition. It proposes a two-branch convolutional network architecture, which fuses spatial flow features with temporal flow features to obtain prediction results. Karen et al proved through experiments that the convolutional neural network trained by multi-frame dense optical flow will have good performance even if the training data is small. This study is inspired by using an optical flow feature between continuous image sequences to assist vascular segmentation. [18] is a convolutional network proposed for biomedical image segmentation. The U-Net structure is roughly shown in figure 1, which is composed of an encoder and a decoder. The encoder in the left half uses four down-sampling to extract image features, and the decoder part uses four up-sampling to restore the extracted features.

U-Net U-Net
Down-sampling is the process of continuously learning the image context features to obtain deep features, and up-sampling is the process of restoring deep information to the original image size. In the process of restoring the image, to avoid the loss of features caused by down-sampling, the channels of the feature map are superimposed in each parallel layer.

CBAM
Woo et al [19] proposed CBAM(Convolutional Block Attention Module). CBAM combines channel attention with spatial attention module to make each feature more closely related in channel and space, so as to extract target features more effectively. The CBAM structure is shown in figure 2.

Spatial-temporal image
The main idea of spatial-temporal image (ST) [20] is to transform the blood flow velocity of a continuous video into the trajectory slope measurement of a planar twodimensional spatial-temporal image. The continuous sequence image is mapped to the x-axis, and the length of the vascular skeleton line after 'straightening' is mapped to the y-axis, as shown in figure 3. If the time-varying difference of the green channel of each point on the skeleton line is taken as the value of the coordinate point and mapped to 0-255, a grayscale image can be obtained, which is the ST image.

Dataset
The nailfold capillary dataset used in this study was self-constructed, the XW880 microcirculatory microscopy observation instrument is used for collecting nailfold capillary images. We collected 22 volunteers in the laboratory, randomly selected different fingers of each volunteer, and collected 10 s (30fps) for the same field of view of a single finger. Finally, about 30,000 nailfold capillary images were collected. After selection, 539 nailfold capillary images were selected and randomly divided at a ratio of about 9:1 as a training dataset and a test dataset.
In addition, we prepared 51 nailfold capillary images as a validation dataset to validate the practical application ability of the trained model. The image of the validation dataset has no intersection with the training dataset and the test dataset to ensure that the experiment is closer to the real situation.
The method used in this study is supervised learning, and the original image and label need to be given when network training. The Label refers to the manually labeled vascular Mask. Figure 4(a) is the captured nailfold microcirculation image, and figure 4(b) is the manually labeled image. The Label of the dataset used in this study was manually labeled by experts with medical backgrounds(also the author of this study). We used drawing tools to put the original image as the background layer, created a new transparent layer to depict the area of interest (vascular area), and exported this layer as ground truth.

Method
The hardware information of the experiment is as follows. The GPU is NVIDIA GeForce RTX 3090, the graphics memory is 24GB, the CPU is Intel (R) Xeon (R) Platinum 8157 CPU @ 2.30 GHz, and the number of CPUs available is 12. The parameters of the coding platform are as follows: Python 3.8, CUDA 11.3, cuDNN 8, Pytorch 1.10, Ubuntu 18.04, VNC, NVCC 11.3. The dimension of the input images is 640 * 480. We use Binary Cross Entropy Loss (BCE Loss) as the loss function, and the optimizer selects Adam (Adaptive Moment Estimation). The learning rate uses 1e-5.

Vessel segmentation
Firstly, we made some modifications to the original Unet. As shown in figure 5. For the Down Sampling of the U-Net, we use LeakyReLU instead of ReLU, and add the BatchNorm layer and Dropout layer between the convolution layer and the activation function to prevent overfitting. BatchNorm and LeakyReLU activation are added after the maximum pooling layer. Similarly, we used LeakyReLU instead of ReLU during the Up Sampling and added a BatchNorm layer and a Dropout layer between the convolution layer and the activation function. The results of the last 1 * 1 convolution are output after activation with Sigmoid. The final output is only one channel, and the tensor of each pixel will approach 0 and 1 to distinguish the foreground and background.
Inspired by the Two-Stream Convolutional Networks, this study added the optical flow feature to assist spatial features to segment blood vessels. A multimodality network structure similar to the Two-Stream Convolutional Networks is adopted. We choose U-Net as the SpatialNet, preprocess the original RGB image to obtain the green channel component as input, and output the feature map with spatial features after the U-Net network. In the Tem-poralNet, we use OpenCV to obtain the optical flow feature and then output the feature map with the temporal feature after the three-layer convolution operation. In the FusionNet, two features are fused to output the final feature map, as shown in figure 6.
The specific structure of the SpatialNet(modified U-Net) is shown in figure 5. The TemporalNet is a three-layer convolution structure. The first two layers use a 3 * 3 convolution kernel size to increase the number of channels. The third layer uses a 1 * 1 convolution kernel size for compressing the number of channels. The FusionNet uses a small structure similar to U-Net.
The specific process is as shown in figure 6. The SpatialNet reads the RGB image of the microcirculation blood vessel for preprocessing, extracts its green channel component as the network input (1 * 480 * 640), and changes the number of channels to 64 after a convolution layer. After each Down Sampling, the width and height are halved, and the number of channels is doubled. The size is 1024 * 30 * 40 at the fourth Down Sampling, and the size is unchanged after two convolutions on the fifth layer. In the Up Sampling, the width and height are doubled, and the number of channels is halved and stitched with the Down Sampling results of the previous layer (1024 * 60 * 80). The subsequent convolution reduces the number of channels by half. The Up Sampling is repeated four times to restore the size to 64 * 480 * 640. Finally, the channel is reduced to 1 by 1 * 1 convolution and activated by Sigmoid. In the TemporalNet, the current frame and its first two frames (Frame n and Frame n−1 , Frame n−1 and Frame n−2 ) use OpenCV to obtain optical flows and extract four optical flow maps of horizontal and vertical components as input. After one convolution, the number of channels is increased to 64, and then the number of convolution channels is unchanged. The last convolution compresses the number of channels to 1. In the FusionNet, the output of the SpatialNet and TemporalNet is concatenated in channel dimension as the input (2 * 480 * 640). After one Down Sampling and Up Sampling, the final output size is 1 * 480 * 640. In the feature map, each pixel tends to be 0 and 1, representing the background and foreground, respectively.
We discussed the effects of different fusion methods on blood vessel segmentation. In figure 7, TwoS-treamNet2, TwoStreamNet3, TwoStreamNet4, TwoStreamNet5, and TwoStreamNet6 are the model names with different fusion methods, respectively. The specific fusion structure is shown in figure 7. The difference between fusion method 2 and method 6 is the number of CBAM modules added. The difference between the fusion method 4 and method 5 is using different feature maps to concatenate in the up-sampling. The former is concatenated with the convolution feature map, and the latter is concatenated with the feature map through the CBAM attention mechanism. The difference between fusion method 2 and this method is the times of up-sampling and downsampling. The former is twice and the latter is once.
Fusion method 3 is an attempt made in this study, and its complexity is higher than other fusion methods in this study. The basic idea is: the output feature maps of the two streams will obtain four feature maps with the same size after convolution and CBAM. The convolution feature map and the other convolution and CBAM feature map are concatenated in channel dimension to obtain two feature maps in turn. Then, four feature maps of the same size are obtained by down-sampling and convolution respectively, and then through CBAM. Similarly, two feature maps are obtained by cross-overlapping channels. These two feature maps are respectively subjected to CBAM to obtain two feature maps of the same size. The four feature maps are concatenated through 1 * 1 convolution,

ST image and line detection
As shown in figure 3, the horizontal direction is a continuous sequence image, and the vertical direction is a straightened blood vessel. The same blood vessel of the continuous sequence image can draw one ST image. The trajectory of the blood vessel content can be regarded as a straight line. By calculating the angle between the straight line and the horizontal direction, the velocity of the blood vessel content can be calculated to estimate the blood flow velocity. The specific process is shown in figure 8(a).
1. The vascular skeleton was extracted from the vascular Mask identified in the previous work. In this study, the skeleton extraction method is based on skimage.morphology. However, the results obtained by simply using the skeleton extraction method will incur 'bone spur', which will affect the subsequent acquisition of the value of each pixel on the skeleton. Thus, this study designs an algorithm to remove the 'bone spur'. The main idea of the 'bone spur removal algorithm' is to find all the endpoints, calculate the distance from the endpoint to the branch point and set the threshold. Those endpoints with less distance than the threshold are considered 'bone spurs' and removed.
2. Due to the inevitable slight jitter of the finger during the shooting process, which is reflected in the continuous sequence image is the dislocation of the blood vessel position. Therefore, when obtaining the green channel component, it is necessary to register the continuous sequence image. We use OpenCV for image registration. The registration process is as follows: The first frame image Frame 1 is used as the query image, and the subsequent image(Frame 2, ..., Frame n ) is used as the train image. The RGB image is grayed out. We select the SIFT(Scale-invariant feature transform) [21] algorithm to perform feature detection on the query image and the train image. This step will return two sets of feature points that can be used for registration. We choose FLANN as a matcher and use the KDTREE algorithm to put the two sets of feature points returned by SIFT into the FLANN matcher and return a DMatch object, where the distance between the two sets of best feature points, the index of the query image, the index of the train image, etc Then we use cv2.findHomography() method to obtain the homography matrix, and use the homography matrix to register the train image by the affine transformation. In the registration process, to ensure the consistency of the image and the vascular Mask, similarly, we need to do the same affine transformation on the vascular Mask. Figure 9 is the registration process of two different sequence images.
3. Each pixel of the vascular skeleton is traversed and the green channel component in the RGB color of each pixel on its cross-section is obtained and averaged. This step allows the curved blood vessels to be straightened for the same operation on subsequent sequence images. As shown in figure 7(b), the main idea of the algorithm is to take the third adjacent coordinates p1 and p2 on both sides of any point p on the skeleton and calculate the reciprocal of the slope of the straight line connected to the two points, that is, the slope of the cross-section where the normal of p1p2 is located, through the slope and coordinates of the crosssection of the point p, while moving one pixel at a time on both sides of the image until a non-vascular position is encountered (marked by the blue line in figure 8(b)). By traversing each pixel of the crosssection on the original image, the green channel component and the mean value can be obtained. In addition, we perform mean processing on all sequence images of the same batch to obtain a mean vascular image, which will be used for subsequent

The line detection method was used to detect the trajectory of the vascular content on the ST image.
This study uses OpenCV's probabilistic Hough transform method (HoughLinesP) for line detection. The core idea of the HoughLinesP method is Hough transform [22]. After obtaining the trajectory, the tangent value(pixel/s) of the angle (θ) with the horizontal direction is calculated, multiplied by the frame rate (Fr, 30fps), and combined with the standard scale (Ss, μm/pixel) to obtain the blood flow velocity (V, μm/s). Figure 10(a)(b) are the standard scale and its scale under the microscope, respectively. A small scale represents 10μm. The standard scale is calculated by the actual length of the standard scale and the proportion of the actual pixels. In this study, the standard scale Ss = 1.1851 μm pixel −1 .

Evaluation metrics
We used Dice (Dice similarity coefficient), IoU (Intersection over Union), VOE (Volumetric Overlap Error), and Accuracy to evaluate the proposed method. The Dice coefficient in equation (2) and the IoU coefficient in equation (3) both represent the similarity between the two samples. The closer to 1, the closer the model prediction results are to the real results. VOE in equation (4) represents the overlap error of two samples, the lower the better. Accuracy in equation (5) is the proportion of the number of pixels predicted by the model to the total number of pixels. Vpred represents the prediction result, Vgt represents the ground truth, TP and TN are the correct foreground and background respectively, and FP and FN are the wrong foreground and background respectively.

Application validation
In order to verify the feasibility of our method in real life, we collected the same volunteer, the same finger, and the same field of view of the nailfold capillary sequence images. We collected every hour from 10 AM to 2 PM, a total of five time periods, the specific sampling time is about three-quarters per hour as shown in table 1. Due to the difference in sampling speed, the actual interval is about one hour. Each time period is collected for 10 s (30fps).
We use our proposed model to predict blood vessels in sequence images and obtain blood vessel masks. Then the skeleton algorithm is used to draw the ST image and detect the trajectory. Finally, statistics and draw a box plot to observe the trend of individual blood flow velocity.

Blood vessel segmentation using optical flowassisted two-stream network
The results are shown in table 2. Compared with U-Net, our method improves the overall accuracy by 1.18%, the Dice score by more than 0.06, the IoU by more than 0.08, and the VOE by more than 0.08. Figures 11(a)-(d) are microcirculation images, the Label, our prediction results, and U-Net prediction results. Compared with U-Net, our method can first use optical flow to segment blood vessels more completely in terms of the integrity of blood vessel segmentation, as shown in the red dotted box in Figure 10. Standard scale and its scale under a microscope. figure 11. Secondly, optical flow can assist in identifying some overexposed or non-focal blood vessels or edge blood vessel tips. In figures 11(a1)(a2), the green dashed box positions are the overexposed and nonfocal blood vessels, respectively. The blue solid box in figure 11(a2) is the edge of the blood vessel tip. Using optical flow to assist blood vessel segmentation has a better performance in these situations.
In addition, we discussed whether adding an attention mechanism can improve the ability to segment blood   show that the addition of CBAM does not effectively improve the ability of blood vessel segmentation, but still improves compared with U-Net. The reason for our analysis is that the optical flow has been able to notice some details of the blood vessels, which are the tip of the marginal blood vessels or overexposed or defocused blood vessels. In this case, the addition of the CBAM does not enhance the effect. We explored the effects of different fusion methods on vascular segmentation ability. TwoStreamNet2, TwoStreamNet3, TwoStreamNet4, TwoStreamNet5, and TwoStreamNet6 in table 2 correspond to each structure in figure 7. The results show that: 1. Comparing fusion methods 2 and 6, adding CBAM attention mechanism in the down-sampling stage is better than adding it in all stages. 2. Comparing fusion methods 4 and 5, the effect of not adding CBAM is better than adding CBAM. 3. Comparing fusion method 2 and this research method, the effect of one-layer down-sampling and upsampling is better than the two-layer structure. 4. Compared with other methods, fusion method 3 is not as effective as other methods in this study. Figures 12(a)-(d) are the process and results of skeleton extraction from a single blood vessel, which are Mask, Mask phase inversion, skeletonize, and vascular skeleton after 'bone spur' removal. To facilitate viewing the effect of the skeleton extraction algorithm, we overlay the skeleton with the vascular Mask and display it in figure 12(e). The rose-red thin line perpendicular to the skeleton slope in figure 12(f) is the cross-section of the skeleton line pixel, only the cross-sections of some pixels are shown. Figure 13 shows an unmatched image (left) and a registered image (right), respectively. The edge of the registered image is black-filled. We need to set the  The detected line is consistent with the direction of motion of vascular contents, but it will also be affected by noise. The trajectory shown by the arrow in figure 14(a2) is an abnormal value affected by noise.

Blood flow velocity and capillary morphology
We calculate the blood flow velocity for the trajectory detected by each ST image, and perform statistics after removing outliers. As shown in figure 15. Figures 15(a) -(d) correspond to figures 14(a2)-(d2) respectively. The horizontal axis represents the blood flow velocity, and the vertical axis represents the statistical quantity. According to the statistics, we can understand the distribution of blood flow velocity, the velocity near the mode represents the blood flow velocity.
To verify the practical application ability of our algorithm, we collected 300 continuous sequences of 10 s from the same blood vessel of volunteers in five time periods. Figures 16(a)-(e) is one of the same blood vessels in five time periods. Figure 17 shows the blood flow velocities after processing the same blood vessels in five time periods using the method proposed in this study. The abscissa represents five time periods, and the ordinate  represents the speed of blood flow. The position change of the observed median (the horizontal line in the box) is a gradual downward trend over time, consistent with the feelings observed during sampling.
We show the blood flow velocity and the diameters of the venules and the arterioles in five time periods in table 3. The blood flow velocity was faster at 10 AM and 11 AM, and the blood flow velocity began to slow down after 12 AM. Combined with the vascular morphology, the inner diameter of blood vessels tends to expand.

Discussion
This study constructed a nailfold capillary dataset. Inspired by the Two-Stream Convolutional Networks, we select U-Net as the SpatialNet, replace the ReLU activation function with LeakyReLU, and add the Dropout layer to avoid overfitting. The TemporalNet uses OpenCV to obtain the optical flow of the continuous image and goes through three convolutions. Finally, the feature maps of two branches are fused in the FusionNet.
Experiments and discussions are carried out on the dataset constructed in this study. It is concluded that optical flow is helpful for blood vessel segmentation, which is mainly manifested in the fact that optical flow can improve the integrity of blood vessel segmentation. It can assist in identifying some blood vessels that are overexposed or not on the focal plane or blood vessel tips that are not easily found on the edge. We draw the ST image and use the line detection method to extract the trajectory of the vascular content and calculate its angle with the horizontal direction, combined with the standard scale to obtain the blood flow velocity (μm/s). To test the feasibility of the proposed method in real life, we sampled, processed, and counted the nailfold capillaries of volunteers in five consecutive time points, calculated the blood flow velocity of each time point, and analyzed the morphological changes of blood vessels.
There are still many deficiencies in this study temporarily unresolved. When drawing the ST image, there is still some noise. The existence of these noises will lead to misjudgment in trajectory detection, which will lead to an error in the final result. Our subsequent work will still focus on improving the accuracy of blood vessel segmentation and the analysis of microcirculation characteristics.
This study provides an effective method for vascular segmentation of microcirculation images and an idea for calculating blood flow velocity by combining ST image and line detection method. This lays a foundation for the automation and intelligence of microcirculation diagnosis using nailfold capillary image sequences.

Acknowledgments
Thanks to the 22 volunteers who contributed to the dataset in this study. We thank the following funds that support this study: Research Fund for Enrolled

Data availability statement
The data cannot be made publicly available upon publication because they contain sensitive personal information. The data that support the findings of this study are available upon reasonable request from the authors.

Declarations
Ethics approval and consent to participate Participants involved in this research were informed of the research procedures and signed an informed consent form before participation. The experimental procedures involving human subjects described in this paper were approved by Zhejiang Ocean University.

Consent for publication
All authors have given consent for publication.