Adaptive Learning Image Tracking Algorithm Based on Characteristic Fusion

To solve the problem of model free real-time image tracking, this paper proposes an image tracking algorithm based on characteristic fusion to adaptively learn the historical characteristics of the target(CFAL). In this algorithm, the color feature is redesigned by using the idea of hog to achieve the feature consistency; the particle filter is combined with the reality constraint to improve the matching efficiency. Finally, the algorithm is programmed and tested in reality, animation and video scenes. The experimental results show that adaptive feature learning and weight adjustment can improve the tracking effect and CFAL can track the target stably and reliably under the environment of attitude change and partial occlusion.


Introduction
Image tracking has always been a problem in the field of computer vision [1,2]. Image tracking is to find the target in the subsequent frames after knowing the target position in the first frame. The lack of prior knowledge, the occlusion of the target, the disappearance of the target, the motion blur, the drastic changes of the target and the environment, the high-speed movement of the target, and the shaking of the camera will all have an impact on target tracking. These factors have made image tracking a difficult problem in the CV field. In this paper, an adaptive learning target tracking algorithm based on feature fusion is proposed, which we call CFAL. The algorithm mainly includes feature extraction, target matching and adaptive learning. For a set of image sequences, multi-dimensional image features are extracted and fused [3,4]. Based on feature matching degree [5] and practical constraints, the region with the highest matching degree is taken as the target. Thus, the tracking target is identified from the image of the next frame. The identified target image is used as new information for adaptive learning to improve the generalization of target features.

Histogram of color features: Lab spatial color distribution
Lab [6] color space has perceptual uniformity and Euclidean distance invariance. Therefore, using the color channel of Lab to extract the color histogram of the image, as the color feature of the image, it not * These authors are joint corresponding authors (Prof. Jing Xu, Dr. Lei Liu). 2 only conforms to the true perception of colors, but also reduces the color from the three dimensions of RGB to the two dimensions of a and b.
2.1.1. Production of histogram of color distribution. For each block in the image, traverse the pixels in the block, and each pixel can be assigned to a bin [7]. Finally, the number of pixels in each bin is counted and normalized to generate a color feature histogram. The histogram reflects the color distribution of the block.

Process of color feature extraction.
We divide the matching window into cells. Four cells constitute a block. From left to right, from top to bottom, the color histogram of the block is extracted and stitched by cells. Finally, we can get a feature vector group describing the color of the matching region.

Histogram of edge features: directional gradients
Edge feature extraction, uses the classical direction gradient histogram (HOG) [8,9,10] method. The specific content will not be expanded here. The key is to unify the form of edge features and color features to facilitate feature fusion.

Feature matching method
2.3.1. Block matching. Each block obtains a set of normalized feature histograms as the distribution of features. The concept of KL divergence [11] is introduced to evaluate the similarity of feature distributions between two blocks. KL divergence is a tool used to describe the difference between two distributions. For ) || ( q p DKL , the higher the similarity, the smaller the value. In order to make the similarity positively correlated with the final output, a Gaussian mapping function is added. The matching degree function between two blocks is defined as: (1) where k represents the k-th block, v represents the feature type, and t represents the sampling time point. , v represents the feature type. Considering that edge pixels are more susceptible to occlusion or background pixels interference, the image features at the center position will be more stable. Therefore, we define the weight of blocks as ) , ( j i γ , among which, the block matching degree located in the center of the image has higher weight and the edge is lower. Define a single feature matching degree as follows: (2)

Adaptive learning feature fusion
The matching degree of color feature and edge feature can be obtained for a set of matching window ) , ( 1 + t t X X Two features have different distinguishing abilities for different scenes. Therefore, it is necessary to dynamically adjust the feature weight of the next frame and realize adaptive feature fusion based on the matching degree of the current frame [12]. The adaptive fusion weight function is defined as follows: Among them, the τ is an adaptive learning rate used to control the speed of adjusting the weights. Then the region matching degree can be expressed as: (4) where the v represents the color feature w the edge feature.

Feature matching process[13]
② Generate k particle coordinates: ③ Generate corresponding matching window i t X 1 + for every particle. ④ Extract color and edge features from matching window i t X 1 + to match sample features and obtain matching i t R 1 + . ⑤ Select the particle with the highest matching degree as the target position for the next frame: . The process is shown in the figure 1. Without providing sufficient data sets in advance, we can only improve the amount of information by learning. When the target window of the t time is t X , characterized as t W , the change of the sample feature in the following process is shown in the figure 2.

Adaptive learning
Since the color and edge features are linearly superimposed as we considered, the features are updated by: Among them, α is the learning rate of feature and describes the speed of feature learning.

Process of adaptive learning image tracking algorithm based on characteristic fusion (CFAL)
① Obtain initial feature   (3). c. Learn characteristics according equation (5). In order to achieve better results, we used several tricks inspired by intuition.

Weight the particle generation range based on the target moving speed.
To predict the target position, the speed of the target estimated through historical information can be utilized. Based on the magnitude and direction of the velocity, the particle generation range is weighted. The specific weighting method is shown in the figure 3.

Suppress the influence of background on color characteristics.
Considering that there is a part of the background in the window, in order to suppress the influence of it on color feature extraction, the surrounding area S outside the matching window is examined, and the color feature histogram of S is extracted. The schematic diagram is shown in figure 4.

Morphological changes.
In order to test the robustness of CFAL to the morphological change of the target, we selected a real scene (case 1) to track the shoes, an animation scene (case 2) to track an rotating target. The tracking results are shown in the figure 6,7,8,9 (the image sequence is obtained by sampling frames, not a complete sequence, only the tracking process is shown). From the actual results, it is possible to track the target whose shape has changed greatly. The more drastic the change of the attitude, the lower the tracking accuracy, but a stable tracking effect can still be maintained.    Figure 9. Rotating moving target.

Consider occlusions.
In order to test the tracking effect of CFAL when the target is partially occluded, two groups of image sequences are selected for tracking: one group reflects the process of the mobile robot passing through the obstacle (case 3), the other group is the segment selected from the documentary (case 4), tracking one of them from a group of wild animals. The experimental results are shown in the figure 10, 11 , 12, 13 (the image sequence is obtained by frame extraction, not a complete sequence, only showing the tracking process). From the tracking effect, the algorithm has a certain resistance to partial occlusion. If the parameters are adjusted properly, the moving trend of the target can be tracked, but the size of the matching window will fluctuate. Figure 10. Partially occluded target. Figure 11. Partially occluded target.

Effect of Each Parameter on The Tracking Effect
In order to study the influence of single parameters on tracking effect, the selected image sequence case1 is tracked by different size parameters, and the harmonic average of precision and recall of the last 50 frames is calculated. Table 1.  figure 14, increasing τ within a certain range will improve the overall tracking performance, but too large τ will cause the P and R to decrease.From the principle analysis, adaptively adjusting the feature weight can weakened the negative influence of environment feature to obtain a better tracking effect. Therefore, the tracking effect can be improved by setting a certain feature weight learning rate.

5.3.2.
Feature learning rate α . From the experimental results shown in table 2 and figure 15, when 0 = α , P is higher, but R is lower. Properly increasing the learning rate can increase R, thereby improving the overall tracking level; when the learning rate is further increased, the overall tracking level will decrease. Figure 14. Different τ . Figure 15. Different α .
Analyzed in principle, only rely on the originally given characteristics to track cannot adapt to the realtime change of the target. The increase in the learning rate will have stronger adaptability to the morphological changes and occlusion problems of the target, but on the other hand, it will also learn features that have nothing to do with the target, leading to the accumulated error increase and cause tracking failure. On the whole, setting an appropriate learning rate can achieve a better tracking effect.

Conclusion
The original intention of CFAL is to realize adaptive learning of historical features by skillfully designing feature structures, so that the specified targets can be effectively tracked without labels and large sample sets. And the algorithm can have strong universality, for any image sequence of a given target, can achieve effective tracking. The experimental results show that CFAL can track the specified target effectively by setting appropriate parameters under partial occlusion and morphological change. But on the other hand, it also reflects the deficiency of this algorithm: for different tracking scenarios, different parameters need to be set, and there is no strong universality. Through experimental analysis, the influence of each parameter on the final tracking effect is understood to some extent. If we find the law through a large number of experiments, we can further set the adaptive adjustment of parameters and realize a more generalized algorithm.