OPSNet: Point Cloud Registration Based on Overlapping Predictive Segmentation

Registration is a critical task in the field of point clouds, aiming to align data acquired at different times or from different viewpoints for accurate matching. Deep learning methods have made important progress in point cloud registration tasks. However, most existing approaches do not handle the non-overlapping parts of point clouds, resulting in poor performance in low-overlap and noisy scenarios. We propose a registration model called OPSNet, which achieves optimal alignment transformation estimation and overlapping region prediction through an iterative process. OPSNet consists of modules including global feature extraction, overlapping region prediction segmentation, and alignment registration. By utilizing a segmentation algorithm to deal with the non-overlapping parts of data, OPSNet reduces the adverse effects caused by non-overlapping regions in point cloud registration. The model learns feature representations and performs iterative optimization to achieve precise point cloud alignment. We conduct comprehensive experiments on common point cloud registration datasets and compare OPSNet with several classical point cloud registration methods. The experimental results display that OPSNet achieves outstanding performance in terms of rotation and translation errors, outperforming other methods. Additionally, we evaluate the registration performance under different overlap ratios and find that OPSNet can achieve better registration results even in low-overlap scenarios.


Introduction
Registration is a critical task in the field of point clouds, aiming to align two or more point cloud datasets to obtain more accurate and globally consistent 3D models.It finds extensive applications in various fields, such as robot navigation, 3D reconstruction, and virtual reality.However, the complex structure of point cloud data poses challenges to achieving high-precision registration results.Point clouds may contain noise, local occlusions, and non-overlapping points, making the registration process more complicated.
Traditional methods, such as iterative optimization, matching techniques, and so on, have limitations in handling complex geometric structures and high levels of noise.Deep learning-based approaches have emerged as a promising solution, allowing for more accurate and automated registration by learning global point cloud features.However, most existing deep learning methods do not effectively handle non-overlapping regions in point clouds, resulting in instability in noisy and low-overlap scenarios.
We propose an iterative end-to-end network called OPSNet that predicts and filters out the nonoverlapping regions of data from the perspective of point cloud segmentation.In this framework, we first extract deep learning features using the Kpconv extraction net.Then, the subsequent feature alignment network estimates the transformation matrix for each iteration, and the segmentation network predicts the overlapping areas of the data.By filtering out the non-overlapping parts, our network maintains reliable accuracy even in low-overlap and noisy scenarios.As shown in Figure 1, We judge the overlapping part in the low-overlap scene and use the filtered overlapping area for registration.Experimental results demonstrate that OPSNet outperforms previous algorithms in normal, low-overlap, and noisy environments.We make the following contributions. x We propose OPSNet, a registration network based on global feature extraction that integrates segmentation into the registration.It predicts and filters the non-overlapping parts of the point cloud to mitigate the negative impact on registration, while iteratively improving the accuracy of overlap prediction segmentation and pose transformation estimation in each iteration. x We conduct experiments by adjusting the overlap rate of the dataset samples to evaluate our network's performance under different overlap scenarios, including noise and low overlap.Our network exhibits superior performance in these challenging scenarios.
x By incorporating segmentation into the registration, OPSNet addresses the limitations of traditional approaches and achieves more robust and accurate point cloud alignment.

Feature based
Feature matching registration is a method of matching and aligning corresponding points by extracting local feature descriptors from point clouds.SHOT and FPFH are classic feature-based methods that enable the extraction of local geometric features but may not perform well when dealing with complex geometric structures or high noise point clouds.Provably Approximated ICP introduces a novel alignment algorithm that guarantees a constant factor approximation for the point cloud alignment problem, providing the first provable global optimum approximation.By adaptively adding point-toplane penalization based on surface flatness, LSG-CPD incorporates local surface geometry information to improve accuracy and robustness.In recent years, some studies use deep learning to extract features for registration.3DMatch and D3Feat combine deep learning and local geometric feature extraction to align point clouds with significant geometric differences, but they still have certain limitations in noisy or low-overlap scenarios.PPFNet combines Rotational invariance and position information.FoldingNet is a method of converting point clouds into voxel representations, which achieves registration by studying the voxel representations and transformation matrices of point clouds.PPF-FoldNet combines the advantages of PPFNet and FoldingNet, acquiring more robust and accurate registration by analyzing point-to-point features and voxel representations of data [1] .

End-to-end
End-to-end networks utilize deep learning methods to directly learn feature representations and registration transformations for achieving automated and accurate point cloud registration.PointNetLK [4] , PRNet [7] , and PCRNet [6] are all iterative nearest-neighbor search algorithms based on PointNet, which learn feature representations and perform iterative registration for point clouds.DCP [2] utilizes an encoder-decoder structure to extract features and perform registration transformations.RPM-Net is a network based on dynamic programming, which learns feature representations and utilizes dynamic programming algorithms for point cloud registration.These approaches are susceptible to the sampling density and noise of collected data, and they do not address the adverse effects caused by regions that do not overlap in point cloud registration, which can lead to inaccurate or unstable registration results.

Learning-based point cloud segmentation
Based on PointNet, many algorithms utilize deep nets to perform segmentation well on point cloud data.Among them, PointNet++ introduces multi-scale feature extraction on top of PointNet [5] , enabling hierarchical segmentation of point clouds to capture local structural information more effectively.KPConv [3] is a segmentation method that utilizes variable convolutional kernels and local geometric structures of point clouds, enabling efficient and accurate segmentation, especially for large-scale data.

Method
Our network follows an iterative process with the ultimate goal of estimating the optimal alignment transformation p, as well as predicting the overlapping region (ox, oy), as shown in Figure 2. In each iteration, our network has three main parts: global feature extraction, overlapping area prediction segmentation, and alignment registration.Firstly, the source is transformed into point cloud Xi using the transformation matrix from the previous iteration.Then, both Xi and Y are input to the feature

Feature extraction
We employ the kernel point convolution (KPConv) structure [3] , which is configured with shared parameters for both point clouds, to acquire per-point features f.Specifically, for each input pointj, the KPConv layer identifies the k nearest core points and their corresponding weight matrices Wk.Based on the distances between the input pointi and the core points, as well as their feature information, weighted values hik are computed.These weights determine the influence of each core point.By aggregating the weighted features from the core points, the KPConv layer generates refined per-point features.Similar to convolutional neural networks, we can set multiple layers of kernels to capture local features at different scales.In OPSNet, we set four layers.Subsequently, features f passed through the prediction filtering part o i-1 and a max-pooling area to acquire the initial global features of the point clouds, as in Figure 3.Then, the transform structure set in DCP is used to get a global feature F that integrates two point clouds [2] .
In the equation, Fx 0 corresponds to the initial global feature of the source after undergoing overlap filtering and max pooling operations."Max" denotes the maximum pooling operation.ox i-1 represents the predicted overlap region vector from the previous iteration.fx corresponds to per-point features, and fxi represents the feature vector of the i-th point.Wk corresponds to the weight matrix, and hik corresponds to the weighted value between point i and kernel point k.Fx represents the final generated global feature.Symbol ĳ denotes the transformation performed by the transformer module.The operations for the target Y are symmetrically equivalent to those for the source point cloud.

Overlapping prediction
Traditional end-to-end point cloud registration methods typically involve obtaining global features and using MLPs or FC layers for alignment without considering the non-overlapping area.Taking inspiration from point cloud segmentation networks like PointNet [5] , we incorporate point cloud segmentation into the point cloud registration task.Specifically, we combine the per-point features extracted from feature extraction with the concatenated global features.The per-point features, which have a dimensionality of 1024, are fused with the global features, which have a dimensionality of 2048, resulting in per-point features with a dimensionality of 3072, as illustrated in Figure 4.Then, the obtained feature matrix is passed through MLPs (Multi-Layer Perceptrons) to acquire the final overlapping segmentation prediction vector o.In this process, the MLPs take as input an N*D matrix.The output is an N1 vector that represents the prediction of overlapping segmentation.In our approach, we set d to be 3072, and the MLPs consist of multiple layers with dimensions of 3072, 2048, 1024, 512, 256, and 2, respectively.The specific formula is as follows: where Ox represents the overlapping prediction vector of the source, Oy is similar to it.fx represents the per-point features, and Fx and Fy represent the global features of the X and Y, respectively.The symbol M represents the MLPs operation and Ͱ represents the concatenation operation.

Alignment registration
By concatenating the global features of the X and Y, we obtain a concatenated feature vector that includes both point clouds.The MLPs are then used for this concatenated feature vector to estimate the seven-dimensional pose vector p for the i-th iteration, which consists of the quaternion rotation vector q and translation vector t.
where M0 represents the MLPs layer for alignment registration in the overlapping region.In this study, the number of layers in this region is set as 2048, 1024, 512, 256, and 7.After m iterations, the process converges to obtain the optimal pose transformation matrix p m .

Loss function
Our network consists of two tasks: overlapping region prediction segmentation and point cloud alignment registration, for which we have two loss functions to train them.Overlapping region prediction segmentation loss: We refer to the common binary segmentation network and consider the influence of positive and negative samples [3] [5] .We adopt cross-entropy loss as follows: where Į represents the overlap ratio.og denotes the ground truth label (1 if the corresponding point exists in the other data, 0 otherwise).op is the predicted probability obtained through a sigmoid function to ensure its value falls between 0 and 1.
Transformation matrix loss: The transformation matrix prediction module adopts a direct training strategy to make the predicted estimation close to the true value.The transformation matrix loss is defined as: where g represents the true value, and the third term is the regularization term for the parameters of OPSNet to reduce net complexity.The overall loss function is calculated as L = L1 + L2.The loss is calculated in each iteration until convergence is reached.

Experiments
We compare OPSNet(ours) with ICP, PointNetLK, PCRNet, and DCP [2] [4] [6] .We also include OPSNet without the overlapping region prediction module, which we refer to as OPSNET-N.For ICP, we utilize the Intel Open3D library for implementation.For PointNetLK, PCRNet, and DCP, we utilize the code provided by ViNiT SaRoDe in the learning3d library.OPSNet is trained using the Adam optimizer with a batch size of 64 samples.

Dataset and Evaluation
We use ModelNet40, which has over 1w synthetic CAD point clouds from 40 different categories.Data are preprocessed by applying random rotations, translations, and adding Gaussian noises sampled from N (0, 0.01 2 ) and clipped to [-0.05, 0.05].Finally, they are shuffled.We partitioned the dataset into training and validation sets, allocating the first half of the categories for training purposes.The remaining categories were exclusively used for testing.In evaluating the performance, we employed root mean square error (RMSE) and mean absolute error (MAE) as the metrics for quantifying the accuracy of rotation angles and translations.It is important to note that all angle measurements in our reported results are presented in degrees.

Comparative Experiments
In our first experiment, we set the overlap rate to normal (over 95%) and trained and test our model using preprocessed ModelNet40.Table 1 shows the experimental results.Under normal overlap conditions, our performance is slightly better than the DCP network, and significantly better than PCRNet and PointNetLK, while ICP performs the worst.Additionally, due to the presence of Gaussian noise, OPSNet's overlapping filtering can effectively remove some of the noise, resulting in better performance compared to OPSNET-N without the overlapping filtering module.Furthermore, we conducted experiments under different overlap ratios.We used the same preprocessed ModelNet40 dataset and followed the data preparation approach of PR-Net to generate point cloud pairs with varying overlap ratios [7] , with the overlapping regions labeled.The other settings remained mostly unchanged.As shown in Figure 5, our experiments consistently outperformed the compared algorithms across different overlap ratios.Moreover, our method also exhibited superior performance compared to OPSNET-V without the overlapping prediction module.We present the point cloud registration results for overlap ratios of 50% and 70% in Figure 6.

Conclusion
Summarily, we proposed a registration model called OPSNet, which incorporates the functionality of overlapping region segmentation prediction.By utilizing a segmentation algorithm to preprocess the point clouds, we successfully filtered out the non-overlapping parts, thereby reducing their negative impact on the registration process.In our experiments, we compared OPSNet with several common methods, including ICP, PointNetLK, PCRNet, and DCP.Experimental results demonstrated that OPSNet performed excellently, outperforming the other methods significantly.Furthermore, we evaluated the registration performance under different overlap ratios and found that OPSNet achieved favorable results even in low-overlap scenarios.

Figure 1 .
Figure 1.We propose OPSNet based on overlapped region prediction segmentation.(a) Blue region represents the Source, (b) green region represents the Target (c) shows the registration result, and (d) depicts the overlapped region prediction segmentation of the data (highlighted in red).

Figure 2 .
Figure 2. The framework of OPSNet.In the figure, i represents the iteration number, f represents perpoint features, F represents global features, o represents the overlapping prediction vector, and p represents the transformation estimation.

Figure 3 .
Figure 3. Initial Global Feature Generation Module: In this module, we utilize four layers of kernels to successively extract more fine-grained per-point features.When passing through the overlapping prediction matrix (N * 1), the non-overlapping region (white matrix region) is effectively masked out in the global features due to the filtering effect of the overlapping filtering vector and the max-pooling operation.

Figure 4 .
Figure 4.The Combination and Fusion Process of Global and Local Point Clouds.

Figure 5 .
Figure 5. Point cloud registration performance under different overlap ratios.The performance of the rotation dimension is depicted in the left, while the right illustrates the performance of the translation dimension.

Table 1 .
Testing for normal overlap.