Quadratic hyper-surface kernel-free large margin distribution machine-based regression and its least-square form

ε-Support vector regression (ε-SVR) is a powerful machine learning approach that focuses on minimizing the margin, which represents the tolerance range between predicted and actual values. However, recent theoretical studies have highlighted that simply minimizing structural risk does not necessarily result in well margin distribution. Instead, it has been shown that the distribution of margins plays a more crucial role in achieving better generalization performance. Furthermore, the kernel-free technique offers a significant advantage as it effectively reduces the overall running time and simplifies the parameter selection process compared to the kernel trick. Based on existing kernel-free regression methods, we present two efficient and robust approaches named quadratic hyper-surface kernel-free large margin distribution machine-based regression (QLDMR) and quadratic hyper-surface kernel-free least squares large margin distribution machine-based regression (QLSLDMR). The QLDMR optimizes the margin distribution by considering both ε-insensitive loss and quadratic loss function similar to the large-margin distribution machine-based regression (LDMR). QLSLDMR aims to reduce the cost of the computing process of QLDMR, which transforms inequality constraints into an equality constraint inspired by least squares support vector machines (LSSVR). Both models combined the spirit of optimal margin distribution with kernel-free technique and after simplification are convex so that they can be solved by some classical methods. Experimental results demonstrate the superiority of the optimal margin distribution combined with the kernel-free technique in robustness, generalization, and efficiency.


Introduction
The regression problem aims to predict numeric outputs where an order relation is defined [1,2].ϵ-SVR is a regression method whose fundamental principle [3] is minimizing the margin by customizing the hyperplane while allowing for a certain level of tolerated error.Currently, ϵ-SVR is widely used in several engineering and technology fields, such as estimation of daily suspend sediment [4,5], prediction of water temperature [6], estimation of lithium-ion battery state of health [7], stock market forecast [8,9], and prediction of traffic flow [10].
LSSVR is a variant of ϵ-SVR, in which Suykens and Vandewalle transformed the inequality constraints into an equality constraint and utilized the L2 norm of slack variables [11].This change brings some advantages and has led to its rapid adoption in various areas.For instance, Van Gestel et al applied LSSVR to discriminant analysis [12] due to its high prediction accuracy for small-scale datasets, and Guo et al implemented it in reliability analysis [13] owing to the higher computational efficiency compared to multiple linear regression and artificial neural networks [14].
The kernel trick provides a method to approximate nonlinear surfaces in higher dimensional spaces [15].However, the kernel trick has certain limitations.Firstly, there are no generalized rules for automatically selecting the most appropriate kernel for a given dataset [16].Additionally, the performance of models utilizing the kernel trick depends on the selection of kernel parameters heavily [17].In contrast, appropriate parameters were chosen in the experiment solely by intuition to produce the smallest cross-validation error rate [18].Furthermore, when the kernel matrix becomes singular, certain methods that employ the kernel trick require computation of the inverse of the perturbation kernel matrix [19] or decomposition of the kernel matrix to solve their dual problems [20].These processes often necessitate additional computational effort and result in approximate solutions.Ye et al [21] introduced QLSSVR as a variant of traditional LSSVR, utilizing soft quadratic surfaces as a decision function to nonlinearly separate data without relying on kernel methods.Subsequently, Ye et al [22] proposed SQSVR, an application of the kernel-free technique to ϵ-SVR.Experimental validation has indicated that both SQSVR and QLSSVR can be regarded as alternatives to the kernel trick [21,22], exhibiting superior performance in the most scenarios.
Margin theory served as the foundation for support vector machine and was used to prove the generalization capability of support vector machines [23].Yoav Freund et al proposed an integrated algorithm called the Adaboost algorithm, which utilized margin theory to explain its resistance to overfitting phenomena [24].Shortly afterward, Breiman emphasized the significance of the minimum margin and introduced an improved Arc-gv algorithm [25] that aimed to maximize the minimum margin but did not perform as expected.Reyzin and Schapire observed that the Arc-gv algorithm produced a large minimum margin, but its margin distribution was quite poor [26].Based on the fact that these theoretical studies only focused on augmented algorithms, the potential impact of margin distribution on support vector machines has not been thoroughly leverage.Zhang and Zhou proposed the large margin distribution machine (LDM) [27].This idea about optimizing the distribution of margin rather than maximizing the minimum margin is widely used in a variety of methods, such as large-margin distribution machine-based regression (LDMR) [28], least square large margin distribution machine-based regression (LSLDMR) [29], and optimal margin distribution machine [30].In the aforementioned methods, the inclusion of statistical measures introduces additional parameters.Furthermore, the choice of kernel parameters in kernel methods brings a computational burden.Significantly, Luo et al [31] and Zhou et al [32] have also researched the combination of kernel-free techniques with optimal margin distribution, yielding favorable performance.
Inspired by the above studies, we propose two methods with optimal distribution based on kernel-free technique named QLDMR and QLSLDMR.Since the optimization problem is derived from the LDM, QLDMR takes the margin distribution into account, rather than just minimizing the geometric margin.Thus, it exhibits superior generalization ability compared to SQSVR, while also demonstrating reduced sensitivity to the presence of outliers.Same as SQSVR, the specific solution process of QLDMR involves applying the Lagrange Multipliers to transform the primal problem and solving the convex quadratic optimization problem to obtain a regressor.Additionally, the solution of QLSLDMR is conveniently achieved by addressing sets of equations, different from the computationally convex quadratic programming problem inherent in both SQSVR and QLDMR.Moreover, our methods exhibit significant computational time and resource utilization advantages when dealing with large-scale datasets compared to standard ϵ-SVR or LSSVR.Besides, to further verify the improvements of integrating margin distribution with kernel-free technique in terms of generalization, robustness, and efficiency, extensive numerical experiments are carried out by the method of both artificial and benchmark datasets.
This paper is comprised of the following 4 parts.A review of the background on a series of regression models is provided in section 2. In section 3, it is presented the proposed methods and the simplified form, which are necessary for the solution.Experimental results on artificial and benchmark datasets are presented in section 4.And the conclusions would be summarized in the final section of the paper.

Background
The regression issue entails identifying an appropriate function to predict the target values corresponding to input samples.Assume that there are training samples S = (X, Y), where X =

ϵ-Support vector regression
The optimization problem of ϵ-SVR is to obtain a regressor f(x) = w T x + b that closely approximates the true target y i for all the training data with a maximum deviation of ϵ, while maintaining flatness [3].The 'soft margin' loss function in ϵ-SVR permits a certain level of error by introducing slack variables ξ + i , ξ − i to handle otherwise samples that are not in the ϵ-tube in the optimization problem [3].The role of the penalty parameter C is to control the trade-off between the complexity and the degree of fitting.Thus, the formulation stated by Vapnik [2] is obtained as ( Vapnik addressed the optimization problem in ϵ-SVR by employing the concept of the Lagrange function [33], which integrates both the objective function and the constraints through the introduction of Lagrange multipliers α i , α * i , β i , β * i .This approach allows us to find the saddle point of the Lagrange function, which corresponds to the optimal solution for both the primal and dual variables [34].By applying the Karush-Kuhn-Tucker (KKT) conditions, we can express optimization problem (1) in the following dual form.max where ⟨•, •⟩ denotes the dot product in X.

Least square support vector regression
LSSVR compared to ϵ-SVR replaces the inequality constraint with an equality constraint while using L2 loss function [12].In order to obtain a linear regressor f (x), LSSVR solves the following optimization problem.( After exploiting Lagrangian multiplier α i , primal problem (3) can be transformed as Because the solution procedure of LSSVR converts from solving convex quadratic programming to calculating equations (4) compared to ϵ-SVR, LSSVR has faster computational time than ϵ-SVR [13].

ϵ-Kernel-free soft quadratic surface support vector regression
The regressor of ϵ-Kernel-free soft quadratic surface support vector regression (SQSVR) is a soft quadratic surface g(x) = 1 2 x T Wx + B T x + c.The target value y i corresponding to input sample x i is inferred by using g(x).The optimization problem can be shown as min where To simplify objective function (5) and make it more manageable, we can use the following approaches [22]: ] T , which is a column vector formed by the upper triangular matrix elements of W.
(2) The details of constructing matrix M i ∈ R m×( m 2 +m

2
) , i ∈ 1, 2, . . ., n can be described as follows: filling in the M i with ] T , where x k i refers to the element in kth dimension of X.For each jth , the first term in the objective function of optimization problem (5) can be transformed into z T ( Ultimately, the optimization problem ( 5) is reformulated as where

Quadratic hyper-surface kernel-free least squares support vector regression
Quadratic hyper-surface kernel-free least squares support vector regression (QLSSVR) as a quadratic programming problem, the constraints only contain equalities [21].Meanwhile, the application of kernel-free technique eliminates the need for method to select the most suitable kernel function.The QLSSVR can be presented as min The simplified process of optimization (8) [21] is demonstrated as follows, which is slightly different from that of SQSSVR.
(1) Construct ŵ = vec(W) in the same way as mentioned in section 2.3.(2) Build M i using the same method described in section 2.3.

Large-margin distribution machine-based regression
The LDMR model has been derived from the LDM based on the insights of Bi and Bennett [35], who proposed the concept of treating the regression problem as the classification of sets located on different sides of a hyperplane.Consequently, it inherits the spirit from the LDM of obtaining a better margin distribution by considering statistics [27].Moreover, by effectively balancing the ϵ-insensitive loss and the L2 loss through parameters d 1 , d 2 and C, the model achieves improved generalization ability.This can be expressed as: To make it easier to derive the solution for the primal problem, assume ] .Then ∥w∥ 2 2 can be rewritten as v T I 0 v, where m+1) and I is an identity matrix of size m × m.
Optimization problem (11) can be converted as Introducing Lagrange multiplier vectors α, α * , the dual problem can be obtained as where

Least squares large margin distribution machine-based regression
The disadvantage of the high computational complexity of LDMR is evident due to the solution process and the introduction of new parameters.Inspired by the least squares method, LSLDMR changes the inequality constraints to an equality constraint to reduce the computational burden.The formulation is detailed in [29].
The process of simplification is the same as LDMR.
] and rewriting After using the Lagrange multipliers, the solution of v can be found by 3. Quadratic hyper-surface kernel-free large margin distribution machine-based regression and its least-square form

Quadratic hyper-surface kernel-free large margin distribution machine-based regression
Incorporating the idea of optimal margin distribution into SQSVR, we propose a novel QLDMR approach.
During the solving process of SQSVR, the construction of G differs from H in QLSSVR due to the inverse matrix that needs to be solved for G in the convex quadratic programming problem (7).In contrast, our proposed method avoids the need for solving G −1 .Therefore, we prefer the transformation process of QLSSVR over that of SQSVR in terms of optimization problem.The advantage is that the W, B, and c of the regressor g(x) can be obtained directly by introducing z.The optimization problem is given as where d 1 , d 2 , C, and ϵ are user-defined positive parameters.
The optimization problem of QLDMR model involves minimizing three terms.The first term 1 2 z T Hz is the transformed form of , whose objective is to make the regressor flat enough to prevent overfitting.∥Y − U T z∥ 2 2 is equivalent to representing the total variance.The objective of this term is to reduce the dispersion of the training samples.Last term means that within the ϵ-tube, which is to get better sparsity.For SQSVR, which is served as a generalization of the kernel-free technique for ϵ-SVR, only training samples located above or outside the ϵ-tube are considered as support vectors.This is the same as ϵ-SVR, ignoring the overall information of the data.In QLSSVR, the final regressor is constructed with the involvement of all training samples.However, this method may be interfered from potential noise or outliers, leading to poor performance.In contrast, our model trades off between ϵ-insensitive and quadratic loss by adjusting d 1 , d 2 , and C to obtain better generalization ability.
Our model can still be derived from the LDM using the conclusions of Bi et al [28,35].The LDM optimizes the distribution of margins rather than focusing on individual minimum margins [27].This characteristic is reflected in the QLDMR that there are fewer data points exhibiting a large degree of deviation from the regressor and more points with a small degree of deviation, resulting in a more compact distribution.
To address the primal optimization problem (17), it is necessary to construct its related dual problem.This involves the introduction of Lagrangian multiplier vectors α, α * , β, β * ∈ R n .The specific structure of the Lagrange function is presented as The KKT optimality conditions are given by The dual problem can be obtained by using KKT conditions as where Once the optimal value of α, α * are determined, we can calculate z by (19a Then, the estimated regressor is

Quadratic hyper-surface kernel-free least squares large margin distribution machine-based regression
Considering the drawbacks of QLDMR in terms of high computational time, we introduce its least squares form, referred to as QLSLDMR.This method substitutes the L2 norm of the slack variable for the L1 norm and considers an equality constraint in optimization problem.The formulation of QLSLDMR is presented as follows. min Using the Lagrangian multiplier vector α ∈ R n , the Lagrangian function of ( 21) is constructed and presented as Take partial derivatives of the Lagrange function (22) concerning the variables z, η, α and then equate to zero, we have Simultaneous formulas (23a)-(23c) and get the value of z The solution process of QLSLDMR involves computing a linear equation instead of solving convex quadratic programming, which significantly reduces the computation time.Subsequently, the estimated regressor is stated as

Experimental results
We assess the validity and superiority of the proposed QLDMR and QLSLDMR through comparative experiments on an artificial dataset and 14 benchmark datasets in this section.The models involved in the experiment also include ϵ-SVR [2,3], LSSVR [11], SQSVR [22], and QLSSVR [21].The experiments are conducted using MATLAB 9.10 in a Windows 10 laptop environment, equipped with an Intel I7 processor (2.60 GHz) and 16 GB of RAM.
The methods proposed in this paper involve multiple parameters, and the selection of these parameters significantly affects the performance of the models.Therefore, it is crucial to carefully choose the appropriate parameters to achieve the optimal model performance.We utilize the grid search method [36] to find the optimal parameter that yields the best performance on the test set.For ϵ-SVR,LSSVR, SQSVR, QLSSVR, QLDMR, and QLSLDMR, the regularization parameters C, d 1 are selected from {10 −5 , 10 −4 , . . ., 10 5 }.The value of ϵ is chosen from {0.001, 0.01, 0.1} for QLSLDMR.For ϵ-SVR, SQSVR, and QLDMR the values of ϵ are selected from {0.001, 0.05, 0.01, 0.1, 0.5, 1, 1.5, 2}.The value of d 2 is set to 1.The definitions of the evaluation criteria are presented as where the test sample is y i , the predicted value is ỹi . 2 , where ȳ is mean of test samples, represents the total variability of the observed values around the mean of the observed data [37].Typically, a low SSE/SST ratio indicates a strong agreement between predicted and actual values.iv.SSR/SST = 2 is used to measure the amount of variation in the dependent variable that is explained by the regression model [37].Generally, the small SSE/SST is accompanied by a big SSR/SST.

Artificial datasets
To visually assess the performance of these regression methods, we generate artificial datasets.The definitions for these datasets are detailed in the following function.where θ i represents Gaussian noise with a mean (µ) of 0 and a variance (σ) of 0.1.Let samples sizes n ∈{200, 400, 600, 800, 1000} and input dimension m ∈{1, 2, 4, 8, 16, 24, 32} to comprise 35 independent sets of artificial datasets.

Comparing the predictive accuracy
In order to compare the improvement that the optimal margin distribution brings to the kernel-free regression methods, we initially apply them to three-dimensional datasets with n = 200 and m = 2, which are generated using the previously defined function.To avoid biased estimation, the mean value following ten runs for datasets are adopted as the evaluation criteria.Specifically, we arbitrarily partition each dataset into two parts, allocating 80% of the samples as the training set, while the residual samples are the test set.Table 1 clearly demonstrates that the incorporation of optimal margin distribution notably enhances the regression accuracy of both SQSVR and QLSSVR.This improvement is manifested in the substantial reduction in RMSE and the considerable increase in SSR/SST for both QLDMR and QLSLDMR, suggesting a decrease in prediction error while capturing more information from the datasets.These results support the assertion that the optimal margin distribution effectively modifies the decision boundary, thus rendering it more flexible and adaptable to complex data patterns.Consequently, this leads to a more accurate prediction model capable of better capturing the underlying data trends and relationships.
Additionally, the data obtained by the above methods is plotted in three-dimensional coordinates to illustrate that the kernel-free method can fit the high-dimensional datasets.Figure 1 clearly illustrates the capability of methods utilizing the kernel-free technique to fit the regression function in a non-linear manner, eliminating the need for the kernel trick.This is a significant advantage as it simplifies the computational process and reduces the potential for overfitting, which is a common issue with methods that rely on the kernel trick [38].Moreover, the kernel-free methods have shown to be more versatile, as they are not constrained by the choice of kernel function and can adapt more readily to various types of data patterns.These attributes highlight the potential of kernel-free techniques in enhancing the accuracy and efficiency of non-linear regression models.

Exploring the effect of dataset scale
From the previous experiments, it can be concluded that the scale of SQSVR and SVR as well as their least-squares variants are not the same in terms of the optimization problem, which inevitably leads to a difference in the computational cost.The most intuitive manifestation of this is the varying degree of variability in running time as the size of the dataset changes.In order to compare their CPU running time with different sizes of datasets, we make use of artificial datasets created with different sizes and dimensions, where n ∈{200, 400, 600, 800, 1000} and m ∈{4, 8, 16, 24, 32}.Ten independent groups of datasets were generated by randomly selecting 80% of the original dataset as training set and the rest as test set.The average value obtained through 10 experiments is used as the final evaluation criteria, this is done to avoid biased estimation.
Table 2 shows the total CPU running times of different methods with different samples sizes n and dimensions m.In the performance results of ϵ-SVR, SQSVR, and QLDMR, the minimum time is marked with ' * ' as a superscript.Similarly, the minimum value among LSSVR, QLSSVR, and QLSLDMR is emphasized with '+' .Of all the methods that used dual problem to solve the convex quadratic programming problem, there exist various considerations that influence the time complexity [39], including sparsity, problem structure, convergence behavior, and other relevant aspects.And the experimental results indicate that SQSVR and QLDMR have faster CPU running times than ϵ-SVR.In addition, performance on running of QLSLDMR is comparable to QLSSVR but both of them are superior to LSSVR on most datasets.For all three models, the regressor is obtained by solving a linear equation.The time complexities of them are as detailed 3 ) ) 3 ) .
In terms of time complexity, values of QLSSVR and QLSLDMR are smaller than SVR on datasets where dimension m is much smaller than sample sizes n, which represents faster running times.The equal time complexity of QLSSVR and QLSLDMR suggests that their execution time increases similarly regardless of the growth of the input size.
To visualize the impact of the data scale on the running times of LSSVR, QLSSVR, and QLSLDMR, three-dimensional histograms were created using the data from table 2. The sample sizes n are plotted on the x-axis, the dimensions m on the y-axis, and the CPU running times on the z-axis.From figure 2, the CPU running times of QLSSVR and QLSLDMR exhibit a slow growth as the sample size n increases, while keeping the dimension fixed.Although the kernel-free techniques are more sensitive to the dimension m when the  sample size n is fixed, the CPU running time of QLSSVR and QLSLDMR is significantly faster than that of LSSVR for large sample sizes n.
results show that the methods we proposed are suitable for datasets where sample size n is much larger than dimension m and have good performance in most cases.

Comparing the robustness
To verify the advantage of robustness due to the introduction of optimal margin distribution, it is necessary to add the noise to the artificial datasets , θ i ∼ N(0, 0.1) generated previously, where sample size n = 200 and dimension m = 1.Gaussian noise A i is added to y i , where A i ∼ N(0, 1), and the contaminated datasets is categorized into five groups by the ratio of the number of non-A i to the overall A i , ratios of 0%, 10%, 20%, 30%, 40%.Similar to the previous approach, ten independent groups of datasets are generated for each specific ratio of noise.The average RMSE obtained from these ten experiments is used as the final evaluation criterion.
Table 4 shows that our methods consistently outperform other models with different noise ratios.This is due to the fact that the methods based on the maximum margin theory are extremely sensitive to noise [40] whereas our models are obviously much more robust to noise since both of them consider the overall margin distribution.It is worth noting that LSSVR and QLSSVR exhibit weaker resistance to noise due to the substitution of the 2-norm for the 1-norm of the slack variable.However, QLSLDMR demonstrates strong robustness, as evidenced by the results in figure 3, which is reflected in the fact that when the noise ratio is raised from 0% to 40%, QLSLDMR still fits the tendency of ture curve well.

Benchmark datasets
To further validate the superiority of QLDMR and QLSLDMR, we test fourteen benchmark datasets.They are Prediction of Average Localization Error in WSNs (ALE), Servo, Computer Hardware (Computer), Yacht Hydrodynamics (Yacht), Auto_MGP, Consumo, Real Estate Valuation (Estate), Boston Housing (Housing), fitting element 'Heating Load' of Energy efficiency (Heating), fitting element 'Cooling Load' of Energy efficiency (Cooling),and QSAR fish toxicity (QSAR) from UCI 4 , Body Fat Prediction (Body Fat), and Concrete Compressive Strength (Concrete) from Statlib5 , and Diabetes from the web page 6 .Each dataset is randomly split into training and test samples, whose specific numbers are displayed by the form of '(training,test)' in table 5.The performance of each model on each dataset is ranked and denoted as '#rank' in table 5.All the regression methods run ten times as same as before.
The results presented in table 5 demonstrate a significant improvement in the kernel-free technique achieved through the combination of the optimal margin distribution.QLDMR consistently achieves a smaller RMSE on all datasets compared to SQSVR, and a larger SSR/SST on 13 datasets.QLSDMR outperforms QLSSVR in terms of RMSE on 12 datasets and SSR/SST on 9 datasets, highlighting the superiority on predictive ability and fitting degree of our proposed method.In terms of MAE, QLDMR demonstrates the smallest rank across seven datasets, while QLSLDMR exhibits the smallest rank in five datasets.These results suggest that our two proposed methods have similar predictive performance, and both outperform the remaining four methods.The rank of RMSE suggests that SQSVR performs similarly to     ϵ-SVR, while QLSSVR significantly outperforms LSSVR.Additionally, consistent with previous experiments, the kernel-free techniques exhibit faster running times compared to the original models.Furthermore, QLSLDMR demonstrates comparable performance on time to QLSSVR.
To further evaluate the models, we use some nonparametric approaches to model performance evaluation called the Friedman test and the Nemenyi test [41][42][43][44].The procedures of Friedman test along with the corresponding post-hoc tests are outlined below (1) The average of the ranks of the performance evaluation metrics of each method after testing it on multiple datasets is calculated.Friedman test is to determine whether these methods all perform equally.Let γ i be the average rank of ith algorithm.The Friedman statistic can be calculated by the following formula which is distributed according to χ 2 with k − 1 degrees of freedom, where k is number of methods and N is number of data sets.However, the original Friedman statistic ( 25) is so conservative that new one are commonly used as follows where τ F is distributed according to the F-distribution with k − 1 and (k − 1)(N − 1) degrees of freedom.(2) If the assumption that all methods perform equally is rejected, we can continue with a post-hoc test, which implements the Nemenyi test to distinguish the methods further.The Nemenyi test calculates the critical range of the difference between the average rank where q α is the critical value of the Tukey distribution.And then, if the difference between the average rank of the two methods exceeds the critical range CD, the assumption that two methods have the same performance is rejected with the corresponding degree of confidence.
To determine if there is a significant difference in the performance of RMSE, MAE, SSE/SST, and SSR/SST between ϵ-SVR, LSSVR, SQSVR, QLSSVR, QLDMR, and QLSLDMR, the Friedman test is conducted.This statistical test requires calculating the average ranks of each evaluation criteria across the benchmark datasets, which are calculated as table 6.
Subsequently, the Friedman test is employed to evaluate if the performance of the six methods on RMSE is equivalent, and the computed value τ is displayed as  When α = 0.05, F α (5, 65) = 2.356 < 27.4444, so we reject the assumption that all methods perform equally.This suggests a significant disparity in the RMSE performance across these six methods.
For further comparison of individual methods, the Nemenyi test is employed.At a significance level of α = 0.05, the critical difference (CD) is calculated as CD = 2.850 × √ 6×7 6×14 = 2.0513.From the average rank differences, it is evident that the QLDMR outperforms ϵ-SVR, LSSVR, and SQSVR in RMSE.This is demonstrated by rank differences of 3.3571, 2.7143, and 3.2857 all of which exceed the CD value.Correspondingly, the performance of QLSLDMR surpasses that of ϵ-SVR with a rank difference of 3.25, LSSVR with a rank difference of 2.6612, and SQSVR with a rank difference of 3.1786.In terms of the RMSE index, both QLDMR and QLSLDMR outperformed most of the compared methods.Therefore, we can conclude that the proposed QLDMR and QLSLDMR exhibit better performance in comparison to the other six methods.
Figure 4 illustrates the frequency histograms of the margin (difference between predicted and true values of targets) for SVR, LSSVR, SQSVR, QLSSVR, QLDMR, and QLSLDMR in the training set of the Heating dataset.The data presented in these histograms clearly demonstrate that our proposed method exhibits a higher proportion of samples with smaller margins and a lower proportion of samples with larger margins, indicating a more favorable margin distribution.This improvement can be attributed to the introduction of the optimal margin distribution, which prioritizes the distribution of margins rather than solely focusing on obtaining a larger minimum margin.
Figure 5 provides a box plot that illustrates the mean and standard deviation (std) of test errors from ten runs on eight of the fourteen benchmark datasets.The mean values for QLDMR and QLSLDMR are the closest to zero, with their medians consistently around or equal to zero.In contrast, the other methods show larger deviations across the majority of datasets.Furthermore, our methods display the most compact std distribution and the lowest std median, indicating a more stable and reliable performance on these datasets.

Conclusion
This paper introduces and illustrates two methods named SQLDMR and QLSLDMR.Both of them are inspired by the concept of optimal margin distributions, in order to enhance performance.With the basis of the kernel-free technique, QLDMR maintains characteristics of the LDMR method in the consideration of margin distributions to improve the generalization and robustness.Moreover, QLSLDMR transforms the inequality constraints of QLDMR into an equation, achieving the computational costs cutting down.In the research, the detailed implementation of these methods and the calculation of relative time complexities are fully demonstrated and discussed.In terms of the experiment results of artificial and benchmark datasets, it is clear that the performance of SQLDMR and QLSLDMR present superiority over original kernel-free methods when it comes to generalization capacity and robustness.Additionally, in the case of the dimension of the training samples is much smaller than the number of samples, empirical evidence shows that it takes less time for the performance of kernel-free methods than the original methods.In conclusion, the proposed methods, combining optimal distribution and a kernel-free technique, significantly improve robustness, generalization, and efficiency.Thus, it is believed that this research makes progress and paves the way for further exploration in developing models that effectively handle nonlinear regression without the kernel trick.
More efforts need to be devoted to the enhancement of this method's robustness and its applicability to diverse and noisy data in real.Therefore, in future research, given the prevalence of various types of noise in real-world datasets, devising efficient solution algorithms to cope with non-Gaussian noise issues would be the next move to enrich the experimental results in this field (Contreras-Reyes et al 2014).In this sense, proposed method could be addressed in the context of non-linear regression models.

Algorithm 3 . 1 .
QLDMR.Input: Training dataset X, positive trade-off parameter C, d1 and d2, positive epsilon-tube parameter ϵ.Output: Decision function g(x) = 1 2 x T W * x + B * T x + c * 1: Create matrix H and U following the section 2.4 mentioned steps.2: Solve convex quadratic programming (20) and obtain the value of α and α * 3: Compute z * = [ ŵT , B T , c ] T by using (19a) and build W * from the elements in ŵ * .4: Construct decision function g

Figure 1 .
Figure 1.Quadratic surfaces fitted by the kernel-free methods.

Figure 2 .
Figure 2. Three-dimensional histogram consisting of CPU running times for different samples size n and dimensions m for LSSVR, QLSSVR, and QLSLDMR on artificial datasets.

Figure 5 .
Figure 5. Mean and standard deviation of test errors of methods on benchmark datasets.

Table 1 .
Performance comparison of kernel-free methods on RMSE, MAE, SSE/SST, and SSR/SST for three-dimensional datasets.
Note: Bold highlights the best evaluation indicators in this experiment.

Table 2 .
Comparing the CPU running times under different samples sizes n and dimensions m (CPU time: section).

Table 4 .
Comparison of RMSE on contaminated artificial datasets with varying ratios of noise.

Table 5 .
Results comparisons of RMSE and #rank on Benchmark datasets.

Table 5 .
Note: Bold highlights the best evaluation indicators in this experiment.