Optimization of Robotic Arm Grasping through Fractional-Order Deep Deterministic Policy Gradient Algorithm

With the rapid development of robotics technology, robotic arm grasping has gained significant attention in the fields of automation and artificial intelligence. In this study, we propose a fractional-order deep deterministic policy gradient (DDPG) algorithm for optimizing robotic arm grasping tasks. Traditional machine learning algorithms face challenges in handling continuous action spaces, while the DDPG algorithm effectively addresses this issue. In this research, we first review the background and challenges of robotic arm grasping and provide an overview of the application of traditional reinforcement learning algorithms in grasping tasks. Subsequently, we introduce the principles and fundamental ideas of the DDPG algorithm in detail, discussing its potential for optimizing robotic arm grasping. To further enhance the performance of robotic arm grasping, we propose an improved approach based on fractional-order control. Fractional-order control exhibits unique advantages in environmental dynamics modeling and grasp posture optimization, enhancing the robustness and adaptability of robotic arm grasping. Through a series of experiments, we validate the effectiveness and superiority of the fractional-order DDPG algorithm in robotic arm grasping tasks. Our algorithm achieves significant improvements in grasping success rate and stability compared to traditional methods. The experimental results demonstrate that the fractional-order DDPG algorithm is better equipped to handle control challenges in continuous action spaces and optimize the performance of robotic arm grasping tasks.


Introduction
In recent years, with the rapid advancement of robotics technology, robotic arm grasping has played a crucial role in automation, industrial production, and service domains.The objective of robotic arm grasping is to enable robots to accurately and efficiently capture and manipulate various objects, thereby accomplishing complex operations and tasks [1].However, this task remains challenging due to the involvement of continuous action spaces and variable environmental conditions.
To overcome these challenges, the Deep Deterministic Policy Gradient (DDPG) algorithm has emerged in recent years.The DDPG algorithm combines deterministic policy and deep neural networks, enabling direct learning of policies in continuous action spaces [2].By utilizing experience replay and target networks, the DDPG algorithm enhances training stability and convergence, making it suitable for complex robotic arm grasping tasks.However, traditional DDPG algorithms still face certain challenges in robotic arm grasping.
For instance, robotic arm grasping tasks demand precise control over grasping posture and force, which may pose limitations for traditional DDPG algorithms [3].To further enhance the performance and robustness of robotic arm grasping tasks, this research proposes an improved approach based on Fractional-order control exhibits characteristics of non-integer derivatives and integrals, enabling better adaptation to environmental dynamics modeling and control requirements [4], [5].Introducing the concept of fractional-order control can enhance the precision, robustness, and adaptability of robotic arm grasping tasks.In this study, we will provide a detailed explanation of the principles and design of the fractional-order DDPG algorithm.We will propose an improved approach based on fractional-order control and validate its effectiveness and performance in robotic arm grasping tasks through a series of experiments.

Related Work
The field of robotic arm grasping has attracted significant attention from scholars and engineers, resulting in an active area of research.Over the past few decades, numerous methods and algorithms have been proposed to address the challenges in robotic arm grasping tasks.The relevant approaches in this study can be mainly classified into two categories: traditional reinforcement learning methods and deep learning-based methods.
Traditional reinforcement learning methods have found certain applications in robotic arm grasping tasks.For instance, Q-learning and DQN algorithms have been widely used to solve grasping problems in discrete action spaces [6].These methods learn the optimal policy by establishing stateaction value functions and iteratively updating them.However, traditional reinforcement learning methods face challenges when dealing with continuous action spaces.Discretization of the action space is required, leading to the problem of dimensionality explosion and increased computational complexity.
With the rise of deep learning, an increasing number of researchers have started applying deep neural networks to robotic arm grasping tasks.Deep neural networks possess strong fitting and expressive capabilities, enabling them to learn complex grasping strategies.For example, deep reinforcement learning methods such as the Deep Deterministic Policy Gradient (DDPG) algorithm have made significant progress in robotic arm grasping [7].The DDPG algorithm combines deterministic policies with deep neural networks, allowing direct learning of policies in continuous action spaces and overcoming some limitations of traditional methods.
Fractional-order control is a control method based on non-integer derivatives and integrals, which has received extensive research attention in the field of robotics.Fractional-order control exhibits excellent performance and robustness and is well-suited for modeling and controlling the dynamics of the environment [8].In the context of robotic arm grasping tasks, researchers have begun exploring the integration of fractional-order control into the optimization of control policies [9] to improve grasping performance.
In summary, traditional reinforcement learning methods and deep learning-based methods have provided important research foundations for robotic arm grasping tasks.However, traditional methods face limitations when dealing with continuous action spaces, and deep learning-based methods still face challenges in grasping pose and force control.Therefore, this study aims to combine fractionalorder control with the DDPG algorithm to further optimize the performance and robustness of robotic arm grasping tasks.

Method
This paper presents a method for robotic arm grasping based on the Fractional-Order Deep Deterministic Policy Gradient (FO-DDPG) algorithm.The proposed method addresses the low sampling efficiency issue inherent in reinforcement learning algorithms by improving the reward function based on the DDPG algorithm.To further enhance the performance and robustness of the robotic arm grasping task, fractional-order control is introduced.To validate the effectiveness of the improved DDPG algorithm, a three-dimensional simulation platform is employed to test the capability of the end effector to grasp a target object.Finally, a comparative analysis of the training results before and after the improvement is conducted through simulation experiments.

Improvement of reward function
As the environment becomes more complex, simple reward settings are no longer sufficient to meet the demands of the agent.The design of the reward function plays a crucial role in the learning speed, operational efficiency, and control effectiveness of reinforcement learning algorithms.In the case of a three-dimensional robotic arm, the reward function is commonly defined based on the distance between the gripper's end-effector and the target object.However, this simplistic reward setting may not capture the intricacies of the task.To address this issue, we propose an enhanced algorithm with an improved network framework, as depicted in Figure 1.This new approach aims to refine the reward function and achieve better performance in complex environments.( ) where Q is the target critic network,  is the target Actor network, and  is the discount factor.
The parameters of the Critic network are updated to minimize the following loss function.
( ) , where N is the number of batch samples.Update of the Actor network: The estimated action value of the Critic network is used as the optimization goal to calculate the gradient of the Actor network.
The parameters of the Actor network are updated according to gradient descent.
where  is the learning rate.
The improved reward function is a composite reward function that combines multiple reward functions, aiming to enhance the control capabilities of the trained algorithm model.

(
) ( ) ( ) represents the center coordinates of the target object.The goal is to guide the algorithm in controlling the robotic arm to bring the gripper's end effector closer to the target and grasp it.is the distance reward function, where the environment provides a single-step reward to the algorithm for each action executed by the robotic arm.
2 r is a sparse reward, which is only obtained when the algorithm determines the end of the current training episode.If the target object is successfully grasped within a specified number of steps, a reward of 100 is given; otherwise, a reward of assigned.

Introducing Control
Due to the traditional DDPG algorithm's use of a discount factor to compute the discounted value of future rewards, this method incorporates fractional-order control theory to adjust the fractional-order exponent and balance exploration and exploitation.The fractional-order deterministic policy is employed to select actions for the robotic arm, and a fractional-order discount factor is used to calculate the fractional-order return, thereby addressing the decay of long-term rewards more effectively.The fractional-order return allows for the consideration of rewards accumulated over a longer period.In recent years, Caputo-type fractional calculus, characterized by its simple definition and suitability for engineering control problems [10], has been widely applied in engineering applications.The definition of Caputo-type fractional calculus is shown in Equation ( 9). () where n is the smallest integer greater than  and (.) represents the Gamma function.

Constructing the Fractional-Order Deep Deterministic Policy Gradient (FO-DDPG) network architecture
Design and implementation of the network architecture for the Actor and Critic networks are carried out.The Actor network is responsible for learning the grasping policy, while the Critic network is used to estimate the action value.Fractional calculus methods are employed to optimize the value function network and policy network, enabling the capture of long-term dependencies within the state space.
Following the updating rules of the Fractional-Order Deep Deterministic Policy Gradient (FO-DDPG) algorithm, iterative training is performed.In each iteration, the grasping action is selected from the Actor network based on the observed values, and the value of the action is estimated using the Critic network.The target value function and loss function are computed, and the parameters of the Actor and Critic networks are updated using gradient descent.

Experiments
The learning rate for both the Actor and Critic networks is set to 10 -3 .The reward discount factor is set to 0.99.The replay memory buffer has a capacity of 10, 000 data samples.During the parameter update of the training network, a batch size of 32 is used.
During the training phase, the improved algorithm interacts with the simulation environment for a total of 61,000 steps, corresponding to one training cycle of 61, 000 iterations.There are two criteria for evaluating the completion of a single training episode: (1) Successful grasping of the object using the FO-DDPG algorithm, and (2) Reaching the maximum number of steps per episode, which is set to 200 actions.For subsequent training or testing, the robotic arm's pose is reset to the initial state, and the position of the target object is refreshed.
The performance evaluation of the robotic arm grasping includes two main aspects:   The episode number corresponding to a reward value close to 100 in the figure represents the test episode in which the robotic arm successfully grasped the object.By comparing Figure 2 and Figure 3, it can be observed that the improved algorithm has significantly increased the number of successful grasps, and the distribution is more uniform.The number of successful grasps in the 200 test episodes has increased from 22 to 80, and the average reward value has increased from -19.263 to 16.479.
The comparison of average rewards before and after the improvement of the DDPG algorithm is shown in Figure 4.According to the change in the average reward curve, it can be observed that as the number of test episodes increases, the average reward of the algorithm after improvement is higher than that of the unrevised version.This indicates a significant improvement in the effectiveness of the model's control.

Conclusion
The proposed fractional-order Deep Deterministic Policy Gradient (FO-DDPG) algorithm significantly improves the performance and robustness of robotic arm grasping tasks.Compared to traditional deterministic policy gradient algorithms, the introduction of fractional-order control allows the robotic arm to better adapt to uncertain and complex grasping environments.The introduction of fractional-order control effectively enhances the power control and pose adjustment capabilities of the robotic arm.Through fractional-order differentiation and integration operations, the robotic arm can precisely control the grasping force and position, thereby achieving more accurate and stable grasping tasks.Experimental results demonstrate significant advantages of our method over traditional approaches in terms of grasp success rate, grasp stability, and efficiency.In future research, we will further explore the application of fractional-order deep deterministic policy gradient algorithms in other robotic arm tasks and further optimize the efficiency and performance of the algorithm.

1
r (1) Average reward value: This measures the average reward obtained by the robotic arm during the testing process, and calculates the change in average reward over the testing episodes; (2) Grasping success rate: In each round, if the ROS robotic arm successfully grasps the object within 200 steps, it is considered a successful grasping operation.The success rate is calculated by counting the number of successful grasps out of 200 rounds.The testing results of the DDPG algorithm before and after improvement are shown in the following figure.

Figure 2 .
Figure 2. The variation of rewards with the number of test episodes for the original algorithm.

Figure 3 .
Figure 3.The variation of rewards with the number of test episodes for the improved algorithm.

Figure 4 .
Figure 4. Comparison of Average Reward Variation.
By integrating fractional-order control with the DDPG algorithm, we aim to optimize robotic arm grasping tasks. )