Efficient discovery of multiple minimum action pathways using Gaussian process

We present a new efficient transition pathway search method based on the least action principle and the Gaussian process regression method. Most pathway search methods developed so far rely on string representations, which approximate a transition pathway by a series of slowly varying system replicas. Such string methods are computationally expensive in general because they require many replicas to obtain smooth pathways. Here, we present an approach employing the Gaussian process regression method, which infers the shape of a potential energy surface with a few observed data and Gaussian-shaped kernel functions. We demonstrate a drastic elevation of computing efficiency of the method about five orders of magnitude than existing methods. Further, to demonstrate its real-world capabilities, we apply our method to find multiple conformational transition pathways of alanine dipeptide using a quantum mechanical potential. Owing to the improved efficiency of our method, Gaussian process action optimization (GPAO), we obtain the multiple transition pathways of alanine dipeptide and calculate their transition probabilities successfully with density-functional theory (DFT) accuracy. In addition, GPAO successfully finds the isomerization pathways of small molecules and the rearrangement of atoms on a metallic surface.

ab initio accuracy. In addition, GPAO successfully finds the isomerization pathways of small molecules and the rearrangement of atoms on a metallic surface.

Introduction
Finding multiple transition pathways of phase transitions, chemical reactions, and conformational transitions remains challenging in physics and chemistry. With conventional molecular dynamics (MD) or Monte-Carlo (MC) approaches, i.e., initial-value formulation, it takes a considerably long time to simulate an evolution over energy barriers starting from an initial state. To overcome such limitations, numerous single-end methods have been suggested [1][2][3][4][5][6][7][8][9][10][11][12][12][13][14][15][16]. Methods include using Hessian to climb up the energy barrier [1,2], using the eigenvalues of a Hessian matrix [3,4], repeating the process of climbing up and relaxing to locate multiple local minima [5], and using the knowledge, which atoms are more likely to form or to break a bond, to guide the transition [12-14, 16], were suggested. The referred methods find the transition state efficiently. However, these methods do not guarantee finding a transition pathway reaching a targeted product state.
Another class of approaches, biased sampling methods, were introduced to sample a broad range of configurational space [8][9][10][11]. By applying additional potential on the region already sampled, it forces a system to explore over energy barriers [8,9]. They efficiently estimate the probabilities of sampled local minima or observable properties of an ensemble. However, they cannot provide the probability estimates of multiple transition pathways and associated kinetic information.
To overcome the limitations of single-end methods, various double-end, or fixed-boundary-value, sampling approaches have been developed [17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36]. The fixed-boundary-value formulation guarantees to find virtual trajectories connecting two end states if a calculation converges successfully, while the initial-value formulation does not. Berkowitz et al. suggested variational formula for the system with frictional resistance [17], Elber and Karplus suggested the Gaussian chain approach that minimizes the line integration of a potential along a trajectory [19]. Czerminski and Elber added a repulsion constraint to the Gaussian chain approach to prevent replicas from aggregation [20]. To accelerate the convergence, the nudged elastic band (NEB) method was suggested [24]. NEB applies an artificial force to the tangential component of a trajectory chain to remove the corner-cutting problem. These methods find the minimum energy pathway (MEP) with the lowest energy barrier along the transition trajectory. However, the concept of a minimum energy pathway is not well-defined when multiple energy barriers exist between end states on a highly rugged energy landscape.
Contrary to MEP-based approaches, pathway sampling methods based on the least action principles were proposed [17, 21-23, 25, 29-31, 34-38]. The variational Verlet algorithm [21] and the Fourier component relaxation method that minimize the classical action of the trajectory expressed as Fourier (sine) components [22] were suggested. Olender and Elber suggested using Onsager Machlup action to investigate the long-time molecular trajectory [23].
Passerone and Parrinello [25] developed an action-derived molecular dynamics (ADMD) method, which finds low-action pathways via local optimization of a modified classical action with an energy conservation restraint term to keep the system's total energy constant throughout a reaction. Lee and coworker [29] developed a kinetic controlled ADMD method introducing a restraining term to the modified action that satisfies the equipartition theorem. However, these classical-action-based methods have an inherent limitation: the classical principle of least action is an extremal, not a minimum, principle. In other words, the classical action can be either minimized or maximized, which makes its computational outcome ambiguous.
As a way to overcome this limitation of classical-action-based methods, pathway sampling approaches based on the Onsager-Machlup (OM) action [39 -41] have been suggested. the probability p(x B x A ; t) for a state x A to become another state x B within a given time interval t, can be described as follows [30,[39][40][41]: where S OM is the OM action and D is the notation for integrating all the possible pathways connecting two end states. The probability of a transition from the state x A to x B is calculated by summing up the probability of all pathways s e −S OM (x(τ ))/k B T [39, 40,42,43]. For most cases, a state transition usually incorporates multiple pathways, such as multiple protein folding pathways. The most dominant pathway is determined by the path with the lowest OM action, corresponding to the maximum probability of a transition [30, 31, 35-37, 41, 42, 44]. Lee and coworkers utilized an efficient global optimization method to identify the most dominant transition pathway corresponding to the global minimum of OM action and sample multiple less prevalent pathways [36].
A major bottleneck of the existing pathway search methods is their computational cost. Calculating the action of a pathway requires the energy and gradient of every replica of a pathway. A small step size between replicas is necessary to generate smooth pathways. Thus, there is a trade-off between trajectories' accuracy and computational efficiency.
To alleviate this computational cost issue, Jónsson and coworkers [45,46] suggested the one-image evaluation (OIE) method by combining the Gaussian process (GP) [47,48] with NEB [24]. The OIE method finds the MEP by building a Gaussian potential energy surface (PES) approximating the actual PES iteratively. They reduced the number of function calls reduced by two orders of magnitude on an artificial 2D PES [46].
Another major limitation of existing pathway search methods, such as NEB, is that they are local optimization methods whose final results heavily depend on initial guesses on pathways. Generally, conventional pathway search methods start from the linear interpolation between end states and perform local optimization of the initial pathway. As a result, these approaches can find the correct minimum energy or action pathways only when a given PES is simple and smooth. However, many complex systems have highly rugged PESs, which do not guarantee to locate the most dominant pathway near the linear interpolation between two states. An extensive search on a pathway space is essential to avoid this problem.
To overcome these limitations of the pathway search methods, we introduce a novel transition-pathway-sampling method, Gaussian process action optimization (GPAO), by combining the GP algorithm with a global action optimization approach. Inspired by the OIE method, we only calculate the most uncertain images measured by the variance obtained with GP. Then the energy and forces of the images are used to construct the Gaussian PES. Afterward, the intermediate images' potential energies, forces, and Hessian are estimated using an updated Gaussian PES. Pathways are optimized to have the minimum action using the predicted potential, forces, and Hessian. Our method conducts direct OM action optimization with a total-energyconservation restraint without using the original system's Hessian.
The most significant advantage of GPAO is a dramatic reduction of the number of force calls by four-to-five orders of magnitude than the conventional action optimization method while preserving their accuracy. The significant reduction of the computational cost by GPAO allows us to find multiple transition pathways of a molecule-here, we choose alanine dipeptidewith a quantum-mechanical (QM) potential. Additionally, we investigated the isomerization pathways of small organic molecules and the rearrangement pathways of atoms on a metallic surface. Throughout these examples, it is clearly identified that GPAO successfully identifies multiple physically accessible trajectories between the two end-states using QM potential energy. To the best of our knowledge, this work is the first report, which identifies multiple conformational transition pathways of alanine dipeptide with an ab initio potential.
The process of how GPAO minimizes the MEP's uncertainty and approximates the MB potential is illustrated in Figure 1. At iteration 1, the pathway is optimized with the three data points: the initial, the final, and a random intermediate state between the two. Due to the lack of data points, potential energy values estimated along the trajectory are highly uncertain. The uncertainty range (light blue intervals) of the estimation at iteration 1 is much wider than iteration 9 ( Figure 1b). The main idea is to calculate the energy and forces of the point that minimizes the uncertainty of a trajectory most. Thus, at iteration 2, GPAO collects data at the most uncertain region of the previous iteration (marked with a red cross in Figure 1b). After the update, the OM action of a pathway is minimized on the updated Gaussian PES, whose hyperparameters are optimized with new data points. After hyperparameter optimization, at iteration 3, the energy and forces of the most uncertain points in the pathway are added to the database (red crosses in Figure 1b), and the process is iterated. At iteration 9, the pathway is considered as converged since the uncertainty criteria, Σ (max) < 0.05, are satisfied. In total, GPAO requires 11 force calls.
The targeted total energy of the system, E t , one of the hyperparameters    The map and graph at iterations 1, 2, 3, and 9 are plotted from left to right. The red line at the Gaussian PES indicates trajectory, the black crosses are positions where data is acquired, and the white arrows are the force data of the corresponding coordinates. On the uncertainty map, we marked additional red crosses corresponding to the latest position where data are acquired in addition to the white crosses marked at identical locations on all Gaussian PESs. In the energy graph, the blue line shows potential energy, light blue means 95% confidence range of potential energy, the red dot indicates kinetic energy, the orange dashed line is the total energy and the green diamond line expresses target energy.
of the method, is adjusted during iterations based on the following equation: is the target energy of iteration i − 1, and µ (max) is the estimated maximum potential energy of iteration i−1. Figure 1(c) shows how GPAO approximates E t , iteratively. At iteration 1, E t is set to −1.47, the minimum energy between the initial and the final image. GPAO sets µ (max) = −0.10 from the trajectory optimized with three data points. Using this information, at iteration 2, E t is raised to −0.81, which is the half of E t + µ (max) of iteration 1. Thus, GPAO estimates µ (max) to be −0.20 with the four data points. Similarly, at iteration 3, E t is set to the half of E t + µ (max) of iteration 2, −0.368.
Using the target energy obtained from GPAO, we also obtain pathways using DAO with three actions: classical action with total energy conservation restraint S cls (blue dashed line in Figure 2), OM action S OM (orange dash-dotted line in Figure 2), and OM action with total energy conservation restraint Θ OM (solid green line in Figure 2). The trajectories obtained with direct optimization of three actions are denoted as C cls Θ , C OM S , and C OM Θ , and the trajectories obtained with GPAO are denoted as GP-C cls Θ , GP-C OM S , and GP-C OM Θ .
The most remarkable result is that GPAO requires significantly fewer force calls than DAO (Table 1). GP-C cls Θ requires 12 force calls, while C cls Θ requires around 1,200,000 force calls. Similarly, GP-C OM S and GP-C OM Θ require 14 and 12 force calls respectively, while C cls Θ and C cls Θ require 600,000 and 900,000 force calls respectively. Overall, GPAO reduced the number of force calls in four to five orders of magnitude.
The GPAO results are highly similar to those obtained with DAO with the significantly reduced numbers of force calls. Overall, the Frétchet distances between the pathways obtained with GPAO and DAO are less than 0.08% of the pathways' total distances. The distance between C cls Θ and GP-C cls Θ is 0.006. Similarly, the distances between C OM S and C OM Θ and their counterpart obtained with GPAO are 0.082 and 0.042, respectively. GP-C cls Θ deviates from C cls Θ less than a Fréchet distance of 0.01. All four GP-assisted pathways show a deviation of less than a Fréchet distance of 0.1 than the DAO results. These results clearly demonstrate that approximating PES with GP is accurate enough to locate the transition pathways with low action on a given PES.
The convergence of GPAO depends on the number of data points near the MEP, which should be large enough to approximate the PES accurately. Overall, GPAO requires similar numbers of force calls regardless of the selection of the objective function (Table 1). For example, the results of the GP-assisted NEB method, GP-C NEB , showed similar numbers of force calls to reproduce the result of the conventional action optimization method. Indeed, the number of data points required to find an accurate MEP on the MB potential is also between 10 and 20. These results imply that the type of local relaxation method does not significantly affect the efficiency of GPAO. In contrast to GPAO, the calculation cost of DAO is strictly proportional to the number of images in a pathway.
After confirming that GPAO successfully optimizes a given action of a pathway, we compare the characteristic of C cls Θ , C OM S , and C OM Θ ( Figure 2a). The results demonstrate that Θ OM successfully conserves the system's total energy throughout the pathway (Figure 2b). The magnitude of total energy fluctuations of C OM Θ is reduced by 98.7% than that of C OM S . The action value of C OM Θ increases only about 19%, and the Fréchet distance to C OM S is less than 0.067 (Table 1). Also, 95% of the intermediate images of C OM Θ are located within a distance of 0.013 from C OM S . This result demonstrates that the total energy conservation term affects little to the shape of the pathway while conserving the total energy of the system. The results indicate that enforcing the energy conservation restraint on C OM S gives only little effect on finding low V (max) and S OM . Figure 2b shows the total energy profiles of three trajectories and their energy gaps, the energy differences between the maximum and the minimum total energies of pathways. The energy gaps of C cls Θ , C OM S and C OM Θ are 0.09, 3.85 and 0.05, respectively. The large spikes of the total energy are observed between the time steps 0.8 ∼ 1.1, where the trajectory climbs up the energy barriers of the PES. The main reason for this huge spike of the total energy is the second term of the S OM equation (Eq. (21)). The first term in Eq. (21) prefers to minimize atoms' forces, and the third term keeps the velocities between images low. The second term in Eq. (21) favors following a concave potential energy surface along its ongoing direction. More specifically, if a  (Table 1). Our results demonstrate that unphysical increases of atomic velocities to lower the second term can be suppressed effectively with the total energy conservation restraint without changing trajectories. Furthermore, two pathways, C OM S and C OM Θ , overlap significantly each other.

Isomerization pathways of small organic molecules
To further validate the advantages of GPAO in application to real molecular systems, we benchmark the computational cost and characteristics of pathways obtained with DAO and GPAO using density functional theory (DFT) on the isomerization of small molecules: formaldehyde, formic acid, and propyne ( Figure 3). We perform GPAO first, and the resulting target total energy is used for DAO calculations. Additionally, the final trajectory obtained from a GPAO calculation is used as the initial trajectory for DAO to reduce computational cost. In this way, the difference between trajectories approximated with GPAO and true ground state obtained with DAO is compared. Atomic configuration of initial states (IS), transition state (TS), and final state (FS) of each molecule are drawn in Figure 3 with the arrows pointing to corresponding reaction coordinates. GPAO begins with the five data, two initial and final states and three points between them. During the process, the potential profile of the trajectory quickly converges as the data accumulate throughout iterations. In Figure. 3, the potential energy profiles of GPAO (solid blue line) and their variance (light blue shade) slowly converge to those of DAOs (solid orange line). In the rightmost graph in Figure 3, Gaussian PES constructed with this process are compared with true DFT data sampled along the DAOs trajectory (orange dot). In all three cases, using only, comparably a few data points sampled with the GPAOs process (black cross), root mean square error (RMSE) is less than 0.06 eV, or 0.01 eV/atom.
A comparison of the number of force calls, energy barrier, and actions obtained with GPAO and DAO is summarized in Table 2. The number of force calls for DAO of formaldehyde, formic acid, propyne are 3.52 × 10 5 , 3.70 × 10 5 , and 4.73 × 10 5 , respectively. Meanwhile, the numbers of force calls conducting GPAO counterpart are 18, 30, and 49, respectively. Thus, the overall computational cost of GPAO is less than that of DAO by two to three orders of magnitude, while the error of OM action and energy barrier is suppressed under 5%. All the example shows consistent improvement in computational efficiency with comparable accuracy of GPAO.
One of the major challenges in performing DAO with DFT is an enormous computational cost for Hessian calculations. The Hessian matrix can be obtained analytically for simple potentials such as the MB potential. However, obtaining a Hessian matrix is not straightforward for a DFT potential in At third, both the results obtained with GPAO and DAO are drawn together with three arrows pointing to each reaction coordinates of their atomic configurations, initial state (IS), the transition state (TS), and the final state (FS) of isomerization reaction. On the rightmost graph, a comparison between DFT data and its prediction by Gaussian PES constructed with GPAO is drawn with RMSE. DFT data used for comparison is gathered from the trajectory obtained from DAO. Black crosses are the data gathered by GPAO. Hydrogen, carbon, and oxygen are colored white, grey, and red.
general. Thus, a finite difference method is used in this study. To obtain a Hessian matrix using the central difference method, D additional force calls are needed, where D is the degree of freedom. For example, for the Hessian calculations of formic acid and propyne, 12 and 18 additional force calls are required.
The Cartesian coordinates of atoms are used as the input for the Gaussian PES kernel. To satisfy the rotational and translational invariance of the GP process, the position of an arbitrary atom of a system is fixed. When an atom is not fixed, GPAO keeps searching the same geometry but slightly translated position, making the convergence of GPAO difficult.

Surface reactions
In the third class of examples, the diffusion and rearrangement of atoms on a metallic surface are studied. The potential energies of metallic systems are calculated using effective medium theory (EMT) in the Atomic Simulation Environment (ASE) package [50]. Using the EMT potential, we compare the characteristics of DAO and GPAO results on two examples: 'Au hopping' and 'Pt rearrangement' [46]. In the 'Au hopping' example, a gold atom jumps to its neighboring site on the aluminum fcc (100) surface. To visualize the diffusive pathway, the atomic configurations at the key states are plotted in Figure 4. From left to right, the atomic configurations of the initial state (IS), transition state (TS), and the final state (FS) watching from the top and the side are illustrated. For both IS and FS, the gold atom is propped in the middle of four aluminum atoms. In TS, the gold atom is on the saddle point supported by two aluminum atoms. Two aluminum atoms right under the gold atom sink slightly, while the other two aluminum atoms, far from the gold atom, slightly protrude along the z-axis. This shift lowers the energy barrier to 0.4 eV. The GPAO trajectory successfully reconstructs the shift of potential energy barrier, leading to an almost identical trajectory to the DAO trajectory.  Figure 4: A comparison of low action pathways obtained with GPAO and DAO of a gold atom hopping on the aluminum fcc (100) surface. A side view and a top view of the surface reaction and its potential energy profiles are plotted. The potential energy profile of GPAO (solid blue line) and its variance (light blue shade) at the iteration 1, 3, and 10 (Iter: 1, 3, and 10) are plotted from left to right. At iteration 10, DAOs potential energy profile (solid orange line) are drawn together with GPAOs one, with three additional arrows coming from the initial state (IS), transition state (TS), and final state (FS) to the corresponding reaction coordinate (Reaction coord). Finally, DFT data gained from GPAO (black cross) are compared with DFT data of DAOs trajectory (orange dot) at the rightmost graph. The gold atom is colored in yellow, and aluminum is colored in brown.
Below the atomic configuration, energy profiles at iterations 1, 3, and 10 (Iter: 1, 3, and 10 in Figure 4) are plotted. While optimizing action, only the gold atom and its neighboring four aluminum atoms are allowed to move and the other atoms are fixed reducing the degree of freedom D = 15. At iteration 1, GPAO optimizes a trajectory on a Gaussian PES, which is constructed with five initial data points. The initial data are IS, FS, and additional three random points between the two end states. The energy profile at this stage highly deviates from that of DAO, and it has high uncertainty.
As the calculation proceeds, the GPAO trajectory and energy profile converge to those of DAO. At iteration 10, the trajectory converges to the DAO trajectory with a covariance less than 0.05 eV. As a result, using only 13 data points, GPAO accurately predicted the saddle points. The EMT data and predicted potential with GP of the images of the DAO trajectory are compared. For 15 images, the RMSE of potential prediction is 70 meV, or 5 (meV/atom). Compare to the common neural network potential, the number of data required to find the energy barrier is relatively small. This implies that our iterative searching scheme is highly effective for investigating surface hopping problems.
In the 'Pt rearrangement' example, seven platinum atoms packed as a hexagonal shape on the fcc (111) surface are rearranged to a parallelogram shape. Among 71 atoms, seven atoms on the surface are set to move freely, setting the degree of the freedom as D = 21. To lower the calculation cost for the Gaussian kernel, surface platinum atoms under the seven hexagon atoms are fixed. In Figure 5, from left to right, IS, TS, and FS of 'Pt rearrangements' are drawn emphasizing moving atoms with a blue edge. In IS and FS, seven platinum atoms on the surface are located at symmetry sites. In TS, the center atom is misplaced from its symmetric sites, while maintaining a two-fold symmetry. This lowers the energy barrier to 1.0 eV.
Below the atomic configurations in Figure 5, the energy profiles of GPAO at iterations 1, 9, and 186 (Iter:1, 9, and 186) are illustrated. At iteration 1, GPAO optimizes the trajectory from five data points: IS, FS, and three linearly interpolated points between the two. The energy barrier at this stage is nearly 2 eV, and the confidence for this prediction is low as expected. As low energy configurations accumulate, the trajectory converges to the DAO trajectory. The black cross mark stacked at the RMSE graph is these attempts to correctly locate the configuration near-optimal pathways. At iteration 186, GPAO converges and successfully locates the MEP. A comparison between the predicted energies on the Gaussian PES and EMT data sampled from the DAO trajectory is plotted on the right side of Figure 5. The RMSE between two calculations is 95 meV, or 5 meV/atom. These results show that GPAO predictions are in great agreement with the EMT results.  To validate the advantage of the GPAO method, the results of both DAO and GPAO are summarized in Table 3

Multiple conformational transition pathways of alanine dipeptide
Encouraged by the significant gain of computational efficiency by GPAO in five orders of magnitude, we investigate multiple conformational transition paths of alanine dipeptide through ab initio calculations. Alanine dipeptide has been widely used as a test system for computational methods in biophysics [36]. It has two rotatable dihedral angles, the φ and ψ angles and two stable conformations, C7 eq and C7 ax ( Figure 6). The PES's shape shows that multiple conformational transition pathways from one conformation to the other are possible ( Figure 6). We combine GPAO with the Action-CSA method that finds multiple transition pathways through the global optimization of action on a trajectory space and call it GP-CSA. The multiple transition pathways of alanine dipeptide are sampled using GP-CSA calculations with three different total transition times, t total = N dt = 3, 6, and 9 ps. Each pathway converged locally using GPAO. Further details on the converging process of alanine dipeptide are provided in Figure ??. The potential energies and forces of alanine dipeptide are calculated using the ab initio based density-functional-theory package VASP [51,52]. The constrained MD method of VASP is used to optimize the conformations of C7 ax and C7 eq with fixed φ and ψ angles. Since the structures of alanine dipeptide are almost exclusively defined by φ and ψ angles, its potential can be expressed as follows: V x (n) = V φ (n) , ψ (n) . This reduction on coordinates x (n) = φ (n) , ψ (n) , v (n) = φ (n+1) − φ (n) , ψ (n+1) − ψ (n) /dt is extended on the formulation of the modified OM action for alanine dipeptide. Alanine Dipeptide PES (eV) (a) Our Θ OM for alanine dipeptide is where ∇V indicates the poten- tial difference with respect to φ and ψ, I ψ (φ, ψ) indicate the moments of inertia associated with φ and ψ angles of n'th image, respectively. We use the following periodic kernel to describe the PES of alanine Table 4: Action values and energy barriers of multiple conformational transition pathways of alaninit dipeptide. The OM value and maximum potential energy of conformational transition pathways of alanine dipeptide are obtained with GP-CSA under three different transition time scales. The pathways from A to E are shown in Figure 7.
where x = (φ, ψ) is a two-dimensional reaction coordinate (D = 2), d is a dimensional index, and l, σ f are isotropic hyperparameters. Among the pathways obtained with GP-CSA, distinctive pathways with low action values are plotted in Figure 7. Path A in Figure 7 has the lowest S OM value for all transition times, suggesting that Path A is the most dominant pathway between C7 eq and C7 ax conformations. This result is consistent with previous studies conducted with molecular mechanics potentials [36,53]. The maximum potential energy of a pathway estimated with GP, µ (max) is near -128.48 eV (Table. 4). For all three different t total , GP-CSA consistently locates the correct saddle points.
The S OM value of Path D with a t total of 3 ps is 11.282 eV, which is high enough to make the path improbable. However, the action value decreases drastically to 5.551 eV and 4.887 eV when t total elongates to 6 ps and 9 ps. As t total increases, Path D becomes longer, allowing the pathway to detour the high potential region and to seek a lower valley of the PES (see Path D in Figure 7b and c). This tendency is also observed in the µ (max) values in Table 4. µ (max) of Path D at 3 ps is 0.2 eV higher than the other transition times.
It is noticeable that the relatively longer pathways, i.e, Path E, F, and G, are not observed in simulations with t total = 3 ps. As t total becomes longer, GPAO samples additional pathways, which require longer t total to occur. S OM values of Path E obtained with t total = 6 ps and 9 ps are 8.547 eV and 6.745 eV, and µ (max) values are −129.237 eV and −129.316 eV, respectively. The maximum potential energy of Path E of t total = 6 ps is 0.1 eV higher than that of Path E with t total = 9 ps. This difference arises because Path E with t total = 6 ps did not pass the true saddle point due to a short transition time. However, Path E with t total = 9 ps correctly passed the saddle point (see E in the Figure 7c).
With t total = 6 ps, two additional pathways, Path F and Path G are found, which are similar to the most dominant pathway, Path A. These two pathways are longer pathways with higher S OM values, 4.56 eV and 5.293 eV, than Path A. These results clearly indicate that our GPAO method finds multiple transition pathways of the conformational change of alanine dipeptide accurately with a QM potential, which has not been reported previously.
GPAO also improves the Hessian's accuracy by accumulating data points, which allows one to utilize potential functions whose Hessians are hard to calculate for path sampling. The conventional way of obtaining the Hessian of a DFT potential is to utilize a finite difference calculation. That is, to acquire a Hessian at an arbitrary point φ (n) , ψ (n) , one needs to compute ∇V (φ (n) , ψ (n) ), ∇V (φ (n) + ε, ψ (n) ) and ∇V (φ (n) , ψ (n) + ε). These two additional calculations triple the computation cost to evaluate Θ OM preventing the use of finite difference calculation from being applied to relatively large systems. Thus, a fast evaluation of a potential using GP makes using Hessian for practical problems feasible.

Conclusion
In this study, we demonstrate that the GP algorithm dramatically enhances the efficiency of searching minimum action pathways by approximating a PES accurately with much-reduced numbers of energy evaluations with diverse examples: the MB potential, isomerization of small molecules, surface reactions, and conformational transition of alanine dipeptide. For most cases, the transition pathways obtained with GP and DAO show little discrepancies. The errors of energy barriers are also marginal. These results indicate that transition pathway search using the modified OM action with a total energy conservation restraint on an approximate PES is an efficient and accurate approach. In addition, compared to the direct optimization of the OM action itself, large fluctuations of kinetic energies are effectively removed without affecting the OM action of a pathway.
The gradients of OM action require the Hessian calculation of a potential function, which is computationally expensive to obtain in general. Thus, combining GP with action optimization gives not only superior computational efficiency also accurate approximations to the gradients of the OM action without actual Hessian calculations. This advantage is especially significant for the case where the evaluation of potential energies and their Hessians are challenging to obtain, i.e., the Hessian of DFT-based potentials is often incorrect. On the contrary, GPAO accumulates gradient information at multiple data points and leads to the accurate estimation of the Hessian from a Gaussian PES. Since the Hessian of a Gaussian PES can be estimated accurately, the OM action of a pathway can be readily obtained with DFT-based potentials with enhanced computational efficiency by five orders of magnitude. Additionally, we also showed that our GPAO method automatically finds suitable hyperparameters, such as target energy. This study also demonstrates that combining GP with Action-CSA [36] enables the sampling of multiple pathways. Due to GPAO's efficiency, multiple conformational transition pathways between the two stable conformations of alanine dipeptide are successfully obtained using a DFT-based potential. This is the first work that sampled multiple low action pathways of alanine dipeptide with a DFT potential to the best of our knowledge.
Tackling higher-dimensional problems in an efficient way is still a daunting challenge in machine learning fields and other related fields. This paper proves that our GPAO method properly works for relatively small-dimensional problems, whose degree of freedom is less than 30. However, the method does not guarantee its efficiency on problems with a large degree of freedom yet. Fortunately, many methods have been suggested to solve problems in a high dimensional space using various types of descriptors [54][55][56][57]. Furthermore, finding reaction pathways of the systems with many degrees of freedom can be applied to various problems such as finding drug binding pathways, battery, and protein folding. Thus, we believe that GPAO will open up new possibilities for such problems.

Gaussian Process
Significant computational improvements of GPAO over the conventional least action methods are attributed to quantifying the variance of specific images in a pathway by using Bayesian inference. For example, let us suppose a pathway consisting of N images and M evaluated data points. In our GP model, the probability distribution of a given string of images, (2) , · · · , x (n) , · · · , x (N ) , whose energies and forces are represented by Y * = [V * , ∇V * ], is given by a multivariate Gaussian distribution: where X = x (1) , x (2) , · · · , x (m) , · · · , x (M ) are the evaluated data points and Y = [V , ∇V ] correspond to their corresponding potential energies and forces. µ * and Σ * are the mean and the variance of the Gaussian distribution, respectively: where m and (K) mm = k x (m) , x (m ) are the mean and the kernel matrix of the multivariate Gaussian distribution, p(Y) = N (m (X) , K (X, X)). m * and (K * * ) nn = k x (n) , x (n ) are the mean and the kernel matrix of N (m (X * ) , K (X * , X * )). (K * ) mn = k x (m) , x (n) is the kernel matrix connecting coordinates between the evaluated data points and the images in a pathway. σ n is a noise parameter and I is the identity matrix. A detailed description of GP is provided in Supporting Information [58].

Classical and Symmetric Onsager-Machlup action with total energy conservation
For a pathway consisting of N images and A atoms for each image, the discretized representation of the classical action with energy conservation restraint, Θ cls [25], is where µ E is the coefficient of the total energy conservation restraint term, E t is the target total energy of the system. The classical action is defined as Total energy at the time step n is defined as follows: If a system propagates according to the Langevin dynamics, the dynamics of the diffusive system can be described by the OM action [39][40][41]. Miller III and Predescu suggested the symmetric OM action where it is numerically stable up to second order [41]. For a large system with a highly viscous bathtub, the time evolution of the system is described bẏ whereẋ is time derivative of the coordinates, γ is friction coefficient, anḋ W is time derivative of the Wiener process or Brownian motion. Transition probability of the distribution of system is where I is an indicator function of configuration space. From the Markovian properties of diffusive dynamics, the Canonical ensemble in the form of reweighted propagator is, where t = ∆t/n. G(x k , x k+1 , t) follows Bloch equation, where V eff is effective potential written as, Solution for the Eq. 14 is Feynman-Kac equation, x ,x is symbol for average of all Brownian motion with variance σ 2 = 2Dt. To alleviate the second order derivation term Miller III and Predescu suggested dividing t 0 V (W t )dt into 23 two finite difference of first order terms [41]. That is, We get discrete propagator for diffusive dynamics without containing Hessian, Transition probability that states A at time 0 changes into state B at time t is, (20) More detailed derivation can be found in Ref [41]. The discretized formula of the OM action is given by Here, ∆V is the energy difference between the initial and the final step, and γ is the damping constant. In this work, we use the modified OM action 24 with the total energy conservation restraint:

Gaussian Process Action Optimize: GPAO
In this work, we also implement a routine that automatically finds suitable target energy of a pathway during pathway sampling. Target energy is a parameter that is inherently hard to know prior to the sampling of an actual pathway because the proper target energy is closely related to the height of the energy barrier of a pathway. Our GPAO method handles two tasks simultaneously. First, it minimizes the uncertainty of a Gaussian PES near the sampled pathways in an iterative way. Second, the method determines the total energy (E t ) of the MEP automatically.
In this study, we assume that the E t of a system is close to the maximum potential energy along with the MEP, V (max) . To gradually approximate the E t to V (max) , we set E t as µ is the estimated maximum potential energy of MEP on the Gaussian PES.

Computational details for the Müller-Brown potential
Both GPAO and DAO use the same number of intermediate images of 300. To ensure the continuity of a pathway, we use 300 images for a pathway for both methods. Thus, the energy and forces of 300 images must be calculated for a single action evaluation. Total transition time is set to 3 ps, damping constant, γ, is 1, and energy conservation restraint constant, µ E = 1. For GPAO, we used the squared exponential kernel for the Gaussian kernel: where l and σ f are the isotropic real hyperparameters. We use the multivariate Gaussian distribution with zero mean for GPAO calculations, m = m * = 0. For DAO, target energy is set to −0.368, estimated from the GPAO calculations.

Parameters for isomerization of organic molecules
The DFT calculations are performed with the VASP package [51,52]. The PBE exchange functional, energy cutoff of 520 eV, and two-body van der Waals interaction are used. For DAO and GPAO calculations, OM action with energy restraint action Θ OM is used, γ = 1.0, µ E = 1.0. Total transition time is set to 0.1 ps and the 150 intermediate images are used, N = 150, but only 100 sine components, N k = 100, are used for action optimization. For GPAO, we use the standard kernel with isotropic hyperparameters, σ f , l. The ranges of hyperparameters are set to 0.001 < σ f < 1000, 0.01 < l 2 < 10 12 , 0.0001 < σ e n < 0.01, and 0.00001 < σ f n < 0.001. In this case, using zero-mean, as we used in the MB potential, will poorly fit the PES since the target energy is aggregated near −20 eV to −35 eV. For this reason, we use 'maximum constant mean', where the maximum potential energy of the MEP of the previous iteration, µ (max) is assumed as a mean of Gaussian distribution. For DAO, target energy has used the value attained by the GPAO counterpart and the results of the GPAO calculations are used as the initial trajectories of the DAO calculations.

Parameters for surface reactions
The calculator we used for this example is the embedded medium theory (EMT) calculator implemented under the Atomic Simulation Environment package (ASE). Parameters of OM action with energy restraint, γ = 1, µ E = 1, and E t are set by the value obtain by GPAO. Both GPAO and DAO methods conducted straight line conditions, total transition time is set to 10 ps, a number of the intermediate images, N , is 150, and a number of sine components used in action optimization are 50, N k = 50. For DAO, Hessian is calculated using the central finite difference method. For GPAO, the standard kernel with the 'max mean' function, where 'max mean' is the mean function that returns the maximum potential energy of the previous iteration. The range of hyperparameters are set as 0.001 < σ f < 1000, 0.01 < l 2 < 10, 10 −6 < σ e n < 0.01, and 10 −7 < σ f n < 0.001.

Code and Data availability
The source code of GPAO can be found in the GitHub repository (https://github.com/schinavro/taps). GPAO consists of multiple tools in the open-source package TAPS. TAPS incorporates tools needed for datadriven pathway search methods such as a database or atomic calculator. For atomic calculations, we use the atomic simulation environment (ASE) [50] package, which can easily link with atomic calculators such as VASP [51,52], Quantumespresso [59], OpenMX [60], and SIESTA [61].
[9] Alessandro Supporting Information: Efficient discovery of S1. Discretized formula for the derivatives of modified action at the time step n Discretized formula for the classical action and energy restraint terms are Discretized formula for the derivatives of OM action is where ∂∇V x (n) /∂x (n) is the Hessian.

S2. Details on Gaussian Process Regression
For an atomic system, the vector representation for the input coordinates of data is are the x, y, z coordinates of the a-th atom of the m-th image. For an arbitrary coordinate system with D dimensions, the representation vector for the system is simply where x The representation for the entire dataset of GP or pathways can be expressed in a matrix form by combining all the data into one where each datum vector is concatenated laterally.
With the M number of evaluated data points and the N number of images, representation for data points and pathway X and X * have the size (D × M ), (D × N ), respectively: The representation for the output Y has the (D + 1) M dimensional vector with the following sequence: where V (m) is the potential energy of the m-th image x (m) and ∂V (m) /∂x related. In the case of the MB potential example, we use the standard (squared exponential) kernel. For the alanine dipeptide example, which has a periodic potential energy surface (PES), we use a periodic kernel. Our method also incorporates gradient information to exploit both potential energies and the gradient of system energies to construct a more accurate Gaussian PES. The elements of a simple kernel matrix K can be written as where (m, n) ∈ (M × N ) are the indices of the matrix. ∂ n K, ∂ m K and ∂ mn K are the tensors derived from the derivatives of the simple kernel matrix with their shapes: , respectively. The components for these tensors are expressed as follows: An extended kernel matrix K ext can be constructed via reshaping its tensors into a matrix form. We reshape the tensors ∂ n K, ∂ m K and ∂ mn K of the form (M × D × N ), The explicit expression can be written as follows: ∂x (2) where the shape of the matrix k (X, In total, the shape of the matrix K ext is (D(M + 1) × D(N + 1)).
Bayesian inference comes handy if one wants to estimate the energies of unknown pathway X * when the observed energy Y of a structure X is given. Specifically, the probability of X * to have Y * , for given X and Y can be calculated by integrating throughout possible posterior. Since the integration of Gaussian is also Gaussian, we obtain a Gaussian function with posterior predictive mean µ * and covariance Σ * as follows: where µ * and Σ * are defined as follows: where K y = k (X, X) , K * * = k (X * , X * ). σ n I is the error matrix having zero for all offdiagonal terms with the size (M (D + 1) × N (D + 1)). For simplicity, we denote X (m) ≡ x (m) and (K) ij = (k (X, X )) ij = k X (i) , X (j) = k x (i) , x (j) . The kernel matrix can be simply represented as K = k (X * , X). m y and m * are mean vectors at each site X and X * , respectively. We set the elements of the mean vector corresponding to potential data to the average of the gathered data. We set the elements of the mean vector corresponding to force to zero. The predictive Hessian H * of Gaussian PES is where K H = ∂ nn K ∂ mnn K is a Hessian kernel, having the second and third derivatives of Most kernels used in GP are associated with hyperparameters. Hyperparameters are selected to maximize the log-likelihood function: However, adding gradient information often breaks the positive definiteness of a kernel matrix. If it happens, the determinant is not guaranteed to be positive, which may make ln |K y + σ n I| impossible. In cases when we get a negative determinant, we generate pseudo-kernel and output vector, k (X + dX, X + dX) , Y + dY where all the force information is replaced with the interpolated potential value near dX. With this positive-definite guaranteed kernel, we can find hyperparameters that minimize the log-likelihood function, Eq. (S18).

S6
Total & Potential (eV)  6. If the new one has lower action than its counterpart, replace it with the new one.
7. Reduce the cutoff distance. If the cutoff distance is lower than a certain threshold, stop the iteration. Otherwise, go back to step 1.
With two randomly selected pathways, we choose a random image along the pathway.

S10
With the selected position, we slice both trajectories at the selected image and merge them.
After a new pathway is generated, a random mutation/perturbation is performed with a probability of 30%. For the pathway's mutation, the pathway is modified by adding another random number at the sine components of a pathway. After the local minimization of the pathway using the GPAO method, we search the nearest neighbor by measuring the Fréchet distance between the new pathway and the current candidates. The nearest neighbor is considered similar, readily accessible to each other, if the Fréchet distance is shorter than the current cutoff distance d cut . The initial cutoff distance is set to half of the average distance between the initial pathways. After an iteration, we reduce d cut by 2% and repeat the whole procedure from step 1 until d cut becomes below 0.05.  1.00e-02 < $l^2$ < 1.00e+01 1.00e-06 < $\sigma_n^e$ < 1.00e-02 1.00e-07 < $\sigma_n^f$ < 1.00e-03