Physics-informed neural networks for an optimal counterdiabatic quantum computation

A novel methodology that leverages physics-informed neural networks to optimize quantum circuits in systems with N Q qubits by addressing the counterdiabatic (CD) protocol is introduced. The primary purpose is to employ physics-inspired deep learning techniques for accurately modeling the time evolution of various physical observables within quantum systems. To achieve this, we integrate essential physical information into an underlying neural network to effectively tackle the problem. Specifically, the imposition of the solution to meet the principle of least action, along with the hermiticity condition on all physical observables, among others, ensuring the acquisition of appropriate CD terms based on underlying physics. This approach provides a reliable alternative to previous methodologies relying on classical numerical approximations, eliminating their inherent constraints. The proposed method offers a versatile framework for optimizing physical observables relevant to the problem, such as the scheduling function, gauge potential, temporal evolution of energy levels, among others. This methodology has been successfully applied to 2-qubit representing H 2 molecule using the STO-3G basis, demonstrating the derivation of a desirable decomposition for non-adiabatic terms through a linear combination of Pauli operators. This attribute confers significant advantages for practical implementation within quantum computing algorithms.


Introduction
Quantum computing has emerged as a dynamic and vibrant domain of research within the scientific community, primarily driven by notable achievements and advancements in applied quantum machine learning [1][2][3][4], quantum simulation [5,6], and optimization of circuits and systems [7,8].Optimization problems have garnered particular attention, given their pervasive presence in various domains, including medicine [9], economics [10], logistics [11], and numerous others [12][13][14].Classical approaches to solving these problems from an industrial perspective often face challenges in terms of efficiency and speed, thereby motivating the exploration of quantum computing as a promising alternative.The escalating interest in these methods can be attributed primarily to recent experimental advancements.This surge in interest is particularly noticeable from an industrial and commercial perspective.Consequently, there is growing anticipation that both conventional computers and quantum computing, in general, could yield significant advantages, eventually achieving a state known as "quantum supremacy" [15].This potential advantage and progress have, in turn, spurred developments in various scientific domains, wherein contemporary quantum computers serve as proof of concept.However, it is essential to underscore that the applicability of these quantum algorithms warrants extensive research and investigation, particularly in the current state of quantum computing, which is commonly referred to as the Noisy Intermediate-Scale Quantum (NISQ) era [16] whose defining characteristic is the utilization of quantum processors with capacities of up to 1000 qubits.
Hybrid classical-quantum algorithms leverage NISQ devices while offloading a portion of their computational workload onto classical devices, offering considerable potential for practical applications in the field of quantum computing.A prominent example worth highlighting is the Variational Quantum Eigensolver (VQE) [17].The primary objective of VQE is to determine the lowest energy quantum state through hybrid optimization, utilizing a designated Hamiltonian operator in conjunction with variational quantum circuits.The realm of quantum machine learning also falls under the purview of these algorithms, seeking to adapt classical algorithms to their quantum counterparts to enhance and expedite computations by capitalizing on the principles of quantum superposition, entanglement, and interference.Within this domain, one can identify supervised classification algorithms like binary classification and those based on Grover's search algorithm [18].Notably, Grover's algorithm has demonstrated quadratic acceleration in solving problems such as k-medians [19] or k-nearest neighbors [20,21].On the other hand, an alternative methodology that has significantly progressed in the literature of this field and has laid the foundation for numerous studies is the Quantum Approximate Optimization Algorithm (QAOA) [22,23].This approach presents a valuable alternative for tackling combinatorial optimization problems using shallow quantum circuits through classical optimization of the associated parameters.In recent literature, substantial endeavors have been dedicated to employing these methodologies to solve the ground-state challenges of physical systems [24][25][26], reflecting the ongoing efforts to enhance and adapt these techniques for broader applications in quantum optimization.
In recent years, significant attention and interest have been directed towards the development of adiabatic quantum optimization (AQO) methodologies for confronting optimization problems [27,28] with direct practical implementations in the branches of physics and chemistry [29][30][31].These algorithms begin by initializing a quantum system in the ground state of a known Hamiltonian.The system's Hamiltonian is then slowly evolved into one that represents the problem to be solved, with its ground state encoding the solution.Leveraging the adiabatic theorem [32,33], it becomes feasible to ensure that the quantum system remains in its instantaneous state of lowest energy, provided the evolution of the Hamiltonian is carried out in a sufficiently slow and gradual manner and within a sufficiently extended period of time.Nevertheless, implementing slow adiabatic evolution at the experimental level is typically not feasible, necessitating the development of methodologies that accelerate these processes.In pursuit of this objective, recent scientific literature puts forth various approaches based on Shortcuts To Adiabaticity (STA) [34,35].These methodologies encompass diverse techniques, including fast-forward methods [36,37], invariant-based inverse engineering [38,39], and counterdiabatic (CD) protocols [40,41].Despite the noticeable progress in the first two methodologies, this study primarily centers around the CD protocols.These techniques are specifically designed to speed up the adiabatic evolution process from an initial Hamiltonian to a final Hamiltonian.This is achieved by incorporating non-adiabatic terms following Equation (1), which effectively nullify the transition amplitudes between any energy eigenstate of the original Hamiltonian [42].Consequently, the quantum system undergoes an accelerated adiabatic evolution in practical applications.The resulting Hamiltonian by including the CD term is given by H(t) := H AD (t) + H CD (t). ( The operator designated as H AD (t) in (1) will be tailored to facilitate the preparation of its ground energy state at the initial time of the evolution.Nevertheless, the main challenge of the CD protocols lies in the accurate and plausible determination of the operator encompassing the non-adiabatic terms of the process, denoted here as H CD (t).In general, the computation and acquisition of this operator are exceedingly intricate tasks, particularly when dealing with many-body quantum systems [43].As a customary approach in related literature, a time-dependent external parameterization, denoted as λ(t), is introduced to which the operators [44] are dependent (explained in detail in Section 2.2).Efforts have been directed towards a method for the approximate determination of this operator, leading to significant progress, such as the development of the Nested Commutator (NC) methodology [45].This advancement has led to recent contributions, exemplified by [23,46].Within this framework, the computation of these terms is simplified into a series of commutators involving H AD (t) and its derivatives concerning the mentioned external parameterization λ(t).As a result, the non-adiabatic terms of these protocols are obtained approximately through an expansion in orders, where the complexity of obtaining them rises accordingly with the number of particles in the system and the order considered in the expansion.Nevertheless, these methodologies may exhibit problem-dependent characteristics, as their escalating complexity in non-trivial physical scenarios might necessitate the adoption of alternative perspectives to approach the issue at hand.
In a distinct domain, within the rapid progression of the computer science field, the realm of Deep Learning (DL) has achieved prominence as a highly potent instrument for constructing diverse models, owing to its direct applicability in domains such as image processing [47], natural language generation and processing [48], time series prediction and classification [49], among a plethora of other possibilities.Present-day technologies encompass Recurrent Neural Networks (RNNs) [50], Long Short-Term Memory architectures (LSTMs) [51], the well-known transformers [52], sparse and submanifolds convolutional networks [53] utilized for images with limited information load, among other advanced techniques.The Physics-Informed Neural Networks (PINNs) methodology has emerged as a highly intriguing DL application within the realm of physical sciences since its first appearances in the literature [54,55].This approach aims to address specific problems by employing neural networks as powerful universal approximators for systems of Partial Differential Equations (PDEs) that govern the physics describing the evolution.Due to their remarkable potential as numerical solvers, PINNs have garnered considerable attention and established themselves as a viable alternative to conventional numerical solving algorithms.Extensive efforts have been undertaken in diverse branches of physics to apply this methodology and its adaptations.These fields encompass classical hydrodynamics [56], relativistic hydrodynamics [57], electrodynamics [58], chemistry [59], and many others.PINNs demonstrate their utility wherever differential equations are employed to describe the underlying physics of a given scenario.Their ability to unravel complex physical phenomena and offer numerical solutions has positioned them as promising tools for tackling intricate problems across various scientific domains.Consequently, the motivation to employ this methodology for addressing the challenge of CD protocols in quantum systems arises organically.The investigation of potential applications of PINNs in quantum systems and circuits becomes a natural course of study.
In this paper, we present an innovative approach for designing the counterdiabatic terms in CD protocols.Our proposed method lays on the utilization of PINNs, thereby representing a direct application of DL.We introduce a PINN-based methodology without supplementary alterations, enabling it to effectively resolve the underlying physics of the problem.Through the neural network, we obtain both the counterdiabatic terms and the temporal parameterization λ(t), as expounded in Section 2.2.Additionally, we directly explore the experimental feasibility by decomposing the nonadiabatic operator into tensor products of the set of Pauli and identity matrices.This approach offers a comprehensive and direct means to address the experimental applicability of the method.
The rest of the paper is structured as follows: Section 2 provides an introduction to the foundational theoretical framework concerning the operation of baseline PINNs.It further presents a comprehensive exposition of the theoretical framework under consideration, encompassing CD protocols and the specific problem that serves as the motivation for our research.Additionally, a thorough literature review of prior work in this domain is presented.Section 3 delves into a meticulous presentation of the adapted PINN methodology, particularized to address our specific case, while taking into account all pertinent physical factors that the network must conform to.In Section 4, we present notable outcomes obtained through our methodology and juxtapose them with previous findings in the field.Furthermore, we conduct comparisons and explore the scalability of the proposed methodology.Finally, Section 5 serves as a concluding segment, summarizing the principal conclusions drawn from our research and offering insights into potential avenues for future investigations.
2 General Concepts

Physics-Informed Neural Networks
The fundamental approach employed in PINNs methodologies [55] involves leveraging neural networks as powerful tools for approximating functions and solving physical problems by fitting sets of differential equations, known as PDEs.PINNs derive their name from the fact that they incorporate physical knowledge through the incorporation of inductive biases.These biases are manifested in various aspects of the methodology, including the design of the underlying neural network architecture, the formulation of appropriate cost functions (losses), and other characteristics that aim to ensure or enhance the convergence of the neural model.The underlying algorithm of these networks leverages the automated differentiation capabilities found in contemporary frameworks [60] to construct differential equations based on the output variables obtained from the network.These variables are essential for computing the specific problem at hand.By performing calculations, a minimization process is subsequently employed, guided by a designated loss function, to update the trainable parameters of the architecture.Consequently, this adjustment aligns the network with the requirements of the physical framework.
Taking a broad perspective while maintaining generality, let us denote by U := U (t, x) a collection of physical variables that serve as the output of the neural network.These variables, along with their derivatives, are the components of a system of PDEs defined within a domain of interest Ω over a specific time interval [0, T ].Consequently, it is possible to write the following definition: where D corresponds to the spatial dimension of the problem and Ω ⊂ R D .The operator F defined in Equation ( 2) can be conceptualized as the comprehensive collection of physical constraints inherent to the system, which must be fulfilled in order to satisfy the underlying PDEs.It is worth noting that, in addition to these constraints, supplementary limitations can be established, such as the initial conditions that dictate the evolution of the system, or potential boundary conditions that may influence the behavior of the system at the spatial boundaries of the domain, namely, In addition to the aforementioned conditions, other factors can be taken into consideration, such as imposing additional constraints on the final time (final conditions), or at specific points of significant physical significance within the spatiotemporal framework.Furthermore, if there are actual experimental measurements available for a subset of the domain, they can also be incorporated.Consequently, each of these physical conditions represents a segment of a priori knowledge regarding the physical scenario that can be integrated into the cost function as separate terms (referred to as "soft enforcement"), as denoted with L i in Equation ( 5): where L F corresponds to the metric pertaining to the underlying system of PDEs, while the collection (ω F , ω i , . ..) represents the weights assigned to each term within the mixture.The neural architecture employed in this methodology yields a set of essential physical variables, denoted as U (t, x; Θ) := U Θ (t, x), where Θ encompasses all the trainable parameters of the network that are updated during the training process.Consequently, the output aims to closely align with the corresponding real-world values: The constraints expressed in ( 3) and ( 4) can be transformed into cost functions by employing a suitable difference measurement metric, such as Mean Squared Error (MSE) or similar approaches.The determination of these cost functions together with L F can be outlined as follows.
Here, the set (N IC , N B , N F ) represents the number of points in the sample for each respective domain considered.Additionally, R k denotes the fulfillment of specific physical constraints imposed at the level of the PDE.Through the utilization of the fundamental methodology employed in PINN models, it becomes feasible to smoothly enforce the imposed constraints by adjusting the PDEs associated with the given problem.Nonetheless, alternative approaches, such as the "hard enforcement" method [61], propose compelling the neural network to enforce predetermined constraints from the onset of training by modifying the output of the network.This technique necessitates incorporating several constraints, but it entails establishing a specific dependence on the problem at hand.Other researchers achieve a certain level of independence concerning the set of weights (ω R , ω i , ...) in ( 5) through the application of the Augmented Lagrangian method [62].This technique involves updating multipliers with corresponding names during training in accordance with the degree of violation for each respective condition.

Physical background
Combinatorial and optimization problems are of great interest in many industry and research fields [23,46].Adiabatic Quantum Computing (AQC) algorithms are used to solve this kind of problems and they are expected to outperform classical computers in the current NISQ era [63,64].In this paradigm, we can prepare a Hamiltonian in an initial ground state, driving or mixing Hamiltonian, and evolve it towards a desired final operator, whose ground state encodes the solution to the underlying problem, or it can also be the solution itself [43,46].This initial or driving Hamiltonian operator should be easy to prepare and evolve.We shall commence by establishing the Hamiltonian operator associated with the adiabatic transition of the system, H AD (t), which is characterized by its energy eigenvalues, E n (t), and corresponding eigenstates, |n(t)⟩, as determined by: A time-dependent Hamiltonian, as the one defined in (9), generally could lead to modifications in the quantum states it governs.Nevertheless, when these changes are sufficiently minute, analytically tractable, and under controlled conditions, the adiabatic theorem [32] ensures that the system, assuming non-degeneracy in energy, will preserve its proximity to the initial energy state throughout its temporal evolution.Considering the aforementioned perspective, it is consistently feasible to formulate a Hamiltonian operator, denoted as H(t), that exhibits a direct correlation with H AD (t) and accurately reproduces the temporal progression of its eigenstates.In other words, it possesses transition amplitudes between energy levels that are precisely zero [42].This operator will be designed to satisfy the following Schrödinger equation: where |ψ n (t)⟩ can be defined in terms of the corresponding eigenstate of the operator in (9), |n(t)⟩.These states represent each one of the n states driven by H AD (t) in a certain particular system.Through rigorous derivation and computation, one can obtain a plausible analytical representation for the operator H(t).For a detailed deducement, the interested reader is encouraged to refer to the works of [42,65,66].This operator, defined in (10), effectively governs the evolution of energy states within the framework of H AD (t), ensuring the absence of transitions between them.
The Hamiltonian operator H CD (t) corresponding to the second term in (10), can be interpreted as the operator responsible for capturing the counterdiabatic effects during system evolution.These effects facilitate the acceleration of the underlying system dynamics by introducing additional accessible degrees of freedom allowing to reach the same results as the entire adiabatic evolution of the system and eliminating any possible transition between eigenstates [64,67], as previously stated.These frameworks, which are the well-known as counterdiabatic Protocols, effectively expedite the adiabatic processes by introducing novel terms that precisely offset any excitations that may arise during system acceleration.These have been recently developed as part of the STA methods.With this in mind, it is feasible to extend the theoretical deduction by incorporating a set of time-dependent external parameters, denoted as λ(t), upon which all our operators will be dependent [44].By doing so, when retrieving Equation (10), it follows that the temporal derivatives of the |n⟩ states can be written as follows, Equation (11) In general, this parameterization could encompasse a collection of values λ := (λ 1 , ..., λ N ).However, to align with the latest literature in the field, we will confine ourselves to a single scalar function.Consequently, λ(t), which usually receives the name of scheduling function in the literature, carries out the counterdiabatic driving by terms of its temporal derivative.Indeed, it is evident that in the limit of zero velocity, dλ dt → 0, the Hamiltonian in Equation ( 12) simplifies to the adiabatic operator, as anticipated during its construction.When extrapolating this understanding to contemporary digitized versions of algorithms within the realm of quantum circuits [43], it is customary to initiate the process with a Hamiltonian which presents a ground state of energy that can be readily prepared in practical settings, denoted as H initial .Under adiabatic conditions, it is imperative for this operator to undergo a gradual transformation over time, eventually converging to the target Hamiltonian corresponding to the specific problem under investigation, written as H problem .
In Equation ( 13), the scheduling function can once again be identified.Within the time interval t ∈ [t min , t max ], it is necessary for λ(t) to fulfill the conditions λ(t min ) = 0 and λ(t max ) = 1 based on its functional definition, which enables the interpolation of the process from the initial Hamiltonian to the desired endpoint of the process.
On the other hand, the A CD (t) operator (also written in the literature as A λ ) is defined as the Adiabatic Gauge Potential.
Obviously, this operator should fulfill the hermiticity condition, i.e., A CD = A † CD , in order to be interpreted as a physical observable.It is also easy to show that the potential A CD (t) satisfies the condition (15), which is equivalent to minimizing the action S defined in Equation ( 14) for the Hilbert-Schmidt operator G λ [44,68].Consequently, Equation ( 15) can be understood as the Euler-Lagrange equation resulting from the minimization of the physical action.
This term should establish a connection between the aforementioned external parameter λ(t) and instantaneous eigenstates.Finding this exact CD terms is not easy without spectral information of the system and they are usually approximated using the NC approach [23,43,46] or through Variational Circuits [63].This lead to a set of possible local CD terms and different techniques that have been developed to determine which of them are the most adequate for the particular problem.Nonetheless, these methods even though they are of great interest and may constitute the state-of-the-art approaches, they are still approximations.Consequently, there could be other relevant terms and aspects that are not considered within these methodologies.

Quantum circuit design for the H 2 ground state problem
The main numerical application of this paper will be to find the ground state of the H 2 molecule in the STO-3G basis assuming different bond distances where, in particular, this STO-3G basis corresponds to a minimal set that uses three Gaussian functions to approximate the atomic orbitals [69].This can be described with the 2-qubit Full Configuration Interaction (FCI) Hamiltonian in Equation ( 16), where the value of the coefficients vary with the bond distances.This Hamiltonian has been obtained using the well-know Qiskit module [70].In a different domain, it is well-known that the Pauli matrices comprise a set of Hermitian and unitary matrices, namely {σ X , σ Y , σ Z } ∈ M 2×2 (C), which are highly suitable for representing quantum operators and physical observables.These matrices, together with the identity matrix I (also called σ 0 ), form an orthogonal basis of the Hilbert space of 2 × 2 Hermitian matrices.Similarly, when dealing with a system of N Q qubits, the matrix space of Hermitian M 2 N Q ×2 N Q (C) matrices can be decomposed by means of tensor products involving the aforementioned set (the interested reader is referred to [71] Chapter 2 and [72] Chapter 3).This procedure, referred to as the Pauli basis expansion, enables us to achieve a comprehensive representation of any Hermitian operator on a system comprising a certain amount of qubits.From the perspective of quantum circuits, this approach offers a structured and concise depiction of quantum operations and physical observables, thereby facilitating the analysis and manipulation of quantum systems.
Z + c 2 σ (1) Z ⊗ σ (2) X + c 5 I (1) ⊗ I (2) .( 16) This corresponds to H problem in Equation (13).In this context, the numeric superscripts enclosed in parentheses mean that the written operator pertain to the specific qubit under consideration thereby resulting in distinct matrices associated with each one.The symbol ⊗ denotes the Kronecker product.Furthermore, the first and last coefficients are written separately with the same operator since they correspond to different atoms in the molecule, even though they have the same interaction.The FCI Hamiltonian in Equation ( 16) can also be written in matrix form Equation (17).
As detailed before, we start from a driving Hamiltonian that should be easy to prepare.For this particular problem, the Hartee-Fock (HF) approximation Hamiltonian ( 18) is used, with its matrix form written in Equation (19).It is easy to see that both the FCI and the HF Hamiltonian are real-valued since they are composed only with the identity matrix, I, and the Pauli matrices σ X and σ Z .
The HF method approximates the ground state of the system using a single Slater determinant.The contributions of both the initial and final Hamiltonian operators in the adiabatic evolution can be determined using advanced numerical algorithms [70].As stated before, the primary objective is to solve the adiabatic temporal evolution while considering non-adiabatic terms.The goal is to accelerate this evolution without allowing unexpected transitions between energy states.By incorporating the adiabatic operator defined in Equation ( 13) into the total Hamiltonian operator described in Equation ( 12), and considering that the initial and final Hamiltonians are represented as shown in Equations (16)(17)(18)(19) with H initial := H HF and H final := H FCI = H problem , we can establish the following PDE,

Related work
The primary objective of CD driving processes is to attain a gauge potential, A CD (t), by expediting the adiabatic process through a temporal velocity parameter, specifically the time derivative of the scheduling function, dλ dt .The aforementioned operator possesses the property that the product dλ dt A CD in Equation ( 20) fully mitigates the occurrence of non-adiabatic transitions that would otherwise manifest in a specific eigenstate |n(t)⟩ under the total H(t), which has undergone evolution from the initial state |n(t = 0)⟩ of the control Hamiltonian, H AD (t) [73].Therefore, the aim of the related methodologies is to ensure precise compensation of the adiabatic terms in the evolutionary dynamics within the moving frame [40].This potential, comprising the non-adiabatic compensations, represents an operator that is typically intricate to derive.While an exact numerical construction is possible for certain systems with a small number of particles, resolving it for systems with numerous bodies (a substantial number of qubits) becomes exceedingly challenging, if not impossible, due to the requirement of diagonalizing the resulting Hamiltonian observable across the entire Hilbert space.Furthermore, A CD often entails the emergence of highly intricate and non-local couplings, thus limiting its implementation to specific cases [74][75][76].Part of the research focused on a robust obtaining of the potential A CD has led to works such as [77] where fast-forward methods are presented through which it is possible to obtain an objective wave function in a shorter time.This investigation is conducted within the field of microscopic quantum mechanics, aiming to delve into the macroscopic domain primarily governed by the Schrödinger equation.These inquiries are expounded upon in references such as [36,78].Despite these notable progressions, there presently exists no viable approach to extend these studies to the context of many-body systems.Consequently, these methodologies are not applicable to complex problem sets.On the other hand, several recent studies such as [63] have suggested alternative approaches for addressing the aforementioned issues.The authors suggest to employ Variational Circuits (VC) in conjunction with classical optimizers to optimize the choice of CD terms, with these being treated as trainable parameters, being tested on the specific case of an Ising model of interactions with nearby neighbors and carrying out a comparison with some of the state-of-the-art QAOA methodologies [23,79,80].Some assumptions are made regarding the form of the function λ(t) thereby predefining a temporal parameterization, which may exhibit certain dependencies on the problem under investigation.
Returning to the latest approaches to derive A CD (t) as described in the existing literature, based on its definition in Equation ( 12) and assuming that the parameterization is completely determined by the function λ(t), by differentiation of Equation ( 9) [42], it is straightforward to arrive at However, it can be readily observed from Equation ( 21) that determining the operator A CD (t) can become highly intricate and computationally expensive, especially when dealing with many-body problems.This potential necessitates exact diagonalization of the operator H AD (t) over time and, additionally, the difference in energies E m − E n can introduce complications, leading to divergent and mathematically ill-defined matrix elements.In an attempt to address these challenges, the authors in [45] suggest employing the so-called method of NC for approximating the representation of A CD (t).
Equation ( 22) presents a methodology for obtaining an approximate numerical approach for the operator A CD .In this expression, l represents the order of the expansion, and the set of coefficients {α k } l k=1 can be determined by minimizing the action in Equation ( 14) specifically tailored for the l-th order.For further details and a comprehensive demonstration, interested readers are referred to the reference, where it is demonstrated that the exact potential A CD is recovered in the limit as l tends towards infinity.Notwithstanding its utilization as an approximation limited to an expansion order l, this ansatz has proven to be a methodological framework for attaining cutting-edge outcomes in the field of research [23,43,46,81,82].In the absence of an analytical solution to the problem of the CD protocol, the most compelling alternative at our disposal is to employ techniques rooted in DL, which possess the capacity to comprehend and acquire knowledge of the fundamental physics governing the problem.

PINN framework
Our approach involves employing a Physics-Informed Neural Network (PINN) that incorporates a neural network structure comprising multiple fully connected dense layers, with the total number of layers denoted as K.This includes both the input and output layers.Each layer consists of a variable number of neurons, represented by N k , and is characterized by a common output activation function denoted as σ k .This activation function remains consistent across all neurons within a given layer after it is specified.Let W (k) ∈ R N k ×N k−1 be the weight matrix connecting the (k − 1)-th layer to the k-th layer, and b (k) ∈ R N k be the bias vector for the k-th layer.Consequently, the output of the k-th layer, denoted as U (k) Θ , can be expressed as the application of the activation function σ k to the weighted sum of the output of the previous layer, followed by the addition of the biases, as denoted in Equation ( 23): In this manner, contingent upon the specific problem under consideration, additional network parameters beyond weights and biases may be taken into account.Nevertheless, if only these two sets are considered, the trainable variables would be limited to those typically found in a conventional network, denoted as Θ := {W (k) , b (k) } 1≤k≤K .Regarding the activation functions, particularly those employed at the output layer, they need to be tailored to suit the requirements of the specific physical problem at hand.For instance, certain physical variables may exhibit limitations within a defined range of values.An example of such a case is the scheduling function as described in (20), which is constrained to the interval [0, 1] by definition, or the velocity of a specific fluid within relativistic scenarios which is subject to an upper limit that cannot surpass the speed of light [57].In the context of our specific problem, the sole independent variable is time, thereby allowing us to exclude spatial dimensions as inputs to the neural model.Consequently, by considering (23) and the time interval t ∈ [0, T ], we can express the ensemble of output variables of the architecture in the following manner: Equation (24) showcases the mathematical representation of the approximation proposed by the underlying neural network in our methodology.This approach aims to closely resemble the actual physical solution following the completion of the training process.In this context, the symbol • represents the composition operator.Recent research in the field asserts that PINNs enhance their performance by incorporating dynamic activation functions that vary with the training process and are distinct for each neuron [83].However, our focus lies primarily on establishing the definition of physical inductive biases within the network.Consequently, each layer k will be associated with an activation function denoted as σ k , which uniformly affects the output tensor of that particular layer.

Inductive biases and optimization
Our approach involves employing a methodology based on PINNs, which allows us to incorporate strong inductive biases and a priori knowledge into the neural network.This incorporation is intended to ensure that the underlying physics governing the problem is adequately satisfied.To achieve this objective, it is essential to consider that the output of the underlying network in our methodology will consist of a set of variables denoted as Here, λ ∈ R denotes the scheduling function, while represents the counterdiabatic terms of the evolution, and C ∈ R 4 N Q stands for the set of coefficients in which these counterdiabatic terms can be linearly decomposed attending to all the possible tensors that come from the Kronecker products of the possible combinations according to both the number of qubits considered and the set of identity and Pauli matrices, {I, σ X , σ Y , σ Z } (see [84] Chapter 2.1).In general, A CD is an operator composed of complex terms and will be of size being the number of qubits into consideration.Moreover, notwithstanding that the dependency may not always be explicitly stated, all variables emerging from the PINN are contingent upon the input time.Hence, the network yields solutions for each of the considered inference times.
The underlying neural network will be optimized using the so-called "soft enforcement" technique, as introduced in equations (6,7,8).We will adapt this methodology to our specific scenario of Hamiltonian dynamics, where the global cost function can be decomposed into multiple terms.Specifically, we will address the initial and final time conditions for an input time interval t ∈ [t min , t max ] as described in Equations ( 26) and ( 27), respectively.
In the definitions above, (ω IC,1 , ω IC,2 ) and (ω F C,1 , ω F C,2 ) are the weights of the mixture in the calculation of the total loss function and whose values will depend on the knowledge applied as well as on the problem being treated, while N IC and N F C represent the number of sample points at the initial and final instants, respectively.Regarding the scheduling function of the problem, denoted as λ(t), this function shall delineate the progression of physical states subsequent to the introduction of counterdiabatic terms as stipulated in Equation (20).At the initial time instant, we shall impose a condition where λ(t min ) equals zero, i.e., λ(t min ) = 0. Similarly, at the terminal time instant, we would prescribe that λ(t max ) assumes a value of one, i.e., λ(t max ) = 1, as per its formal definition [46].The aforementioned conditions outlined correspond to the physical limitations that are inherently necessary to be satisfied within our specific scenario.At the initial moment, denoted as t = t min , we enforce the scheduling function to be zero, while ensuring that the resulting Hamiltonian operator (20) is equivalent to the one obtained through the Hartree-Fock method (see [85] Chapter 2.2), defined as H(t min ) = H HF .Additionally, by incorporating counterdiabatic terms, our intention is to accelerate the adiabatic transition and reduce the computational complexity of the underlying circuits [23,43].This, however, does not impede our knowledge of the final Hamiltonian operator, denoted as H problem , which can be computed via advanced numerical methods in the chemistry field [70].These conditions, combined with the requirement that the scheduling function must be equal to one at the conclusion of the evolution, collectively constitute the final conditions mandated by the methodology.
After establishing the initial and final inductive biases explicitly, it becomes crucial to elucidate the physics governing the intermediate time periods within the interval.Upon acquiring all the aforementioned components, we possess the necessary elements to construct the complete Hamiltonian operator for the given problem, as delineated in Equation (20).However, for the purpose of coherence, we reiterate the expression here to avoid any disruption in the logical progression.
As it is well-known, the Hamiltonian operators in their general form are operators comprising complex numbers, which necessitate adherence to the property of Hermiticity.By satisfying this requirement, the operators can be appropriately interpreted as physical observables, enabling extraction of values from energy states that are purely real.Consequently, it is obvious that our operators should possess Hermitian properties.Furthermore, it is imperative to impose the condition that the neural network yields the solution for the Gauge potential (counterdiabatic terms) that achieves the utmost reduction in the physical action within the given scenario.This entails selecting the solution that results in achieving The minimization of the physical action represents a crucial requirement for ensuring that the employed methodology yields an optimal operator A CD (t) under specific conditions, encompassing local temporary development, robustness, availability, and experimental accessibility, among other factors.Research studies, such as [64] and related literature, demonstrate the impact of the action in recovering the Euler-Lagrange Equation (15).Consequently, demanding the neural network to minimize the action is entirely equivalent to defining the term associated with the physical loss, as described in (30).Moreover, it is well-established that the temporal rate of change of the scheduling function, denoted as dλ dt := λ, represents the velocity or rate at which the non-adiabatic components drive the evolution of the system in the presence of the total Hamiltonian operator.Consequently, when the derivative approaches zero, i.e., λ = 0, the conventional adiabatic Hamiltonian is recovered.However, it is undesirable for the counterdiabatic terms to greatly surpass the adiabatic counterparts, as their purpose is to expedite the process without exerting dominant influence.Hence, it is essential that the time derivative of λ remains small, yet not entirely nullified.This particular information must be communicated to the PINN through Equation (31).
As mentioned in Section 2.3, the set {σ 0 , σ X , σ Y , σ Z } ∈ M 2×2 (C) form an orthogonal basis of the Hilbert space of 2 × 2 Hermitian matrices.Consequently, it is both logical and well-founded to seek the computation of the operator A CD as a composite of tensor products involving the previously introduced Pauli matrices for a certain system of qubits.By doing so, it is possible to construct a linear combination of tensor products involving these matrices, yielding a set of coefficients denoted as C ∈ R 4 N Q .Each element within this set represents the relative magnitude of each term within the operator.Therefore, the decomposition expressed in Equation ( 32) would enable us to perform efficient simulations and facilitate a more accessible analysis of physical systems through the utilization of quantum circuit models.
In this study, we employ A ′ CD (t) to represent the non-adiabatic terms.This notation serves to distinguish this expansion, which takes the form of a linear combination, from the operator that serves as an output of the PINN.To achieve this objective, it is necessary to introduce an additional term into the loss function of the neural network, which directly affects the set of coefficients denoted as C(t), as shown in Equation (33).Consequently, these terms are dynamically adjusted during the training process in order to construct the decomposition of the Gauge operator.By employing the specified procedure and adhering to the prescribed requirement, our methodology exhibits the capability to yield these scalar quantities not only at the initial and final moments but also throughout the entire temporal interval.This is attributable to the fact that these scalars, in a general sense, are functions contingent upon time.
Once all the requisite components have been specified according to Equations ( 30), (31), and (32), it becomes feasible to establish the loss term for our PINN, Equation (34), which is solely linked to the underlying differential equations.In this context, the vector (ω Action , ω Adiabaticity , ω Coupling ) denotes the weights employed in the combination process when constructing the resultant term.Thus, by considering Equation ( 5) and acknowledging the pre-established mixture weights within our loss terms, the formulation of the final loss function, Equation ( 35), becomes straightforward.This incorporates the loss associated with the PDEs outlined in Equation ( 34), as well as the temporal constraints at the initial and final temporal steps, as indicated in ( 26) and (27).Consequently, the defined loss function serves as the physical metric that guides the optimization process, dictating the objective for the neural network within our methodology to minimize.
In addition, the network has not explicitly been required to satisfy the necessary condition of operators hermiticity even though it can be achieved by including an additional term in Equation ( 34) that minimizes the difference Nonetheless, such a restriction is not obligatory and would be redundant for the PINN.If the coefficients C ∈ R N Q are defined as real, i.e., C = C * , then it is evident from the decomposition of A ′ CD in Equation (32) that we recover A ′ CD − A ′ † CD = 0 naturally, without necessitating any additional requirements.Therefore, the physical condition expressed in Equation ( 33) is more than sufficient for the neural network to ensure the hermiticity of the operator A CD , and since [23], the hermiticity of the complete Hamiltonian operator is also ensured.
After considering all the relevant factors, a comprehensive description of our methodology can be provided.In order to accomplish this, we will utilize a visual aid in the form of a diagram (Figure 1), and a step-by-step algorithm outlined in Algorithm 1.The initial step involves identifying the independent variable(s) that will influence our outcomes.In our particular case, we only have the temporal variable, denoted as t, defined within the interval [t min , t max ], commonly specified as [t min , t max ] = [0, 1].Consequently, we need to select an appropriate sampling method.Various methods are available, including uniform sampling with equidistant points, random sampling, Latin hypercube sampling [86,87], and Sobol sampling [88].The choice of method is somewhat arbitrary, with greater impact when considering fewer time points, as its significance diminishes as the sample size approaches infinity.In particular, the Sobol sampling has exhibited significant advancements in prior inquiries within the relevant body of literature [89].In our investigation, we have employed this approach, which entails generating pseudo-random samples using powers of two.This technique results in a more homogeneous distribution of points within the space, with reduced overlap compared to completely random sampling.Nonetheless, it is important to acknowledge the inherent randomness involved in this approach.After generating the time domain, it serves as the input to the network, which will be constructed as a sequence of fully connected K dense layers, encompassing both the input and output layers.The number of layers and the number of neurons in each layer (N k ) are hyperparameters that will be predetermined prior to training.While these parameters can be adjusted, it is anticipated that a relatively conventional configuration, such as 6 or 7 layers with 30 to 40 neurons each, will suffice for characterizing the counterdiabatic driving problem.Nevertheless, the complexity of individual problems may necessitate varying numbers of layers and/or neurons.

Yes End
No Figure 1: The methodology follows a general procedure where time, represented by the variable t, is the only independent physical variable considered, incorporated into our network as a tensor.The output of the methodology comprises three elements: the scalar variable of the scheduling function denoted as λ(t); the operator representing the counterdiabatic terms of the process, expressed as ACD(t), with each of its components treated as an independent output; and the coefficients C(t) denote the general decomposition coefficients of the operator ACD expressed as a linear combination of tensors resulting from all possible Kronecker products formed between the set of operators comprising the identity matrix and the Pauli operators.Subsequently, the computations required to construct the total Hamiltonian, denoted as H(t), are carried out.Various physical constraints are imposed during these computations, including inductive biases.These constraints adhere to the principle of minimum action, satisfy the initial and final conditions, and ensure the hermiticity of the physical operators among other specifications.At each step of the training process, the total loss is calculated by aggregating the contributions from all imposed conditions, denoted as L. This calculation continues until a specific training period is reached or until a predetermined error threshold is achieved.
Algorithm 1: PINN algorithm for the counterdiabatic driving of a system of N Q qubits.
Input: Physical domain: t ∈ [tmin, tmax].Output: Set of variables produced by the PINN comprising the scheduling function, the operator controlling the counterdiabatic terms, and the coefficients of its associated Pauli decomposition, denoted as U Θ(t) := (λ, ACD, C)Θ.Specify the training domain: Generate the temporal interval t using Sobol sequence [88].Construct the underlying neural network as a densely connected architecture consisting of K layers, with a specific number of neurons per layer.Initialize the trainable parameters of the network, denoted as Θ = {W (k) , b (k) } 1≤k≤K , using the Glorot initialization method as described in the reference [90].
By employing the principles of automatic differentiation, compute the derivative dλ dt from the output and subsequently construct the operator H(t) in accordance with the expression (20).Construct the initial and final losses in time using the MSE metric as LIC (26) and LFC (27), respectively.Derive the comprehensive set of physical loss terms associated with the differential equations, L, incorporating considerations for the Principle of Least Action, adherence to adiabatic conditions, the decomposition in terms of Pauli tensor products and the hermiticity of the physical operators as a direct consequence.Subsequently, formulate the aggregate physical loss as follows: Update the set of network variables by backpropagation, {W (k) , b (k) } 1≤k≤K , that minimizes the total loss using a certain defined optimizer: Θ * := arg min (L(Θ)) .
Repeat steps 3-6 for the required number of epochs until the convergence is as desired or a maximum number of iterations is reached.
With the establishment of the input domain and the construction of the network, we are now capable of extracting the output and interpreting it as the tensor of physical variables denoted by U Θ (t) = (λ, A CD , C) Θ , where λ ∈ R, Subsequently, the derivative dλ dt can be straightforwardly computed using automatic differentiation, getting the Hamiltonian operator H(t) as described in (20).Furthermore, the initial and final temporal constraints, defined in equations ( 26) and ( 27), are necessary.These constraints are accompanied by the calculation of the loss associated with the underlying PDEs stated in Equation (34), which incorporates the physical constraints outlined in equations ( 30), (31), and (33).These physical restrictions represent the fulfillment of the Principle of Least Action, the agreement with the adiabaticity, and the decomposition of the operator A CD (t) as previously explained.By combining all these components, the final loss metric L given in Equation ( 35) is computed, and the set of trainable variables Θ is updated via backpropagation, minimizing the loss through an optimization process.

Numerical experiments and results
In this section, various numerical results obtained through our DL-based methodology are presented, as outlined in Section 3.This approach has been applied to address the H 2 molecule problem within the STO-3G basis, considering different bond distances between the particles and utilizing a 2-qubit representation.The initial and final Hamiltonian operators used for the evolution, as denoted by the PDE in (20), are listed in Table 1 and were configured following the guidelines in [70].Since our numerical example involves a 2-qubit representation, these operators will possess a matrix size of 4 × 4. Furthermore, it is pertinent to note that these operators are real-valued by definition, i.e., H HF , H problem ∈ M 4×4 (R).In a general context, and unless explicitly stated otherwise, all conducted trainings comprised a total of 500,000 epochs (iterations) dedicated to updating the Θ parameters of the used PINN.The training procedure utilized the PyTorch module [91], employing the Adam optimizer [92] with a fixed learning rate of 10 −5 .This learning rate was chosen to be sufficiently small, ensuring convergence without excessive numerical noise.Throughout all examples, the sampling method used has been the Sobol sampling [88] as stated in Section 3. The number of sampling points at the initial and final time instances of the evolution, denoted as N IC and N F C respectively, has remained constant at 2 13 .However, the number of points in the inner part of the interval, which is linked to the underlying PDE and represented as N F , was subject to variation and examination in the experiments.Moreover, if not otherwise specified, the neural architecture used for all the examples consists of six hidden layers with 30 neurons each.
Table 1: Numerical configurations for the initial Hamiltonian operator, denoted as H HF , and the final Hamiltonian operator, denoted as H problem , obtained using the quantum computing library Qiskit [70].These configurations correspond to the evolution of the molecule H 2 in the STO-3G basis and are represented by their respective matrix descriptions given in Equations ( 19) and (17), respectively.To illustrate initial general results, Figure 2 presents the physical loss functions employed for optimizing the neural network, as defined in Section 3. It is essential to emphasize that each training process would exhibit in general distinctive loss curves, influenced by factors such as the number of qubits used for representation and the specific molecular system being studied, including the considered bond distances between molecules, among other features.However, the figure showcases the cost function for the specific scenario under investigation: the H 2 molecule, represented by a 2-qubit system in the STO-3G basis, with specifications outlined in Section 2.3.The left subfigure displays the three constituents comprising the total loss L (35), namely, L F , L IC , and L F C , corresponding to the residual conditions of the PDE, the initial conditions over time, and the final conditions, respectively.The latter two diminish rapidly, converging to magnitudes on the order of 10 −4 or even lower, especially evident for L F C , where the error is so minute that any variation induces substantial noise on a logarithmic scale.Conversely, the residual loss L F (34) encompasses three internal terms, generating discernible tension among them, which impedes its reduction to such small orders of magnitude.This becomes more evident in the right subfigure, which provides a detailed breakdown of these terms.It is evident that the loss term responsible for minimizing the Principle of Least Action or, equivalently, the Euler-Lagrange equations (30), attains the lowest value, thus ensuring a highly satisfactory minimization of the action and, consequently, the attainment of a potential gauge A CD (t) that adheres closely to its prescribed guidelines.Additionally, the graphical representation includes a loss term denoted as L Hermiticity , which assesses the quadratic discrepancy between A CD (t) and its corresponding operator, defined in a similar way to how the other terms are.However, this factor is not incorporated in the total loss; instead, it serves as a representation allowing us to confirm that minimizing the loss L Coupling (33) through the decomposition of the potential gauge directly enforces its hermiticity, given the strict real-valued nature of the coefficients C(t) involved in the decomposition (32).Thus, for all the examples shown in this article, the mix weights considered for each of the individual loss terms have been preset as: where 26) and ( 27).These weights play a crucial role in determining the relative emphasis that the neural network places on each term, essentially representing their respective priorities.For our specific objectives, ensuring prompt and robust satisfaction of the initial and final conditions in time is of crucial importance.Consequently, we set ω IC and ω F C to significantly larger values than ω Action , ω Ad , and ω Coupling .Following the same thoughts, the minimization of the action is also essential in our methodology as it leads us to identify the most appropriate operator A CD (t) for the system [64], and we express it through its linear combination (32).Consequently, these two weights will be considerably higher, yet an order of magnitude lower than the first two.Particularly, ω Coupling is considered slightly higher, as this condition inherently encompasses the constraint on the hermiticity of the operator.Lastly, the condition of adiabaticity will be assigned the lowest weight, reflecting our intent to recover the limit of the adiabatic theory without permitting it to dominate the overall metric.On the left we illustrate the dynamic changes of the components contributing to the total loss, L, as defined in Equation ( 35).On the right side of the graph, each individual constituent of the loss, LF , is presented, corresponding to the different physical aspects under consideration.It is important to note that the loss term LHermiticity is included in the plot, although it remains undefined and unused throughout the training.This term quantifies the discrepancy between ACD and its adjoint operator but is solely provided for visualization purposes, tracking its reduction relative to LCoupling due to their mathematical equivalence.
We could continue our analysis of the right-hand subfigure by specifically putting our attention on the curve that corresponds to the loss of recovery of the adiabatic theory, denoted as L Adiabaticity (31) and represented in a solid black curve.This loss exhibits a marginal decline, commencing from a value proximal to zero at the onset of the training process.Subsequently, it rises to approximately 10 2 before descending to around 10 0 and eventually maintaining a constant level of magnitude for the duration of the evolution.The physical reason behind this phenomenon is clear: while the theory necessitates a decrease in the adiabatic speed, dλ dt , the PINN fails to recognize the appropriateness of taking it to values lower than what is observed in our results.This is primarily attributed to the presence of multiple terms within the loss function, not solely related to adiabaticity recovery, thereby creating inherent tension in defining the physical problem as a whole.Consequently, the solution predicted by the methodology does not significantly benefit from further reductions in the adiabatic speed.The fast saturation of adiabatic recovery is the main cause of the considerably elevated value for L F observed in the left subfigure, underscoring the importance of isolating and analyzing each term separately.Thus, a zoom has been applied to L Adiabaticity between 30,000 and 50,000 iterations, a range during which the λ function transitions from a sigmoidal to a linear behavior.Consequently, it assumes a temporal derivative of 1, i.e., dλ dt = 1.A more comprehensive discussion on this topic will be presented in the following section.It is important to note that unless otherwise specified, all the results shown in this section have been carried out for the system formed by 2 qubits representing the H 2 molecule in the STO-3G basis, following the guidelines of Section 2.3.

Scheduling function, λ(t)
The output of our methodology is triple.Firstly, it enables the retrieval of the scheduling function λ as a time-dependent variable.Secondly, it facilitates the extraction of the matrix components (i, j) of the potential gauge A CD (t), considering also the temporal variable, and it also enables the obtaining of the C(t) components of its decomposition through time.This achievement is made possible by the utilization of a cost metric defined on the underlying PDE comprising three distinct terms, as stated in Equation (34).Each of these terms compels the neural network to update one of the outputs while adhering to the imposed physical constraints.In particular, the term L Adiabaticity plays a crucial role in enforcing the adherence of the function λ(t) to the physical evolution, thereby recovering, in the most accurate approximation, the adiabatic theory that has been empirically demonstrated to hold true.Among all the terms taken into account within the total cost function, the neural network exhibits the lowest attention towards fulfilling the requirement of recovering the fully adiabatic theory, L Adiabaticity .As illustrated in Figure 2, our PINN ceases to actively optimize this particular term after approximately 50,000 iterations.Consequently, the scheduling function λ(t) converges to an optimal solution wherein its time derivative, representing the adiabatic velocity, maintains an approximately constant value of 1 throughout the temporal evolution.As a consequence, the optimal form of the λ function predicted for the system is the one that adheres to the condition λ(t) = t for the evolution, as governed by the counterdiabatic differential Equation ( 20).This phenomenon is exemplified in Figure 3, where we show the values of λ(t) and its temporal derivative in the left and right subfigures, respectively, for different training steps in both cases concerning the 2-qubit system representing the H 2 molecule.As observed, even after 15,000 and 25,000 epochs, the function maintains a sigmoidal shape, a widely employed representation in the existing literature [23,46].This sigmoidal form is considered appropriate from a theoretical perspective due to its derivative taking the form of a "bell curve", facilitating the adiabatic terms A CD to present higher values at intermediate values during time evolution while effectively being turned off at the initial and temporal instants.It should be noted, however, that this sigmoidal shape for λ emerges predominantly during the early phases of the training thereby helping in the driving of the convergence of the methodology towards a more optimal solution.From a theoretical point of view, our ultimate goal is to recover the fully adiabatic theory while incorporating the non-zero presence of counterdiabatic terms.Consequently, our neural network converges towards a function λ that precisely matches the temporal variable t.This outcome signifies the restoration of the original formulation of counterdiabatic driving, as explained in [42], thereby undoing the temporal parameterization of physical operators through a specific set of parameters λ(t), which, in our case, corresponds to a single scalar parameterization.Through this procedure, our objective is to begin with a differential equation containing non-zero counterdiabatic terms and then strive to recover the adiabatic theory in the limiting case.By doing so, we can obtain all the necessary results in accordance with the theory of adiabaticity.In Figure 4, we present the profiles of λ(t) and its time derivative for the counterdiabatic (CD) protocol from Equation ( 20) on the left, while on the right subfigure, we depict the same results but without considering the presence of counterdiabatic terms, i.e., working directly with the adiabatic theory (13).It is evident that for both cases we obtain λ(t) = t, except for some numerical noise at the edges of the time interval, which is directly related to the nature of the automatic differentiation process [60,93,94].Notably, during the initial stages of training, the neural network capitalizes on the sigmoid-shaped λ(t), while simultaneously adjusting other physical conditions.This aids the network in achieving a more optimal solution in recovering the adiabatic theory.

Temporal evolution of the H(t) operator and the energy levels
From the three outputs obtained from the network (see diagram in Figure 1), all the necessary information can be derived.As an initial step, considering the potential gauge A CD (t) that minimizes the physical action of the system, the total Hamiltonian operator H(t) can be computed.The time evolution of all its components is illustrated in Figure 5. Notably, where N Q represents the number of qubits, implying that for a 2-particle system, both H(t) and the remaining operators will have a matrix size of 4 × 4. In both depictions, the inherent hermiticity of the observable is evident; the real part of the components exhibits symmetry with respect to the diagonal, while the imaginary part displays complete antisymmetry, leading to diagonal elements being as close to zero as possible (around the order of 10 −3 ∼ 10 −4 ).This condition can be further reinforced by increasing its relative weight, ω Coupling .In the mentioned figure, the real and imaginary parts of distinct components of the operator are depicted within the same graph, distinguished by black and blue colors, respectively, and with individual scales.By considering the fulfillment of hermiticity for H(t), we can now extract its instantaneous eigenvalues, thereby obtaining information about the energy levels of the 2-qubit physical system representing the H 2 molecule with which we are currently conducting our investigation.The operator H(t), as defined in Equation (20), is specifically designed to yield the eigenstates |n(t)⟩ of H AD (t) exactly over time.It ensures that no transitions between energy levels, E n (t), are possible [42].This property holds for all eigenstates of H AD (t), allowing us to interpret the set of states |n(t)⟩ as "moving eigenstates" of the total operator H(t).Consequently, we can extract the energy levels corresponding to H(t) throughout the entire time evolution and compare them to the energy levels obtained under the assumption of a completely adiabatic transition, i.e., considering dλ dt = 0. Figures 6 and 7 depict the real and imaginary components, respectively, corresponding to two distinct scenarios considered.The first one employs the CD protocol and is shown in the top row, while the second scenario involves a totally adiabatic transition and is displayed in the bottom row.These investigations were conducted for the H 2 molecule.To broaden the scope of our study, we examined different bond distance values between particles, specifically d ∈ {1.0, 1.5, 2.0, 2.5} Å.The numerical values of the corresponding initial (H HF ) and final (H problem ) Hamiltonian operators are described in Table 1.Notably, Figure 7 reveals that the imaginary part of the eigenvalues (energies) obtained is on the order of 10 −3 , indicating their proximity to zero.As such, these values have negligible influence on observable analyses.However, it is essential to recognize that this outcome directly arises from the hermiticity of the physical observable.Furthermore, by adjusting the weight ω Coupling , it is possible to further fortify the enforcement of this property.Moreover, in view of the formulation of the Hamiltonian operator under the entirely adiabatic scenario, H AD (13), and the consideration of the initial and final operators as detailed in Table 1, it is important to note that the scalar nature of the function λ(t) ensures the absence of complex numbers.As a result, the imaginary component of the energy levels in the completely adiabatic case is strictly zero, as evidenced in the bottom row of Figure 7.
Regarding the real component of the energies, which is of primary interest, it is observed that in the case of a fully adiabatic transition, these energies demonstrate nearly linear temporal evolution, with diminishing separation as particle bonding increases.The close proximity of energy levels leads to an unfavorable outcome wherein transitions between them become more likely for the system.However, such a challenge is addressed in the CD protocol, wherein energy levels tend to be notably more separated throughout the entire evolution domain.This phenomenon becomes more noticeable for the energy ground state, E 0 , and the first excited level, E 1 .Notably, when the bound distance (d) is set at 2.0 Å and 2.5 Å, it is evident that these two energy levels remain substantially distant, especially at the initial stages of the evolution of the system.This occurrence is highly desirable from an experimental perspective, as it aims to minimize the probability of transitions between energy levels, especially between the ground and the first excited state, given that the system is initially prepared in the E 0 level.Figure 7: Time-dependent variation of the imaginary component of the eigenvalues En(t) investigated for the molecular system H2, represented by a 2-qubit configuration in the STO-3G basis.The computational analysis encompasses two scenarios: one employing the CD protocol in the top row, and the other considering a purely adiabatic transition in the bottom row.In the former case, the imaginary components exhibit magnitudes on the order of 10 −3 , whereas in the latter case, these are precisely zero, as dictated by the definition of the underlying PDE (13).

A CD (t) operator and its decomposition
Our methodology enables us to directly obtain the components of the potential gauge, denoted as A CD (t).Consequently, we are capable of visualizing the temporal evolution of both its real and imaginary components.Analogous to the presentation of H(t) in Figure 5 above, we display the temporal evolution of the corresponding operator A CD in Figure 8 of this section.We differentiate their respective real and imaginary components on two distinct scales.Observing the plot, it becomes evident that as a physical observable from an experimental standpoint, the real part of the operator, Re (A CD ), exhibits complete symmetry, while its imaginary part, Im (A CD ), is fully antisymmetric.This property comes directly from the natural hermiticity of the operator, which, in turn, is imposed indirectly by the condition L Coupling (33), together with C(t) ∈ R 4 N Q .
While exploring the components of A CD (t) holds theoretical significance, our primary focus from an experimental point of view lies in obtaining the set of coefficients C(t) over time.These coefficients play a crucial role as they enable the gauge potential to be expressed as a linear combination of all the possible combinations and interactions between the qubits of the system, thereby allowing a better implementation in a real quantum circuit [23,46].The theoretical formulation of this decomposition is represented by Equation (32), wherein the potential is expressed using the required Kronecker products.This formulation takes into account all possible combinations for the 2-qubit system under current investigation.Given the relatively small size of our example system, it remains feasible to consider all the possible combinations of the Pauli matrices as the number of these combinations scales with 4 N Q , being N Q the number of qubits.Nevertheless, in certain experimental scenarios, it may be unnecessary to explore all combinations and interactions.In such cases, specific specializations and additional requirements can be applied, guided by the methodology we present.
In Figure 9, we present the temporal evolution of the coefficients derived from the decomposition of the H2 system on the STO-3G basis, as a function of the bond distance (d) between the particles.The upper row illustrates the evolution itself, while the lower row displays a bar chart presenting the specific coefficients arranged in descending order based on their average values throughout the observed time interval.
This visualization enables us to identify the most significant contributions in terms of the absolute value when implementing this system in a real quantum circuit.Notably, the two coefficients that exhibit the highest values are denoted as C XY and its symmetric counterpart C YX .These findings align with previous literature that employed the NC methodology [23,46].Our approach naturally reveals these prominent contributions, facilitating an explicit understanding of their respective orders of magnitude.Additionally, there are less prominent contributions, such as C ZZ , C XX , and C II , which warrant consideration as well.The determination of these coefficients is universally applicable, i.e., it emerges as an outcome of implementing our methodology for any system, irrespective of the number of qubits considered.Nevertheless, as previously commented in the preceding section, certain specific instances may arise in experimental scenarios where it becomes impractical to account for all possible contributions in the decomposition of A CD (t).In such circumstances, it would be interesting to adapt the method and restrict the output C(t) of the neural network to a subset of the entire set, which becomes particularly relevant when dealing with an increasing number of qubits.Consequently, a trade-off between purely theoretical outcomes, exemplified herein, and those of particular experimental significance may always exist.

Scalability
So far, we have conducted a study on a 2-qubit system representing the H 2 molecule in the STO-3G basis.However, it is straightforward and easily feasible to modify the base or adjust the number of qubits used in our methodology.This process primarily involves examining the matrix dimensions of the respective network outputs.Specifically, the matrices H, A CD , H HF , H problem are elements of the matrix space M 2 N Q ×2 N Q (C), while C(t) belongs to R 4 N Q .Consequently, the number of components of the variables obtained may increase with the number of qubits, but it is important to note that the neural architecture for all calculations has consistently comprised six internal layers, each containing 30 neurons.This architectural choice is widely documented in the literature and has resulted in state-of-the-art outcomes across various domains of research [56,57,95].
In general, the scale and arrangement of the neural architecture of the PINN will depend on the specific problem being addressed, meaning that it is more or less directly related to the difficulty presented by the underlying PDEs.In our case, we are dealing with a single differential equation, which essentially defines H(t) in Equation ( 20) whereby involving only one derivative function.Consequently, our chosen neural network architecture efficiently addresses the problem of the CD protocol, as increasing the number of trainable weights usually does not get translated into better results.Moreover, such an increase could even negatively impact computation time and model convergence during the backpropagation process [96].Apart from the H 2 molecule, others can also be considered on the STO-3G basis such as lithium hydride, LiH, which can be represented using 4 qubits.The initial and final Hamiltonian operators of the process can be computed again using [70].
Discussing scalability within these methodologies holds substantial importance as it elucidates the extent to which an approach can be experimentally applicable.Undoubtedly, our DL-based methodology provides a wealth of information, as we have shown throughout the entire paper.However, when addressing the scalability of the method, we need an examination of two key factors: the number of qubits and the quantity of points encompassed within the time interval (N F ), in conjunction with the graphics processing unit (GPU) or hardware employed, as a whole.All computations considered employed an NVIDIA A16 card with a memory capacity of 16 GB.Consequently, we present, in Figure 10 (top row), a comprehensive analysis of the final physical loss (or final total residual) which marks the conclusion of network training concerning the number of points within the interval (t min , t max ) denoted as N F .This analysis encompasses diverse bond distances for the H 2 molecule (left) and a singular value of d for the LiH molecule (right), solely for the purpose of facilitating comparisons.The physical loss depicted in the top row of the figure has been normalized with respect to the minimum value observed for that particular magnitude across all training sessions considered in this section, encompassing both the H 2 and LiH simulations.This normalization, denoted as L min , allows us to represent the loss in a manner independent of specific training instances.An analysis of the results reveals that the general loss for the LiH case is marginally higher, although the values are within the same order of magnitude as the H 2 case.It is worth noting that both simulations employed an identical neural architecture comprising 6 internal layers, each containing 30 neurons.Furthermore, a common training regimen of 500,000 iterations (epochs) was applied to all cases.Consequently, a longer training duration would likely result in reduced final physical errors, owing to the common training approach.The consistency in architecture and training duration enables us to draw meaningful comparisons between both simulations and enables us to fairly evaluate their respective final performances.
On the other side, for the purpose of effectively quantifying the computational time involved in various training sessions and facilitating a meaningful comparison, it is imperative to consider external factors that could impact the simulations.These factors include the specific GPU utilized, available free memory in the CPU (as some calculations are delegated to the CPU), among others.In the lower section of the figure, we present a comparative analysis between the two molecules concerning the time consumed (∆T ), which is also normalized with respect to the minimum value obtained for this metric, denoted as T min .The results demonstrate that an increase in the number of data points, N F , generally leads to a multiplication of approximately 3.5 times in the compute time consumed, when transitioning from 2 7 to 2 14 data points, for the H 2 molecule.However, it is crucial to highlight that with only N F = 2 11 the final physical error L obtained is minimal.This observation indicates that augmenting the sampled time domain does not necessarily enhance the performance of the model.In other words, with this particular number of data points in the training domain, the PINN exhibits significant capabilities in making inferences, extrapolating information to new temporal instances, and interpolating.Consequently, the adoption of 2 11 data points implies a multiplicative factor of slightly more than 1.5 in comparison to the minimum elapsed computation time and it can be regarded as a favorable trade-off between training time and performance.
Concerning the training of the LiH molecule which involves 4 qubits, a significant increase in the required time is observed compared to the case of H 2 with 2 qubits.This discrepancy arises due to the consideration that the latter necessitates a decomposition of the potential gauge A CD (t) comprising 16 possible combinations, each represented by a 4 × 4 matrix.However, in the case of the lithium hydride, there are 256 possible combinations in total, each represented by a matrix size of 16 × 16.Consequently, both hardware memory and training time experience a considerable surge from one case study to another.It is important to note, however, that this increase in resources is essential for extracting all possible information from the problem.This includes the scheduling function, all components of the potential gauge at each time instant, as well as the instantaneous values of each coefficient of the decomposition.In practical situations, theoretical analysis often focuses on a subset of all the possible interactions of the qubits constituting the system.By reducing the number of interactions from 256 to a more manageable figure, the problem becomes more amenable to study.Under these circumstances, the primary contributors to the memory and computing time consumption are both L Coupling (33) and L Least Action (30).The former involves simultaneous manipulation of numerous matrices, while the latter involves performing two commutators.Moreover, both terms span N F defined points in time, which further adds to the computational complexity.

Discussion and conclusions
In this study we have shown that deep learning methodologies such as Physics-Informed Neural Networks (PINNs) can be used to tackle the problem of counterdiabatic (CD) driving for analog quantum computation, deviating from conventional numerical methodologies [45].Our proposed method is a generalized problem-independent methodology that offers a comprehensive solution, encompassing the determination of counterdiabatic terms and the temporal parametrization which empirically demonstrates that the adiabatic theory holds true.The suggested approach provides an unified and effective way to resolve the underlying physics of the problem while also giving the theoretical Pauli matrix decomposition needed to be implemented experimentally.Furthermore, using the CD approach allows to get a greater separation between the ground state, E 0 , and the first excited level, E 1 , throughout a significant portion of the temporal progression.This is a desirable property from an experimental perspective, as mentioned in Section 4.2.
Several questions may emerge from these findings.Firstly, an exploration of the computational capacity regarding the maximum qubit count achievable through PINN-based simulation is desirable, i.e., scalability.Secondly, there is an opportunity to enhance our methodology by integrating recent PINN advancements, including the incorporation of causality as an additional constraint and enabling the network to autonomously learn activation functions.Lastly, the restrictions of the theoretical Pauli matrix decomposition to encompass hardware-specific limitations introduce the prospect of improving the operational performance of analog quantum computers within experimental contexts.
Currently, all methodologies formulated within this context have been exclusively applied to systems comprising two and four qubits, as discused in Section 4.4.Indeed, our study reveals that the training duration of a PINN for a 4-qubit system is approximately 15 times greater than that required for the 2-qubit counterpart.However, it is imperative to acknowledge that, empirically, the possible permutations of gates and qubit interconnections are usually restricted.Hence, despite the potential exponential increase in training time for an N-qubit system, the imposed experimental limitations render it feasible to effectively train the methodology for a substantial quantity of qubits.
Aside from reducing the problem, it is also possible to improve the overall performance of our proposal by forcing it to respect the causality principle [97] further to restrict the time evolution of our Hamiltonian operators.Using this approach is sufficient to achieve substantial improvements in terms of physical accuracy.Therefore, it is feasible to reduce the number of temporal points used during the training process without altering the performance of the network.Moreover, implementing dynamically trainable activation functions may help improve performance and convergence time [83].
Furthermore, the physical losses presented in Figure 2 are around 10 −4 , thus underscoring the imperative to comprehend the attainable precision of a base PINN in the context of CD protocols.Enhancing the presented methodology could entail the imposition of temporal causal prerequisites and the optimization of the neural architecture.The incorporation of mixture coefficients (36) within the loss function is a predetermined selection based on inductive physical biases, thereby specifying greater weights for the initial and final loss components to steer the progression of the system.Other constituents associated with physical conditions encompass hyperparameters that can be selected through iterative experimentation.The said coefficients are amenable to alteration during the training process, potentially employing techniques such as the Augmented Lagrangian approach [62], which adapts them according to the deviation from each physical condition.Consequently, the presented approach offers opportunities for enhancing the achievements and mitigating physical losses, improving, if feasible, the robustness of the model from a physical perspective.
In conclusion, our work has shown that PINNs-based approaches are a promising methodology that can be effectively used to optimize CD protocols for adiabatic quantum computing.However, despite the substantial advancements achieved within this context, it is evident that ample opportunities for enhancement exist.In particular, it would be worth studying the impact of these results in digitized-counterdiabation quantum computing (DCQC) algorithms [98,99].The aforementioned questions stand as open paths for our future research, aiming to evolve and elaborate upon them.

Figure 2 :
Figure 2: Analysis of the evolution of the loss function during the training process.On the left we illustrate the dynamic changes of the components contributing to the total loss, L, as defined in Equation(35).On the right side of the graph, each individual constituent of the loss, LF , is presented, corresponding to the different physical aspects under consideration.It is important to note that the loss term LHermiticity is included in the plot, although it remains undefined and unused throughout the training.This term quantifies the discrepancy between ACD and its adjoint operator but is solely provided for visualization purposes, tracking its reduction relative to LCoupling due to their mathematical equivalence.

Figure 3 :
Figure 3: Analysis of the scheduling function λ and its temporal derivative for distinct iterations (epochs) of the training process.Observations reveal that during the initial stages of neural optimization, λ exhibits characteristics resembling a sigmoidal function.However, as the training advances, it converges towards a linear behavior dλ dt ≈ 1 .

Figure 4 :
Figure4: Evolution over time of the scheduling function λ(t) (in solid black) along with its derivative (in dashed blue) predicted by our methodology for the H2 molecule in the STO-3G basis set using the CD protocol on the left.On the right, the same is depicted for a fully adiabatic evolution according to expression(13).

Figure 5 :
Figure 5: The evolution of the real and imaginary components of the Hamiltonian operator H(t) for the H2 molecule is examined using a bond distance of d = 1.0 Å. Black and blue colors have been used for real and imaginary parts, respectively, using dashed lines for the latter and a different scale for each one.The findings reveal substantial fluctuations in the values across various time scales.Notably, the natural symmetry and antisymmetry for the real and imaginary components, respectively, arise due to the hermiticity of the operator.

Figure 6 :
Figure 6: Temporal evolution of the real component of the instantaneous energy levels, namely the eigenvalues En(t), describing the molecule H2 within a system of 2 qubits utilizing the STO-3G basis.These computations are conducted for diverse values of the interparticle bond distance, d.A comparative analysis of the energy levels is presented, showing the results obtained from the CD protocol (20) in the top row, juxtaposed against the levels obtained from the same methodology but with a fully adiabatic transition (13) in the bottom row.It is noteworthy that the energy levels demonstrate a tendency to exhibit greater separation under the CD protocol, a phenomenon that becomes particularly pronounced at d = 2.0 Å and d = 2.5 Å.

Figure 8 :
Figure 8: Time evolution of the real and imaginary components of the gauge potential ACD(t) for the H2 molecule using the STO-3G basis and a bond distance of d = 1.0 Å, represented respectively in black and blue colors using two different scales.It is observed that the values exhibit variations across different orders of magnitude.Notably, the natural symmetry and antisymmetry arises naturally in these components over time due to the hermeticity of the operator.

Figure 9 :
Figure 9:In the upper row, we present the temporal evolutions of the coefficients C(t) resulting from the decomposition of the operator ACD(t) (32) applied to the H2 molecule utilizing a 2-qubit configuration in the STO-3G basis.In the lower row, a bar chart illustrates the average values of each coefficient, arranged in descending order of magnitude.It is evident that both the coefficient CXY and its symmetric counterpart CYX exert the most substantial influence throughout the entire process, followed by CII, CZZ and CXX.

Figure 10 :
Figure 10: Graphical investigation of the scalability of our methodology for the H2 molecule in the STO-3G basis.Various bond distances of the hydrogen atoms are considered, represented by distinct colors as indicated in the legend to the right.The top side of the graph illustrates the total physical loss L (35) after completing the training process, plotted against the number of points NF considered within the domain t ∈ (tmin, tmax).On the bottom side of the graph, we present the time required to complete the entire training process, normalized to the minimum time.We conducted these experiments using an NVIDIA A16 GPU.