High-dimensional multi-fidelity Bayesian optimization for quantum control

We present the first multi-fidelity Bayesian optimization (BO) approach for solving inverse problems in the quantum control of prototypical quantum systems. Our approach automatically constructs time-dependent control fields that enable transitions between initial and desired final quantum states. Most importantly, our BO approach gives impressive performance in constructing time-dependent control fields, even for cases that are difficult to converge with existing gradient-based approaches. We provide detailed descriptions of our machine learning methods as well as performance metrics for a variety of machine learning algorithms. Taken together, our results demonstrate that BO is a promising approach to efficiently and autonomously design control fields in general quantum dynamical systems.


Introduction
Inverse problems pose some of the most intriguing yet difficult questions in the physical sciences as they start with an observed effect and require us to calculate its cause.This is distinctly different from the more common forward (or initial value) problem, which starts with the cause and subsequently calculates the effects.In the context of chemical/material systems, quantum optimal control [1] is one of the best-known examples of an inverse problem; it seeks to drive the system from a known initial state to a desired target state using a tailored external electromagnetic pulse, E(t).Constructing E(t) is central to providing critical initial conditions for time-resolved experiments including light-harvesting complexes [2][3][4][5][6], quantum information processing [7][8][9], and laser cooling [10][11][12][13].Controlling these phenomena could have ground-breaking implications in chemistry and materials science since it would allow manipulation of these systems at a time-resolved quantum level of detail.
Although several theoretical approaches and algorithms [14][15][16][17][18][19][20][21] have been used to construct optimal quantum control fields, most of them are iterative and require intricate numerical techniques to achieve convergence.The nonlinear nature of these dynamic optimization problems results in a large number of floating point operations [16,22], rendering them computationally intractable for high-dimensional quantum systems.To address these computational bottlenecks, we previously carried out the first 'conventional' machine learning approaches, including convolutional neural networks [23] and reinforcement learning [24], to solve quantum control problems of continuous systems (i.e.systems characterized by a continuous Hilbert space as opposed to finite-dimensional spin-1/2 systems).While these previous machine learning approaches showed some success, the training and validation processes in these approaches were quite computationally intensive.
To address these computational issues, we present the first high-dimensional Bayesian optimization (BO) machine learning approach for efficiently solving time-dependent quantum control problems in reduced-dimensional chemical systems.While Bayesian statistics have been used before [25][26][27], they have not performed BO, a particular approach to optimization.One notable exception [28] optimizes time-varying control; however, it focuses on how to adapt the noise model and considers controls with five degrees of freedom (compared with the 22 in this work).These dynamical time-dependent systems pose a unique challenge for machine learning techniques, and we investigate a variety of approaches for predicting optimal control fields, E(t), in these systems.We extend the existing sparse axis-aligned subspace BO (SaasBO) method to allow for multiple computational fidelities 1 to speed up computation.This novel method, multi-fidelity SaasBO (MFSaasBO), performs better than other computational approaches.
We first briefly outline the basic concepts of quantum control and the data inputs/outputs used by the machine learning approaches in our work.Next, we describe the details of our BO approach and compare its performance against conventional gradient ascent methods typically used in quantum control algorithms.Finally, we conclude with a brief discussion and perspective look at potential future applications and extensions of our machine learning approach.

Brief overview of quantum optimal control
Since the primary objective of this study is to harness BO techniques to control dynamic chemical systems, we only present a concise introduction to quantum optimal control.Interested readers are directed to relevant topical reviews [29][30][31][32] and our prior work [23,33] for further details.For chemical systems, the quantum optimal control formalism commences with the time-dependent Schrödinger equation for describing the temporal dynamics of nuclei, which, in atomic units is given by In the equation above, x signifies the reduced coordinate along a chosen reaction path [34][35][36][37], m is the effective mass associated with the molecular motion along this path [38,39], V(x) is the Born-Oppenheimer electronic energy of the molecule, µ(x) is the dipole moment function, E(t) is the time-dependent external electric field, and ψ(x, t) represents the probability amplitude for nuclear motion along the reduced coordinate path.Both V(x) and µ(x) can be obtained from a standard quantum chemistry calculation of a relaxed potential energy scan [40,41].With x and V(x) properly chosen or computed, equation ( 1) is an initial value problem that poses the question: 'Given an electric field E(t), how does an initial state, ψ 0 (x, t = 0), evolve after some final time T has elapsed?'However, as described in the introduction, quantum optimal control instead seeks the answer to the inverse question: 'To achieve a desired final state ψ N−1 (x, t = T) at time T (after N − 1 propagation steps), what functional form should E(t) take?'More precisely, quantum control seeks the functional shape of an external electric field, E(t), that maximizes the functional P [ψ N−1 , E] given by: where ψ f is a desired final target wavefunction provided by the user, and ψ N−1 is obtained after applying N − 1 consecutive propagation steps of the time-dependent Schrödinger equation (i.e.equation ( 1)).In short, equation (2) mathematically measures the similarity of the final target and the propagated wavefunction.Providing accurate and efficient answers to this inverse question is the ultimate goal of the BO approaches described in this work.

Bayesian optimization
BO [42,43] is an optimization process that finds the location x ∈ R d corresponding to the global maxima (or minima) of a function f : R d → R. It is predicated on the assumption that f is a computationally expensive black-box function.BO expedites the optimization of f by using all prior evaluations of f to select the next point x at which to evaluate f.BO tracks a posterior distribution over the function f conditioned on the evaluated points thus far, and it uses this distribution over the space of functions to select the next point to query the function with the goal of constraining the position of the maximum of the function as quickly as possible.
The process of BO starts by sampling the function f on a small subset of the input domain {x 1 , x 2 , . . ., x t }.These points are used to construct the conditional probability distribution P(f | f(x 1 ), f(x 2 ), . . ., f(x t )), also known as the surrogate model.This model function is much cheaper to compute than f and can estimate f (x) (and uncertainty about the value of f (x)) for unobserved values of x.
This surrogate model drives the choice of the next value of x to query; f (x) is computed for this new x, the surrogate is updated to include the new information, and the optimization process continues.
To select the next sample point x given the surrogate function, BO relies on an acquisition function that maps x to a heuristic value of the benefit of selecting x as the next point to evaluate.The general form of BO procedure is as follows: a. Choose a surrogate model for modeling the true function f and define its prior.b.Given a set of data or observations D t = {x i , y i } i =1,...,t−1 , where x i is the input chosen to be queried on the ith iteration, and y i = f(x i ) is the function evaluated at x i , obtain the surrogate posterior: P(f | D t ).c. Use an acquisition function α(x), which is a function of the posterior, to find the next point x t = arg max x α(x).d.Evaluate y t = f(x t ) and add {x t , y t } to D t and repeat the procedure from step b until the given budget (number of iterations or total computation time) elapses.

Gaussian processes
BO usually uses a Gaussian process (GP) [44] as its surrogate model.A GP can be defined as an infinite collection of random variables, indexed by x, such that the joint distribution of any finite subset of the variables is a multivariate Gaussian.The surrogate model assumption of a GP is denoted f ∼ GP(µ, k), where the parameters of the process are the mean function µ(x), and the kernel/covariance-function k(x, x ′ ).
Unless we have some knowledge about the properties of our function f, the mean function µ is typically assumed to be zero.The kernel encodes information about the prior distribution's smoothness: how similar in value (correlated) two function values f (x) and f(x ′ ) are based on the distance between x and x ′ .
A myriad of kernel functions are possible.The squared-exponential kernel or radial-basis function (RBF) kernel is most commonly chosen because it is infinitely differentiable and decays as a function of distance [45] where σ 2 is the variance of f (x) at any point x and ρ d is the inverse squared length scale associated with dimension d.Given any set of points {x 1 , x 2 , . . ., x t }, the corresponding function values {f(x 1 ), f(x 2 ), . . ., f(x t )} are jointly Gaussian: where N (m, Σ) denotes a multivariate normal distribution with mean vector m and covariance matrix Σ.
To model the predictive distribution )), we formulate the joint distribution as in equation ( 4) and condition it on all values except f(x t ), resulting in a normal posterior distribution over f(x t ) using standard multivariate normal distribution formulas.We denote the mean and standard deviation of

Acquisition function
In BO, the acquisition function is used to estimate the benefit of next evaluating f at point x.Common acquisition functions include expected improvement (EI) [46], probability of improvement (PI) [47], and upper confidence bound [48].PI chooses the next query point with the highest probability of being larger than the best point found so far, which often results in under-exploration and missing the global optima.EI was proposed as an alternative to PI that chooses the point with the highest Expected Improvement over the best point found so far.EI performs poorly when sampling from low fidelity evaluations as it does not consider correlation between different fidelities [49].The knowledge-gradient (KG) [50] acquisition function is similar to EI, but allows for the possibility that the new maximum of the surrogate model may not be at the newly chosen point.If the evaluation of f is noiseless, this is not the case and EI and KG are the same.
As the function f in our case is deterministic, we will mostly focus on utilizing the EI method.However, we revisit KG in section 2.6.The goal of EI is to choose the next query point where the EI over the current best point found so far is the highest.Let y + = max i y i be the best value found so far.The analytical expression of EI for a GP surrogate model is [51] EI where the expectation is with respect to the surrogate model's posterior distribution over the value of f (x), conditioned on all of the observed data so far.If the surrogate is a GP, the posterior of f (x) is a normal distribution and the analytic form for the EI can be written as a function of the mean and the standard deviation of the posterior distribution of f (x): where and Φ(.•) and ϕ(•) are the CDF and PDF of the unit normal distribution.When µ t (x) − f(x + ) (the difference between the expected value at x and the best value found so far) or the uncertainty of the value at the point x, σ t (x), is high, EI will be high.

High-dimensional Bayesian optimization
BO becomes challenging as the input dimension grows.Learning the probabilistic surrogate model (GP in our case) becomes exponentially complex as the number of parameters increases [52].Many methods have been proposed to address the problems of BO in the high dimensional spaces.Most of them use the idea of optimizing the objective function in a low dimensional space by mapping the high dimensional space to a low dimensional space [52][53][54][55][56].
Others assume additive structures in the objective function [57][58][59][60][61]. Another approach uses a cylindrical transformation [62] for scaling to the higher dimension.Li et al [63] use a dropout strategy by randomly selecting some dimensions and filling in the dropped out dimensions using different methods such as random values etc.Other techniques also include a geometry-aware BO method based on Riemannian manifolds [64].A trust region based method has also been proposed that confines the search region to promising regions [65].
We found that SaasBO [66] outperforms other methods for our quantum control problem.It places a prior distribution over the parameters of a GP surrogate.In particular, it places a half-Cauchy distribution over each ρ d (see equation ( 3)).This encourages the inverse (squared) length scales to be near zero, thus encouraging the surrogate to model f as depending only on a few dimensions (those with inverse length scales that are non-zero).However, half-Cauchy distributions have heavy tails, and thus with adequate evidence, the posterior distributions over ρ d will be directed to larger values, allowing BO to explore more complex functions.Because SaasBO places a distribution over the parameters of the GP, the surrogate is no longer a GP, but rather an infinite mixture of (or distribution over) GPs.The probability distribution of function f conditioning on D is where is the posterior GP distribution over f (x) given mean function µ, kernel function k, and data D (as in section 2.3).This integral is approximated as an average of L samples drawn from P(µ, k | D) (the posterior distribution over the GP parameters): , where µ l and k l are the mean and kernel functions from the lth sample of GP parameters.The samples are drawn from the posterior distribution over the parameters of the GP using a variant of Markov chain Monte Carlo, the no-U-turn sampler (NUTS) [67].
Thus the approximation of the posterior over f is a mixture of GPs.For an acquisition function like EI, its evaluation is the average over the samples of the same formula for a GP: where µ l t (x) and σ l t (x) are the posterior mean and covariance of f (x) for the GP using the lth sample from NUTS.
The query point x t is achieved by optimizing the EI function using L-BFGS-B [68].

Multi-fidelity Bayesian optimization
In some optimization processes, the function f can be evaluated at different fidelities.In our problem, the fidelity corresponds to the temporal resolution at which the system is simulated (not the value of the functional P), as is typical in the BO literature.Function evaluations at lower fidelities reduce computation time, but the results are less accurate.The goal of multi-fidelity BO is to use low-fidelity evaluations in combination with full-fidelity evaluations to find the maximum of the full-fidelity function f.Note that while low-fidelity evaluations may more quickly help to find the maximum, only full-fidelity evaluations are relevant for determining the success of the optimization.Therefore any multi-fidelity BO method must carefully consider whether to make a low-or high-fidelity query each iteration of the algorithm.We let z ∈ {0, 1, . . .N} denote the fidelity, with z = 0 being the high 'true' fidelity.In this work we focus on the N = 1 low fidelity approximation, but the technique can be generalized.We let f(x, z) be the result of evaluating the function at x with fidelity z.At iteration t in the algorithm, multi-fidelity BO selects x t (the function input) and z t (the function fidelity) and receives y t = f(x t , z t ).To complete the multi-fidelity set up, a known cost function is required; λ : Z → R, which denotes the (often computational) cost of evaluating the function f at fidelity z.
Different multi-fidelity procedures have been proposed, such as an auto-regressive approach to correlate between the lower fidelity function and the higher fidelity function by applying linear regression between different levels of fidelity [69].Kandasamy et al [70] use separate GP for each fidelity level to explore the input space in lower fidelity.Song et al [71] suggest a two phased model with combined GPs across multiple fidelity levels and emphasize on mutual information acquired from different fidelities.To better understand the correlations between different fidelities, a deep neural network based model [72] has also been proposed that uses neural networks to represent each fidelity.Multi-fidelity methods are particularly attractive for high-dimensional problem spaces to reduce the increased computational cost associated with needing more function evaluations.

Computational approach
We have added a multi-fidelity approach to SaasBO [66] to gain both the benefits of SaasBO on high-dimensional search problems as well as the computational gains of multi-fidelity functions to save on evaluation time.
For our quantum control problem, we can simulate the evolution of the quantum system with different temporal resolutions.For example, we select a resolution of one-third the target resolution as our low-fidelity approximation for our experimental results.
To add multi-fidelity capability to SaasBO, we use the SaasBO surrogate on the function f(x, z)-that is, we augment x with an extra dimension, the fidelity, and build a surrogate model on this d + 1-dimensional function.
Our acquisition function is the EI in the high-fidelity function's maximum.We call this improvement EI 0 (x, z) to indicate that it is the EI in the high-fidelity (0) from querying point x with fidelity z: where is the maximum value found so far during high-fidelity evaluations.We divide by the cost of the associated fidelity to score a particular (x, z) choice based on the rate at which it improves: For the high-fidelity model, EI 0 (x, 0) is the same as EI as in equation ( 9) because x ′ (in equation ( 10)) is equal to x (at least for a GP or SaasBO surrogate model).However, for low-fidelity queries, EI 0 (x, 1) differs from EI.It should not be the improvement at the point (x, z = 1), because our goal is to find the maximum for z = 0. Rather, EI 0 (x, 1) is more similar to the KG acquisition function which asks how does the expected maximum of f change, even if the new maximum might not be at the query's point x.A complete calculation of equation ( 10) would require a search over x ′ inside the expectation, which is too computationally expensive.Therefore, we make the assumption that if adding the queried low-fidelity function evaluation at point x causes the surrogate model to find an improved high-fidelity maximum, that maximum will also be at the point x.Thus, we assume that x ′ = x in equation (10).
For a GP surrogate, the joint distribution over f(x, 0) and f(x, 1) conditioned on the previous observations is a normal distribution that can be found by formulating the joint distribution over all of the queried points, (x, 0), and (x, 1) (see equation ( 4)) and then conditioning on the calculated values at the queried points.We denote the result as where, conditioned on the data available at the tth iteration, µ t (x, z) is the conditional mean of f(x, z) under the surrogate model, σ 2 t (x, z) is the conditional variance of f(x, z), and c t (x, z; x ′ , z ′ ) is the conditional covariance between f(x, z) and f(x ′ , z ′ ).
Thus, by properties of the normal distribution, the variance of f(x, 0) conditioned on the data and the value of f(x, 1) would be c 2 (x,0;x,1) σ 2 (x,0) , regardless of the value of f(x, 1).In expectation, µ t (x, 0) will not change based on the new observation.Therefore, the EI at (x, 0) from observing (x, 1) is g(y + , µ t (x, 0), ct(x,0;x, 1)  σt(x,0) ) and thus our entire formula for the EI at z = 0 from observing (x, z) is Figure 1 graphically illustrates the calculation.When considering a high-fidelity point, EI 0 is the EI from the distribution, shown in blue.When considering a low-fidelity point, it is the expected improvement from the distribution shown in orange.These EIs are divided by the query cost of the associated fidelity to determine the value of the acquisition function at this x-z pair.Equation ( 14) is the EI in the high-fidelity model for a GP surrogate.To use the SaasBO surrogate, we must take the expectation of this formula over settings of the GP parameters.Just as in equation ( 9), using MCMC sampling to produce L posterior samples over possible GPs, the final EI for multi-fidelity SaasBO is Therefore the final algorithm of the multi-fidelity version of SaasBO is as follows, given a budget of T iterations: (i) Select an initial set of m random points to query: {(x 1 , z 1 ), (x 2 , z 2 ), . . ., (x m , z m )}, with some in the high fidelity (z = 0) and some in the low fidelity (z = 1).(ii) Evaluate these initial m points:

Computational experiments
We applied these methods to the optimization of a time-varying electric field to control a series of one-dimensional quantum systems [24,33] discussed previously in section 2.1.The simulation propagates an initial wavefunction ψ 0 (x) to a propagated state ψ N−1 (x) after N − 1 iterations of a split-operator propagator after discretizing both space and time on a uniform grid.Our goal is to maximize P (given by equation ( 2)), which is the inner product (overlap) of our propagated state ψ N−1 (x) with a provided target final state, ψ f (x) .To optimize the electric field, E(t), we let E(t) be a superposition of a fixed set of frequencies with variable amplitudes and phases: where {f i } 11 i =1 are fixed and we optimize over 22 parameters: {a i } 11 i =1 and {ϕ i } 11 i =1 .The set of frequencies are centered on the theoretical ideal frequency associated with the energetic difference between the start and target states.The two parameters, amplitude and phase, are the input X of the objective function.Multi-fidelity is incorporated by changing the system's temporal resolution (10 000 time-steps for low fidelity and 30 000 for high fidelity) for our proposed optimization method.All other methods make use of the high-fidelity temporal resolution.
We tested our methods on a large set of optimization problems, each with a unique background potential.This set of problems was separated into three levels of difficulty, depending on the maximum transition probability P obtained by a gradient-based NIC-CAGE algorithm [33] (modified to use a split-operator propagator) after 60 iterations.An optimization problem was considered to be 'easy' if P > 0.99, 'medium' if 0.01 < P < 0.99, and 'hard' if P did not exceed 0.01 in 60 iterations.The full problem set consisted of 7000 easy problems, 2497 medium problems and 1305 hard problems, consistent with those found in [24].Note that NIC-CAGE optimizes a general electric field (in comparison with our experiment which considers a fixed set of frequencies).However, this classification is helpful to identify generally easier or harder problems and stratify the results accordingly.
The BO methods (BO, SaasBO, and MFSaasBO) and NM do not require gradients and therefore require less computational time per evaluation.NM reflects, stretches, or shrinks a simplex of solutions by evaluating the function at the vertices of the simplex.This direct-search method works well, but convergence is not guaranteed.
The gradient-based methods all performed very similarly, so in the results below, we show only the best-performing method: CG.CG overcomes the drawback of steepest descent [77] by moving in the conjugate direction for faster convergence.This method is quadratically convergent.It considers the next search direction by linearly combining the gradient and the previous direction is such a way so that the new direction is conjugate to the previous search direction.
All eight methods were tested with a budget of 50 iterations.After every evaluation of the forward-propagated initial state, we recorded the overall compute time, compute time spent on simulation, and P, the maximum transition probability achieved over the duration of the control pulse.We let P * be the best value of P found up to any particular iteration.We report the final P * after all 50 iterations, P * as a function of total computation time, and P * as a function of the total simulation time.

Results and discussion
The main objective of our work was to reduce the compute time required to find some time-dependent control-field, E(t), that maximizes the probability P that a transition from state ψ 0 to ψ f occurs.To test the capability of our BO routine, we explored two different scenarios: the transition from the ground to the first excited state (ψ 0 → ψ 1 ), and the transition from the ground to the second excited state (ψ 0 → ψ 2 ).
Figure 2 shows what fraction of the tested problems were able to be optimized to a given quality as a function of that quality, P * , after 50 iterations.For instance, for ψ 0 → ψ 1 transitions that can be classified as 'easy' examples for the CG method, approximately 50% of the problems were optimized to a value of P ⩾ 0.9.From figure 2 it is clear that the proposed method achieves the highest accuracy in the majority of the examples, whereas the performance of standard BO is poor.The performance of CG and NM are quite similar to one another, but still perform worse than SaasBO and MFSaasBO.The number of examples for which the CG and NM methods reach P * ⩾ 0.99 are fewer in comparison to SaasBO and MFSaasBO.The performance of all methods in the (ψ 0 → ψ 1 ) case is significantly worse in the case of the hard problem.It is notable that for all cases, MFSaasBO outperforms SaasBO in the fraction of examples that reached P * > 0.99.
Figure 3 shows the average (across different potential functions) best value found as a function of the total amount of computation time spent in simulation.For both the (ψ 0 → ψ 1 ) and (ψ 0 → ψ 2 ) case, and for all easy, medium, and hard problem sets, MFSaasBO distinctly beats all other methods, requiring less simulation time.Though equal iterations cost equal function evaluation for all methods, MFSaasBO has the advantage that a low-fidelity evaluation that takes less time.It is interesting to note that SaasBO, BO, and NM methods take almost similar simulation times as they all have equal number of high fidelity evaluations.CG methods take more simulation time because it is a gradient method and, therefore, requires additional time for the gradient calculations.While CG does slightly better than NM in figure 2, in terms of simulation time NM is better.Both of these methods quickly get stuck in local minima.NIC-CAGE was able to do better using a gradient method for the 'easy' problems because it optimizes in the full space of (discretized) electric field functions.For these experiments, we are restricted to electric fields of the form of equation ( 16).For 'hard' problems, neither NIC-CAGE nor gradient methods in this reduced function space are able to achieve transition probabilities better than 0.01, whereas our method is able to optimize most of these problems to at least a level of 0.1 (and some to 0.2).
Figure 4 show the average quality of the found solution as a function of the total computation time (both simulation time and time spent selecting the next query point).This shows that SaasBO and MFSaasBO take significantly more total time than other methods, although in accuracy these two methods are the best in a good margin.This is due to the additional time necessary to decide which point to next query.The simulations for our one-dimensional quantum system are relatively fast.With higher-dimensional quantum simulations (such time-dependent density functional theory calculations on material systems), we expect the simulation time will dominate all other aspects of the optimization (which do not scale with the complexity of the simulation), and high-dimensional multi-fidelity BO methods will not only produce better optima, but also produce them faster.Figures 5 and 6 show examples of the electric fields found (top three plots for both figures) and corresponding double bell potential function.We have used same set of examples for both transition cases (ψ 0 → ψ 1 and ψ 0 → ψ 2 ).From both figures it is clear that our known initial state (blue) is completely distinct from our desired or target state (orange) and except for one case (ψ 0 → ψ 2 , hard set), our final propagated wave function i.e. final state (green) closely matches the target state (orange).

Conclusions
We have presented a high-dimensional multi-fidelity BO approach for successful quantum control of time-dependent systems.This approach is particularly efficient and well-suited for the quantum control problem as it considers all previously evaluated points (instead of just the single previous one or its gradient) when deciding which point to query next.By employing a recent high-dimensional BO surrogate model (SaasBO) and combining it with our multi-fidelity acquisition function, we are able to drastically increase the quality of the found solutions for a fixed simulation computational budget.
BO methods require more computational cost between function evaluations than other optimization methods.However, as the complexity of the simulations necessary for the evaluations grows, the extra computational burden of BO relative to gradient methods becomes negligible.For these reasons, we anticipate that our BO approach would be even more efficient for higher-dimensional quantum systems (or systems with many-body quantum interactions), which would intrinsically require more function evaluations.As such, we advocate the use of BO in optimal quantum control and other time-dependent inverse problems that have a heavy simulation cost.

Figure 1 .
Figure 1.Example surrogate model and acquisition function calculations.Five points have been queried, shown in black circles, 2 in the high fidelity and 3 in the low fidelity.The surrogate mean is shown in a solid black line; the standard deviation of the model indicated by the gray shaded region.Two possible next query points are shown: xa associated with the high fidelity and x b associated with the low fidelity.The solid blue curve is the marginal distribution over f(xa) at the high fidelity.The expected improvement (EI) acquisition function scores this possible next point as the expected amount by which a sample from this distribution exceeds y + (the best high fidelity point found thus far) or 0 if it does not exceed y + .The dashed green curves are the (correlated) marginal distributions at the high and low fidelities for point x b .Because observing the function in the low fidelity at x b will not collapse the surrogate model at x b in the high fidelity, we do not use these dashed green distributions to calculate the expected improvement.Rather, we use the solid orange distribution, which is the distribution of the mean of the resulting surrogate model in the high fidelity if we condition on the value at x b in the low fidelity.The expected improvement is analogous to that for xa, but using the orange distribution.

Figure 2 .
Figure 2. Fraction of examples which achieved a best transition probability of P * or greater (top: left to right, easy, medium, and hard examples respectively for ψ0 → ψ1) (bottom: left to right, easy, medium, and hard examples respectively for ψ0 → ψ2).

Figure 3 .
Figure 3. Optimization quality versus simulation time (top: left to right, easy, medium, and hard examples respectively for ψ0 → ψ1) (bottom: left to right, easy, medium, and hard examples respectively for ψ0 → ψ2).

Figure 4 .
Figure 4. Optimization quality versus total computation time (top: left to right, easy, medium, and hard examples respectively for ψ0 → ψ1) (bottom: left to right, easy, medium, and hard examples respectively for ψ0 → ψ2).
m,and let D m+1 = {(x 1 , z 1 , y 1 ), (x 2 , z 2 , y 2 ), . . ., (x m , z m , y m )}.(iii) For t from m + 1 to T a. Sample L GP parameters from a SaasGP surrogate posterior distribution conditioned on D t .b. Find best point so far: y + = max i |z i =0 y i .c. Maximize the acquisition function: x, z = arg max x,z