Optimized multifidelity machine learning for quantum chemistry

Machine learning (ML) provides access to fast and accurate quantum chemistry (QC) calculations for various properties of interest such as excitation energies. It is often the case that high accuracy in prediction using a ML model, demands a large and costly training set. Various solutions and procedures have been presented to reduce this cost. These include methods such as Δ-ML, hierarchical-ML, and multifidelity machine learning (MFML). MFML combines various Δ-ML like sub-models for various fidelities according to a fixed scheme derived from the sparse grid combination technique. In this work we implement an optimization procedure to combine multifidelity models in a flexible scheme resulting in optimized MFML (o-MFML) that provides superior prediction capabilities. This hyperparameter optimization is carried out on a holdout validation set of the property of interest. This work benchmarks the o-MFML method in predicting the atomization energies on the QM7b dataset, and again in the prediction of excitation energies for three molecules of growing size. The results indicate that o-MFML is a strong methodological improvement over MFML and provides lower error of prediction. Even in cases of poor data distributions and lack of clear hierarchies among the fidelities, which were previously identified as issues for multifidelity methods, the o-MFML is advantageous for the prediction of quantum chemical properties.


Introduction
Fast and accurate calculations of chemical properties have become increasingly accessible to the community of quantum chemistry (QC) in the recent years with the accelerated development of machine learning (ML) for QC [1][2][3][4] .Various supervised and unsupervised learning approaches have seen widespread application in the field of QC.These applications include areas of material design and discovery 3,[5][6][7][8][9][10][11][12] excitation energies 2,[13][14][15][16] , potential energy surfaces [17][18][19][20][21][22][23] , and even prediction of chemical reactions 24 and ML molecular dynamics for the simulation of infrared spectra 25 .The conventionally costly QC calculations are gradually being replaced with ML models or hybrids of ML and QC resulting in a drastic reduction of the compute cost associated with chemical design and discovery.The core principle of the various ML techniques is to reproduce some implicit mapping between the geometry of the molecules to some property of interest such as excitation energies, potential energy surfaces, or atomization energies.These are usually targeted at some level of theory which is relevant to the area of application.
The general ML-QC pipeline for such applications begins with the generation of raw data consisting of the Cartesian geometries of the molecules of interest and the QC calculation property to be predicted at the level of theory (MP2, CCSD etc) that is deemed accurate for the application.The Cartesian coordinates are then transformed into some input feature format, called representations or molecular descriptors, that the ML models can map to the property of interest.In the recent past, much work has been dedicated to the development of such representations.These include molecule-wise descriptors such inverse distance representations and their extensions such as the Coulomb Matrix (CM) [26][27][28][29][29][30][31] and Bag of Bonds [32][33][34] , or atom-wise descriptors such as Smooth Overlap of Atomic Positions (SOAP) 34,35 , SLATM 36 , permutationally invariant polynomials (PIP) 17 , the PaiNN representation 37 , and the Faber-Christensen-Huang-Lilienfeld (FCHL) representation 1,29,30,38 . Sinificant research has also been performed on using other types of representations such as SMILES strings 39,40 , graph-based representations 41 , and representations that are either generated with neural network (NN) models such as the Deep Tensor NN 26,42,43 or are generated ad hoc 44,45 .Once machine interpretable features are generated, any of the various ML methods such as kernel ridge regression (KRR), Gaussian Process Regression (GPR), or NN models such as ANI 46,47 , SchNet 26,43 and PhysNet 48 , can be used to map the input features to their respective QC properties.
Within such frameworks, it has been a common observation that the higher the number of training samples, the better the accuracy of the prediction.However, a high cost is associated with generating this training data since conventional QC calculations with high accuracy are expensive to generate.Thus, the compute cost associated with discovery in QC is shifted from the conventional QC calculations to the cost associated with generating the training data set for these ML models.While any of the aforementioned ML methods is a promising candidate to replacing the time consuming conventional calculations, only rather recently has the cost of the training data for the models been investigated 13,20,49,50 .
An ad hoc optimization procedure for the ∆-ML method has been implemented for the ground state PES reconstruction of the CH 3 Cl, termed as hierarchical-ML (h-ML) 20 .Based on the CPU compute time of point calculations, the training samples to be used at various fidelities are selected by minimizing an objective function.This reduces the number of QC calculations needed to generate the multifidelity data set for some user defined target error.
Recently, a systematic generalization of the ∆-ML method called multifidelity machine learning (MFML) 49 was applied to the first excitation energies of molecules 13 .The MFML method exploits the existence of varying levels of accuracy of conventional QC methods, thereby resulting in a hierarchy of methods for properties such as excitation energies.MFML reduces the number of expensive training samples needed by training on the difference of various fidelities between a baseline fidelity and the target fidelity.The MFML model is built by iteratively adding models built on the difference between the excitation energies calculated at the various fidelities.In MFML the number of training samples is decreased by 2 at each subsequently costly fidelity 13 .Thus, there is an inherent decrease in the number of costly training samples.For each fidelity and training set size at this corresponding fidelity, a sub-model, for a given training set size, is trained 49 .This is recursively performed from a baseline fidelity (cheaper and less accurate) up to the target fidelity (expensive and more accurate).The various sub-models are combined to give the final MFML model.This combination was performed based on the sparse-grid combination technique 58,59,[59][60][61][62][63] as has been discussed in Refs. 13,49.This work furthers the methodological research in MFML by introducing a novel method of optimally combining the various sub-models built on the different fidelities.The novel approach is inspired by Refs. 64,65where an optimized sparse-grid combination technique is introduced and discussed for the solution of partial differential equations.In contrast to that work, we however apply it to ML for QC where the optimal combination of the sub-models is performed with respect to a validation set of the property of interest, not based on intrinsic approximation properties of the given problem.This results in a multifidelity model that predicts the property at the target fidelity with improved accuracy (Section 3).Thus, the optimized MFML (o-MFML) presents an optimal linear combination of the sub-models.This work benchmarks this novel method on the QM7b dataset with the prediction of atomization energies at the CCSD level of theory with the ccpvdz basis set 29,49 .Further benchmarking is carried out on the first excitation energy data-set from Ref. 13 .The results indicate that the o-MFML is indeed superior to the implementation of the conventional MFML.
The manuscript is structured as follows.A brief overview of the data used for this study is reported in Section 2.1.Next, Section 2 discusses the key methodology and the novel o-MFML technique.Next, various results of the comparison of MFML and o-MFML for the two datasets are delineated.Section 3.1 discusses the results for the benchmark on the QM7b dataset 29,29 while Section 3.2 discusses the corresponding results for the excitation energy predictions.The assessment of the various models is carried out by studying the mean absolute error (MAE) and the learning curves.

Methods
In this section, the various methodological terms needed to arrive at the results are recorded.Details of dataset, MFML definitions, and optimization methods are discussed in addition to the evaluation metric for the various ML models.

Dataset
The effectiveness of the optimized MFML method is benchmarked on the QM7b dataset 29 , which consists of a total of 7211 molecules with up to seven heavy atoms.The atomization energies for each of these molecules were calculated in kcal/mol as mentioned in Ref. 49 .For this study, the effective averaged atomization energies are considered.This is given as: where, n i is the number of atoms and e i is the effective atomic energy of the i th molecule.
The latter is obtained by a linear fit of E = i n i • e i for all molecules in the QM7b dataset.Without loss of generality, the E eff used are simply referred to as atomization energies herein.Further, only the MP2 66-68 and CCSD 69-71 levels of theory were considered.
The fidelity structure was formed by evaluating these with three varying basis set sizes, namely: STO-3G, 6-31G, and ccpvdz (with increasing size).While the original use of this dataset in Ref. 49 considers a 3-dimensional multifidelity structure, in this work these are flattened into a 2-dimensional multifidelity structure.Thus the order of the fidelities in the assumed hierarchy was taken as MP2-STO3G, MP2-631G, MP2-ccpvdz, CCSD-STO3G, CCSD-631G, and CCSD-ccpvdz.The CCSD-ccpvdz is set as the target fidelity.A total of 1.5 • 2 7 = 6144 molecules were randomly chosen as the training set.
The data for the excitation energy calculations is taken from Ref. 13 .This consists of DFTB and MD based simulation of benzene, naphthalene, and anthracene.For each, a total of 15 ps of trajectory was generated after energy minimization and equilibration.This trajectory was then sampled every 1 fs giving 15,000 frames which was used for training and evaluation.For training, the first N train = 1.5 • 2 13 = 12288 frames were used with excitation energies calculated at five fidelities (basis sets): def2-TZVP, def2-SVP, 6-31G, 3-21G, and STO-3G.The sampling and calculations are identical to those discussed in Ref. 13 .

Multifidelity Machine Learning
Consider an ordered hierarchy of fidelities indexed as f = 1, 2, . . ., F where the cost of calculation (and usually, therefore, accuracy) increases with an increase in the index.The training set for data at some fidelity f can be then defined as Defining the set of molecular descriptors , based on previous work in this field as detailed in Refs. 13,49the current state of the multifidelity method recommends the nestedness X F ⊆ . . .⊆ X 2 ⊆ X 1 of the training data.This is enforced in both the datasets used in this work.That is, if a molecular conformation is picked, which has the quantum chemistry property calculated at the highest fidelity, then it is also that the quantum chemistry property is calculated for this conformation at the next lower fidelity, and so on.As Ref. 13 shows, a multifidelity machine learning (MFML) model with kernel ridge regression (KRR) as the ML model of choice, can be iteratively built for an ordered hierarchy of fidelities as where F is the target fidelity and f b = 1, 2, . . ., F − 1 is some baseline fidelity, and X q is the representation of a query molecule.The term inside the summation is calculated as The coefficients of KRR, α , are calculated by solving the linear system of equations given by It is to be noted that ∆y (f,f +1) = y f +1 − y (f,f +1) , where y f +1 is the vector of energies in the training set T (f +1) and y (f,f +1) is the vector of energies in training set T (f ) restricted to those conformations only found on fidelity level f + 1.Thus, this definition of MFML can be seen as one that works on the difference between the data.As an example, for a target fidelity F = 5, with a baseline fidelity f b = 1, the MFML model built with Eq. ( 2) would be explicitly written as: KRR .
The number of training samples used for each of the fidelities is scaled by 2, based on work in MFML 13,49 .Thus, if the number of training samples at the target fidelity are set to be N F train , then the next lower fidelity uses 2 • N F train of training samples and so on.
Ref. 49 has mathematically shown that this form of the MFML is equivalent to taking the difference of models built on the two different levels while ensuring a nested data structure.
That is, P for f = 1, . . ., F .This formulation of sub-models, represents a 2-dimensional multifidelity structure, that is, the fidelity, and the number of training samples.In such a structure, it is assumed that increasing the fidelity results in a more accurate (and therefore, a costlier) QC calculation.This in turn translates into a more accurate (and costlier to train) sub-model.
In principle, there is no limit on the dimensions of MFML as long as a clear hierarchy can be established in each dimension 49 .For the specific case of the 2-D structure, one can identify a sub-model with an ordered pair, or index, s = (f, η f ) where f is the fidelity and the number of training samples chosen from this fidelity are given as N f train = 2 η f .A standard KRR model (see Section S1) built for the index s is then denoted as P With this development, one arrives at the MFML method written as the linear combination of the various sub-models.To this end, some notations are introduced.The set of indexes of all available sub-models is denoted by S. A standard KRR model for a query molecule represented as X q is built as P (s) KRR (X q ) for s ∈ S. Further, define the set of indexes of sub-models used for a MFML model with target fidelity F , for N F train = 2 η F , and a baseline f b , as follows: where S (F,η F ;f b ) ⊆ S. The motivation is to combine various sub-models such that only a few expensive training samples are required, which, when combined with cheaper training samples, yield a high-accuracy low-cost model for the target fidelity.This is achieved by the linear combination of the sub-models from s ∈ S (F,η F ;f b ) .This is denoted by where β s are the coefficients of the linear combination.These coefficients can be interpreted as a measure of how much each sub-model contributes to the final MFML model.Based on work in MFML for atomization energies 49 and excitation energies 13 , the coefficients are set in such a manner that each sub-model contributes in equal magnitude to the final MFML model.For a model of the form P MFML , the β s , are set in conventional MFML as follows: where the terms are as discussed previously.
A hypothetical 2-dimensional multifidelity structure is shown in Figure 1 with the dimension of fidelity on the y-axis and the dimension of the number of samples on the x-axis.
One can now identify various sub-models in this hypothetical structure.For example, P (s) with s = (6 − 31G, 5), represents a sub-model built at the 6-31G fidelity with 2 5 = 32 training samples.In this scheme, the cost (and therefore, the accuracy to target fidelity) of the training data of the sub-models increases with increase in either of f or sη f .That is, s is more accurate (and more expensive) than a sub-model built with with s ′ = (3 − 21G, 5).
At the same time, a sub-model built with s ′′ = (6 − 31G, 6) is more accurate (and expensive) than s from this example.As an example to depict the conventional MFML model built with the various sub-models, consider the set of sub-models for MFML being built for target fidelity F = 4, with 2 2 (that is, η F = 2) training samples at this fidelity, and with a baseline fidelity of f b = 1.The set of MFML sub-model indexes is then given as A hypothetical structure of sub-models for 4 fidelities is depicted here.Each sub-model can be identified with an index pair s = (f, η f ) representing the fidelity with N f train = 2 η f .Thus the circled sub-model can be denoted as s ′ = (2, 3).Within this formulation, the MFML model is built by combining the sub-models as shown with the dotted black line.The contribution of sub-model s ′ is given by the coefficient denoted by β s ′ .In conventional MFML, this would in particular be -1.

Optimized MFML
Having written the MFML model in terms of the individual sub-models of multifidelity, one can consider formulations of the coefficients, which are different from Eq. ( 7).This can be seen as a hyper-parameter optimization of the different β s to return a multifidelity model which has improved accuracy at the target fidelity.In interest of this form of an optimization, the validation set is defined as To evaluate the accuracy of the model, define a test set , where φ denotes the empty set.The split of the validation and test sets is a common approach in ML techniques wherein the optimization/ hyperparametertuning is performed on the former and the error of the final model is reported on the latter.
It is to be noted that the test set is never used in any stage of the training process.
One can explicitly define an optimized MFML (o-MFML) model for a target fidelity F , with N (F ) = 2 η F training samples at the target fidelity, for a baseline fidelity f b , as where β opt s are optimized coefficients, and X q is the representation of a query molecule.In general, one is interested in solving the optimization task: where one minimizes some p-norm on the validation set V F val .This is equivalent to solving where with respect to S ′ as depicted in Eq. ( 8), and y ref is the vector of reference energies from V F val .This work utilizes the ordinary least square optimization (OLS) procedure to solve Eq. ( 9) with p = 2.In the results, the OLS optimized MFML model is reported as P o−MFML .
However, it must be noted that any method that can solve the minimization problem in Eq. ( 9) can be used to optimize the coefficients.
Thus, the complete process of building an o-MFML model can be written as follows: 1. Identify the set of sub-models for a given MFML model, S (F,η F ;f b ) .
2. Build the various KRR sub-models for sub-models s ∈ S (F,η F ;f b ) .
3. Optimize the coefficients, β s , on V F val using an optimizer of choice.

Evaluate the final model
test for some error metric (Section 2.4). 3.For the conformations X i such that

Model Evaluation
, that is, the energies at fidelity F − 1 for the conformations which are also found in G (F ) .
4. At the next lower fidelity, f = F − 1, build the sampled training set Throughout this investigation, all prediction errors have been reported on a test set, , which consist of evaluation representations and their corresponding reference values for property of interest (for example, excitation energy) calculated at the target fidelity F (for example, TZVP).These errors are reported as Mean Absolute Errors (MAEs) which are defined by a discrete L 1 norm The model P ML can be either identified by the standard KRR model or by the various MFML models discussed in this work.For the case of predicting atomization energies for the QM7b dataset, of the 1067 molecules which remained after separating the training data, 367 were randomly sampled and used as the validation set along with their atomization energies calculated at the CCSD-ccpvdz fidelity.The remaining 700 molecules and their atomization energies at the target fidelity were utilized as the test set.In the case of the excitation energy dataset, for each molecule, the 2712 samples with the target fidelity of TZVP were randomly split into 712 and 2000 samples for the validation and test set respectively.The random sampling was performed using the Scikit-learn package 72 .

Results
To establish the effectiveness of the optimized MFML (o-MFML) method, a study was carried out on two datasets for the prediction of two different properties.In particular, this work reports the prediction of atomization energies for the QM7b dataset as calculated in Ref. 49 , and the prediction of the first excitation energies for the data used in Ref. 13 .The process of the kernel generation and training of the KRR for the work recorded here are carried out with the QML package 73 .As a preliminary analysis, the scatter plot between the 1-norm of SLATM representations 36 and the atomization energies of the molecules from the training set is studied in Figure 2.This assists in understanding the layout of the chemical space by studying the proxy of the chemical space, which in this case is the SLATM representation.On comparing the distribution across the basis sets, that is, row-wise, one observes that increasing basis set size results in clearer separation of the atomization energies across the proxy chemical space.

Atomization Energy Prediction on QM7b
The higher energy clusters become clearer.A similar comparison for increasing level of theory shows visible differences only for the ccpvdz basis set.Here, the CCSD level of theory further separates clusters of molecules in comparison to the MP2 level of theory, especially for those with atomization energies in the region of -100 kcal/mol.For increasing accuracy to the target fidelity of CCSD-ccpvdz, one observes that the scatter plot of the energies with respect to the chemical space gets closer to that of the target fidelity.The smallest basis set, STO3G does not show any atomization energies higher than 100 kcal/mol for both MP2 and CCSD levels of theory.One observes that each increasing fidelity results in a clearer, more distinct categorization of the molecules in the QM7b dataset, which was previously discussed in Ref. 49 with respect to the 1-norm of the coulomb matrices.The STO3G basis sets fail to provide any form of information of the separation of the clusters of molecules.
The scatter plot of the fidelities with this basis set show a strong clustering around the 0 kcal/mol mark.For the larger basis sets, one observes that higher atomization energies show two distinct clusters.A large one around the 0 kcal/mol mark and another around the 150 kcal/mol mark.As identified in Ref. 49 , these correspond to the largest molecules of the QM7b dataset.Since this information is missing from the smaller STO3G basis set, one anticipates that the use of the fidelities MP2-STO3G and CCSD-STO3G in the conventional MFML would provide little to no benefit in predicting the atomization energies at the target fidelity of CCSD-ccpvdz where the clustering is all the more distinct.
The resulting learning curves of the multifidelity analysis on the QM7b data are shown in Figure 3.All the sub-models for MFML and o-MFML methods were built with KRR As preemptively discussed in the preliminary analysis, the addition of MP2-STO3G fidelity does not provide any perceivable benefit to the MFML model.The model built on the CCSD-STO3G baseline, however, does show improvement.
The learning curves for the o-MFML models are presented on the right-hand side of Figure 3. Firstly, one observes that even for smaller training set sizes, the o-MFML does not show any pre-asymptotic perturbance.The MAE of the various models always decreases for increasing training samples.This is contrasted with the conventional MFML method where a region of pre-asymptotics is observed wherein the MAE of the model built with f b = MP2-STO3G fluctuates before settling down.In other words, there is a constantly lowered offset with the addition of each cheaper fidelity even for very small training set sizes for the o-MFML models.The same sub-models are used for both MFML and o-MFML models.The combination of these models is optimized resulting in an increased accuracy of prediction.Secondly, one also notices that the addition of the MP2 level of theory even with the largest basis set size results in a significant decrease in the prediction error of the model.
The addition of the MP2-STO3G fidelity further improves the capability of predictions of the o-MFML models resulting in a lower error of prediction.For N CCSD−ccpvdz train = 128 and the baseline of MP2-STO3G, the MFML method results in an MAE of 2.73 kcal/mol while the MAE corresponding to the o-MFML method is 1.4 kcal/mol.The over-estimation in particular begins as early as about 40 kcal/mol and becomes evident as one goes up the energy range.The o-MFML on the other hand manages to predict these higher atomization energies with higher accuracy thus bringing the distribution closer to the identity mapping.

Coefficient Study
As discussed in Section 2.3, the o-MFML method optimally combines the various sub-models to result in a superior multifidelity method.The coefficients are optimized on the validation set with the OLS method.In order to further understand the o-MFML method, the  for the case of the MP2-STO3G baseline.There is significant difference in the optimized coefficients and the default MFML coefficients for almost all the sub-models.This shows that the conventional MFML method was not optimized in combining the different fidelities.
In particular one observes that the values of β opt s for the CCSD-631G fidelity are small in comparison with those of the other fidelities in the central plot of the second row.This could indicate that the optimization method identified this fidelity to be less useful.In order to verify this, an experiment was carried out by separately building two models.The first was the usual complete model with all six fidelities with N CCSD−ccpvdz train = 2 7 and the training samples at the other fidelities scaled by 2. The second model was built without the CCSD-631G fidelity but the training samples at the other fidelities were kept to be identical to that used in the first model, that is, (2 7 , 2 9 , 2 10 , 2 11 , 2 12 ).For these two models the o-MFML was generated and the MAE evaluated.The original model resulted in an MAE of 1.421 kcal/mol while the second model resulted in an MAE of 1.431 kcal/mol which is a difference of 0.72%.This is a strong indicator towards the robustness of the o-MFML method and how it can be a tool to detect whether a particular fidelity would benefit the overall multifidelity structure or not.More details on the effectiveness of the coefficient analysis are reported in the supplementary material in Section 3.3.1.

Excitation Energy Prediction
The dataset for excitation energies consists of MD and DFTB-based trajectories of benzene, naphthalene, and anthracene 13 .A total of 5 fidelities were calculated and ordered as discussed in Section 2.1.In brief, the target fidelity is set to be TZVP and the cheapest fidelity is considered to be STO-3G.All the sub-models used in both the MFML and o-MFML method are built with KRR using the Matérn Kernel of first order and l 2 norm.A regularization strength of 10 −9 is used.The kernel widths for each molecule were chosen as recorded  in Ref. 13 .Unsorted coulomb matrices are used as representations for all cases.Previously, various preliminary analyses of this dataset have been discussed and two problematic data structures were thereby identified 13 .For MD-based naphthalene, there was no clear multifidelity structure.For DFTB-based anthracene, a high spread of the STO-3G energies with respect to the target fidelity of TZVP was also identified to be problematic.From these, it was shown that the MFML method would not provide favorable results for these two cases.
The learning curves of the conventional MFML method for the MD-based trajectories of benzene, naphthalene, and anthracene are shown in the top row of Figure 6.At the same time, the bottom row shows the learning curves resulting from the novel o-MFML method.Various baselines fidelities for the multifidelity models are as shown in the legend.
Of particular interest in this is the case of naphthalene.The MFML results reflect the issue of the wide spread of the scatter as previously identified in Ref. 13 .However, with the o-MFML method, one observes that the model built with the 3-21G and 6-31G fidelities still results in constant lowered offsets as opposed to the conventional MFML method where these models do not provide much improvement.Thus, the o-MFML method provides a robust multifidelity method even if the data distribution of the quantum chemistry methods is not as anticipated for MFML.For benzene and anthracene, the improvement in the MAEs is perceptibly small.This could indicate that the original MFML model already was properly optimized for these cases.Next, consider the case of DFTB-based anthracene.For the MFML method, the addition of the STO-3G fidelity results in a decrease in performance of the model as discussed in Ref. 13 , where the authors argue that the wide spread distribution of the STO-3g fidelity with respect to the target fidelity of TZVP results in a poorer improvement with the conventional MFML +1) train i=1 with confor-mations restricted to those found in the training set used for fidelity f + 1.This is further numerically verified in the supplementary material in S 3.1 for the first excitation energy data.Models of the type P (f +1) KRR and P (f ) KRR are herein referred to as sub-models of MFML.A sub-model of MFML is built for a specific choice of a training set.For the current work, it implies selecting a fidelity, f , and the number of training samples at this fidelity, N (f ) train conceptual development of such a combination of sub-models has been previously implemented by Zaspel et al. for the prediction of atomization energies in the QM7b dataset 49 .
Figure1: A hypothetical structure of sub-models for 4 fidelities is depicted here.Each sub-model can be identified with an index pair s = (f, η f ) representing the fidelity with N f train = 2 η f .Thus the circled sub-model can be denoted as s ′ = (2, 3).Within this formulation, the MFML model is built by combining the sub-models as shown with the dotted black line.The contribution of sub-model s ′ is given by the coefficient denoted by β s ′ .In conventional MFML, this would in particular be -1.

2 .
Learning curves are a well known metric in the field of KRR-based ML methods.These depict the change in prediction error of the model for increasing training set size.In all results reported in this work, the learning curves are averaged over a 10-run random shuffling of the MFML training set while ensuring the nestedness of the training samples.For each of the 10 runs, the procedure is as follows: 1. Randomly select N F train = 2 η F training samples from T F .Define this as a new sampled training set, G F ⊆ T F .Train the sub-model P (F,η F ) KRR on training data from G F .

Figure 2 :
Figure2: Scatter plot of the various fidelities from the training data with respect to the 1-norm of the corresponding SLATM representation.The SLATM representation serves as a proxy to the chemical-space.Thus these scatter plots represent the spread of the atomization energies across the chemical space.The first row corresponds to the MP2 level of theory for increasing basis set sizes.Similarly, the second row displays the scatter plots for the CCSD level of theory.

Figure 3 :
Figure 3: Various learning curves for the prediction of atomization energies of molecules in the QM7b dataset.The left-hand side plot corresponds to learning curves built with the conventional MFML method, that is P (F;f b ) MFML .The right-hand side plot corresponds to the o-MFML models optimized with OLS, referred to as P (F;f b ) o−M F M L .In both cases, each curve corresponds to a model where the target fidelity, F , is CCSD-ccpvdz.The various baseline fidelities f b are as shown in the figure legend.The learning curve for the conventional KRR model (KRR-reference) is also shown for reference.

Figure 4 :
Figure 4: A comparison of learning curves of the MFML (dashed lines) and o-MFML (solid lines) models for varying baseline fidelities, f b .A scatter plot comparing the predictions vs. CCSD-ccpvdz reference for the two models is also presented for f b = MP2-STO3G.

Figure 5 :
Figure 5: Values of the o-MFML coefficients for N CCSD−ccpvdz train = 2 7 = 128.For readability, in most of the cases, the coefficients have been rounded off to the second decimal place.For varying baseline fidelities, the final values of the coefficients are shown.For reference, the default coefficients used in MFML are shown for the MP2-STO-3G baseline.

Figure 6 :
Figure 6: Learning curves for MFML (top row) and o-MFML (bottom row) models for MDbased trajectories of various molecules for the prediction of excitation energies.The various baselines fidelities used are delineated in the legend.The KRR-reference (black curve) is provided for each case for a single-fidelity training on TZVP.The axes are scaled identically for the MFML and o-MFML methods but are different for each of the different molecules.

Figure 7 :
Figure 7: Learning curves for MFML (top row) and o-MFML (bottom row) for DFTB-based trajectories of various molecules.The MAE is reported for the prediction of first excitation energies.The single-fidelity (TZVP) KRR leaning curve (black line) for prediction on the same test set as the other models is provided for reference.The scaling of the axes is identical for individual molecules across the MFML and o-MFML models for easy comparison.

4 Conclusion
method.With the o-MFML method, the optimization of the coefficients results in a model that performs much better.The learning curve indicates that the o-MFML model for the STO-3G baseline is now comparable to that of the model built with the 3-21G baseline.The o-MFML method results in a better model in-spite of the poor distribution of the STO-3G with respect to the target fidelity.Further results and analyses of the o-MFML employed for the prediction of the excitation energies are discussed in S 3.3.This work has numerically established the improvement of the conventional MFML by optimally combining the various multifidelity sub-models.For the prediction of atomization energies of molecules from the QM7b dataset, and the prediction of excitation energies for three molecules of growing sizes, o-MFML has been shown to categorically improve the prediction capabilities of the multifidelity method.The use of o-MFML was especially shown to be beneficial for cases where the hierarchy or distribution of the cheaper fidelities is not optimal.The learning curves indicate that the use of o-MFML results in low errors for the prediction of both atomization energies and excitation energies.This novel method opens up further research avenues for multifidelity methods in QC.The use of the optimal coefficients to determine the optimal number of training samples to be used at each fidelity, for instance, is one such area of research.When combined with an in-depth analysis of the scaling of the number of training samples between fidelities, this could provide a better picture of the multifidelity structure and its use for QC properties.Overall, this work presents a cost-efficient