Importance profiles. Visualization of atomic basis set requirements

Recent developments in fully numerical methods promise interesting opportunities for new, compact atomic orbital (AO) basis sets that maximize the overlap to fully numerical reference wave functions, following the pioneering work of Richardson and coworkers from the early 1960s. Motivated by this technique, we suggest a way to visualize the importance of AO basis functions employing fully numerical wave functions computed at the complete basis set (CBS) limit: the importance of a normalized AO basis function $|\alpha\rangle$ centered on some nucleus can be visualized by projecting $|\alpha\rangle$ on the set of numerically represented occupied orbitals $|\psi_{i}\rangle$ as $I_{0}(\alpha)=\sum_{i}\langle\alpha|\psi_{i}\rangle\langle\psi_{i}|\alpha\rangle$. Choosing $\alpha$ to be a continuous parameter describing the orbital basis, such as the exponent of a Gaussian-type orbital (GTO) or Slater-type orbital (STO) basis function, one is then able to visualize the importance of various functions. The proposed visualization $I_{0}(\alpha)$ has the important property $0\leq I_{0}(\alpha)\leq1$ which allows unambiguous interpretation. We also propose a straightforward generalization of the importance profile for polyatomic appliations $I(\alpha)$, in which the importance of a test function $|\alpha\rangle$ is measured as the increase in projection from the atomic minimal basis. We exemplify the methods with importance profiles computed for atoms from the first three rows, and for a set of chemically diverse diatomic molecules. We find that the importance profile offers a way to visualize the atomic basis set requirements for a given system in an a priori manner, provided that a fully numerical reference wave function is available.


I. INTRODUCTION
Most quantum chemical calculations reported in the literature employ the linear combination of atomic orbitals (LCAO) approach, 1,2 in which the spin-σ molecular orbitals (MOs) ψ iσ (r) are expanded in terms of atomic orbital (AO) basis functions χ µ (r) as The AO χ µ (r) centered on R µ is defined as a product of a radial function R nl (r) with a spherical harmonic Y lm as Typically, LCAO approaches employ real-valued spherical harmonics; however, in the case of linear moleculeswhich also trivially includes the cases of atoms and diatomic molecules-complex spherical harmonics Y m l that afford the analytical solution with respect to the angle ϕ around the bond axis 1 may also be employed in eq.(2).
The accuracy of LCAO calculations is controlled by the AO basis set, that is, the radial functions R nl (r) in eq.(2).AO basis sets are typically optimized to reproduce total energy differences around the chemical equilibrium in order to facilitate cost-efficient evaluation of reaction energies, for instance. 3,4Indeed, the reason for the popularity of LCAO calculations is that they often offer reliable estimates of molecular properties, for instance, because basis set truncation errors-the differences in energy between the value predicted by the AO a) Electronic mail: susi.lehtola@alumni.helsinki.fibasis and the complete basis set (CBS) limit-tend to be systematic across various geometries and electronic states. 1 Unfortunately, while making a basis set is straightforward, making a good basis set is terribly difficult because of the conflicting requirements that define such a basis set.On the one hand, the basis set should be as small as possible, because the larger the atomic basis set is, the more costly it is to use in polyatomic calculations.On the other hand, the basis set should also be transferable: it should be similarly accurate across a wide variety of systems.It is easy to make a basis set more transferable by adding more functions; however, this is in opposition to the first criterion.Although it is possible to formulate approaches to generate sequences of basis sets that approach the CBS limit from first principles 5 -fully numerical basis sets being an extreme example thereof-the issue is that such benchmark quality basis sets are considered much too large for routine calculations.
The tradeoff between the two aforementioned criteria is not always simple.It historically led to the development of the pioneering Pople-type x-yzG basis sets such as 3-21G, 6 6-31G, 7 and 6-311G, 8 and the zoo of their polarized counterparts.These basis sets have become obsolete with the introduction new families of basis sets, which afford an optimal balance of cost and accuracy.We would especially like to point out here the problematic nature of the 6-311G family: it is merely of valence double-ζ quality instead of the intended valence triple-ζ quality, 9 and can also lead to peculiar chemistries. 10Unfortunately, this is still not widely appreciated, as demonstrated by a recent benchmark study. 11odern basis set families have been designed to afford systematic convergence towards the complete basis set (CBS) limit.The cost-accuracy tradeoff is solved by the introduction of basis sets of prefixed size, ranging from split-valence polarization or polarized double-ζ quality to polarized triple-ζ and higher basis sets; this also greatly simplifies choosing the basis set, as one only needs to pick a suitable rung, that is, the cardinal number of the basis set.Examples of modern basis sets include the correlation consistent family, 12 the TURBOMOLE default basis sets, 13 and the polarization consistent family. 14e note in passing that such standard energyoptimized basis sets are often suboptimal for modeling properties other than (differences in) the total energy.Specially optimized basis sets that yield faster convergence to the CBS limit have been reported for various properties in the literature, such as magnetic properties [15][16][17][18] and electron momentum densities. 19,20xtreme environments are an even more challenging case for standard basis sets, as has been recently demonstrated for the case of strong magnetic fields. 21,22The magnetic fields that can be found in the atmospheres of white dwarfs and neutron stars are strong enough to result in qualitative changes in the electronic structures of atoms and molecules.As a result, the basis set requirements for calculations at finite magnetic fields are more stringent than for those at zero field, and novel types of basis sets are required. 21,22aving discussed various general challenges in the development of AO basis sets, we can now comment on the practical aspects of basis set development.Typically, the optimization of basis sets begins by choosing a level of theory and a training set of atoms and molecules.Sequences of basis sets with various numbers of functions are then optimized with a given training set of systems in order to determine the optimal composition of the basis set, which is usually chosen with the notion of correlation 12 or polarization 14 consistency; Shaw and Hill 23 have recently reported a Python package for performing such optimizations.
This procedure implies the need to carry out a large number of electronic structure calculations with varying basis sets.If the training set is changed by the addition or removal of some atoms or molecules, the electronic structure calculations need to be repeated in full, because the optimal basis set changes when the training set changes.
The question we now pose is: can we find a way to avoid having to carry out such a large number of electronic structure calculations when modifying the training database?The answer is yes: the maximal overlap method of Richardson and coworkers 24,25 offers a shortcut for basis set optimization: computing projections onto a precomputed CBS limit wave function is much cheaper than carrying out full, repeated electronic structure calculations of total energies in the AO basis set under optimization.Thereby, optimization of the projection is much faster than self-consistent energy optimization.The attractiveness of the maximal overlap idea is obvious from the number of times it has been described in the literature: projection techniques have become an established technique in the literature for forming compact basis sets.For instance, as discussed by Francisco, Seijo, and Pueyo 26 , the later works by Kalman 27 and Adamowicz 28 both describe approaches analogous to that of Richardson et al..
Fully numerical calculations on diatomic molecules were already possible four decades ago, 1 and already Adamowicz and McCullough 29 applied an overlap maximization procedure to fully numerical references in order to produce accurate AO basis sets.For completeness, we mention in this context also the work on studying the deficiencies in Gaussian basis calculations with fully numerical diatomic calculations of Kobus, Moncrieff, and Wilson. 30,31ore recently, projections to fully numerical atomic wave functions have been used by Van Lenthe and Baerends 32 to fit Slater-type orbital basis sets.4][35][36][37][38][39] Although these techniques are likewise derivative of the maximal overlap technique of Richardson and coworkers 24,25 and its many later applications, this connection does not appear to have been made before in the literature.
3][44][45] All-electron calculations have recently become feasible with plane waves, as well, through the use of a regularized nuclear Coulomb potential. 46,47These developments merit new attention on the maximal overlap method: a database of fully numerical all-electron wave functions for a set of chemically diverse systems would offer an excellent starting point for fitting novel, systematic and error-balanced AO basis sets with machine learning techniques, for instance.
As a key step to building such a database, we point out that the projections involved in the computation of the overlaps can also be used as a visual tool.We will introduce importance profiles I(α) that measure the projection of a AO test function with parameter α (here the exponent of a Gaussian or Slater type orbital) onto the fully numerical orbitals of a given system.These importance profiles thereby reveal the electronic structure of the system, and can also be used to study what kind of AO basis functions should be used to model the studied system.
The driving idea behind this work was to find out whether the importance profile would offer an unambiguous way to characterize the electronic structure of various molecules: would it be possible to see the differences in polarization effects and optimal polarization exponents for atoms in different molecules, for example H in H 2 vs H in HF? Quantifying the dissimilarity of the basis set requirements of various molecules would be of great help in constructing the training and test sets of molecules that could be used to optimize new, compact and efficient AO basis sets following the maximal overlap technique of Richardson et al.. 24,25 The layout of this work is the following.Next, in section II, the basis of the proposed visualization method is outlined: the employed radial functions are discussed in section II A, the closely related completeness profiles of Chong 48 are briefly reviewed in section II B, and the importance profiles are introduced in section II C. Details of the implementation of the importance profiles are presented in section III.Applications of the method on a set of chemically diverse diatomic molecules are presented in section IV.We describe the atomic calculations used to build the minimal NAO basis set in section IV A. The molecular calculations are discussed in section IV B. Section IV C delves into the questions of non-orthogonality and exactness of the NAO basis set with atomic calculations in the diatomic numerical basis.The goodness of STOs vs GTOs is compared in section IV D. Deficiencies in the minimal NAO basis are analyzed in section IV E. The article concludes in a brief summary and discussion in section V.

II. METHOD A. Radial Functions
As recently reviewed in ref. 1, various radial functions can be used in the AOs of eq. ( 2).Gaussian-type orbitals (GTOs) have a radial part defined by GTOs are the pre-eminently employed basis set in quantum chemistry.Although eq. ( 3) shows the primitive form, GTOs are typically used in contracted form. 3,4owever, the analysis of the projection onto the primitive GTOs of eq. ( 3) offers a good starting point for the construction of contracted GTO basis sets, as well.Slater-type orbitals (STOs) whose radial part is given by are a less commonly used option, as molecular integrals are more difficult to evaluate in this basis set.2][53][54] But, whether this actually holds in general systems is debatable: as the asymptotic behavior far from the nucleus depends on the energy of the highest occupied orbital which is system dependent, it is not obvious that a basis set that has the right asymptotic form for fixed values of ζ yields more accurate results than a GTO basis set, for instance, because the asymptotic decay of the STO basis functions will not match that of a general polyatomic system.We note that numerical atomic orbitals (NAOs) are yet another option; see refs. 1 and 55 for reviews.NAOs are extremely powerful in principle, as they can afford the exact solution to the non-interacting atom: not only do NAOs have the right asymptotic behavior close to the nucleus and far away from it like STOs, NAOs are also exact everywhere in-between in the case of the non-interacting atom.Although the methods discussed herein are also applicable to NAOs, the complication of NAOs is that the form of R nl (r) is not restricted to a simple analytic form with a single adjustable parameter like α in eq. ( 3) or ζ in eq. ( 4); thus, for simplicity, we will not discuss NAOs and will explicitly focus on GTOs and STOs in this work.Indeed, many NAO codes such as GPAW 56 and FHI-aims 57 employ GTOs or STOs as polarization functions.
We also note the same drawback for NAOs as for GTOs and STOs, in that NAOs are not exact for polyatomic systems.However, their flexibility means that NAO basis sets can be more accurate than GTO or STO basis sets. 42,58,59

B. Completeness Profile
The completeness profile 48 is a way to visualize the completeness of AO basis sets, which can be quantified by the goodness of satisfaction of the of the resolution of the identity which is inherent in the LCAO expansion of eq. ( 1).The overlap matrix, whose inverse is employed in eq. ( 5), has elements Studying how well the basis set can represent a normalized (⟨α | α⟩ = 1) primitive test function |α⟩ parametrized by α, which is typically an exponent, one obtains the completeness profile 1][62][63][64][65][66][67][68][69][70][71] A two-electron completeness profile for the assesment of suitability for electron correlation effects has also been suggested. 72

C. Importance Profile
The completeness profile of eq. ( 6) can be straightforwardly applied to calculations with real-space basis sets with eqs.( 5) and ( 6) as a way to visualize the flexibility of the real-space basis set.A flexible real-space basis set is able to represent AO basis functions for a wide range of exponents α, which is demonstrated by Y (α) ≈ 1.
Alternative metrics can also be fashioned.A projection of the test function onto the occupied orbitals computed at the CBS limit in the real-space basis yields where the sum runs over the occupied orbitals i.As the metric in eq. ( 7) measures the weight of the test function α in the electronic structure, we will call I 0 (α) the freeatom importance profile.Like the completeness profile, this importance profile satisfies 0 ≤ I 0 (α) ≤ 1.
The importance profile has an important connection to the maximal overlap method.Inserting the resolution of the identity in the AO basis, eq. ( 5), into the occupiedspace projection we obtain the occupied-orbital AO projection which is the quantity that is maximized in the maximum overlap method of Richardson et al. 24,25 , also used by a variety of other authors in the literature (see Introduction for discussion), by optimizing the parameters in the AO basis set.
It is easy to see that in the case of an orthonormal AO basis set, S µν = δ µν , the sum of the basis functions' importances µ I 0 (α µ ) equals the overlap P of the occupied orbitals and the basis functions of eq. ( 9).I 0 (α) therefore carries information on the overlap of the basis function with parameter α, and can be used to inspect basis function requirements in atomic systems.Basis functions parametrized by α that have large I 0 (α) should likely be included in the maximal overlap AO basis due to the connection of eqs.( 9) and (10).
When applied to polyatomic systems, the importance profile given by I 0 also carries information on the nonorthogonality of atomic basis functions on different centers, which is the major headache in the design and development of atomic basis sets.A complete basis set can in principle be spanned by functions centered on a single atom, and this is the physical interpretation of I 0 (α).Although I 0 (α) can still be used to illustrate the nonorthonormality of polyatomic AO basis sets, it does not afford good chemical insight.
In order to isolate the effects of chemistry-the aim being to dig out the changes in the electronic structure from the free atoms-we therefore need to build in the baseline of non-interacting atoms.This is easily achieved by formulating an analogue of I 0 (α) computed in the presence of a minimal NAO basis on each atom in the system.][75][76][77][78][79] A straightforward generalization of eq. ( 7) is afforded by the difference in overlap between that afforded by a minimal NAO basis padded with the function |α⟩ and that of the baseline of simply the minimal NAO basis on the atoms.The importance profile, which measures the importance of the test function |α⟩ in the presence of a minimal NAO basis, can therefore be computed as where P is the occupied-orbital AO projection of eq. ( 9).Note that in the absence of a minimal basis, eq. ( 10) reduces to eq. ( 7).We note that minimal NAO basis sets are commonly used with GTO or STO polarization functions e.g. in the FHI-aims program. 57

III. IMPLEMENTATION
We have recently described finite element implementations for all-electron Hartree-Fock and Kohn-Sham density functional theory 40,41 for atoms 47,[80][81][82][83] as well as diatomic molecules 84 in the HelFEM program. 85The projection code supports both GTOs and STOs, and either can be used to probe the completeness of (i) the finite element basis set or (ii) the occupied orbital space.
Computing the importance profile of eq. ( 7) requires the calculation of integrals by quadrature.The volume element in the prolate spheroidal coordinate system is given by 84 The integral over ϕ in the AO projection ⟨ψ i | α⟩ of eq. ( 7), can now be done analytically, as can be seen from eq. ( 12).This integral yields The integrals over the µ and ν dimensions, in turn, are evaluated by quadrature with the methodology discussed in ref. 84.A similar technique was also used to implement the superposition of atomic potentials initial guess described in ref. 99 in HelFEM.
Importance profiles are computed for both nuclei in the system.The necessary relations between the (µ, ν, ϕ) coordinates of the fully numerical calculation and the (r, θ, ϕ) coordinates needed to evaluate the AOs at the two nuclei A and B at (0, 0, −R h ) and (0, 0, R h ) are 100 where R h = R/2 is one half of the bond length R, and the upper and lower signs are chosen for A and B, respectively.
We also looked into mid-bond projections, for which However, as the completeness profiles for the mid-bond projections suggested that the employed numerical basis set is not sufficiently complete to afford a thorough analysis on the importance of mid-bond functions, we do not discuss mid-bond projections in this work.

IV. RESULTS
We study a set of chemically diverse diatomic molecules from the database of Weigend and Ahlrichs 101 .Weigend and Ahlrichs 101 employed their database, which also contains larger molecules, to test the TURBOMOLE default basis sets.The diatomics for the first three rows suffice for the present study; the studied systems and the computed CBS limit energies are shown in table I. To assess the role of the non-orthogonality of atomic orbitals in polyatomic molecules as well as the exactness of the NAO basis, the list of systems studied also includes the H atom at the H 2 geometry, the Li atom at the LiH geometry, and the F atom at the HF geometry.
The numerical basis sets were determined with the proxy method of ref. 84 with the threshold ϵ = 10 −10 , which have been shown to lead to µE h level precision total Hartree-Fock energies in ref. 84.All calculations employed the Perdew-Burke-Ernzerhof (PBE) generalizedgradient approximation (GGA) functional 102,103 as implemented in the Libxc library of density functional approximations. 104The calculations were started from a superposition of atomic potentials (SAP) 99 which we have found to offer a reasonable and easy to implement starting point for fully numerical calculations.

A. Atomic calculations
We begin with the analysis of the atomic calculations, which generate the minimal basis for the molecular calculations.The NAOs were determined with five radial elements of 15-node Lagrange interpolating polynomials for the PBE ground state configurations given in table II.The atomic calculations used a practical infinity r ∞ = 40a 0 , which is sufficient to guarantee convergence of the total energy to sub-µE h precision to the free-atom limit.Note that typical applications of NAOs in the literature employ confinement potentials to enhance the locality of the atomic basis, 56,57,86-98 which has not been in this work as we are interested in reproducing the freeatom limit.The completeness profile of this universal atomic basis is shown in fig. 1, demonstrating that the finite element basis can reliably describe also GTOs and STOs with various exponents.
The importance profile for GTOs and STOs in case of the Br atom are shown in fig. 2. Analogous plots for the other atoms are included in the Supporting Information.In addition to the PBE functional used for this calculation, we also ran calculations for the atomic configurations of table II using the Perdew-Wang (PW92) local density approximation (LDA) [105][106][107] and the Tao-Perdew-Staroverov-Scuseria 108,109 (TPSS) meta-GGA functional.The resulting importance profiles, which are shown in the Supporting Information, were found to be similar to those obtained with the PBE functional.
The atomic importance plots could be used to determine GTO and STO basis sets that are suitable for describing the non-interacting atom.The comparison of figs.2a and 2b reveals that individual STOs do have a higher overlap with the exact numerical wave function for the spin-restricted atom than that of individual GTOs, suggesting that they indeed are a better basis for electronic structure calculations.But, as we will see in the next section, few differences can be seen in the goodness of STOs vs GTOs for capturing polarization effects.
What is also noteworthy here is that the importance profiles are wider in the GTO basis than in the STO basis.This might suggest that it would be easier to span the AO basis set in GTOs than in STOs, since the STO importance profile is so much more peaked.However, a fair assessment also requires taking into account the non-orthogonality of the STO and GTO basis functions.Taking the one-center case for simplicity, the overlap of   two s-type STOs is while that of s-type GTOs is This function is shown in fig. 3  with different exponents decays much more quickly than that of GTOs, which also explains why the STO importance profiles are more peaked.This also means that a one-center expansion in terms of STOs can use exponents that are more tightly spaced than a corresponding Gaussian-basis one.Note that since GTOs and STOs are not exact for the non-interacting atom, these basis sets are more prone to basis set superposition errors than NAO basis sets: when the atomic basis is not exact, the description of the orbitals on the atom can be improved by "borrowing" the basis functions on other nuclei. 110,111However, as we will demonstrate later in this work, typical NAO basis sets are also not exact for non-interacting atoms, because the NAO basis sets are derived from spin-restricted calculations.

B. Molecular calculations
Having discussed the results of the free-atom importance profiles, we can move on to discussing the importance profiles of the polyatomic calculations where the minimal NAO basis is explicitly included in the calculation of the importance profile.
No atomic basis is exact in polyatomic calculations, because atomic symmetry is lifted in a polyatomic environment.Depending on the system, an arbitrarily large number of higher and higher polarization shells may be necessary.For example, reproducing the PBE atomization energy of SF 6 to 0.1 kcal/mol precision requires three polarization shells, that is, up to g functions. 5Stateof-the-art GTO basis sets are optimized such that the error made for free atoms is similar to the error made in neglecting higher polarization shells in polyatomic calculations. 14,112The purpose of this section is to analyze polarization effects.
Due to the large amount of computed importance profiles (618), we will only discuss the results in detail for the hydrogen atoms in CuH, H 2 , HBr, HCl, HF, KH, LiH, and NaH.The full set of molecular importance profiles can be found in the supporting information.We begin the analysis with GTO projections, as this type of basis set is most commonly used in quantum chemistry; these importance profiles are shown in fig. 4.
Starting from the top row of fig. 4, it is clear that there is a great deal of similarity between the chemistry of H in HF, HCl, and HBr.We observe that the σ orbitals have a large projection onto s functions on the hydrogen atom with a wide range of exponents, indicating that the hydrogen atom requires breathing functions due to losing electrons to the halogen atom.The importance profile for s functions experiences a sharp dip around α ≈ 0.28, possibly since the scanning function with such an exponent has a large overlap with the NAO minimal basis.
The importance profiles for higher-angular-momentum functions for H in HCl and for H in HBr are especially similar, and show that the importance of the polarization shells decreases with increasing angular momentum, as expected.H in HF, in contrast, shows somewhat unexpected behavior, the shells arising in decreasing impor-tance as s, f , d, g, p, h, and so on.
H in H 2 shows a similar spectrum for breathing functions as observed for the halide hydrides above.For the case of H 2 , both the breathing and polarization curves show a double peaked structure, as the importance curves for all angular momenta again exhibit a clear drop at a critical value of the exponent α, which can again be tentatively understood by a maximal overlap with the NAO basis: when the test function can be accurately expanded in the minimal NAO basis, including it yields little additional flexibility to the wave function.For H 2 , the importance of the polarization shells decreases with increasing angular momentum, as expected.
The hydrogen atoms in LiH, in NaH, and in KH likewise show great similarities in the form of the s and p function importances.For the first two molecules, the s breathing function on hydrogen has smaller importance than including a p function on hydrogen, while in the lattermost the order of the s and p functions is switched.There appears to be no systematic information for the higher polarization shells in the plots, but this can tentatively be explained by the non-orthogonality of the atomic orbital basis functions and/or the small minimal basis used in this work for the alkali atoms: while the minimal basis used in this work is 2s for Li, 3s1p for Na, and 4s2p for K, most minimal basis sets in quantum chemistry include a further p function on alkali atoms to account for the low-lying np excited state which is accessible to chemical bonding.The lack of these functions in the minimal basis of the alkali atom can show up as larger importance for higher polarization functions on hydrogen in these molecules, and could explain the apparent lack of systematic behavior in the series LiH→NaH→KH.

C. Spin-polarized atomic calculations in the diatomic symmetry
To assess the importance of the role of the minimal basis and the non-orthogonality of the atomic orbital basis functions, we performed calculations on the hydrogen atom with the diatomic numerical basis set, in analogy to the counterpoise correction of Jansen and Ros 110 and Boys and Bernardi 111 .This analysis leads to the results shown in fig. 5.
We begin with the calculation of the hydrogen atom in the H 2 geometry.This gives us the importance profiles shown in figs.5a and 5b for the atom itself, and the ghost atom in the molecule.Since the exact ground state of H is well-known to be a 1s function, fig.5a shows that the NAO does not give the exact solution: after all, this calculation is spin-polarized, while the NAO was determined for a spin-restricted calculation.Comparing figs.5a and 5b shows that the importance for the diffuse functions with small α are similar, regardless of which atom the functions are placed, while major differences between the two centers are observed for tight functions with large α.Diffuse functions are indeed infa- mous for leading to large interatomic overlaps and causing ill-conditioning in AO basis sets.However, such ill conditioning can nowadays be stably and accurately handled with the help of pivoted Cholesky decompositions to ensure stable electronic structure calculations even in the presence of pathological linear dependencies. 113,114he polarization functions appear in fig.5b as a consequence of the inexactness of the NAO basis on hydrogen discussed above, and the ghost atom being off-center from the system.Because the minimal NAO basis was determined with spin-restricted orbitals, it is unable to describe the spin-polarized Li atom.A similar story also applies to fig.5d; however, the case of fluorine is more complicated: the hole on the 2p shell leads to symmetry breaking, and one actually needs to include many polarization shells to reproduce the energy of the fully numerical calculation in an atomic basis set calculation.

D. Comparison of STO and GTO projections
We now move on to the analysis of STO projections.Plots analogous to fig. 4  However, the logarithmic y axis scale can be misleading, and we continued with an in-depth numerical analysis of the full database of results, which contains 2432 GTO and STO importance profiles: profiles for l ∈ [0, . . ., 7] for the occupied σ, π, and δ orbitals of the two centers in the 47 systems in table I.
The largest difference in the database was found to arise for the d orbitals on Sc in ScO.This orbital has clear atomic character, as indicated by the GTO and STO plots in fig. 7. A d orbital on Sc was not included in the minimal basis listed in table II, because occupying the d orbital raises the spin-restricted PBE total energy by an s → d excitation energy of 1.54 eV; however, the orbital clearly takes part in chemical bonding.
Analyzing the maximal value of the importance profile for each system, center, and angular momentum, the remaining differences between GTOs and STOs were found to be small, ranging between 5.4 × 10 −3 in favor of GTOs for a p polarization function on Ni for describing the σ orbitals in NiS, to 6.67 × 10 −3 in favor of STOs for a p function on Be to describe the π orbital polarization in BeS.Overall, GTOs appear to win in this comparison, because the sum of the maximal values of the GTO profiles is greater than that of the STO profiles.However, these results do not constitute definitive proof, for which variational optimization of the maximal overlap wave functions-which is left to future work-is required.

E. Minimal basis analysis
To continue the discussion on ScO above, we now ask if there are other cases where the minimal NAO basis fails badly, and a more educated choice should be made.The number of electrons that cannot be described in the minimal NAO basis was given in table I.The analysis of these data show that for most systems, the NAO minimal basis is quite accurate: in all but 7 systems, the alpha and beta electron manifolds can be described by the minimal basis to less than 0.1 electrons.The outliers are MgF (∆N α = 0.171), BeS (∆N α = 0.233), ScO (∆N α = 0.526), MnO (∆N α = 0.878), FeO (∆N α = 0.675), MnS (∆N α = 0.632), NiO (∆N α = 0.347), and NiS (∆N α = 0.256).Examining the importance profiles for these systems leads to the following conclusions.MgF requires an additional p function on Mg and a s function on F to describe the polarization of the σ orbitals and the strong movement of charge from Mg to F. BeS requires a p function on Be to describe the polarization of the π orbitals and a d function on S. A d function with atomic character is missing on Sc, as was already discussed in section IV D.
The discussions for FeO, MnS, NiO, and NiS are similar to the case of MnO.MnO badly needs further s functions on Mn to describe the change in its atomic size due to its losing electrons to oxygen.Adding an s and a p function on Sc appears to have around the same importance as further s and p functions on oxygen.Next, one should add a d function on oxygen, followed by another d function on Sc and a f function on O.

V. SUMMARY AND DISCUSSION
We have suggested the importance profile to extract information on atomic basis set requirements from fully numerical calculations.First, we discussed the free-atom importance profile, which is obtained as a projection of the wave function at the complete basis set (CBS) limit onto individual atomic orbital (AO) basis functions.Next, as calculations on polyatomic systems typically include at least a minimal basis set on each atom, we generalized the importance profile for this case as the difference in overlap onto the CBS limit wave function obtained with the minimal basis padded by the studied test function, and the overlap obtained with the minimal basis, in the aim to isolate the effects of breathing and polarization functions in polyatomic systems.
Employing a minimal numerical atomic orbital (NAO) basis set on each atom, we computed importance profiles for a variety of atoms in a large database of chemically diverse diatomic molecules.Due to the significant amount of data, importance profiles were only discussed for hydrogen, while the full set of data is available in the supporting information.Although only isotropic Gaussiantype orbitals (GTOs) and Slater-type orbitals (STOs) were considered in this work, the approach is independent of the form of the AO basis, and could also be applied to other types of AO basis sets.
The above definition of the importance profile inherently depends on the chosen minimal basis.Following established practice in the NAO literature, 56,57,[86][87][88][89][90][91][92][93][94][95][96][97][98] the minimal NAO basis was derived from spin-restricted atomic calculations, and the configuration leading to the lowest total energy was used for each atom.However, the definition of the minimal basis is not always obvious.
For instance, as discussed in section IV B, many GTO basis sets include an p function on alkali and alkaline metals, as they are needed to describe the low-lying np atomic excited state.Similarly, in this study, we noticed that in the case of ScO, a d function should clearly be included in the minimal basis of Sc, although it is an excited state according to the spin-restricted atomic calculations.Further cases were analyzed in section IV E, where it became clear that many transition metals need breathing functions to describe the strong changes in their electronic configuration in chemical bonding.
The spin-restricted NAO basis is not exact for noninteracting atoms, as was discussed for the cases of H, Li, and F in section IV C. The danger of such inexactness is that it invites basis set superposition errors: if the minimal NAO basis is not exact, the atom's description can be improved by lending basis functions from other nuclei, leading to exaggerated binding energies. 110,111mportance profiles can also be computed with respect to a larger "minimal" atomic basis: employing a NAO basis that is sufficiently flexible to also describe the spin-polarized atom and its low-lying excited states would lower the estimated importances of further functions on the atoms.Because breathing functions are essential for describing chemistry, a double numerical basis which is also able to describe atomic cations exactly 115 could also be employed.We especially believe that stateaveraging [116][117][118] is an underexplored avenue in the development of NAO basis sets, 1 and hope to follow up with such work in the future.
The main motivation of this work was the development of new AO basis sets starting from fully numerical methods.We believe that systematic databases senting different chemistries are entered in the database.An apples-to-apples comparison of GTO, STO, and NAO basis sets would be especially interesting: how do these three families compare in the maximal projection for a fixed database of fully numerical reference wave functions?We hope to follow up this study in that direction in the future.

Figure 1 :
Figure 1: Completeness profile for the atomic radial finite element basis, probed by a GTO or an STO.

Figure 2 :
Figure 2: Importance profiles for the atomic minimal basis radial functions of the Br atom, probed by a GTO or an STO.

Figure 3 :
Figure 3: One-center overlap of s-type GTO and STO functions.

Figure 4 :
Figure 4: Various molecular σ orbitals' GTO projections on the hydrogen atom.Note logarithmic scale on both axes.

Figure 5 :
Figure 5: Effects of the non-orthogonality of atomic orbitals on various centers on the importance profile.Note logarithmic scale on both axes.

Figure 5
Figure 5 also shows two more calculations for ghost hydrogen atoms: for Li computed with the numerical basis for LiH, and for F computed with the numerical basis for HF, which are shown in figs.5c and 5d, respectively.Like fig.5b, fig.5c a measure goodness of the minimal NAO basis for Li.Because the minimal NAO basis was determined with spin-restricted orbitals, it is unable to describe the spin-polarized Li atom.A similar story also applies to fig.5d; however, the case of fluorine is more complicated: the hole on the 2p shell leads to symmetry breaking, and one actually needs to include many polarization shells to reproduce the energy of the fully numerical calculation in an atomic basis set calculation.
are shown in fig.6. One-by-one visual comparisons of figs.4a and 6a, figs.4b and 6b, figs.4c and 6c, figs.4d and 6d, figs.4e and 6e, figs.4f and 6f, figs.4g and 6g, and figs.4h and 6h reveals remarkably few differences, other than the smaller scale of the x axis in the STO plots, which was discussed in section IV A in the context of atomic calculations.

Figure 7 :
Figure 7: GTO and STO projections for the σ and π orbitals onto the d orbital of the Sc atom in ScO.

Table I :
Systems included in the present study, including the employed spin multiplicity M , bond length R, as well as the PBE total energy E for the wave functions determined in C ∞h (heteroatomics) or D ∞h (homoatomics) symmetry that were used in the analysis.The last two columns show the number of alpha and beta electrons, ∆N α and ∆N : the overlap of two STOs β , which are not described by the minimal NAO basis.