Transforming characterization data into information in the case of perovskite solar cells

In many emerging solar cell technologies, it is a significant challenge to extract the electronic properties of materials and interfaces inside a working device from experimental data. In many cases, approaches frequently used in mature technologies such as crystalline silicon are inapplicable as they require many material parameters to be known a-priori, which is rarely the case for novel materials. Based on this challenge for material and device characterization, this perspective discusses the different strategies for data interpretation that have been developed or are in the process of being developed for the specific case of halide perovskite solar cells. The specific focus of this work is to discriminate between experimental data and strategies to extract useful information from data. This information can then be used to make informed decisions about strategies for process and material innovations.


Introduction
Halide perovskites are a material class that shows promising performance [1][2][3][4][5][6] if employed in photovoltaic or optoelectronic devices. Peculiar features of the material class are the long charge carrier lifetimes [7][8][9] and consequently high photoluminescence yields [10,11] that can be measured in polycrystalline thin films that were processed from solution using relatively cheap and simple techniques. The material class also has substantial downsides such as poor stability especially under exposure to heat, oxygen and water [12]. Furthermore, the material has the best performance if the toxic element lead is used as a B-site cation in the ABX 3 -type perovskite lattice [3]. Thus, while the material class is promising for various applications, research and development efforts to further improve performance at higher stability and/or lower toxicity are needed. Furthermore, for applications where different band gaps are needed such as for multijunction solar cells [13,14] or light emitting diodes with different wavelengths, a wide variety of different stoichiometries within the material class have to be developed to some technological maturity. This task of transforming laboratory results into a viable technology requires a highly multidisciplinary approach that combines physics, chemistry, material science and various engineering disciplines. Furthermore, the community needs to acquire a thorough understanding of how the properties of materials and interfaces affect the functionality of the finished device. Thus, characterization of materials and devices must be performed and analyzed in a way that it supports decision making within an optimization process. In halide perovskite solar cells, the key efficiency limiting processes take place at interfaces [15][16][17] between the perovskite absorber and adjacent layers that are often called electron and hole transport layers. These interfaces improve the selectivity of the contacts by allowing only carriers with a certain polarity to pass on towards the contacts while keeping recombination of the other polarity of carriers as low as possible. Furthermore, the charge transport layers have to be sufficiently conductive [18] to not cause a (typically non-linear) series resistance that often reduces both the fill factor and the short-circuit current of the devices.
Thus, the most important discipline of sample characterization in perovskite photovoltaics is characterizations of layer stacks or devices that include interfaces identical to the ones that are used in the final solar cell [8,16,19]. However, this requirement often causes an unwanted complication, namely that every conceivable experiment done on devices or layer stacks will depend on a large number of parameters of the different layers and their interfaces [20]. In nearly every situation, the experimental data obtained from a given experiment will not by itself be able to uniquely identify any individual parameter without doing other experiments in addition or making assumptions about values of certain parameters based on previous knowledge. This is a situation that is very different from e.g. crystalline silicon solar cells. Let us take for instance the problem of determining a charge carrier lifetime. In silicon photovoltaics this is often done using a method called QSSPC [21,22] that will give lifetimes as a function of carrier density by performing a single experiment of tens of milliseconds duration. In order to obtain lifetimes and analyze data, many parameters have to be known such as electron or hole lifetimes, doping densities and consequently the resistances of the wafer as well as Auger coefficients to disentangle unavoidable Auger recombination from avoidable surface and bulk recombination via defects. All these parameters are however known [22,23] and tabulated for monocrystalline silicon, which substantially simplifies the analysis. However, even then it can be difficult to disentangle bulk from surface recombination [24]. Thus, the characterization of e.g. passivation layers on silicon wafers highly benefits from the fact that most of the relevant material parameters of silicon are well known. Thus, a single measurement can provide fairly direct information, e.g. a relevant figure of merit for the quality of the surface passivation. If we attempt to determine the lifetime of a perovskite-ETL bilayer system, the situation would be substantially more complicated [25,26]. Lead-halide perovskites have extremely low doping densities [27,28], which implies that both electrons and holes matter for recombination which is different to the situation in doped silicon wafers. Furthermore, the electrostatic potential is not fixed by the doping but changes during the transient due to the movement of electrons, holes and often also ions [29][30][31][32]. Thus, the question arises how to analyze data originating from one or several experiments done on layers, layer stacks and devices with the aim of determining key electronic parameters such as mobilities, lifetimes, interface recombination velocities and band alignments. In the following, we will discuss the different options that are currently available and discuss possible opportunities for future development.

Introducing the typical experiments
In the field of halide perovskites, a large number of experimental techniques are applied to the fabricated samples. Typically, the workflow on new materials starts with structural characterization of the materials, which may involve methods such as x-ray diffraction, x-ray photoelectron spectroscopy, scanning electron microscopy and others that look into grain sizes, strain, chemical composition, and crystal structure. This is then often followed by optical techniques that study the optical properties, i.e. essentially the complex refractive index of the material. Such techniques are the classical transmission-reflection method, ellipsometry and photothermal deflection spectroscopy [33]. Finally, there are techniques that measure electronic properties such as mobilities, recombination coefficients and lifetimes, properties of the interfaces, doping and defect densities and eventually the device performance. This perspective deals primarily with this latter type of techniques that aim to determine those parameters that directly affect the performance of the device. The experimental techniques used for this purpose are based on either electrical and/or optical excitation combined with electrical and/or optical detection of signals. This can then be done in steady state, as a function of time after e.g. an electrical or optical pulse and finally as a function of frequency of a periodic optical or electrical excitation. Figure 1 gives an overview over the different techniques that exist. The methods summarized in figure 1 do not have a well-defined and well-established name within the scientific community dealing with perovskite solar cells. For the purpose of this perspective, we will refer to this set of techniques as optoelectronic characterization techniques noting that the set also encompasses purely electrical techniques (e.g. impedance) and purely optical ones (e.g. photoluminescence). The common denominator is the aim to determine electronic parameters relevant for the optoelectronic or more specifically photovoltaic functionality of the final device.

Mathematical description of the physics of optoelectronic characterization
The mathematical description of optoelectronic characterization methods is based on the solution of three coupled differential equations [34] namely the Poisson equation  (2) and the continuity equation for holes dp dt Here, x is the spatial coordinate, t the time, φ is the electrostatic potential, ρ is the space charge, ε the permittivity, R is the recombination rate, G ext is the generation rate of electron hole pairs due to external illumination while G int is the internal generation rate due to the absorption of photons generated by radiative recombination within the device itself (i.e. due to photon recycling). The two last terms in equations (2) and (3) originate from the diffusion and drift currents for electrons and holes, whereby D n,p = µ n,p kT/q is the diffusion coefficient, kT/q is the thermal voltage and F is the electric field.
In addition, depending on the time constants and frequencies of the measurement, the influence of moving ionic species may have to be considered [32] which implies that one or two additional transport equations are needed. Equations (1)-(3) can be solved numerically and a variety of software tools are available to the research community that vary in their capabilities and cost. Frequently used free software includes for instance SCAPS [35], driftfusion [36], SIMsalabim [37], IonMonger [38] or OghmaNano [39,40], whereas commercial software includes tools such as Setfos [41], ASA [42] or Sentaurus TCAD.
In order to characterize perovskite solar cells, key parameters that are contained in equations (1)-(3) have to be measured. These parameters include the recombination rate R and its dependence on the carrier concentration, the mobility µ as it controls transport and all terms that contribute to the space charge such as the density of dopants, charged defects, ions, free carriers and the built-in voltage [18] which controls the difference of the electrostatic potential over the absorber and transport layers.
The challenge for every approach to data analysis is the fact that the solution of three coupled partial differential equations in time and space requires numerical simulation methods. The forward problem, i.e. going from the material and interface parameters to the outcome of an experiment (e.g. current-voltage curve) is typically simple and can be done by a variety of simulation tools. The inverse problem of going from the experimental data to the material parameters is challenging as it would involve doing many forward simulations until a suitable parameter set is found to represent the data.  3) with appropriate boundary conditions. The inverse problem is to determine the material and interface parameters consistent with one or several experiments. As the equations cannot be easily inverted, a highly multidimensional parameter space has to be sampled to arrive at a result e.g. by fitting or Bayesian inference. The inverse problem is often time consuming and difficult to solve due to the multitude of unknown parameters.

Analytical approximations
One strategy to overcome the difficulties involved in the inverse problem as shown in figure 2 is to simplify the problem. In the field of doped semiconductors, the typical approach to simplification is that for most measurement techniques only minority carriers matter. This means only the continuity equation of one carrier has to be solved [43][44][45][46]. The second simplification is that the electrostatic potential mostly depends on the fixed charge of the dopants and hence also the Poisson equation can often be neglected or approximated. Once, there is only one differential equation left there are often analytical solutions available that exist even in time-and frequency domain situations and thereby allow the use of analytical equations to analyze data.
In halide perovskites, the doping density is typically too low for this approach to be applicable. In contrast, the existence of analytical equations from the field of doped semiconductors might tempt researchers to apply these equations also to halide perovskites. This is regularly leading to a substantial amount of misinterpreted data in the literature. A typical example is the application of the concepts of the Mott-Schottky analysis for doped semiconductors on halide perovskites or other relatively intrinsic semiconductors [28,47].
Even if analytical equations are designed for intrinsic semiconductors or even insulators, their applicability to halide perovskites can be limited and has to be critically examined. An important example for this class of approaches is the field of single carrier devices. Here, the determination of trap densities using the so-called trap-filled limit [48,49] in single carrier devices is extremely popular in the community although it can rarely if ever been used to correctly determine defect densities in thin films [50]. This is due to the competition of electrode charge with volume charge of defects in thin films of thickness d which leads to a detection threshold for volume charge that depends on the ratio ε/d 2 [9,47,51]. As shown in figure 3, this threshold is hardly ever overcome in halide perovskite thin film devices which implies that the measurement as it is currently done in the community is mostly measuring the thickness and permittivity of samples but not the defect density [9]. Thus, a new approach to analytical equations is required that considers the specific properties of halide perovskites rather than attempting to apply equations derived for e.g. p-n junction solar cells. There are various concepts from the world of doped semiconductors that cannot easily be applied to perovskite solar cells. These include but are not limited to (i) the depletion approximation required for many capacitance-based analysis techniques [28,47,52], (ii) the superposition principle for the current-voltage curve [53][54][55], (iii) the theory of charge collection in the neutral zone of a p-n junction based on the diffusion length [43,56], (iv) the concept of a constant charge carrier lifetime [8,57].
However, halide perovskites also possess several quite specific properties that are helpful to develop new analytical equations. The first property is the ionic screening of the electric field in the absorber. As shown in many experiments [30,32,53,58,59], the electric field in the perovskite absorber layer is typically fairly low or non-existent despite the fact that a built-in voltage has to exist to allow device functionality at significant forward bias [18]. This situation is similar to a doped semiconductor only that the perovskite layer is in high level injection, i.e. there is a very low doping density and the free carrier density of both electrons and holes is much higher than the doping density [27]. The second property is that the mobility of halide perovskites can be considered reasonably high relative to the low film thicknesses and relative to many of the used contact layers [60]. This is true despite the fact that the mobility cannot considered to be high if compared to other inorganic solar cell materials [61,62]. Thus, the main source of resistive effects are most likely the electron and hole transport layers as well as then the lateral transport of the electrons through the transparent conductive oxides used. The final property is that the permittivity of the perovskite is high as compared to many but not all used charge transport layers. Especially, the organic charge transport layers used in pin-type device stacks are undoped and have a factor 10 lower permittivity than the perovskite, which implies that the drop of the built-in voltage most likely happens across the charge transport layers [18,53,63]. This amplifies the effect of charge screening by mobile ions.
A promising approach for analytical approximations is therefore to neglect the differences in carrier concentrations within the perovskite but to include gradients in electric field and Fermi level in the contact layers [18]. This approach can be used for instance to determine an equation for the loss in short circuit current as a function of the transport through the contact layers and the charge carrier lifetime τ R in the perovskite layer as [64,65] Here, q is the elementary charge, d the thickness, G the average generation rate, n 0 the equilibrium carrier concentration, and V ext the external voltage. A peculiar aspect of equation (4) is that it implies that the recombination loss at short circuit does not depend on the diffusion length but on the length given by the product of charge carrier lifetime τ R and extraction velocity S exc . The key reason why this result is different from that of e.g. a crystalline silicon solar cell is that the boundary condition at the edge of the absorber layer is very different. In crystalline silicon solar cells as well as in most textbooks on p-n junction photovoltaics, the boundary condition is often assumed to be dn = 0 [43, 66-69], i.e. infinitely fast extraction of minority If one further assumes that the perovskite cell is in high level injection, i.e. n = p at all relevant times and frequencies, the situation can be explained using relatively simple differential equations. One example is the simulation of transient photoluminescence experiments. If we assume a deep trap and no significant detrapping, we can use [70] in a film and (6) in a finished solar cell. Here, τ eff SRH is an effective Shockley-Read-Hall (SRH) lifetime of the film (including its surfaces), k rad is the radiative recombination coefficient, n Q = 2C area kT / ( q 2 d ) , and C area is the electrode capacitance per area of the solar cell. Figure 4(a) illustrates the relation between the decay time τ film and the Fermi level splitting , whereby n i is the intrinsic carrier concentration while n and p are the electron and hole concentration. At low Fermi level splittings (i.e. low carrier densities), the decay time saturates to τ film = τ eff SRH , while at high Fermi level splittings the decay time becomes a function of Fermi level splitting (τ film = 1/k rad n = n i exp (−∆E F /2kT) /k rad ). Figure 4(b) shows the situation in a complete solar cell as described by equation (6). Here, the decay time does not saturate at low values of n. Instead, the term n Q /n in the numerator of equation (6) predicts an increasing decay time for low values of n. This can lead to very long decay times in cells that have nothing to do with recombination lifetimes as is normally the case for decay times in films. Instead, the decay times are due to a slow discharge of the electrode charge by reinjection into the perovskite and subsequent recombination [8,71]. Therefore, the decay times in cells normally approach τ cell = τ eff SRH n Q / n and therefore increase exponentially towards lower values of n or ∆E F . Figure 4(c) shows experimental data on a perovskite film (grey) and solar cell (red) showing that the film behaves approximately like it is expected for radiative recombination (τ film = 1/k rad n = n i exp (−∆E F /2kT) /k rad ), while the cell has an intermediate plateau at around 2 µs followed at low ∆E F by a decay controlled by capacitive discharge (τ cell = τ eff SRH n Q / n). As can be seen in figure 4(c), the experimental data can be well approximated by the analytical equations given by equations (5) and (6) which allows a relatively facile interpretation of data within the framework of an analytical model. The alternative of using numerical simulations to interpret transient data on complete solar cells would involve a huge number of variables and would be computationally expensive. The experimental data are shown as symbols, while the simulation data are shown as lines. The simulations were adapted to the experiment by modifying a range of input parameters until good agreement of the data was achieved. Panel (b) shows the associated decay times. Here, the good agreement between the symbols and the lines shows that the fits reproduce both the decay and its logarithmic derivative (decay time) well. Simulations were performed with the software Sentaurus TCAD. TOPO refers to tri-octylphosphine, a molecule to passivate surfaces of perovskite films [7]. Reproduced from [8]. CC BY 4.0.

Fitting
Fitting can be done to solve the inverse problem shown in figure 2. Fitting involves taking an analytical or numerical model and then changing its parameters until the result of the model agrees reasonably well with the experimental data. Fitting often requires a large number of parameter variations until the model agrees with experimental data. Thus, it is beneficial to apply fitting to a simple model, such as an analytical model or a relatively simple numerical model. It is also highly beneficial to reduce the number of parameters before fitting because in a high-dimensional parameter space, it may be difficult to find a solution with reasonable computational effort. Furthermore, even if a solution is found, it is unclear whether this is the best solution and how many similarly good solutions might exist. Thus, fitting can often only verify that there is a solution within a given model that provides good visual agreement with the experimental data and that the parameter set that is used for the best fit is a possible hypothesis for the material parameters and interface properties of the sample under investigation. Despite its shortcomings, fitting has been successfully used to describe complex experiments, such as transient photoluminescence on full devices, quite accurately. Figure 5 shows an example of the application of fitting to the transient photoluminescence data (same sample series as figure 4). Here, the symbols represent experimental data, whereby panel (a) shows the normalized photoluminescence ϕ TPL (t) while panel (b) shows the decay time calculated via [8] The lines were fitted to the data using numerical simulations performed with Sentaurus TCAD software. Here, the challenge relative to figure 4 is that the model is a numerical one that contains a much higher number of parameters compared to figures 5 and 6. While the fits agree reasonably well, it is difficult to have a high degree of confidence in the parameters that produce the best fit. For instance, the best fit is achieved for an interface recombination velocity of 14 cm s −1 at the perovskite-PCBM interface. This is combined with a conduction band offset at the same interface, which has a magnitude of 70 meV. The obvious question is now whether other combinations of parameters would lead to similarly good agreement between the simulation and experiment. Furthermore, it would be interesting to quantify the correlations between parameters or determine which combination of experiments would be able to pinpoint a certain parameter. So for instance, one might argue that a single measurement might not discriminate strongly between band offsets and surface recombination velocities [26]. However, a measurement at several different temperatures might be able to discriminate between the parameters [72].

Bayesian inference
The specific shortcomings of fitting are (i) the high number of function iterations needed, (ii) the repetition of this high number of simulations for every new sample, and (iii) the inability to quantify the confidence in the determined parameter set of the best fit. The Bayesian inference method offers solutions to all the three shortcomings. Bayesian inference is a rather general principle that has so far been applied only a couple of times within the context of photovoltaic devices [72][73][74] with only first steps towards applications in halide perovskites [75]. While there are a variety of ways how to implement Bayesian inference, we present one approach here that is based on the use of surrogate models and Monte Carlo algorithms [73].
First, a surrogate model is created that addresses issues (i) and (ii). The surrogate model is a neural network that is trained with numerical simulation data that was created using a sensible range of material parameters [73]. The numerical simulations take a long time to perform, but for a set of similar samples one only has to perform these simulations once. The simulations are then stored in a database and used to train the neural network. The trained neural network can then mimic the response of the numerical simulator for the range of material parameters of the training data with a certain accuracy. Subsequently, an algorithm must sample the multidimensional parameter space to which the neural network (surrogate model) has access, to search for a sensible material parameter set to describe the experimental data. The typically used algorithm is the Markov Chain Monte Carlo (MCMC) algorithm, which consists of a large number of walkers that sample the parameter space [73]. Each time a walker tests a particular set of parameters, the neural network generates the response.
To perform Bayesian inference in combination with MCMC, it can take anywhere between 5000 and 10 000 iterations of MCMC to attain convergence when material parameters are numerous. Furthermore, the number of iterations increases with the number of parameters that we attempt to estimate and the complexity of the system. Typically, the number of walkers used in MCMC to scan the parameter space is approximately 200-500. Hence, at every iteration of the MCMC, 200−500 unique points must be evaluated at. If we assume that it takes approximately 8000 MCMC iterations to attain convergence, and there are 300 walkers then for MCMC to run on a single experimental dataset, 300 × 8000 unique combinations of material parameters are evaluated. Given that numerical solvers are usually slow and sometimes cannot be easily run in parallel it will take about 0.5 × 300 × 8000 s ∼ 300 h to perform Bayesian inference on a single experimental dataset on a single core computer. However, using a neural network surrogate model, the same process can be completed within 1 h.
For each set of parameters, the probability that this set of parameters correctly describes the data will be calculated using Bayes equation where E is the evidence (experimental data) and H is the hypothesis (random combination of material parameters). P (H) is the prior probability density that introduces our prior knowledge of the distribution of parameters. In cases where we do not have any prior knowledge of where our parameters might be, we assume a uniform prior. The term L (E|H) is called likelihood which quantifies how well a given set of parameters reproduces the experimental data. P(E) is the marginal likelihood and is the same for all hypotheses. Figure 6 summarizes the workflow for parameter estimation from light-intensity-dependent JV curves of the perovskite solar cell stack as shown by the solid lines in figure 6(c). To solve the inverse problem, we first need to simulate a large set of training data using a traditional photovoltaic simulator to train a neural network to act as a surrogate model. Once the surrogate model is trained, we start the MCMC algorithm, upon convergence of which we obtain the posterior probability distribution as well as the joint probability maps as shown in the figure 6(d), where plotted both for (i) the conduction band offset ∆E C , (ii) the surface recombination velocity S pero/ETL at the interface of the perovskite absorber and ETL, (iii) the valence band offset ∆E V , and (d) the mobility of the HTL µ HTL . First, we find that the posterior of all four parameters is Gaussian. Secondly from the oblique shape of the joint probability distribution we see that the surface recombination velocity S pero/ETL and conduction band offset ∆E C are inversely correlated, i.e. if we vary the two parameters inversely, we will obtain the same system response. This is because the bandgap at the interface decreases when the conduction-band offset increases. The decrease in the bandgap leads to a higher carrier concentration at the interface, meaning that we can compensate for a lower surface recombination velocity owing to the higher availability of carriers. In addition to informing us about the physics of devices, oblique joint probability distributions also entail the fact that the correlated parameters cannot be uniquely identified from the given experimental data. We also observe a similar oblique relationship between the valence band offset ∆E V and mobility of the HTL µ HTL . The device response does not change when we vary the two parameters inversely because the low mobility is compensated for by the high value of ∆E V as it eases hole extraction.
The application of Bayesian inference to optoelectronic characterization data is one of many applications of machine learning algorithms to halide perovskite research. The applications of ML for halide perovskite photovoltaics include e.g. studying the phase stability via photoluminescence [76], the effect of humidity on film stability [77], finding stable perovskite compositions [78], suitable capping layers [79], or do process optimization [80]. However, at least so far, relatively little work has been dedicated to applying ML algorithms to the specific question of analyzing experiments that would normally have to be analyzed by fitting drift-diffusion solvers to data. Notable examples are the use of random-forest-based classification to light intensity dependent current-voltage curves by Le Corre et al [20]. A key difference between the work of Le Corre et al and the overarching challenge described in this paper is that [20] does classification rather than regression. Classification can already be highly useful, for instance in situations where distinguishing between two or more possible causes of an efficiency loss is decisive in guiding process and material improvements. Regression is however typically more complicated as it requires assigning quantitative values to parameters which would allow (by use of a forward model as shown in figure 2(a)) predicting the behavior of the device.
Furthermore, there is important work in the context of organic photovoltaics, e.g. from the group of Mackenzie [81]. Instead of training a neural network to perform a device simulation (material parameters in-measurement result out), the network is trained in the opposite direction (measurement results in-material parameters out). To our knowledge there is currently no comparison on the two different approaches done. The open question to us is how differently the two approaches would deal with situations, where one set of measurement results would fit equally well to many different sets of material parameters. In theory, Bayesian inference predicts multi modal distribution of parameters or ridge cases where several combinations of two or more parameters lead to equivalent output metrics [72].

Characterization for process optimization
A specific aspect of the field of emerging solution-processable photovoltaics is that the speed of device fabrication is fast and the cost in terms of equipment depreciation, consumables, and researcher time is lower than that for silicon photovoltaics. Therefore, a typical lab working on emerging photovoltaics will produce a high number of devices per time, and the main strategy for process optimization will be to use the current-voltage curve as the major and possibly only feedback loop. The current optimization cycle is illustrated schematically in figure 7(a). While measuring the efficiency might be enough to eventually arrive at an optimized efficiency, it means that insights into and understanding of efficiency-limiting processes will often only matter for a posteriori analysis of the optimized process, but not for finding an optimum. For detailed characterization to contribute to the optimization process, device characterization and data analysis would have to become fast enough to support the workflow of device optimization. This question of how to accelerate characterization to support the device optimization cycle will certainly remain relevant in the context of autonomous experimentation [82][83][84][85][86][87] as illustrated in figure 7(b). Autonomous experimentation will initially imply that simpler tasks such as spin coating, sample annealing and all kinds of pick-and-place tasks will be performed by a robot 4 . In more sophisticated platforms, also vacuum processes and sample characterization will be performed by robots [88]. The advantage of autonomous experimentation will be the improved reproducibility and the higher throughput relative to traditional human-centered device optimization. Once the full cycle of device making and characterization is performed automatically, machine learning methods will be needed to automatically plan the next experiment based on the output of the characterization results of previous measurements [89]. State of the art autonomous experimentation systems for solution processable photovoltaics, such as AMANDA in Erlangen [88], already include characterization but focus on transmission and reflection as well as current-voltage curves and stability tests. Sophisticated optoelectronic characterization techniques as shown in figure 1 are not currently included. To support the device optimization cycle, optoelectronic characterization would have to be performed rapidly to not slow down the planning of the next experiments in the cycle. Furthermore, it would have to improve the decision-making process of the experimental planning algorithm relative to a black-box optimization that would just optimize for power conversion efficiency.
To achieve the goal of accelerating device characterization and data analysis, several innovations will be helpful in the near future.
• Hardware to automatically characterize devices (including automated sample loading) has recently become available 5 , and will allow a range of optoelectronic measurements to be performed quickly and automatically. • Methods that provide substantially more and different information than mere current-voltage curves and that can be performed quickly. Examples of such methods include voltage-and intensity-dependent steadystate photoluminescence [53,90,91].
• Fast data analysis methods, such as those described above, are based on either analytical equations or surrogate models, and the use of Bayesian inference will help to quickly transform the experimental raw data into useful information.

Conclusions
The premise of this work is that the interpretation of experimental raw data obtained on halide perovskite devices is significantly more challenging than that for mature photovoltaic technologies such as silicon. This is not a specific aspect of halide perovskites but rather a general challenge in the field of emerging photovoltaics. The number of unknowns is typically much higher than the number of equations obtained by measuring the samples. Furthermore, we often have systems of nonlinear differential equations that explain the behavior of our device, which usually do not have analytical solutions. Thus, data interpretation is frequently performed using simplifications inspired by approximations developed for traditional semiconductors and solar cells. Often, these are inapplicable to halide perovskites or other thin-film devices, thereby leading to misinterpretation of the data. We propose two strategies to improve the current situation. The first approach is to find approximations specifically designed for halide perovskites. The second approach focuses on massively accelerating device simulations to invert the problem of finding material parameters consistent with experimental evidence. We show how this can be done in principle using an approach called Bayesian inference. In the future, we hope that improving and accelerating data acquisition and interpretation could lead to a situation in which a variety of experimental evidence can be included within a manual or automated device optimization process. Here, the characterization of samples, interpretation, and design of further experiments can be integrated, thereby accelerating the development of functional devices in general, and emerging solar cell technologies in particular.

Data availability statement
No new data were created or analysed in this study.