Periodical evaluation of photovoltaic modules and diode parameter extraction method using multiple linear regression models

The stability and performance of photovoltaic (PV) modules can be assessed by outdoor testing where external conditions such as illumination and module temperature are measured at regular time intervals along with the jV-curve of the module. However, the fluctuation and seasonal variation of external conditions can make it difficult to trace changes such as degradation in PV-module properties (at e.g. standard test conditions). This contribution demonstrates the use of multiple linear regressions (MLR) to overcome these difficulties. The data gathered over large periods is condensed into a set of few predictors, that reproduce the jV parameters at infrequently encountered conditions that are required for comparison. Furthermore, the parameters of a physical device model are calculated directly from MLR-predictors, validating our procedure two-fold, by applying the MLR-method to simulated data, replicating the original input parameters, and comparing monthly parameter averages between the MLR-method and a known parameter extraction method.


Introduction
The current-voltage curve (I-V curve) reveals the most significant properties of photovoltaic (PV) devices. From this curve, the I-V parameters describing the device such as open circuit voltage (V oc ), short circuit current (I sc ) and the maximum power point (P mpp ) are extracted, which are used to obtain the performance with respect to the input power. 1,2) Moreover, this curve can be fitted to device models such as the one diode model (ODM) to extract parameters, which help describe properties of the devices such as the saturation current density ( j 0 ) and ideality factor (n). This fitting of course involves iterations, which might become impractical for large amounts of data, due to the required computing power.
Multiple models have been developed mainly to predict the maximum power point (P mpp ) or the maximum power output of PV devices. [3][4][5] These models have been developed either by using analytical and numerical device models such as ODM or heuristically by data mining and observation. Statistical tools also employed for machine learning such as multiple linear regressions (MLR) can be used to predict PV cell or module parameters. These models have various applications from which energy forecasting stands out, where the model is essentially a black-box with coefficients derived by fitting datasets, which do not necessarily relate to a distinct physical model of the device. Such is the case for our own MLR model, which has been presented in a previous publication. 6) However, we have previously shown that the MLR equation for the open circuit voltage could be employed to obtain physical properties from the predictors extracted after the fit without analyzing single I-V curves, 7) enabling the condensation of a large amount of information in arbitrary timeframes to a single "snapshot" of the status of the PV device.
In this work, we validate the use of an MLR model for V oc to obtain the diode parameters. The well-known "I sc -V oc method" for extraction of the saturation current ( j 0 ) and ideality factor (n) was used as a reference for the extraction of the same parameters using the MLR predictors. The extraction of the activation energy (E a ) is validated by comparison to the extracted values of E a via simple linear V oc extrapolation to zero Kelvin and via quantum efficiency measurements.

Theory and methodologies
2.1. One diode model Equation (1), derived from the ODM, [8][9][10] describes the behavior of the open circuit voltage dependent on the short circuit current density (J sc ), the ideality factor (n), the temperature (T) and the saturation current density ( j 0 ) where "k" is Boltzmann´s constant and "q" is the elementary charge.

Diode parameter extraction
The ideality factor and saturation current can be extracted from PV data in multiple ways other than by fitting the jV curve. [11][12][13] However, the V oc -I sc method was chosen for its simplicity and its connection to the ODM. 14) Considering Eq. (1), the ideality factor and the saturation current density can be extracted by plotting V oc as function of the natural logarithm of J sc . Using Eq. (2) the ideality factor can be calculated using the slope From the aforementioned plot, the saturation current can also be extracted using the intercept when V oc equals zero

MLR model
This work applies a MLR model, which has been presented in a previous contribution. 15) The model is based on equations representing the I-V parameters (I sc , V oc , P mpp, and I mpp ) as functions of irradiance (x 1 ) and temperature (x 2 ). Each of these equations [Eqs. (5) to (8)] are defined with four coefficients or predictors, which are obtained for each PV device via fitting of the "training" dataset. In this work, the mean squared error was used to guide the fitting algorithm The rest of the I-V parameters (V mpp , FF) can be calculated from these equations using the known equations: A variation of Eq. (6) was used to increase the correlation coefficient R 2 . The influence of the R s increases for larger irradiances, which limits the logarithmic behavior otherwise present due to the V oc logarithmic behavior. For this scenario, it is better to employ a variation of Eq. (6), which does not have the influence of the logarithm as seen in Eq. (10). For our calculation in this work, both MLR equations for the P mpp were employed. Nevertheless, only the equation yielding better correlation to the data was employed for the prediction. This was done for each fitting, for each interval bin

Experimental
Different modules were installed and measured outdoors at an angle of 35°degrees on a flat rooftop and south oriented in Berlin (52°25′53.4″N 13°31′27.6″E), where two different technologies (CIGS and Si) were used. The CIGS module was self-fabricated using a sequential rapid thermal process 16) with a bandgap of about 1.0 eV and a size of 30 × 30 cm 2 , whereas the c-Si module was bought from a panel distributor with a similar size. The main results are shown for the CIGS module as it presented more interesting features. Nevertheless, results on the c-Si module are provided in the Appendix (Fig. A·1.). The CIGS module was monolithically interconnected using a picosecond laser (1064 nm) for the P1 back contact (Molybdenum) isolation pattern and stylus scribing for the back-front contact interconnection pattern (P2, the top contact consists of ZnO:Al), and the active area definition pattern (P3), respectively. After edge deletion and manual bonding of contact ribbons, the modules were encapsulated using a cover glass (3 mm thick) and a polyolefin elastomer encapsulant. The results in this work include more than 60 000 data points measured in the years 2020 and 2021. These measurements were performed using an MPP tracker and I-V measurement system provided by the University of Ljubljana (i.e. LPVO-MS2x16) similarly to a previous contribution. 6) The system performs an I-V scan measurement every 2 min; I sc and V oc are extracted via extrapolation of the I-V curves, and the MPP through an interpolation with the spline method. Irradiance (Si-01TC sensor) and module temperature (DS18B20 sensor glued to the module backside glass) are recorded simultaneously with the I-V scan. While not being scanned the module is kept at MPP. The specified sensor uncertainties are ±5 W m −2 ± 2.5% (for a vertical light beam at 1000 W m −2 ) and 0.5% (between −10 and +85°C), respectively. A mixture of SQL, python and excel were used for the processing of the data. To facilitate the processing of data and to keep a constant number of days in each cohort during the time series evaluation, each month represents a bin of thirty consecutive days. This means that, even though the graph includes mainly the data obtained in a specific month, it might include a few days from a different month.  The Japan Society of Applied Physics by IOP Publishing Ltd standard IEC61853-1, 17) power matrices have been used for the power rating of different PV module technologies using indoor as well as outdoor data. [18][19][20][21][22] However, the presented matrices show values of outdoor measured values of V oc for different irradiances (1000, 800, 600, 400, 200 W m −2 ) and module temperatures (25°C, 10°C, 40°C, 55°C, 65°C), where each value represents the mean of all the data contained in a bin for the given combination of irradiance and temperature. Each matrix contains 25 bins of data. The bins were obtained by filtering the data around the aforementioned external conditions with a tolerance of ±2%. The mean value from each of the bins is represented in a colored map. For the MLR-estimated values, each "bin" is the calculated value using Eq. (5) for each of the 25 combinations presented in the matrix, where no filtering is needed, and therefore, no tolerance, is specified. The matrices in Fig. 1 show the lack of data in many matrix points (represented in white color) in the month of April, whereas the MLR shows a full matrix as anticipated. For the data gathered and evaluated in this work, all months presented insufficient data in some of the bins to construct the whole power matrix, which can be filled by employing the MLR. Figure 2 is an example of a monthly data set (04.06.2020 to 03.07.2020) filtered by temperature to improve visualization. The empty bins of 10°C and 65°C (black and purple, respectively), which have no data, can be filled with the MLR. Thus, the MLR can be used as a tool to observe the behavior of the I-V parameters when the data is insufficient for adequate statistical correlation. Consequently, MLR can be used to close the gaps in data, which provides a broader overlook of device performance. The IEC methodology provides a combination of interpolation or extrapolation procedures depending on the targeted external conditions, 17,23) which can be avoided using the MLR as it only requires one fitting procedure per data frame.
To evaluate the accuracy of a time series evaluation, the mean absolute percentage error (MAPE) was calculated for each bin (monthly) of outdoor data and each of the I-V parameters from a CIGS module. Figure 3 shows the MAPE of the MLR-estimated values (Y MLR ) with respect to the measured average values (Y meas ) per month over a year of the CIGS module, which was below 5% (black solid line) for all parameters. For the most part, the errors in our calculation are within the tolerances of the irradiance and temperature measurements. Stronger deviations can be seen only in the months of January and March due to higher uncertainty in the measurements possibly related to the location and surroundings of the outdoor installation, where issues such as snow, soiling, reflections or shading are possible. Therefore, the MLR has been verified for the evaluation of PV devices over time, where the error is similar to the uncertainty of the measurements of irradiance and temperature.
Thus, the evaluation of the I-V parameters (V oc , I sc , P mpp , V mpp , I mpp and FF) can be done for different irradiances and temperatures conditions over different intervals of time with excellent accuracy. For the employed data sets, an error below 5% was found, mostly below 2%. This can be especially useful to evaluate degradation of the I-V parameters at lower and higher irradiances Figure 4 shows the time series of monthly average values of the I sc , V oc , P mpp and FF measured outdoors at 400 W m −2 and 25°C (in black) and the MLR-estimated of the same I-V parameters (in red) at the same conditions for the CIGS module. The irradiance and temperature constraints were chosen such that it is more likely to find measured data throughout the whole year under these conditions. For the measured values, the filter tolerance was the same as in Fig. 1. It can be seen in Fig. 4 that the MLR-estimated values are follow very closely the absolute values of the measured averages, and therefore, the general trends for all I-V parameters, which opens the possibility for evaluation of the behavior of I-V parameters over time. It can be seen that for the months of December 2021 and January and June of 2021 no data around the evaluation constraint was observed. However, due to the results presented in the previous section, the MLR closes these gaps allowing a more appropriate description of the electrical changes of the modules at different specific irradiance and temperature conditions, even when specific data is not available. Consequently, MLR helps closing the gaps of missing data to have a more continuous and time dependent evaluation of PV devices.
Moreover, using the MLR approach it is possible to perceive that the changes in the P mpp are seemingly related to small changes in the V oc and a strong degradation of the FF. However, as both FF and V oc depend on the recombination mechanism in the absorber as well as parasitic resistances, a model, that can also extract diode parameters, is useful.

Diode parameter analysis
As we have shown in a previous contribution, 7) our MLR model shows a high correlation between the estimated and synthesized values of V oc using the ODM. In this work, we provide a methodology for extraction of diode parameters using the MLR model and show the possibility of time series evaluation of said parameters using one year of empirical data measured outdoors in a CIGS module. For the validation, we use two approaches; fitting data synthesized using the ODM and replicating the input diode parameters, and using empirical data measured outdoors and comparing the results to well-stablished methods for diode parameter extractions. 4.2.1. Monthly evaluation of j 0 and n. The values of the j 0 and "n" were calculated after fitting the monthly data to the MLR Eq. (5), from which the four predictors per period were For j 0 , Eq. (12) was employed where j N represents a reference value obtained from the predictors of Eq. (6) when x 1 equals 1. Further information on the origin of Eqs. (11) and (12) is provided in the Appendix. The first validation of the methodology was performed using analytically synthesized datasets from a PV device model (ODM) and reproducing the original input parameters. For the reproduction of the input parameters, the formulas provided were applied to the extracted coefficients from fitting V oc matrices generated with the ODM from various irradiances and temperature (as in Fig. 1). Three scenarios were compared, ideal (n = 1.0, R s = 0 Ωcm 2 ; R p = 1E6 Ωcm 2 ), semi-ideal (n = 1.0, R p = 1 Ωcm 2 ; R p = 1300 Ωcm 2 ) and slightly shunted (n = 1.7, R s = 2.4 Ωcm 2 ; R p = 580 Ωcm 2 ) cases. Figure 5 depicts the results from this comparison, showing very high correlation. Statistically, as we have shown in a previous contribution, 7) the worsening of the R s and R p generate lower correlations in the MLR. For this comparison the MAPE (as in Fig. 3) between the input and worst case scenario was below 5% for the ideality factor.
For j 0 the error was about 50%, however, it is evident that the estimations are in the same order of magnitude, which is a very good approximation. The full comparison between input and MLR-estimated diode parameters for the three scenarios is presented in the Appendix.
The second validation was done using outdoor-measured data from actual PV modules. For the extraction of the diode parameters (i.e. j 0 and n), the well stablished "V oc -I sc method" was employed. 14) In addition, in order to increase the accuracy of the said method, an algorithm, that optimizes the statistical correlation (R 2 ) by reducing the ranges of irradiance in the data bin, was developed and implemented (an example is provided in the Appendix in Fig. A·2.). Figure 6 shows the comparison of the two extraction methods for the ideality factor and the saturation current density, where the "V oc -I sc method" and the "MLR model method" (at 25°C) are represented with black and red markers, respectively. It can be seen that for both parameters, both methodologies have a good correlation in their absolute values as well as in their trends. These values are close to the expected values of CIGS, 24,25) which verifies the possibility of extracting the ideality factor and the saturation current density.
Interestingly, the degradation shown in Fig. 4 (seemingly after January) cannot be seen in Fig. 6, which suggests that the degradation observed after the month of February (Fig. 4) is probably not due to degradation of the diode. However, over the whole year, a very slight upward trend (worsening)   The Japan Society of Applied Physics by IOP Publishing Ltd of both the ideality factor and saturation current density can be seen, which might suggest a slight diode degradation.

Monthly evaluation of the activation energy.
In addition to the ideality factor and the saturation current density, the activation energy was also extracted from the MLR coefficients after fitting and compared to a known methodology, the V oc extrapolation to zero Kelvin ("Ext"). 26,27) Even with known methodologies, using outdoor data to extract physical properties is not straightforward. To be able to extract the E a using the extrapolation approach, the irradiance was filtered between 400 and 600 W m −2 , the module temperature was changed to Kelvin and the V oc was divided by the number of cells to obtain the V oc per cell.
For the calculation of the activation energy from our MLR model, Eq. (13) was used. However, in order to get accurate results for modules, the measured V oc in the data was also transformed into V oc per cell. Furthermore, similar to the "extrapolation" approach, temperature was expressed in Kelvin before the fitting process. The derivation of the E a from Eq. (5) is provided in the Appendix Figure 7 presents the comparison of the two methodologies. The black boxes represent the extrapolation approach whereas the red boxes represent the MLR approach. Additionally, as the activation energy is expected to be close to the bandgap (depending on the recombination mechanism [26][27][28] ), the value of the optical bandgap, extracted from the derivative of the EQE measured (i.e. E g ≈ 1.0 eV) in similarly processed samples, is represented with a solid black horizontal line. Thus, it can be seen that, both methodologies provide similar values and deviations which validate the usage of the model for activation energy approximation. In addition, it is worth noticing that the MLR can be used even when no sufficient data is obtained outdoors (e.g. Nov-2020), which might cause larger deviations in the calculation. Furthermore, Fig. 7 presents the accuracy of the correlation of the MLR fit to the V oc data, shown with the blue markers, where an accuracy of about 95% was found. Thus, with the MLR model it could be possible to evaluate changes in the E a , in order to evaluate generation of barriers or possible changes in recombination mechanisms, which is unlikely to happen in CIGS. However, in spite of the good correlation coefficient (R 2 ) for V oc , the errors in determining the E a are significant, indicating that more careful filtering could minimize the E a estimation uncertainty. In the calculated data, the monthly variations of the E a show no significant changes around February 2021 in contrast to what was shown in the I-V parameter evaluation. This, together with the lack of changes in the other diode parameters suggests that the degradation around said period was not related to a degradation of the absorber material but to a different component of the device.

Discussion
The possible reasons behind the apparent degradation of the fill factor in the time series evaluation of I-V parameters (Fig. 4) are unknown. However, using the MLR model in addition to known methodologies, the degradation of the absorber material was ruled out. Here, the capability of the MLR to predict values at different conditions is used to indirectly signal the possibility of either R s or R p degradation.
Using each of the temporal bins of the aforementioned time series, the values of FF were calculated using the MLR estimation in order to compare the behavior of the FF as a function of irradiance in the evaluated CIGS module. The result is shown in Fig. 8, where, by considering the curve from September 2020 (black curve) to September 2021 (soft red curve) for all irradiances, the degradation is evident. This was shown already during our I-V parameter analysis for the 400 W m −2 case, however, it provides indirect evidence of parasitic resistance degradation, as the trends of I-V parameters at higher or lower irradiances may indicate a stronger influence of the series or parallel resistance as shown in literature. [29][30][31] For instance, in Fig. 8, it appears to be a slight improvement in the parasitic resistances from September 2020 to January 2021 probably due to the metastability of the CIGS module. However, after January 2021, not only the absolute value of FF is decreasing, but also the slope of the curve towards larger irradiances is more pronounced which indicates a stronger R s influence. In the lower irradiance regime, where the R p is more significant, there also seems to be an improvement towards winter 2020 and a worsening towards summer.
To support these conjectures, Fig. 9 shows the apparent series (R oc ) and apparent parallel (R sh ) resistances (calculated   Figure 9(a) depicts a slight worsening towards September 2021 for the presented irradiances, which correlates with our previous observations shown in Fig. 8. However, Fig. 9(b) shows, in general, slight worsening after the first three months and an improvement after that, which does not correlate to our observations in the FF. Nevertheless, this can be easily explained with the fact that the R s can also influence the low irradiance performance whenever the R s is significantly high, possibly due to the increasing influence of bias-dependent current collection. 32,33) The influence of the R p alone is better appreciated by looking at the V oc as a function of irradiance, where it was found that the trend of the V oc for lower irradiances did not change significantly after one year (provided in the Appendix Fig. A·3.).

Conclusion
In this work, it was shown that periodical evaluation of PV modules using MLR models for analysis of outdoor-measured large databases is possible. This was verified comparing different I-V parameters (e.g. V oc ) at different external conditions (irradiance and temperature) of MLR estimated values against the measured values, the latter being filtered with a tolerance of 2% for both conditions. Additionally, it was shown that otherwise unavailable data in outdoor conditions can be systematically generated using MLR to obtain a broader understanding of the periodical development of PV devices at different external conditions, which was used to close data gaps in our periodical evaluation of I-V parameters.
Formulas derived from the MLR model of the V oc were validated to extract diode parameters in two different ways. First, the MLR method was applied to data sets synthesized using the ODM and replicating the original input parameters from said formulas with an error below 5% for the ideality factor and a calculation for the j 0 in the same order of magnitude. Second, large data sets from outdoor-installed CIGS modules were used, correlating known methodologies to the newly developed MLR methodology, resulting in a discrepancy of about 3.5% for the ideality factor with respect to the stablished methodologies but with the advantage of reduced complexity in large data set processing.
In addition, a monthly evaluation of the extracted ideality factor, saturation current and activation energy was shown and compared to the evaluation of the I-V parameters. By coupling both evaluations, it was possible to discard the degradation of the absorber. We have shown that in our I-V parameter evaluation, a recovery of the FF towards winter in 2020 was present, whereas a degradation towards summer was observed. However, our results on the extracted diode parameters show that the degradation was not due to degradation of the absorber itself but was specially linked to the R s , which was reassured by the measurements of the apparent series resistance. where "P" is the illumination and "P N " is a reference value, which renders the ratio "x 1 " dimensionless (e.g. P N = 1 W m −12 ). In addition, to make a connection to the J sc (V oc ) plot, it is assumed that J sc depends linearly on the illumination as shown in equation

Acknowledgments
where j N is the current density at the reference value of irradiance   In the J sc (V oc ) plot we extrapolate to V oc = 0 and the intercept with the y-axis is j 0 .
Now we take the MLR, also set it to 0, and solve for ln(x 1 ): That means we cannot calculate j 0 from just the MLR equation for the voltage, we also need the MLR equation for the current density and set To connect the two equations we assume that J sc is proportional to Solving for "n" we get Eq. (A·12) It follows that E a in the MLR model is Eq. (A·18) Note that in MLR, the activation energy depends on irradiance (x 1 ), whereas in Rau´s model it does not. Therefore, the parameter Aß is an indication of how well the device confirms to the ODM.
Supplementary. Extraction of diode characteristics, Silicon module.
The MLR and the V oc -I sc method show very good correlation of ideality factor also in c-Si modules.
Supplementary. Extraction of diode characteristics using the V oc -I sc Based on the V oc -I sc method, 14) we extract the ideality factor from the slope and j 0 from the intercept of a V oc -ln(J sc ) plot. To optimize the extraction, an algorithm using python was developed, where the irradiance is filtered to reduce the noise and increase homoscedasticity, thus increasing the correlation coefficient R 2 .
The V oc as a function of irradiances can indirectly show the influence of the R p by observing the trend around lower irradiances. In this case, even though there is a change in the absolute value of the V oc at lower irradiances, no strong change can be seen in the trend after one year. A slight change can be seen in the trend but only from September 2020 to January 2021 which would partially explain the improvement seen in FF.
Supplementary. Comparison between input and calculated diode parameters from synthetic data from the ODM.
• Ideal case