A Bayesian statistical method for large-scale MEMS-based sensors calibration: a case study on 100 digital accelerometers

Low-cost sensors and in particular micro-electro-mechanical systems (MEMS) devices are widely used in many applications, including consumer electronics, healthcare, automotive, and industrial automation. Their large-scale production (typically in the order of millions per week in a single factory) would require the calibration of a huge number of devices that would be costly and time-consuming. A solution can be found in the use of statistical methods in order to (at least partially) substitute for the typical calibration procedures. In this work, we propose a Bayesian method to statistically calibrate large batches of sensors using probabilistic models and prior knowledge. The method involves experimentally calibrating only a small sample of sensors, then infer the number of reliable sensors in the entire batch and assign an appropriate uncertainty to all the sensors. Therefore, it can be considered as a statistical calibration of the batch. The Bayesian nature of this approach allows reducing the number of experimental calibrations by incorporating the prior knowledge coming from the previous calibration of a ‘benchmark’ batch, which is performed ‘once and for all’ and is representative of the whole production process. The application and validation of the method are performed through the calibration of 100 digital MEMS accelerometers. Validation results showed an acceptable agreement between experimental-based bootstrap and theoretical values, with relative differences within ±7%.


Introduction
According to the International Vocabulary of Metrology (VIM) [1], the sensitivity of a measuring instrument/transducer is defined as the quotient of the change in an indication of the measuring system/transducer and the corresponding change in a value of a quantity being measured and is univocally attributed, based on specific standard calibration procedures.Any transducer is individually calibrated, against primary or secondary standards, and a single calibration certificate, guaranteeing its metrological traceability, is issued one-to-one.This process allows each specific transducer to be linked to the relevant SI units through a proper metrological traceability chain.Nevertheless, while this process is (and must remain) unavoidable to uniquely provide traceability to measuring instruments, it cannot be applied in the case of new generation sensors (e.g. based on micro-electromechanical systems (MEMS) technology), due to the largescale production rate (a single manufacturer can produce millions of sensors per week [2]).This poses new challenges for metrology research.Nowadays, low-cost and low-power new-generation sensors are largely employed in many current daylife technological facilities, such as in smart systems and sensor networks for health, safety, automotive, buildings, energy, industry, environmental control, structural monitoring, as well as many more, in the framework of the evolving digitalization [3,4].
To ensure the safety and functionality of operations managed by these systems, it is crucial that the data provided by the supporting sensors is both reliable and sufficiently accurate.Consequently, there is a need to establish comprehensive calibration methods on a large scale.
As suggested in the literature, this could be achieved by either in-line calibration systems or statistical methods, but none has been fully implemented in practice [5][6][7].
Ideally, in-line calibration systems should offer the advantage of calibrating all MEMS sensors during the production process, using traceable methods to be defined by the manufacturer.However, they come with certain drawbacks, including difficulty in ensuring standard calibration procedures and impartiality.
Conversely, statistical methods provide the advantage of drastically reducing the number of experimental calibrations, hence allowing that those are carried out by an accredited calibration laboratory, ensuring adherence to standard calibration procedures.Nevertheless, they also have drawbacks, such as not experimentally calibrating all MEMS sensors, which may lead to higher uncertainties and lower reliability.The possibility to exploit these methods is also indicated in the strategy document 2021-2031 of the Consultative Committee for Acoustics, Ultrasound and Vibration of the Bureau International des Poids et Mesures [8].It states that «to maintain an acceptable reliability factor» of these sensors, «industry has moved from testing and calibrating every device towards statistical sampling to reduce manufacturing costs».Such approaches (aimed to provide «statistically acceptable levels of performance and reliability» of sensors from manufacturers [8]) involve a probabilistic determination of key metrological attributes, traditionally defined by experimental quantitative values as a result of a standard calibration.Therefore, these new approaches need a careful investigation of their feasibility and suitability, in the metrological perspective.In particular, this paper elaborates one of such approaches, discussing its potentialities but also its limitations, from both computational and practical point of view.
A Bayesian approach [9] is here explored.The method involves first the experimental calibration of a small subset of sensors from a larger batch and then, by means of a statistical calibration process, the estimation of the total number of reliable sensors in the entire batch and the assignment of an appropriate uncertainty to all batch sensors.The significant benefit of this statistical method is the ability to incorporate prior knowledge gained from the previous calibration of a 'benchmark' batch, which is experimentally performed 'once and for all'.The benchmark batch is chosen as representative of the entire production process, whose main characteristics are assumed to remain stable for all the time the statistical calibration of the produced batches is in place.In case of evidence or suspicion of significant changes in the production process, the calibration of a new benchmark batch is required.
After performing the experimental calibration of the benchmark batch, for all subsequent batches of that production process, the statistical method requires the experimental calibration of only a few sensors within each batch.In this way, the number of in-the-lab calibrations is reduced, enabling the statistical calibration of large batches of sensors at affordable expenses in terms of time and costs.
This process implies a merge of quantitative and probabilistic information, such as the calibration uncertainties, the precision of the provided indications, the repeatability and the reproducibility of results, but also predefined admissible tolerance limits, based on suitable hypotheses related to prior information and likelihood functions, to provide the metrological attributes on the basis of probability distributions.
As a case study, the proposed statistical procedure is applied to 100 nominally identical digital MEMS accelerometers and it is validated through a bootstrap technique.

Ontology and taxonomy
Before delving into the methodology it is important to clarify some terminologies used, their relationships with the fundamental and general metrological concepts, and some necessary lexical expansions.Although, in general, terms and definitions used in this work are based on the VIM [1] (indicated in the following within single quotation marks), some exceptions must be pointed out in order to avoid misunderstandings.Indeed, often, terms and definitions currently used in engineering and electronic applications (indicated within double quotation marks) do not follow the VIM, and it is therefore necessary to address those definitions correctly.
Commonly speaking, the term 'sensor' is comprehensively attributed to a device producing output signals by sensing a physical phenomenon, as a reaction to changes in that physical quantity.However, an acceleration sensor, for example, is neither strictly an accelerometer (i.e. the 'measuring instrument') nor the element directly affected by the phenomenon carrying the quantity to be measured (i.e. the 'sensor').It is properly a 'measuring chain', i.e. the set of one or more measuring instruments and other devices constituting a single path of the signal from the sensor to the output element [1].However, along this definition, it is preferable to consider the common term 'sensor' more properly as 'sensing system', with a weaker semantic degree than 'measuring chain'.Indeed, an acceleration sensor is a component of an electronic circuit, namely a chip embedding a vibration sensor with a transducer (generally based on MEMS technology), an amplifier, and an analogue-to-digital converter.Only for the sake of simplicity, hereinafter we use the term 'sensor', albeit intended as 'sensing system'.
In applied metrology, the 'sensitivity' can be attributed to a measuring chain, as a calibration result, in terms of the quotient of the change in an indication of the measuring system, and the corresponding change in the value of the reference standard quantity being measured, which is the stimulus/input to the measuring system.According to the IEEE 2700-2017 standard [10], the 'sensitivity' of a 'sensor' is opportunely indicated in terms of 'scale factor', as reported in the application datasheets provided by the manufacturers.However, the 'scale factor' is determined on the basis of 'adjustment' procedures, rather than 'calibration', and the 'adjustment' procedures, applied at the manufacturer level, are not traceable with respect to a reference standard, and the whole uncertainty budget is generally unknown or disregarded.Hence, in this work, when proper calibration methods against a reference measurement standard are implemented, we refer to the 'sensitivity' of a sensor, as properly defined.
Given the impossibility, in the current state of available technologies, of providing a sensitivity value for each individual sensor of very large batches, an approach based on Bayesian statistics to be applied for large-scale calibration and able to provide a sufficiently representative sensitivity value to a large set of (supposed) identical sensors is here proposed.In this context, it is therefore necessary to extend the concept of sensitivity, traditionally associated with each individual sensor on the basis of its experimental calibration.Through the proposed statistical approach, sensitivity is assessed by means of a statistical calibration performed by combining the sensitivities of a benchmark batch, representative of the entire production, and the experimental calibration results of a small portion of sensors of an unknown batch.Finally, an expression for the uncertainty to be associated with the sensors of the unknown batch, deduced from the same probabilistic assumptions on which the statistical model is based, is proposed.

Use of a benchmark batch
The first step of the proposed method consists in the usual experimental calibration of a benchmark batch of N sensors: each sensor is provided with a measurement of its sensitivity S i (i = 1, …, N) and the associated expanded uncertainty U(S i ) at a confidence level of 95 %.We call it 'benchmark' not in the sense of the best-products batch, but in the sense of a batch faithfully representing the typical results of a production process, including possible defects.
To model the distribution of the benchmark batch sensitivities including their calibration uncertainties, a mixture distribution [11] can be applied.This distribution is derived from a set of N variables, each following a normal distribution that models the knowledge about the sensitivity of each sensor (i.e.having the mean and the standard deviation equal to the measured sensitivity value and the associated uncertainty, respectively).The mixture distribution is numerically obtained by generating data from the multivariate normal distribution of those N variables (allowing possible correlation among the several normal distributions).The parameters of the multivariate normal distribution are a vector of mean values equal to the N sensitivities of the benchmark batch and an N × N covariance matrix with squared standard uncertainties on its diagonal.A Monte Carlo simulation is employed [12,13] that randomly generates 10 5 vectors made of N values each, from the multivariate normal distribution.The simulated values are combined into a 10 5 × N matrix, whose columns represent the (marginal) probability density functions of individual sensors.By combining these density functions by equal weights, i.e. merging the simulated 10 5 × N sensitivity values, the mixture distribution representing the potential sensitivity values of the entire benchmark batch is obtained.It is intended as a representation of the distribution of the sensitivity values typical of the production process from which the benchmark batch is taken.
The characterization of the benchmark batch in terms of a distribution allows defining an interval of 'good' sensitivity values according to some desired level of coverage probability.With such intent, by setting a bearable probability p of finding out-of-tolerance sensor sensitivities in the benchmark batch, the mixture distribution is used to find the lower and upper sensitivity limits, S low (p) and S up (p), as the limits of an interval encompassing the (1 − p) fraction of acceptable sensors.The sensors whose sensitivity falls outside the [S low (p), S up (p)] interval are considered as out-of-tolerance.In this work, the coverage interval is determined as a probabilistic symmetric interval, i.e. leaving out, both to its left and right side, a p/2 fraction of the values.The number of outof-tolerance sensors in the benchmark batch is then equal to C bench = pN.
For other future unknown batches of the same kind of sensors, only a subset of n < N devices is required to be experimentally calibrated, hence reducing calibration time and cost efforts.Also for such batches, the requirement for acceptable sensors is related to the [S low (p), S up (p)] interval.The sensitivities of the n calibrated devices S j (j = 1, …, n) are checked whether, together with their associated expanded uncertainty U(S j ), are within or outside the tolerance limits, i.e. if the condition S j − U(S j ) > S low (p) Λ S j + U(S j ) < S up (p) is satisfied or not.The number of the out-of-tolerance sensors among the calibrated n sensors is indicated by k.

Bayesian statistical calibration
The statistical calibration of the unknown batch of N sensors (actually, of the N − n yet unknown ones in the batch) is based on the following Bayesian model [14].Assuming that the expected number of out-of-tolerance sensors in the unknown batch is equal to C bench , a binomial prior distribution f prior (C; N, p), with mean value equal to C bench = pN, is used to model the actual number C of out-of-tolerance sensors in the unknown batch, as detailed in equation ( 1).This prior probability mass function (assuming that the experimental calibration and the manufacturing process do not change from batch to batch) models the state of knowledge on the number of outof-tolerance items in a typical batch on N items from that production.The likelihood function f like (k; n, C, N), reported in equation ( 2), is defined as a hypergeometric distribution of the k out-of-tolerance sensors in the sample of size n, where N is the size of the unknown batch and C is the number of defective sensors in the batch.The hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (in this case the number of defective sensors) in n draws without replacement.Multiplying the prior and the likelihood yields an un-normalized posterior f post,un (C; k, n, N, p), function of C out-of-tolerance sensors in the unknown batch.Normalizing this distribution, i.e. dividing it by the summation of f post,un (C; k, n, N, p) for C ranging from 0 to N, leads to the probability mass function f post,norm (C; k, n, N, p) reported in equation ( 3), which is the posterior distribution of the number C of out-of-tolerance sensors in the unknown batch, given that k out-of-tolerance devices are found in the small calibrated sub-batch.Detailed analytical calculations are reported in appendix, As an example, the probability mass function f post,norm (C; 0, 3, 100, 0.05) (i.e. at k = 0, n = 3, N = 100 and p = 0.05) as a function of C, is depicted in figure 1.It represents the probability that in the unknown batch of N = 100, C sensors are out-of-tolerance, knowing that k = 0 calibrated sensors from the subsample n = 3 are defective, when the out-of-tolerance sensors probability in the benchmark batch is p = 0.05.

Proposed criterion for the unknown batch performance assessment and uncertainty evaluation
As required by [8], in order to provide the batches with «statistically acceptable levels of performance and reliability», the posterior cumulative probability function can be used to define appropriate metrics for batch reliability.In this work, the focus is only on cases when k = 0 defective sensors are found among the n calibrated in the sub-batch.In the proposed approach, when k > 0 the unknown batch is deemed to be discarded and no reliability assessment is needed.Corresponding metrics could in principle be defined also for k > 0, but the attention is here on the more stringent requirement of no outof-tolerance calibrated sensors.
For this purpose, it is here proposed to define the reliability of the unknown batch as the probability P reliab,unk to get a number of out-of-tolerance sensors in the whole batch not larger than the number of out-of-tolerance sensors C bench in the benchmark batch.This probability is given by the posterior cumulative function: From equation (4), in the limit case of n = N, i.e. when all sensors in the unknown batch are calibrated and none is outof-tolerance, one has that P reliab,unk = P post,norm (C ⩽ C bench ; 0, N, N, p) = 1, which is the current requirement in the typical traceability chain when each and every device is experimentally calibrated.
This metric, defined for each unknown batch which is accepted, i.e. when k = 0, provides the fraction P reliab,unk of the future unknown batches that will have no more than C bench out-of-tolerance sensors.In other terms, it is the probability that the sensitivity of at least (N − n − C bench ) uncalibrated sensors of the unknown batch lies between S low (p) and S up (p).This is an additional information that characterizes the reliability of the batches that will be released.Based on this metric, a sensors producer can state that P reliab,unk % of the released batches will have a sufficiently large number of acceptable items, that is at least N(1 − p).
The n calibrated sensors get their sensitivity value and the associated uncertainty from the usual calibration process inthe-lab.Although the remaining (N − n) sensors are not experimentally calibrated, it is nonetheless necessary to provide them with an estimate of their sensitivity, S unk (p), and an associated uncertainty, u(S unk (p)).This can be done based on the information derived from the mixture distribution of the benchmark batch (always working under the hypothesis that the unknown batches are of the same kind as the benchmark one).In this sense, it is proposed to assign to the (N − n) uncalibrated sensors of the unknown batch a statistically-averaged sensitivity, equal to the mean value of the benchmark batch, e.g. when the mixture distribution is nearly symmetric, equal to The associated squared standard uncertainty u 2 (S unk ) is evaluated based on the weighted mean of the variance associated with a sensitivity lying within [S low (p), S up (p)] and that associated with the sensitivity outside that interval, the weights being the fractions of acceptable and out-of-tolerance sensors in the batch, respectively.Then, this weighted mean is evaluated for each possible value of C and multiplied by the posterior's probability to find exactly C out-of-tolerance sensors in the unknown batch (for C varying across all its possible values).The final expression for the uncertainty is given by When the unknown sensor sensitivity lies between the limits, the variance of a rectangular distribution on that interval is used.Conversely, when the sensitivity is out of the interval, the variance of a U-shaped distribution defined on a larger interval [S min , S max ] is taken, where S min and S max are the minimum and the maximum value of the mixture distribution of the benchmark batch.
Substituting (3) in (6), with k = 0, and after some algebra, standard uncertainty u(S unk ) becomes Equation ( 7) is independent of n and N, depending only on p and on the characteristics of the benchmark batch mixture distribution, which influences the value of lower and upper bounds and the minimum and maximum values.This uncertainty evaluation might result in a precautionary and conservative practice that, however, addresses the need to provide an uncertainty to the sensitivity values of the statistically calibrated sensors.Examples of uncertainty calculations are reported in section 3 with actual experimental data.

Discussion
To study the behaviour of the proposed metric, P reliab,unk values are depicted in figures 2-4 as function of the ratio n/N, for different N, and different probabilities p to find out-of-tolerance sensors in the benchmark batch.As expected, at increasing n/N, also P reliab,unk increases.When the ratio n/N tends to zero, P reliab,unk values tend to 50 %, i.e. pure randomness, since there is no evidence that the batch quality satisfies the requirement of a limited number of out-of-tolerance sensitivities, i.e.C ⩽ C bench .Indeed, even if none out-of-tolerance sensor is found in the sample (k = 0), this too small sample becomes unrepresentative with respect to the whole batch and cannot convey further information with respect to the prior probability (which is a binomial distribution with median equal to C bench , so that P prior (C ⩽ C bench ) = 0.5).Note that when k = n = 0, equations ( 1) and (3) coincide, i.e. the posterior mass function degenerates to its prior.
Moreover, it is important to note that, for constant n/N ratios, P reliability,unk tends to rise as N increases, particularly with higher values of p.However, there are instances where it increases as N decreases, particularly with lower values of p.The general increase of P reliability,unk with larger N, more often when p is high, suggests that, in order to statistically calibrate a whole population of items with higher reliability, it would be preferable to split the population into a few large batches rather than into many small batches, as the former ones result in a greater reliability for the same amount of effort (i.e. the same overall number of experimental calibrations).This, however, would imply the preliminary experimental calibration of a benchmark batch of a larger sample size N which can still demand a substantial experimental effort.By the way, the feasibility of the calibration of a large benchmark batch is highly dependent on several factors such as the sensor type, measurement technique, calibration conditions, and the calibration system.In order to further alleviate the workload, one might also consider the possibility of skipping the measurement of the benchmark batch and simply resorting to the producer's information.However, this would require complete trust in the manufacturer's ability to furnish the distribution of calibration values for the benchmark batch sensors.Although this could be a feasible option, it raises concerns regarding the required impartiality within the metrology chain, which may not be guaranteed.
In order to reduce the number of required calibrations, especially for the benchmark batch, it becomes important to find a balance by selecting batch sizes that are not excessively large.A good trade-off between high reliability values and experimental workload is to choose low values of p, so that batch size N can be decreased, while still choosing a very small size for the unknown batch.In fact, at decreasing p, it can be shown that lower N values entail higher reliability at decreasing n/N values (figure 4).This is an optimal solution since it guarantees, at the same time, a lower workload for calibration and acceptable levels of reliability.The downside of a small p, however, is that a wide acceptance interval [S low (p), S up (p)] is allowed with the consequence that the end-user of those sensors should be aware of (and happy with) a large variability in the accepted sensors.

Application and validation of the method
In order to assess the effectiveness of this Bayesian statistical approach and to validate it, 100 nominally identical digital MEMS accelerometers (STM, model LSM6DSR) connected to an external microcontroller (STM, model STEVAL MKIGIBV2) (figure 5) are calibrated at INRiM, one at a time according to prescribed procedures [15][16][17][18][19].
Calibration is performed using a single frequency of 10 Hz.A vibrating table (specifically, the PCB Precision Air Bearing Calibration Shaker) generates a reference acceleration with nearly constant amplitude (10 m s −2 ) along the vertical axis for a duration of 10 s.This reference acceleration is detected by a single-axis reference transducer (PCB model 080A199/482A23), integrated within the shaker's stroke.Its output is collected using an acquisition board (NI 4431) in the PC.LabVIEW ® software processes this data to provide the reference value in m s −2 , at a sampling rate of 50 kHz.
The digital MEMS accelerometer is fixed to the shaker along the vertical axis using an ultra-thin double-sided adhesive tape, typically employed in practical applications.The external microcontroller records the digital output of the MEMS sensor at a maximum sampling rate of 6.66 kHz and saves the data as binary files.The outputs of the MEMS are given in Decimal 16-bit-signed (hereinafter abbreviated as D 16-bit-signed ) where the digit unit is a signed 16-bit sequence converted into a decimal number.These files are subsequently processed with MATLAB ® software.Firstly, a first-order Butterworth band-pass filter with a centre frequency matching the frequency of interest and a fractional bandwidth of 10 % is applied to the temporal digital signals.Secondly, the root mean square is computed to eliminate gravitational offsets and the influence of background vibrations.Results are depicted in figures 6 and 7. Values range between 832.9 D 16-bit-signed /(m s −2 ) and 850.3 D 16-bit-signed /(m s −2 ).Calibration standard uncertainties are in the order of 4.2 D 16-bit-signed /(m s −2 ), which in relative terms, corresponds to about 0.50 %.
Once all MEMS have been calibrated in order to have a ground truth, the Bayesian method is validated using a bootstrap technique [20,21].All analyses are performed using R  statistical software.The procedure involves generating multiple samples from the original dataset by randomly sampling observations with replacement.After generating the bootstrap samples, the statistical metric of interest, i.e. the reliability of the unknown batch P reliab,unk , is experimentally evaluated.By replicating the bootstrap process, an accurate estimate of the experimental variability of the proposed metric is provided and compared with the expected theoretical result according to equation (4).
For each bootstrap cycle, the proposed Bayesian statistical method is implemented in the following way: from the 100 experimentally calibrated MEMS sensors, two groups of 50 MEMS each are randomly drawn, the first simulating the benchmark batch and the second simulating the unknown batch.A numerosity of 50 MEMS is chosen so as to have a sufficiently large number of values on which to perform the statistical control.From the benchmark batch, the distribution of sensitivities is modelled through a mixture distribution which takes into account also the calibration uncertainties.The mixture distribution is obtained using a Monte Carlo simulation, as described in section 2 and, in more detail, in [12].Once the benchmark batch is characterized and the desired probability p of out-of-tolerance sensors is set, the lower and upper tolerance limits of sensitivity, S low (p) and S up (p), are determined.At this point, in the unknown batch, the overall number C of sensors having out-of-tolerance sensitivities and the k ones in a subsample of n randomly drawn sensors, are counted.If in the sub-sample there are k = 0 out-of-tolerance sensors and, in the whole unknown batch, there are C ⩽ C bench = pN outof-tolerance sensors, a success is counted.Iterating this process r = 100 times, the number of successes in r attempts is counted.
The ratio between the number of successes and the number of cases when k = 0 represents an estimate of the reliability metric defined in equation ( 4) based on experimental results, i.e.P reliab,unk,exp .A brief scheme of a single cycle, to be iterated r times, is shown in figure 8.
Replicating this entire process t = 1000 times, the experimental histogram of the t reliability values P reliab,unk,exp is obtained and compared with the theoretical probability P reliab,unk,theor provided by equation ( 4).
The validation is performed by applying different boundary conditions, i.e. with n ranging from 5 to 20 (i.e.n = 5, 10, 20) and p ranging from 0.02 to 0.16 (i.e.p = 0.02, 0.08, 0.16), in order to simulate different scenarios by a factorial design.Boundary conditions and obtained results are summarized in table 1. P reliab,unk,exp values represent the mean values from the t = 1000 repetitions.In the last column, relative differences between experimental and theoretical probabilities are computed.As an example, the experimental reliability probability histograms of P reliab,unk,exp values obtained for three boundary conditions are depicted in figures 9-11.From these, it is clear that the expected theoretical values fall within the bin of the histogram corresponding to the values with the highest occurrence.In addition, relative differences reported in table 1 lie within ±7 %.This value can be used to compare the suitability of the method with others that may be developed in the future.Anyway, it seems there is an acceptable agreement with the expected values.These evidences provide a first validation of the proposed method.
Standard uncertainty u(S unk (p)) associated with the unknown MEMS according to equation ( 7) is also calculated for the previous boundary conditions with the addition of two more conditions with p = 0.32 and p = 0.64 in order to increase the number of cases.As benchmark batch mixture distribution, the one coming from the first bootstrap cycle is used (figure 12).Minimum and maximum values are S min = 819.6   2.
Sensitivities of the unknown MEMS sensors, calculated according to equation (5), lie between 840.5 D 16-bit-signed /(m s −2 ) and 841.0 D 16-bit-signed /(m s −2 ), indicating a rather symmetrical distribution of the mixture distribution benchmark batch despite different p.Standard uncertainties lie between 6.9 D 16-bit-signed /(m s −2 ) and 11.8 D 16-bit-signed /(m s −2 ) which in relative terms correspond to about 1.1 %.Comparing this value with the calibration standard uncertainty (0.5 %, see section 2), it is two times larger.This is the price to pay for avoiding a huge number of experimental calibrations.However, for many practical applications where very low uncertainties are not required, this represents a good   compromise that addresses the need for traceability with lower efforts compared to the traditional metrological chain.It is also worth noting that the minimum uncertainty, with the considered boundary conditions, is found for p = 0.08.However, this result does not hold universally but depends on the characteristics of the mixture distribution of the benchmark batch, which influences the value of lower and upper bounds.Depending on the needs, therefore, an appropriate trade-off between batch numerosity, N, number of sensors to be calibrated, n, reliability P reliab,unk and uncertainty of the statistically calibrated MEMS, u(S unk (p)), should be decided each time.

Conclusions
In this paper, a Bayesian approach to provide a statistical calibration of large batches of MEMS sensors is described and proposed.The need for this work arises from the impossibility of calibrating each MEMS one by one following the traceability standards of traditional measurement instruments.
It is important to underline that the proposed method is to address the need to extend traceability to these types of sensors, without affecting the current metrology chain for traditional measurement transducers currently used (one-toone calibration).This aspect is crucial since this statistical methodology is in no way intended to replace the current rigorous metrological chain, nor to be used in applications that require measurements with high accuracy and low uncertainty.In fact, MEMS-based sensors are not intended as replacements for current traditional instrumentation but as complementary to them for the detection of physical phenomena, typically on a large scale through the use of extended sensor networks [22].In practical applications, an extended sensor network, containing both statistically calibrated MEMS technology-based sensors and traditional higherquality measurement instruments in a few nodes, is certainly much safer and more trustworthy (and traceable) than any other network containing sensors of unknown or undefined sensitivity.
Given this premise, the proposed idea is to experimentally calibrate only a small subset of sensors from an unknown batch and verify that their sensitivities lie within previously defined limits, which depend on the calibration of a benchmark batch representative of the whole production process, and on a desired probability (1 − p) of acceptability.This entails a dramatic decrease in calibration efforts.Depending on the number of sensors in the unknown batch N, the number n of sampled sensors, and the number k of out-of-tolerance sensors found in the subsample, it is possible to evaluate the probability of having a number C of out-of-tolerance sensors in the unknown batch.Assuming as a desirable feature that of having in the unknown batch a number of out-of-tolerance sensors C not larger than the number of defective sensors in the benchmark batch C bench , a batch reliability P reliab,unk can be defined as a suitable metric.This is an additional information that characterizes the reliability of the batches that will be released.Of course, calibrating a lower than N number of sensors implies a reliability below 100 %, which is instead the current metrological chain requirement.This signifies the potential for an impact on subsequent measurements conducted with these sensors.In practical terms, there exists a non-zero probability that the sensitivity of certain statistically calibrated sensors may fall outside the specified limits.However, the emergence of cutting-edge machine learning and deep learning methods [23] might offer a possible means to mitigate this issue by assessing the congruence of the physical signals detected by sensors within the same network.
Another challenge lies in the calibration of the benchmark batch, which can still demand a substantial experimental effort.The extent of this effort depends on the desired reliability.In fact, to achieve the highest attainable reliability, in general it is advisable to work with larger batch sizes, even though this approach needs a heightened calibration effort for the benchmark batch.Consequently, this might result in a considerable volume of measurements.Nevertheless, this outcome is highly contingent on factors such as the sensor type, measurement technique, calibration conditions, and the calibration system.To minimize the number of necessary calibrations, a favourable compromise between high reliability and reduced experimental workload involves selecting lower values of p, allowing a decrease in batch size N while still choosing very small samples from the unknown batches.This strategy might offer an optimal solution as it ensures a reduced calibration workload while maintaining acceptable levels of reliability simultaneously.
In addition to the batch reliability metric, this work gives also indiction on how to provide the (N − n) statistically calibrated sensors in the unknown batch with a (statistical) sensitivity value and a projection of the associated uncertainty, both depending on the desired value of p and the characteristics of the benchmark batch mixture distribution.The proposed method is applied and validated with a repeated bootstrap technique through the calibration of 100 nominally identical digital MEMS accelerometers.The results showed an acceptable agreement between experimental-based bootstrap and theoretical values within ±7 % in terms of relative difference, confirming the validity of the method.
In the future, it will be necessary to extend this methodology to encompass larger batches of sensors to determine if the level of agreement can be enhanced during the validation process.

Appendix. Mathematical formulation
Assuming that experimental calibration and manufacturing process do not change from batch to batch so that the expected number of out-of-tolerance sensors in the unknown batch is equal to that in the benchmark batch, C bench , a binomial prior distribution representing the state of knowledge on the number of out-of-tolerance sensors C in a typical batch from that production can be given by where N is the number of sensors in a batch and p is the probability of out-of-tolerance sensors in the benchmark batch.
According to Bayes' theorem, the prior distribution is multiplied by a likelihood function f like (k; n, C, N), which is defined as a hypergeometric distribution with n calibrated sensors from the unknown batch, k of which are defective, drawn from the unknown batch of N sensors, C of which are defective, according to: This product yields an un-normalized posterior f post,un (C; k, n, N, p) for C out-of-tolerance sensors in the unknown batch according to: The probability mass function f post,norm of the number C of out-of-tolerance sensors in the unknown batch, given that there are k out-of-tolerance sensors in the small calibrated subbatch, can found be according to:

Figure 2 .
Figure 2. P reliab,unk as function of n/N, for different N values, with fixed probability p = 0.1 of out-of-tolerance sensors in the benchmark batch.Connecting lines are for representation purposes only.

Figure 3 .
Figure 3. P reliab,unk as function of n/N, for different N values, with fixed probability p = 0.01 of out-of-tolerance sensors in the benchmark batch.Connecting lines are for representation purposes only.

Figure 4 .
Figure 4. P reliab,unk as function of n/N, for different N values, with fixed probability p = 0.001 of out-of-tolerance sensors in the benchmark batch.Connecting lines are for representation purposes only.

Figure 6 .
Figure 6.Sensitivities along z-axis of the 100 MEMS at 10 Hz with relevant calibration uncertainties (at 95 % confidence level).

Figure 7 .
Figure 7. Distribution of the amplitude sensitivities along z-axis of the 100 MEMS at 10 Hz.

Figure 8 .
Figure 8.A single cycle of the bootstrap technique iterated r times (where 'success' is a counter variable).'OoT' stands for 'out-of-tolerance'.

Figure 9 .
Figure 9. Histogram of P reliability,unk,exp from t = 1000 repetitions of the bootstrap process with n = 5 and p = 0.02.

Figure 10 .
Figure 10.Histogram of P reliability,unk,exp from t = 1000 repetitions of the bootstrap process with n = 20 and p = 0.08.

Figure 11 .
Figure 11.Histogram of P reliability,unk,exp from t = 1000 repetitions of the bootstrap process with n = 10 and p = 0.16.

Figure 12 .
Figure 12.Benchmark batch mixture distribution of the 10 5 × N simulated sensitivities coming from the first bootstrap cycle (N = 50).

Table 1 .
Results of the repeated bootstrap process with different boundary conditions.