GEMIMEG-II — How metrology can go digital ...

The GEMIMEG-II project is intended to pave the way for digitalization in metrology. The central element of this digitalization initiative is the digital calibration certificate (DCC). It contains all calibration information in full digital form. This means, that it is machine readable and machine understandable without human interaction. This enables its utilization by being securely machine interpretable and machine actionable in the entire chain of truly digital workflows and information technology (IT) environments in Industry 4.0. Therefore, the DCC is created automatically in the calibration process in a standardized form based on a digital document schema. This systematic schema enables to safely transfer, process, and interpret all data in the DCC automatically in all subsequent IT based processes. This paper reflects the project status of GEMIMEG-II in its final phase and shares some insights on the concepts developed and solutions implemented as the results will be demonstrated in five Realbeds. Furthermore, the concept of quality of sensing and quality of data will be introduced as it is implemented in the GEMIMEG-II project to convey supplementary information on the measurement, environmental and/or surrounding modalities, and data quality. Finally, a brief outlook will be given on next steps and actions planned in the project related to other digitalization initiatives for the fab of the future.


What does digitalization mean?
To digitalize given and existing processes often turns out to be much more than simply converting the output of a process into a digital form. In the calibration domain, sometimes the output documentation is already provided as a portable document Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. format (pdf) dataset. Even though, this pdf document is standardized through ISO 32000 [1] in different varieties and level of detail, it is-simply speaking-a portable digital representation of a paper document. This is extremely valuable already as it can be exchanged between different types of computing systems and the content and graphics of the document remain unchanged. Thus, the very broad aim and scope of the pdf specification and representation is to maintain the author's design and original content for any documentation and/or for document exchange, storage and archiving. The content of a pdf document consists of the textual body enriched with all associated formatting information necessary to replicate the document appearance on different computing systems or visual outputs in printouts or displays. There is no formal semantics or ontology structuring the content of the document to make it machine readable or machine executable such as a formal kind of schema for an unambiguous representation of the document content. Nevertheless, such a structured dataset can be included into a pdf document and the content can also be protected when signing the pdf document to prove authorship and the document's originality and authenticity.
Full digitalization requires to have a clear, systematic, and stringent semantics for the entire content of a document. Such a formal system is necessary to structure domain specific information unambiguously to make the document's content machine readable and finally also machine executable. As a prerequisite, it is necessary to define and specify all technical terms precisely without any ambiguity. One such semantic system for a given domain is called ontology. The ontology provides relations and/or hierarchies between the semantic elements. The quality of the semantic system and its related ontology is essential to transfer and convey all information of a document between a data source and a data user in secure way in order to avoid any misunderstanding or misinterpretation. This means, that apart from a digital representation of a document as a structured data file-like in the example of the pdf-, there also needs to be a semantic description of all the formal and technical terms used in order to define and specify precisely the content of each element in the document. Therefore, the digital calibration certificate (DCC) [2,3] needs such a complete and efficient semantic description with related ontology for all the terms and information contained in the digital document. From a globalization perspective, it is advantageous to have one common semantic system for the DCC which supports all needs and demands of metrologists around the world. If multiple semantic systems or ontologies would exist, an additional translator software would be needed as a middleware to convert a DCC from one ontology with a given semantic into another ontology with its related semantics. To avoid any additional and error-prone effort related to translators, one common international applicable DCC semantic system where all the stakeholders can contribute is beneficial for Industry 4.0 applications.
The semantic structure underlying the digital calibration documents is essential to enable machines to read, understand, interpret and act on the data. It is a mandatory prerequisite for collaborative or autonomous interoperability on system and data level. In consequence all users of this semantic structure can and have to select the correct structural element to assign or read the respective values and/or units and/or datum properly. A user or subscriber to the information of such a dataset must rely on the assumption, that the expected and correct information was assigned to the respective structural element according to its definition. Hence, the quality of the vocabulary and the precision of the related definitions are the key elements for users and programmers to assign and retrieve information properly. Typically, when humans interact on data, they can clarify upcoming questions in direct exchange. Machines will take all values as granted for the respective structural element they are assigned to and simply run their codes with the values as read from the file. In that respect, it is also important that the digitalization of the calibration domain reflects local or regional aspects how information is represented in a calibration certificate. Typical differences are related to metrological units from international system of units SI [4] to various imperial units, the assignment and representation of measurement tolerances and uncertainties and sometimes also the vocabulary or technical terms. Therefore, a truly digital calibration system has to be built upon common semantics and ontology on the one hand, but on the other hand needs to be open or flexible enough to also support regional, local or even application specific aspects. All terms related to special aspects also need to be added to the sematic structure to secure international applicability. Therefore, an international and open approach moderated by a group of trusted maintainers is preferred and suggested to create a common digital DCC ecosystem. It might be the most effective and resource efficient way to create a versatile international digital metrology ecosystem.
In daily practice of mutual long-term and trusted collaboration between a tool owner and a calibration service provider highly customized practices or 'internal' technical terms might evolve. Digitalization bears the chance to adopt or replace such individual practices and terms by transparent processes based on good practice of the community or terms used by the community. In principle, even specific additions can be made to the DCC in a separate namespace. The closer a DCC sticks to a good and preferred practice, the easier will be an automated and digital processing of data and content or to secure process resilience in case new calibration service providers are integrated. Customizations typically necessitate some subsequent software adoptions or specific middleware for system owner and/or calibrator.

The GEMIMEG-II project
The GEMIMEG-II project is a German national funded project to pave the way for digitalization in metrology. It is intended to prepare a first foundation and proof of concept of the DCC application. Even though it is nationally funded, it is open to share and discuss the concept and findings in the international community on conferences and in the peer group of 30+ associated international partners. The acronym reflects the project aspiration by combining GEMIni for the digital twin of the MEtrology equipment for Global application. This digitalization initiative focusses on the DCC and its fully automated application in modern industrial information technology (IT) infrastructures. The project consortium consists of the national metrology institute Physikalisch-Technische Bundesanstalt, Germany (PTB), different industry partners and multiple research institutes. The core of the user story of the project is based on an automated calibration workflow documented in a DCC which is then transferred safely and without human intervention to the customer of the calibration. At the customer site, the DCC is read, processed, and interpreted automatically by machines in the full chain of workflows in typical IT and operational technology (OT) environments in Industry 4.0 to update all relevant information in the plant management system (ERP-enterprise resource planning) and all calibration related information in production. This paper reflects the recent project status of GEMIMEG-II in its final phase and shares some insights on the concept developed and solutions implemented. Five Realbeds will be implemented in the project to showcase and prove the applicability of the DCC in different technical fields. Realbed 1 is the Digital Competence Center for wind power (d-CCW at PTB, Germany [6,7]) as a new calibration system for huge torque moments of up to 5 MNm and 20 MNm at a later stage. Calibration results of this complex system will be reported directly in DCC format. Realbed 2 uses a highly digital and automated 'Factory of the Future' scenario to mimic a modern Industry 4.0 environment with IT, OT and internet of things (IoT) devices. Realbed 3 focusses on the 'Process and Pharma Industry'. A single company in this industry typically has a very high number of 10.000-100.000+ recurrent calibrations for process related equipment every year. This Realbed analysis the benefits from full digital process chains in a highly controlled and regulated environment. Realbed 4 'Autonomous Driving' focusses future mobility aspect of numerous sensors with dynamic and recurrent (minutes-hours timescale) calibrations for safe autonomous functions. Realbed 5 is a 'Legal Simulation Study' to challenge the evidentiary value of the DCC in simulated court proceedings with real advocates and judges for representative cases related to the other Realbeds. This paper will further introduce the concept of quality of sensing (QoS) and quality of data (QoD) as used and implemented in the GEMIMEG-II project to convey supplementary information on the measurement and data quality, as the metrologist would do in today's practice. Figure 1 shows the conceptual diagram of the GEMIMEG-II project for a generic calibration and data processing workflow in a typical quality infrastructure together with related communication technologies.
The quality information quality of X (QoX: sensing (S), data (D) and information (I)) is intended to convey relevant information on measurement circumstances and/or indicators for the trustworthiness of a measurement or datum which are not influencing the measured value nor the measurement uncertainty. QoX can be application specific or user defined in contrast to measurement uncertainties which are specified by the guide to the uncertainty in measurement (GUM) [8] and reported in the DCC as integral part of the measurement result for each value measured like today in a calibration certificate.
The following chapters are arranged according to the different technical fields of the GEMIMEG-II research agenda starting from more general topics followed by more detailed information in later chapters. Finally, a brief outlook will be given on next steps and actions planned in the project with regard to other digitalization initiatives.

The digital document schema
The digitalization of the metrology domain requires multiple digital documents. Therefore, it showed out that it is advantageous that all these documents are derived from the same basis, called the digital document schema DX. This common DX concept is expected to bring significant benefits with respect to interoperability of the different documents of the metrology domain. A generic DX concept view chart is shown in figure 2.
Since digitalization is evolving at many different places and domains, interoperability is essential for a smooth combination and integration of different software modules. Therefore, the bottom part of the chart in figure 2 shows a selection of underlying norms and standards which can be used or should be followed or obeyed while developing the DX document schema for the metrology domain. This list cannot be complete since digitalization is a fast-developing field in multiple domains and applications. It is in a kind of flow of different somehow interrelating developments. Nevertheless, it is worthwhile to check out to optimize compatibility with other digital solutions in order to reduce tedious rework on the one or the other side.
The DX schema as sketched in the central part of figure 2 is a semantic schema containing all terms needed to digitalize the metrology domain. The schema is structured, so that information can be retrieved easier from the different clusters. From the DX schema different document types are derived, like the digital calibration request (DCR), the DCC or the digital calibration answer (DCA). Further document types can be derived in the same way as indicated on the right side of the view chart. This common structure secures, that all documents derived are built upon the same terms and semantics which simplifies file handling, information input and information retrieval.
Furthermore, a common ontology and semantics for the data structure in the different documents allows for synergy and scaling effects when developing functionality for subsequent middleware software and application software modules. Most of these software functions can be reused for multiple application with different DX based documents. This will take effect on both sides of the process chain between the calibration or certificate provider and the customer or user of this documentation. In addition, this might also help to pave a way into more automated auditing processes.
To derive separate documents from a common underlying structure has the big advantage not to overload one document with features not needed for this specific document function on the one side and to keep all semantics consistent over the entire DX schema on the other side. A clearly defined structure of the DX schema immediately helps, that all documents derived from that schema are machine readable, machine understandable and thus the information contained can be machine executable.
Another great benefit of the one semantic structure in the DX schema is, that this schema could be developed only in one language, preferably an English XSD schema, to make it fully international. These semantic words can serve also as reference keys in the XSD schema for the respective fields. Based on the English semantic terms and thus the reference keys, there could be auxiliary files with appropriate translations of all these terms as a respective language pack for each language prepared e.g. by national or regional groups of respective native speakers. This common structure enables to easily and automatically translate the full technical content of DX based files into a specific language properly and unambiguously. These language packs also serve as basis when a human readable output (HRO) is generated for a given DCC XML file. This will ease international applicability and collaboration very much, such as already achieved by the vocabulary in metrology [9]. Only individual or dedicated comments or additional information cannot be translated automatically in that approach but all content that is relevant for machines and for auditing respective processes.
The top line of the chart shows the optional part of HRO generator to the respective document type. The style of the HRO might be document dependent and might also need to follow some formal requirements for a respective document type, like the calibration certificate must follow the regulations of ISO 17025 [10] or the certificate of conformance those of ISO 17065 [11].

The digital calibration documents and their application in a calibration workflow
To digitalize the calibration domain in metrology, it is much more then to create a DCC. In order to support the digitalization effort in the industry, especially for Industry 4.0, it is of particular importance to take the whole calibration workflow into account. Therefore, the calibration request is needed to initiate the calibration, while the calibration certificate and answer contain its result.
For each measurement device like a sensor, a measuring system or a calibration artifact, the calibration requested by the owner of the equipment from the calibration service provider has to be specified precisely. This is of fundamental importance to enable fully automated calibration routines-either on premise of the equipment owner or remotely at the location of the calibration service provider. The digital document to exchange the calibration requirement information is the DCR. It serves as the technical requirement specification of the calibration. The commercial part related to the calibration is left to the procurement system of the companies either by individual orders or in frame contracts. The DCR serves to convey a complete and detailed technical specification for the respective calibration requested in fully digital form to the calibration service provider. Figure 3 shows a schematic diagram of information blocks contained in the DCR.
The IT system of the calibration service provider extracts the information from the DCR to perform the calibration as requested by the customer. The administrative data as already entered by the tool owner will be used without any modification and enriched with supplementing administrative information about the calibration service provider and/or equipment used in calibration to generate the full set of administrative data in the DCC. Then, the calibration measurements will be done and evaluated. The final calibration results are put into the data section of the DCC. Even further information can be added to the DCC like the QoS or QoD information. The DCA is a supplementary document. It is intended to convey additional information from the calibration service provider to the equipment owner which cannot be part of the DCC for legal or formal reasons or which the equipment owner does not want to have in the DCC dataset, e.g. when the DCC is part of a product delivery of an equipment manufacturer to an end customer.
On the right side of figure 3 is some supplementary information on the content blocks. Furthermore, it suggests, that the DCC may contain the information about its issuer-like if it is a national metrology institute, a calibration lab or a factory calibration or an acceptance test from a service technician. This hierarchy level is helpful to distinguish different documents automatically in order to identify the most relevant DCC in an easy fashion, if there are multiple calibration protocols in the validity period as defined by the equipment owner.

Sensor calibration and related aspects
In technical applications, there are different kinds of sensors. Physical sensors based on a physical sensing principle, e.g. for temperature, voltage, current, resistance, capacity. More advanced sensors are based on multiple sensor inputs of physical sensors to be combined into a common measurement result. A new multi-modal sensor value gets computed based on physics principles related to the sensor measurand and the input values of the contributing sensors. An example could be a humidity sensor to measure the (absolute/relative) humidity of air or a specific gas.
Model based sensors are another group of sensors employing a generic model to generate the sensor output value based on one or multiple input values from physical sensors. The sensor model is based on physics principles and/or a model of the measuring system. The respective model describes the functional relation between one or multiple input variables of input sensors and the generic output value of this model-based sensor. It is not important for the sensor function if the computing of the output values is performed on a sensor device or on a separate computing environment on an edge device or in the cloud. Since the model is characterized by a functional relation between input and output values, this sensor type is transparent to the user in its function by the explicit functional relation. In that respect, explicit sensor models can be considered as 'white box' models. Preferably, this functional relation is based on physics principles in an explicit way. Model-based sensors are typically calibrated for a range of input values for each of the input sensors. Typically, this sensor type generates valid output values for any combination of input values from the different sensors as long as the range of input values was covered in the calibration for the respective input sensor.
Another quite modern type of model-based sensors is based on models generated by computer learning, machine learning or artificial intelligence. These learning-based models are learned on a dataset of a given distribution of values of the respective input variables. Thus, it is of critical important that the training data covers the expected range of all input variables in an appropriate way. The quality of the training dataset is decisive for the quality and accuracy of the learning-based model. The sensor model is typically not available in explicit or functional form and thus not transparent to the user. It is contained implicitly in the functional structure of the computer code produced in the training process. In that sense, these learning-based sensors can be considered to be a kind of 'black box' models. This is a major difference with respect to sensors described before. They can be described as white box sensors since the sensor function can be described explicitly by formulas, making them fully and truly transparent in their function or functional principle. Typically, the functional principle of physical and model-based sensors is differentiable with respect to changes of input variables and thus these sensors might still produce quite reasonable output, even when at least one of the sensor input values is (slightly) out of the calibrated range. In contrast, learning based sensor models can behave very different when input values are in combinations which are not covered in the training phase. Due to a highly non-linear nature, the output of such learning based sensors can change drastically in an unpredicted fashion, even when only one of the input variables slightly leaves the range of values covered in the training. Typically, such learning-based models cannot extrapolate-sometimes not even for small differences of input values.
Sensor signals typically contain some noise in addition to the value measured. Therefore, some appropriate technologies for noise reduction or suppression get applied either on the analogue signal or the digital values. Typical functions are e.g. averaging over a time span or a number of consecutive measurement values, sliding window approaches with some weighting factors to reduce the impact of signals with increasing time span to the actual measurement. When these functions are specified for all their functional parameters the output of these functions is fully deterministic and explainable and in that sense completely transparent. Sometimes, sensor system suppliers do not explicitly define these functions to their customers but specify minimum response times.
Having these differences between the sensor types listed above in mind, the question is, if such learning-based sensors can really be calibrated in a classical sense. There is an ongoing discussion in the expert community related to formal and/or technical aspects. Calibration is defined to compare a system with a standard and to determine a measurement uncertainty. This is typically not possible with a learning-based system. Therefore, the term calibration cannot be used in all cases.
Nevertheless, there is a clear need to qualify the output of learning-based sensors by an independent third party. A pragmatic approach to that problem could be, that such sensors get qualified for a given range of the respective input parameters by a qualified third party. The result of such a qualification process gets documented by this third party in a way comparable to a DCC but not in a DCC. As a suggestion, the qualification result can be documented with a digital qualification certificate (DQC). Accordingly, the request for qualification by the owner of such a learning-based sensor can also formally be conveyed to the third party in a technical specification document. Hence, this document can be named digital qualification request (DQR). This DQR document would specify the input variable ranges for all variables or the parameter space in multi-dimensional fashion where each of the input variables represents one dimension of the parameter space. It also needs to be specified which data is used for qualification, since the data quality and content may strongly influence the output of the qualification. The availability of standardized and qualified datasets for qualification might not always be feasible or practical. To outline the present status of this early conceptual idea, the view chart of figure 2 is refined for this aspect of learningbased sensors and/or black box sensor models and shown in figure 4 by adding the DQx document types. In essence, this conceptual idea of the DQx documents shows, how flexible the DX document schema can serve even new applications.
At present, it might be too early to list a concise set of stringent criteria when a sensor model is categorized as white or black box or somewhere in between in a kind of 'grey' tone. Future developments might advance or modify this suggestion of a concept to handle this topic appropriately and even learning methods might lead to fairly transparent functional models of a sensor over time.

The transfer of DCx documents
In a chain of a fully digital workflow, the DCA, DCC and DCR need to be transferred securely between the equipment owner and the calibration service provider. There are different ways imaginable, from a point-to-point connection between the two institutions on the one side or a general exchange portal or platform on the other side.
Dependent on the number of calibrations or a security level required by the calibrator or system owner, the one or the other solution might be selected or any setting in between. As an open solution-as envisioned in the GEMIMEG-II project-a platform or multiple platforms consisting of shared file systems or repositories might be preferred. In this approach, multiple physical systems can even be combined into a common virtual system. A platform can serve small or large calibration service providers, since they do not have to run the infrastructure on their own with respect to all security and resilience requirements. From a user perspective, a very limited number of platforms might be favored, to keep the effort small for document exchange and handling. A concept of a virtual platform can help to virtually merge different physical platforms into one logical platform as a single-entry point to share or retrieve a given DCx document.  Typical status of calibration test equipment inventory (TEI) and related process with DCR plus procurement document to order and initiate a calibration. The DCC as output of the calibration is returned to the system owner after the calibration together with sensor system. The DCR is issued by the system owner and transferred via the platform or directly to the calibration service provider selected. The DCR might be accompanied by a formal purchase order information. It also might specify how and where the final DCC and optionally a DCA document have to be transferred to the tool owner. A generic flow chart of the calibration lifecycle status of a calibrated equipment is shown in figure 5 for a typical process implementation.
DCCs and DCAs are created in the calibration process. They are stored on a repository or platform accessible to the equipment owner. If the DCx documents are stored on a platform accessible by multiple companies, there might be some interest that a specific user can only access documents belonging to his inventory. This is a fundamental prerequisite to protect business related information contained in a calibration document, business relationships or even the business volume of a specific calibration service provider or manufacturer. Therefore, the system owner will need some information like e.g. the file name, exchange platform with credentials and/or potentially encryption key to download and read the DCC and DCA files. What exactly will be needed will be governed by the respective exchange platform. If the system owner has some data section on an exchange platform, the calibrations service provider might be entitled to put the respective files directly into this customer section to ease the exchange process.
From a general viewpoint, if the DCC or the DCC platform itself gets protected against unauthorized access, one needs to qualify digitally with credentials to be eligible to access, download, open and read a respective DCC. A high level of security is reached, if a user has to prove his ownership of a respective system on the one side plus that he has to have an encryption secret to read a respective DCC content.
Technologies like two-step or multi-factor authentication would help that only the actual system owner is eligible to open the latest calibration documents which might be important when systems are sold over time.

Revocation of a DCC document
Typically, the exchange of the various DX-based documents as described in section 3.2 can be done in a request-response type of communication. Sometimes, the calibration service provider might be constrained to revoke a certificate document like a DCC he has issued before. Such a revocation can be considered as an event driven push notification to inform the actual user of this respective certificate in timely fashion since the application of the information from this certificate can be safety critical. In case of a DCC the calibrator does not necessarily know the actual user/owner of the device. Therefore, the DCC file name is appended on an open and public DCC or in general a digital certificate revocation list. This base concept is applied successfully with digital certificates already, e.g. for public key infrastructures (PKIs).
The DCC revocation list could either be a general list, where all revocations are contained, or a list per exchange platform or calibration service provider. For distributed lists, the user has to check on every platform relevant for his equipment if one of his DCCs got revoked. In case of a direct contact between calibration service provider and system owner, there could also be a direct information about the revocation. Nevertheless, a public revocation list bears the opportunity for fully digital processes for more transparency and more secure processes with faster response times for calibration documents. Great benefit is in more complex cases, e.g. where the calibration of an item gets revoked, but the respective item was sold meanwhile and thus was changing the owner. In effect, digital document revocation concepts help to make the calibration infrastructure more resilient in its operation.
A revocation of a DCC or any other DX-based document can only be done by the issuer or issuer organization of this respective document. The system owner is responsible for the utilization period the respective system. If there was an incident of the calibrated item in its application phase, the system operator or user can set the system status to unclear which means that at least a recheck of this item has to be done by a knowledgeable person which can decide if this item can still be used or needs to be repaired and/or recalibrated. The related process parts are also indicated in figure 5 on the left and right side, respectively.

Supplementing and supporting digitalization initiatives and technologies
When digitalizing the calibration domain in metrology, there are many pre-requisites in digitalization necessary to complete this demanding task. The most relevant ones for the domain of calibration and the calibration certificates are listed in the following sub-sections.

Unique product identifier
The equipment to be calibrated needs to have a unique identifier. In today's practice a simple sticker is sufficient to identifying the equipment manually as part of the metrological equipment inventory on a company level. In a digital environment, it is preferential, if this unique identifier is computer readable-preferably via standard network technology like (wireless) local area networks. Having unique identifiers for industrial inventory is a general requirement. Therefore, existing approaches can be used like for the digital nameplate [12][13][14] or the identification link [15]. For the purpose of calibration and safe traceability of a measuring system or calibration artifact, it is mandatory that the digital equipment identifier is unique and machine readable. Nevertheless, for factory automation and even brown field automation there also needs to be a fall-back solution for equipment not having a digital interface, e.g. like a pressurized gas cylinder out of metal with calibration gas mixtures for a gas chromatograph. In this case labels with unique identifiers can be attached to the mechanical hardware to enable proper machine readability. These unique identifiers could contain the same information directly or via a link to a network resource, where all relevant information could be found, also including a DCC for the respective configuration or mixture of gases as in the example.
In principle, the unique product identifiers as suggested in the various digitalization initiatives have the form: [unique

ID manufacturer]_[material number of system from manufacturer]_[serial number of system].
The unique ID manufacturer can be a real unique ID from a kind of a public register or the internet domain of the manufacturer. Alternatively, the legal entity identifier (LEI) [16] could also be a solution for a unique ID, since internet domains could change their owner over time. It is described in ISO standard 17442 [17,18]. For digitalization purposes, the verifiable LEI (vLEI) [19] got introduced. The vLEI is a digitally trustworthy version of the 20-digit LEI code which can be automatically verified, without the need for human intervention. Conceptually, the other two parts are unique, since every manufacturer has a system of unambiguous material and serial numbers for his systems produced.
In general, it is preferential if the DX document schema can handle different types of unique identifiers by a combination of a definition of the type of respective identifier used and the unique name of this identifier for the specific sensor or unit. This openness helps for international applicability, since different systems are already implemented for different applications or in companies. In principle, one system could have different types of unique identifiers related to it, like the model number and serial number from the manufacturer, a digital ID of the communication interface or a unique test equipment identifier from the system owner's test equipment inventory list.

Complete system configuration documentation
For any modern system consisting of hardware and/or software modules it is very important to have a precise documentation for all components employed. On the one side there is a hardware configuration which also might become more flexible or context dependent in its operation in a modern IoT environment when single sensors or sensor systems get combined to provide data to a more advanced sensor type or model-based sensor. On the other side there is the software of a sensor or sensor system which might consist of operating system, firmware, application software including the graphical user interface or human machine interface. When calibrating a sensor or sensor system, it is preferable to document all software modules used on the sensor or sensor system in the calibration by the full version or revision number and potential parameter settings or files used to initialize and operate each of the software modules. The reference to the parameters used can be explicit by listing all parameters used or by referencing back to a version-controlled standardized parameter file. This decision might be taken case specific. Preferably, the information on the respective software version and parameter files can be read out automatically from the sensor or sensor system and be documented in the DCC or DCx.
Even software-based sensors and learning based sensors need to be documented properly by software release version, configuration file used, training version or training cycle number plus training data set identifier and the like, whatever is relevant and appropriate to identify the exact configuration. To avoid any ambiguity or unclarity, it is important to document the full system configuration in the DCC.
For hardware-based systems, there is a huge effort ongoing for the asset administration shell (AAS) [20] to have a full digital twin of a hardware system. It also includes its precise configuration in the respective product lifecycle management or ERP system of the hardware system owner. In future, the AAS will provide appropriate sub models for a respective sensor type as they are under development in different initiatives for various sensor types already. In this context, there might also be solutions prepared to document sole softwarebased sensor systems properly. An AAS sub model for the DCC is currently under development.

Digital unit representation
A measurement result consists of the measurement value and the metrical unit related to this value. The international system for metrical units is the Système International d'Unités [4], the SI system. There is a fully digital representation of the SI system called D-SI system [21][22][23]. It is based on the seven metrological base units. All other units can be derived from these base units. In this framework it is also intended, that further units can be represented, including imperial units. They can simply be defined as a derivate unit with a unit name, eventually short name of the unit and unit symbol. Those derived units are referred to a combination of the base units in the respective power together with a scaling factor. The D-SI system is suitable to represent all metrical units used in calibrations worldwide, including all kinds of imperial units. The D-SI also includes a concept to append decimal multipliers to all the units in a systematic fashion. For unit representation in the DCC the D-SI is chosen since it can serve all requirements for a versatile international system.

The digital trust chains
Documents in the digital world need to be easily verifiable to be unchanged and authentic. Typically, this is achieved by a digital signature of the issuer over the respective (portion of the) document. The identity of the signer is bound to a public key (PK) by a X.509 PK certificate issued by a trusted third party called certificate authority (CA).
In ISO 17025 [10] there is no hard requirement on a digital signature for a DCC document, the necessary trust framework, or the signature format that should be applied. With this regard, there is some imbalance and in consequence we expect that calibration service providers on different levels of the calibration pyramid will use PK certificates issued under various policies from different trust anchors (Root CAs).
This can make the step to validate the integrity and authenticity of a document like a DCC even more complex. Therefore, it is beneficial, to have a common technical format for digital signatures of a DCC, which includes some necessary information for signature validation. Hence, we propose to use enveloped XML advanced electronic signature (XAdES) [24]according to electronic IDentification, Authentication, and trust Services (eIDAS) [25] nomenclature and standardized by the European Telecommunications Standards Institute [26,27]-on a DCC document as a good practice. This would ease the trusted utilization of a DCC with standard methods for good international cooperation. Furthermore, it can avoid any imbalance between issuers on different levels of the calibration pyramid. Since there is still a lot of development ongoing for electronic signatures and their mutual international acceptance, we might see some evolving requirements and solutions coming up over time.
In the GEMIMEG-II project enveloped XAdES will be used in the Realbeds in order to show technical feasibility. The signatures will be based on a public key infrastructure (PKI) framework with a respective root certificate from a root CA. In the project, the CA is from Deutsche Telekom Security.
Since it is very likely that there will be not a single Root CA or even trust framework (like eIDAS) accepted worldwide, XAdES allows to apply multiple signatures on one document. In that case, the different signatures can be side-by-side (parallel signatures) or the first signature of the original issuer is the root and further signers can apply countersignatures that confirm the first digital signature. Currently, this concept is developed with those two options and will be detailed further on in a good practice suggested in the future.

Secure device enrolment
Sensor system devices have to be enrolled to the domain of the operator of the respective production system network prior to their utilization. Typically, IT based devices have been assigned to an initial identity by its manufacturer. During enrollment, a new identity from the operator's domain is assigned to the device. This new identity enables the device to authenticate itself to other devices/services from the operator's domain.
There are several methods that can be used to automatically perform the enrollment process after the device has been integrated into the operator's network. The Enrollment over Secure Transport (EST) protocol, standardized by the internet engineering task force (IETF) in request for comments (RFC)

[28], replaced by RFC 8951 [29]
, is a comparatively simple procedure that can be used to assign a new identity to a device: The device establishes a secure transport layer security (TLS) connection with mutual authentication to the EST server, which is referred to as registrar in the figure. The device then generates a new key pair and generates a certificate signing request, which is verified by the registrar and a connected CA. The CA finally issues a certificate for the device, which is forwarded to the device. EST is performed via the HTTP protocol and specifies additional endpoints that can be used, for example, to query additional CA certificates.
One disadvantage of EST is that both the EST server/registrar and the device must already 'know' each other in order to establish a mutual trust relationship. The Bootstrapping Remote Secure Key Infrastructure (BRSKI) standard (RFC 8995) [30] provides an extension that allows the device to establish a trust relationship with an unknown registrar. In the case of BRSKI, the device trusts only its vendor, from whom its initial identity was issued.
The device initially does not trust the registrar but accepts the TLS connection anyway. Then the device sends a so-called voucher request to the registrar. It is a digitally signed JSON structure containing information about the client. The registrar then sends an own voucher request to the manufacturer. This request also contains the original voucher request of the device. The manufacturer verifies the voucher and the registrar's identity.
After the manufacturer has determined that the device is in the correct domain, it issues the voucher, which is digitally signed by the manufacturer and contains the registrar's certificate.
The client on the device can now verify the voucher with the manufacturer's PK certificate and is then able to verify the TLS connection with the registrar afterwards. Then, the EST protocol can be performed over the now trusted connection to obtain a new identity. The whole process chain of BRSKI and EST enrolment process is shown in figure 6.
BRSKI is an extension that allows devices to be equipped with a uniform initial configuration, regardless of the domain in which the devices are deployed later. This BRSKI method reduces the effort for the manufacturer of the devices but requires the operation of a corresponding service that issues a voucher for a device in the enrollment process later for the safe and robust enrollment.

GEMIMEG-interface
When a sensor or sensor system is securely enrolled in the domain of the measuring system owner or registrar, a standardized communication interface helps to exchange the DX or DCC related information.
This communication interface is a pre-requisite to establish a digital collaboration to exchange DCCs or other data between different institutions or legal entities. In the GEMIMEG-II project, we follow the concept to have a standardized lean interface which interfaces between the internet on the one side and a respective IT system of an institution or a respective IoT device on the other side. When this interface is available at the two institutions or on their respective devices, the connection between the GEMIMEG-compatible systems can be established. Alternatively, the interface can also be used to exchange documents with the DCC server platform to upload or download DCCs from a repository as shown in figure 7. A generic view on the effect in a distributed system of different calibration service providers, sensor and sensor system manufacturer and integrator is shown in figure 8. The exchange of DCC and related documents is shown via a common DCC platform in the center.  The concept and solution how to connect sensors to another IoT system as developed in the project should be considered as a good practice recommended by the project consortium. This good practice is not mandatory to allow all users to also implement their respective solution according to their specific requirements, e.g. stemming from compatibility constraints, to make the solution retrofittable to an already installed base of his products. Finally, the systems need to deal with the GEMIMEG-II protocol as developed in the project.
The GEMIMEG-II interface specification document is under development and will be tested and showcased in four Realbeds. Publication is planned at the end of the project in 2023 together with some more detailed information on good practices as suggested and tested.

PyDCC tool set
PyDCC is a toolset build in Python to use and extract the information contained in a DCC file. It is programmed in an open-source software approach. The PyDCC software toolset [31] with its GitHub repository will be made publicly available by the end of the GEMIMEG-II project the latest. PyDCC is aligned with the latest release version of the DCC schema and as such needs to be adopted to upcoming releases of the DCC schema. All GEMIMEG-II partners contribute to it and Siemens has the role as maintainer. PyDCC will have a basic functionality which can be extended over time.
In section 4 different tools and methods or the actual state of concepts were presented which are used in GEMIMEG-II to digitalize the calibration workflow. This entire set of digital tools and functionalities enables machines to unambiguously identify a sensor system, to document its full software and firmware configuration, to have a versatile system to represent all units of measurement, to exchange and understand calibration documents, to have a full digital trust chain, to safely discover and onboard new sensor systems with appropriate methods, to have an interface to exchange calibration files and a toolset to extract and utilize the calibration information. All these functions are prerequisites necessary to implement a fully digital calibration process chain for modern industrial IT and OT environments. The set of functionalities provide a solid basis for piloting the DCC functionality in GEMIMEG-II and further enhancements.

Quality of Sensing, of data or of information
In a digital world, data scientists have to rely on the data they get. Every single datum is valid and has the same trustworthiness as all the others. There is no awareness about the level of trust which can be assigned to a specific data value. This contrasts with the previous procedures in the analogue world, where the metrologist was consciously checking, if everything around a measurement or measurement setting was under normal operating condition or somehow suspicious and worthwhile checking it out. The quality of a data value can be of critical importance in a fully digital chain of subsequent and dependent data operations. Therefore, a subproject in the GEMIMEG-II project is focused on data quality aspects, where first publications are available with general considerations and for different metrological applications [5,32,33].
The quality of data information can be differentiated according to the domains, where new data is generated. During sensing, it is the quality of sensing (QoS). In the data driven domain for data operations like evaluation, fusion, modelbased sensing, learning based sensing etc. it becomes the quality of data (QoD). In the information domain, when information is inferred from the data produced in the sensing or data domain, it becomes the quality of information (QoI). In the future it might be advantageous to distinguish more different domains when appropriate, but there should be no inflation on the number of domains. The domain specific data quality indicators can be summarized as QoX, where X represents sensing, data, and information. The QoX are shown already in the conceptual diagram of figure 1 in the respective domain. As also indicated in this view chart, even the QoX should be fully traceable in parallel to the data measured. To separate and distinguish the QoX based on the domains where they are determined might seem a bit cumbersome at a first glance, but it bears the unique and important chance that a data user in a subsequent step of data processing can directly know, where this QoX value was generated. Furthermore, QoS support the concept, to make sensor values as much agnostic to the individual physical sensor used. Typically, this can only be done by sensing domain experts who are also in charge to provide respective QoS. Hardware agnostic data are very valuable for a robust data processing chain fed with the data. Thus, it allows for resilient operation of the whole system, even though when a single sensor needs to be exchanged for some reason.
Conceptually, the QoX indicators should be handled comparable to sensor systems as described in section 4.1. QoX need to be defined with a unique name for this quality indicator and a definition what sensor system parameter is characterized by the respective QoX e.g. internal parameters concerning operational or functional aspects of the sensor system or This preference directly implies a scale with a positive direction for improvement of a QoX to higher values. When all QoX values have the same range of allowed or expected values, it gets much easier to automatically use and process these QoX data further when the related data also gets processed. Table 1 lists some examples or application fields for potential QoX indicators.
Finally, there can be a definition or recommendation of the indicator interpretation, e.g. in which parameter range the respective QoX is good, acceptable, bad or for information only, e.g. at low trust levels. In some cases, e.g. in a trend analysis, it is much better for data evaluation to have a value for the measurement associated with a low QoX instead of no measurement value because of the low quality. When all the information for a QoX is defined, it can be a very versatile and powerful tool when the trustworthiness of data or a specific measurement is under consideration. Since there can be multiple QoX defined for a single measurement, one single QoX can give further insight only for the respective context where it was intended and defined for.
The last line of the table gives an example, where one measured value of a battery voltage at known load current may be used to derive a capacity information based on a related battery model as data and a remaining battery life as information. Both, the data and the information will have some quality levels conveyed by respective QoD and QoI indicators.

How to compute and combine QoX values?
The data created in a measurement will be processed in subsequent processing steps. A parallel and adequate handling of the QoX along with the data is desirable regarding the aspect how the related data quality propagates through the processing chain. When at least one of the data inputs to a respective processing step is critical in its quality, the QoX for the data output should indicate that the data could be erroneous. On the other hand, sometimes, and especially in processes and process control, it is preferable to have at least some data with a low or even very low QoX value than having no data at all. Therefore, a system to compute an output QoX value based on one or multiple QoX values associated with the input data is desirable. Since the QoX are preferably unitless scalars, the combination of different QoX means to assess the quality of a given data value from different parameters or perspectives, just like a metrologist or data scientist would do when qualifying a result. Combining different QoX into one common or aggregating QoX is not to be confused with combining values of different units.
In descriptive statistics, there are multiple methods already established to aggregate numbers into one indicator. At least some of them can be adopted and refined to describe the quality or characteristics of data in a dataset. Even better, they can be employed according to our needs to propagate QoX values properly. The mean value is a very prominent example. There are different ways to define a mean value. These mean values average to some extent the input values to create a single output value representative of the dataset or ensemble of values. Based on the computation defined for a given mean value, some mean values are more sensitive to even single outliers for low QoX values. This would mean, that even one input value of questionable quality will result in a reduced QoX value for the output. This in turn will notify a user of the data, to be careful by using the data when the resulting QoX is comparably low-e.g. when comparing this QoX with an application specific threshold value.
Here are some examples of different mean values known as Pythagorean means. They are arithmetic mean, geometric mean, and harmonic mean. The generalized parametric representation in the last formula of mean values is the Hoelder mean, respectively: Based on the generalized mean inequality, there is the relation: arithmetic mean ⩾ geometric mean ⩾ harmonic mean. When using some of these computation rules for means, like for the geometric mean, it might be important that none of the QoX has the exact value of zero. Otherwise, due to the product of all QoX calculated, the respective mean will become zero too, regardless of the values of all the other QoX. To stabilize the calculation and to avoid that a single outlier or less important QoX corrupts the information of all other QoX in this combined value, it is suggested to replace an individual QoX below a given threshold value by the threshold value itself as a pragmatic approach to stabilize the mean calculation for the resulting QoX. The user or data scientist has to be cautious when applying this coarse data manipulation to a given mean calculations or with selected QoX parameters only within a mean calculation. In our tests, suitable threshold value were selected appropriate to a respective application. Practical values for a threshold could be 0.05, 0.01 or 0.001 for the QoX in the range [0, 1]. It is important to note, that the resulting combined QoX mean still reflects the poor data quality in sufficiently clear for a respective application. For some applications like process control a low value might still be better than a zero value.
When replacing the variable x with the QoX the formulas read:x These formulas imply, that all QoX are contributing to the output QoX in the same way. This must not be the case in a more generalized setting since the data related to a QoX might be used with different powers in the computing process and thus have different impact on the resulting data. To reproduce this effect for the QoX, specific weight factors can be introduced into the formulas for each QoX value: These weight factors can be set individually for each QoX according to any specific requirement of a respective task or application. This concept is already known from statistics in principal where the weight factors typically count the number of identical input values in the calculation. Thus, they are integer values. In our case, the concept of weight factors can be expanded to (positive) real numbers in order to reflect the relative importance of a respective data value or input source of values relative to all the other data values or input sources used in the respective application. The suggestion to limit the range of potential QoX values to [0, 1] is not limiting anything, since the weight factors can be used to balance the respective contribution.

Outlook and suggested next steps
The digitalization of the calibration domain in metrology depends on and relates to other digitalization initiatives which in turn can cross-fertilize each other to gain speed and momentum. The five Realbeds in the project serve as test environment for the concepts developed in the GEMIMEG-II project. They represent important domains of industry for manufacturing and processing industry, autonomous driving, and a new piece of calibration equipment for huge torque moments up to 20 MNm. Since the project is already in its final year, the focus is shifting from conceptual work to piloting implementations. A final report will be given at the end of the project including public showcases.
To further broaden the basis of this significant digitalization effort of GEMIMEG-II and in order to make this effort sustainable, there should be a maintainer of the software solutions created. To further promote internationality and international acceptance of this digitalization initiative, it would be favorable, if a neutral metrological organization like the Bureau International des Poids et Mesures and/or the Comité International des Poids et Mesures (CIPM) could take over the role of a promoter and facilitator of the entire digitalization initiative of the metrology and calibration domain. Thus, a group of national metrological institutes (NMIs) from member states of the CIPM could be mandated officially to jointly continue on the development of the DCC or DX together with the related software topics like D-SI, related language packs for the semantic schemas, PyDCC and potential future topics like DQx documents. This mandate could be temporal and transferred over time between NMIs to secure international support and ownership. These efforts also might be open to industry, Regional Metrology Organization and organizations in legal metrology as additional important drivers of digitalization in this ecosystem. Inviting and integrating all stakeholder of this ecosystem to contribute conceptually and technically could strengthen this digitalization initiative significantly to build a common digital document system. Differentiation between issuer types or roles in a document is possible via a content field issuer or issuer type as shown in the side notes of figure 3.
On the other hand, it is also important to keep the DCC compatible with other digitalization initiatives like a unique product identifier, digital product passport, the digital product nameplate, the AAS. In 2023 an AAS sub-schema for calibration will be developed.

Data availability statement
No new data were created or analysed in this study.

Acknowledgments
The content presented was created by a great team effort of the entire GEMIMEG-II Project team. I am deeply grateful to all team members from all the contributing organisations for their passion and dedication to digitalize the calibration domain in metrology and to make the DCC real and a good practice. It is a real pleasure working with all of them and to coordinate this light house Project.
The GEMIMEG-II Project is funded by the German ministry for economic affairs and climate action based on a decision by the German Bundestag under Grant No. 01 MT20001A.