Thickness measurements of graphene oxide flakes using atomic force microscopy: results of an international interlaboratory comparison

Flake thickness is one of the defining properties of graphene-related 2D materials (GR2Ms), and therefore requires reliable, accurate, and reproducible measurements with well-understood uncertainties. This is needed regardless of the production method or manufacturer because it is important for all GR2M products to be globally comparable. An international interlaboratory comparison on thickness measurements of graphene oxide flakes using atomic force microscopy has been completed in technical working area 41 of versailles project on advanced materials and standards. Twelve laboratories participated in the comparison project, led by NIM, China, to improve the equivalence of thickness measurement for two-dimensional flakes. The measurement methods, uncertainty evaluation and a comparison of the results and analysis are reported in this manuscript. The data and results of this project will be directly used to support the development of an ISO standard.

With the increasing suitability of GR2Ms for industrial applications, there are numerous industrial variations that may erroneously be labelled as 'graphene'. The crucial question from users and producers is 'What is this material?' [12]. ISO/TS 80004-13 'Graphene and Related 2D Materials' [13,14], defines the terms 'graphene', 'few-layer graphene' and 'graphite'; graphene being a single layer of sp 2 hybridized carbon, whereas few-layer graphene comprises of three to ten layers of graphene. Therefore, the number of layers or their thickness is a critical parameter to distinguish the form of material present. Both parameters are characterized widely by transmission electron microscopy (TEM) and atomic force microscopy (AFM). AFM and TEM provide two different measures of the specimen cross section; TEM provides the number of layers while AFM measures the specimen height. AFM is frequently used because an AFM instrument is less expensive to purchase and operate than a TEM instrument, and TEM sample preparation requires a high level of expertise. AFM provides three dimensional measurements with a lateral resolution of 0.1 nm and a vertical resolution of 0.01 nm possible [15][16][17][18]. In 2004, when Andre K Geim et al isolated graphene for the first time, AFM was employed as one of the measurement methods to prove the two-dimensional monolayer structure of graphene [19]. However, due to the large specific surface area of the graphene material, the interaction with the substrate, and the convolution effect of the AFM probe, the thickness of single-layer graphene reported in the literature ranges from 0.4 to 1.7 nm [20][21][22][23][24][25][26]. The theoretical thickness of graphene (0.34 nm) is also not applicable for other chemical forms of this 2D material. So, the reproduceable and comparable measurement data should be obtained, however, which are affected by AFM calibrations, sample preparations, measurement conditions and data analysis. ISO15329 [27] has reported the AFM calibration including the selection of certified reference materials (step height measurement standards), the calibrations of scan axes (basic calibration-adjustment and measurements) and data analyses (3W method based on line by line, 3W method based on areas and Histogram method). This document gives the general guidelines to users for the AFM calibration, which could be referenced and operated for large scales of thicknesses, but detailed calibration conditions for small scale thicknesses, especially under 1 nm are required to be outlined. For example, the histogram method in ISO15329 is depicted the advantage of its sensitivity to the contaminations on the surface compared with 3W methods, but the operation conditions are not shown in it, which is key impactors for comparable measurement data. The same is with ISO/TS 21356-1 Nanotechnologies-Structural characterization of graphene -Part 1: Graphene from powders and dispersions, which guide the structure characterization of graphene from powders and dispersions using TEM, SEM, AFM and Raman spectrum [28]. In this document, the thickness measurement using AFM is a general description although the measurement protocol is shown in Annex B as informative, the data analysis is only for the flake size without for the thickness. As a result, the measurement process including data analysis is needed for a comparable result.
To obtain well understood reproduceable and comparable measurements, international inter-laboratory comparisons are required. An international comparison, undertaken under the auspices of the Versailles Project on Advanced Materials and Standards (VAMAS), on thickness measurement of graphene oxide flakes using AFM is reported here. VAMAS is an international pre-normative organization, one of the aims of which is to organize interlaboratory studies. VAMAS is organized into a few technical work areas, including TW41 -'Graphene and Related 2D materials', within which this study resides. National Metrology Institutes (NMI), Designated Institutes (DI), industry and universities from around the world participated in these VAMAS studies.
Pre-normative VAMAS-led interlaboratory comparisons regularly lead to international ISO standards or reference materials in nanotechnologies and two-dimensional materials [29]. This study on graphene oxide directly supports the publication of 'ISO/PWI 23879 Nanotechnologies-Structural Characterization of Graphene oxide flakes thickness and lateral size measurement by SEM and AFM' which is under development in ISO/TC 229/JWG2 (Nanotechnologies, Measurement and Characterization).
This paper reports the results of the international intercomparison and describes a measurement method for graphene oxide flakes using AFM and includes uncertainty evaluation.

Sample preparation
A commercial GO dispersion was purchased from Graphenea (Graphenea, San Sebastián, Spain). These GO flakes were mixed with ethanol at a concentration of 0.5 mg ml −1 . The mixture was agitated by vortex oscillation to form a dispersion of GO flakes. The agitation is a necessary prerequisite for the subsequent deposition of the GO flakes and enables wellseparated GO flakes on a substrate surface. The GO/ethanol dispersions were diluted 10 times further with ethanol and then to manufacture each sample approximately 10 μl of the dispersion was deposited onto a ∼1 cm × 1 cm fresh mica substrate at room temperature using a pipette. The deposited samples were dried overnight in petri dishes within a clean room and the dried samples were then stored in Fluoroware (Entegris, USA) containers and sealed under vacuum in a plastic bag before being sent to the participants.

Method for imaging surface topography
The participants were asked to image a total of 10 flakes using AFM and to measure the thickness of each flake. Participants were sent a detailed protocol including imaging and analysis methods, which is included in the SI.
Participants were asked to calibrate the Z-axis of the AFM scanner by imaging an appropriate step-height standard, such as a certified reference material (CRM). Participants were recommended to use intermittent contact mode for imaging using typical intermittent-contact mode cantilevers with a force constant of ∼40 N m −1 , resonant frequency of 300 kHz and a probe apex size of 8-12 nm. The scan parameters were optimized and then each participant scanned a large 20 μm × 20 μm area which covers a large enough area to find multiple flakes in order to locate individual GO flakes for detailed analysis. This was followed by higher resolution imaging of ten individual flake using smaller scan sizes of 5 μm × 5 μm or 2 μm × 2 μm and at higher pixel density of 512 pixels × 512 pixels. The lateral distance taken to plot depends on the size of GO flake. Generally, the lateral distance of substrate we choose is close to or a little longer than the lateral distance of GO flake. And then determine x c1 and x c2 .

Participant method for data analysis
Participants were asked to analyze their AFM images using a set procedure, briefly summarized here. Firstly, the images were flattened to remove tilt or bow in the image, this was undertaken by excluding the flakes observed and applying a flattening algorithm that only considers the substrate. Only flakes which had a clear outline and were not overlapping were measured. For each flake, at least three horizontal profiles of the GO flake were recorded, and the Z-axis data from these three profiles were converted into a height distribution histogram. Figure 1 shows an example analysis based on generating a height profile ( figure 1(b)) along a horizontal white line labelled in the topographic image ( figure 1(a)), which in this case extends across the whole width of the image. The height data of a GO flake and substrate ( figure 1(b)) can be converted into a height distribution histogram using a bin size of 0.02 nm using image analysis software, as shown in figure 1(c). The two peaks observed in the histogram, one for the flake height and one for the substrate, were then each fitted to a Gaussian function to obtain the maximum frequency height values (peak center, x c ) [30]. The equation for the two Gaussian curves fitted to the data in figure 1(c) is shown as equation (1): where y: value of the ordinate, in counts per bin size; A: area of the peak, in nm counts per bin size; w: a scaling parameter to describe the width of the peak, in nm; x: value of abscissa, height in this case, in nm; x c : abscissa peak center value of the peak, in nm. The difference between the peak center of the GO flake peak (x c2 ) and the peak center of the substrate peak (x c1 ) is the thickness of the GO flake.

Comparison of the data analysis methods
Taking into account the compromise between the workload and analyzing sufficient data, three height profiles per flake were initially chosen which were analyzed manually by each participant. The average thickness and standard deviation calculated from 30 thickness measurements (three height profiles per flake for a total 10 flakes) are shown in figure 2 (Three height profiles).
To validate and compare this analysis, two other analysis routines were studied. Firstly, a self-compiled program was written to automatically analyze all height profiles. This analyzed each line along the scanning direction in the AFM image and if the profile line contained enough pixel data points for both the GO flake and substrate the GO thickness for this profile line was calculated using the histogram method as discussed above. The average thickness and standard deviation calculated from all height profiles per flake for a total of 10 flakes are shown in figure 2 (All height profiles).
A second self-compiled program used the height distribution from all pixel data points in an AFM image encompassing Within this interlaboratory comparison (ILC), all participants returned both their raw data and their individually analyzed results using the protocol provided. This meant that the alternative data analysis processes using the self-compiled programs could be used to analyze all the raw images provided by the 12 participants. As shown from figure 2, it can be seen that the three different data analysis routines have a slight impact on the average value of the measured GO flake height, but that these differences are typically within the standard deviation of the measurements. Therefore, the three height profiles method is a reasonable analysis routine for the ILC.

Effect of using different data analysis programs
The effect of using different data analysis packages to measure the height of GO flakes was investigated. For this, one AFM was used to image five GO flakes using intermittentcontact mode at a drive amplitude of 100 mV. Ten height profiles of each flake were measured. These were analyzed using Origin (OriginLab Corp, Northampton, USA), Igor Pro (WaveMetrics, Inc., Portland, USA) and a self-compiled program which is written in Python (Python Software Foundation, Virginia, USA) by participant 3. Using each of the three programs, the average thickness of each GO flake and the associated uncertainty were calculated and the results are shown in table 1. As seen from table 1, the programs were in excellent agreement with each other with differences between them being less than four parts in ten thousand, which can be treated as negligible compared to other sources of uncertainty. This demonstrates that different data analysis programs have a negligible effective on the calculated GO thickness using this data analysis process.

Example uncertainty evaluation
For all data from each participant, three components were considered as part of the evaluation of the uncertainty budget: (i) the instrument Z-axis displacement measurement uncertainty acquired from the calibration of the Z piezo electric scanner (s ci ), (ii) the measurement repeatability which corresponds to the standard deviation of thickness measurements of three profiles for each flake measured (s ri ), and (iii) the Participants' results for GO thickness using different data analysis methods: three height profiles * All height profiles Whole area. Note: * data obtained from participants; ** raw data could not be analyzed with available algorithm software. sample uniformity which corresponds to the standard deviation of the thickness of twenty different flakes (s Hi ).
Here, the uncertainty evaluation from the Participant 3 laboratory will be shown as an example. The instrument measurement uncertainty is evaluated based on the AFM calibration. The AFM calibration of this laboratory was performed using a CRM of Strontium Titanate consisting of ∼0.4 nm unit cell SrTiO3 (001) surface steps [31] (GBW(E) 136709), National Institute of Metrology, China, https:// ncrm.org.cn/Web/Ordering/MaterialDetail?autoID=22380. The step height was certified using x-ray diffraction traceable to the SI system of measurements (here the certified value is 0.39 nm ± 0.29 nm (k = 2)), so the sub-uncertainty from AFM calibration is s ci = U CRM /k = 0.29/2 = 0.145 (nm). Note, that for 2D materials with height variations of only 1 nm or less, this uncertainty due to the AFM calibration is relatively large.
Based on the protocol, the average thickness of each of the GO flakes in the sample were calculated from three measurements (line profiles), so the sub-uncertainty for measurement repeatability corresponds to the standard deviation of three times repeat measurement s ri = 0.07 nm. And the sub-uncertainty from the sample homogeneity corresponds to the standard deviation of the average thickness values of twenty flakes s Hi = 0.10 nm, which is included in the SI.
As a result, the combined uncertainty u i from Participant laboratory 3 was  Table 2 shows the interlaboratory comparison results from all the 12 participants who submitted measurement results. This includes the average GO flake thickness (T i ), and the uncertainties from (a) the measurement repeatability (s ri ), (b) from sample homogeneity (s Hi ) and (c) from AFM calibration (s ci ) from all the participants. The measurement uncertainty from the measurement repeatability (s ri ) is the maximum standard deviation of three repeat measurement values on the same flake, across all flakes measured. The measurement uncertainty from sample homogeneity (s Hi ) is the standard deviation of the average thickness values for ten flakes, respectively. Furthermore, the uncertainty from AFM calibration will be described here. All laboratories recorded that their AFM was calibrated. The s ci from participant L1 was calculated from a metrological AFM. s ci for participants L2 to L4, L7 and from L9 to L11 were evaluated after calibration of reference materials. Participant L8 used the theoretical value of a monatomic step HOPG crystal surface (0.335 nm) to calibrate the Z-axis. The remaining participants L5, L6 and L12 reported their AFMs were calibrated, but no information was provided on the uncertainty and the method used. So for participants L5, L6 and L12, s ci was evaluated as the maximum sub-uncertainty from all of the participants, although this may overestimate the uncertainty for these participants. However, from table 2, the instrument Z-axis uncertainty is typically the largest source of uncertainty. As shown in table 2, the standard uncertainty (u i ) was calculated based on equation (2).
The standard uncertainty (u ref ) of the comparison reference value (CRV), designated as T ref , was estimated based on the uncertainty-weight mean [32] equations (4) and (5) using the data from 12 participants, where n relates to the 12 participants and is calculated using. The results from table 2 are replotted in figure 4, which shows the reported thicknesses and associated combined uncertainties (u i ) of the GO flakes as measured by the 12 participants, with a solid horizontal line representing the CRV. As shown in figure 4, the thicknesses and associated uncertainty values reported by the participants all overlap with the CRV with its uncertainty, except Participant 8. As shown in table 2, the sub-uncertainty from the sample homogeneity for Participant 8 is smaller than some of participants, and the sub-uncertainty from the measurement repeatability is similar with most of participants, indicating that the sample homogeneity and measurement process are not the key factors for this outlier. However, the sub-uncertainty from the AFM calibration is smaller than most of participants. Moreover, they used the monatomic step theoretical value of a HOPG crystal surface (0.335 nm) to calibrate the Z-axis rather than a certified reference material. As a result, it may be the calibration of this AFM is the main reason for Participant 8 out of the CRV interval. Furthermore, it is worth mentioning that the uncertainty weights from the three sources (AFM calibration, sample homogeneity and measurement repeatability) are different in different laboratories, indicating that AFM calibration and a standardized measurement process are key for the valid and comparable measurement results for such small-scale investigations.

Evaluation of results
The equivalence statements were calculated for each of the laboratories following the ISO standard 13528 [32]. The The standard uncertainty associated with the value of the degree of equivalence u(d i ) was calculated from the combination of the standard uncertainties of the individual data (u i ) and the standard uncertainty of the CRV (u ref ) as shown in table 3. Coverage factor of k = 2 was applied in the calculation of the expanded uncertainty U(d i ) by the equations (9) and (10). The evaluation of results are shown in table 3 and figure 5. It can be deduced that the degree of equivalence is excellent except for the outlier of Participant 8, indicating that the measurement method is qualified, which is consistent with figure 4. It should also be noted that the determined uncertainty for each participant is typically large with respect to the CRV and associated uncertainty, but as discussed previously, this may be due to the uncertainty in the AFM calibration being relatively large compared to the thickness of a singlelayer of GO

Conclusions and outlook
In conclusion, an international interlaboratory comparison of the measurement of graphene oxide flake thickness using AFM has been undertaken by 12 participants. In the study of the measurement conditions, the manual analysis method (via a histogram) with three height profiles is reasonable for the ILC because the standard deviation from different data analysis processes is similar, but this method is also less timeintensive. Furthermore, different data analysis software has little influence on the variation of the results, shown through the uncertainty of four parts in ten thousand. This is negligible compared to the variance shown through the measurements themselves.
The results of the 12 participants showed an average flake thickness of 0.93 nm with a standard deviation of 0.08 nm. One possible reason for the outlier of Participant 8 may be because the AFM was not calibrated using certified reference material. The uncertainty evaluation result shows that AFM calibration and a standardized measurement protocol are key factors for valid and comparable measurement results at this scale. The degree of equivalence results is consistent and thus the measurement method is qualified. The results of this project will be directly used as data to support the development of an ISO standard on the measurement of graphene oxide flakes using SEM and AFM (ISO DTS 23879).