Geant4 silver anniversary: 25 years enabling scientific production

This paper summarizes Geant4 contribution to scientific research over the past 25 years through a scientometric analysis of the results with which it has been associated. The scientometric data collected from scholarly literature and databases are evaluated with methods pertaining to econometrics and ecology to quantify relevant traits, diversity and disparity in their scientific and geographic distributions, and to identify statistically significant trends. The analysis reviews the contribution of Geant4 to the field — experimental particle physics — that originally motivated its development and highlights its role in other research domains including nuclear physics and engineering, astrophysics and space science, biomedical physics, archaeology and the cultural heritage.


Introduction
Geant4 [1][2][3] turned 25 years old in 2023.Its main reference [1], published in 2003, has achieved more than 16000 citations in the Web of Science ™ [4] by September 2023, making it the most cited publication in Particle and Fields Physics, Nuclear Physics, Nuclear Science and Technology, and Instruments and Instrumentation journals at the time of writing this paper.The citations come from a wide variety of research areas, which demonstrate the broad scope of scientific results that Geant4 has enabled.
This paper briefly summarizes some relevant characteristics of the research production associated with Geant4 through the analysis of the citations of its main reference [1] in the scholarly literature.Along with some basic traits of these publications pertaining to descriptive statistics, it reports a scientometric investigation of Geant4 citations by means of statistical methods derived from econometrics and quantitative ecology, which highlight characteristics such as fairness and diversity, and their trends, in the scientific production enabled by Geant4.

Brief chronicle of Geant4 birth
Geant4 is an object oriented toolkit for the simulation of the passage of particles through matter.Its development was initially motivated by the requirements of physics experiments at high energy hadron colliders under construction in the last decade of the 20  ℎ century.Due to intrinsic limitations [5], GEANT 3 could not address their needs of functionality, extensibility, flexibility and long term maintainability.
Early considerations about developing an object oriented Monte Carlo system for particle transport targeted to this experimental domain date back to 1993 [6,7].A letter of intent [8] and a following formal research proposal [5] originating from these preliminary explorations were submitted to the CERN Detector R&D Committee, which led to the approval of the RD44 project in 1994 to develop Geant4.RD44 had the mandate of creating a detector simulation toolkit for the next generation of high energy physics experiments, namely the experiments at the LHC (Large Hadron Collider), which were under construction at the time of its endorsement in the CERN research program.
RD44 produced the first -version of Geant4 [9] in April 1997, whose functionality was comparable to that of GEANT 3, and the first -version in July 1998.Geant4 was first released on 15 December 1998 [10]; since then, new versions of the code have been issued once or twice a year.
Although originally addressed to high energy physics experiments, Geant4 encompasses functionality relevant to other physics domains.The adoption of the object oriented technology and the sound software design conceived in the RD44 project are the key factors of Geant4 multidisciplinary capabilities.

Scientometric analysis
The scientometric investigation concerns the publications citing Geant4 main reference [1]; these papers are representative of the scientific production that Geant4 has enabled.For comparison with Geant4, similar assessments are performed for the main references of other highly cited physics software systems and for the publication of the discovery of the Higgs boson at the LHC [11,12], which constitutes a landmark in the physics context that motivated the development of Geant4.
The scientometric research is based on data collected from the Web of Science ™ (WoS) [4] and has access to the WoS publication records since 1990.The survey encompasses all main types of scholarly literature indexed in the WoS: regular articles, reviews, letters etc. Conference papers are included in the analysis only if tagged as articles in the WoS, i.e. papers generally published in scholarly journals.This constraint ensures the access to all their associated scientometric data required in the course of the analysis.
The first stage of the analysis highlights the main characteristics of the literature citing Geant4 through a set of distributions pertaining to descriptive statistics.Further elaboration of the scientometric data exploit statistical methods pertaining to econometrics and ecology to investigate the fairness of the geographical apportionment and the diversity of the scientific production associated with Geant4 and the other target references.These assessments are complemented by statistical inference methods that identify trends in the evolution of the observables; the Mann-Kendall test [13,14] and the Cox-Stuart [15] test are applied for this purpose.The significance level for the rejection of the null hypothesis is set at 0.01 for both tests, unless otherwise specified.
Additionally, the scientometric analysis examines the role of Geant4 in developments of industrial interest through an assessment of patents associated with it.
The analysis uses the R software system [16].

General features of Geantcitations
The scientometric research reported in this paper is based on the analysis of the citations of reference [1] registered in the Web of Science.This paper is the first publication about Geant4 in a refereed journal; previous information about it only appeared in institutional documents and brief conference communications, which lack coverage in the WoS for extensive scientometric analysis.
The citations of Geant4 reference [1] amounted to 16023 on 14 September 2023.The WoS identifies it as the most cited paper among more than 1.4 million publications classified in the categories the hosting journal belongs to: Particle and Fields Physics (encompassing 436176 publications in total), Nuclear Physics (335328 publications), Nuclear Science and Technology (380632 publications), and Instruments and Instrumentation (696376 publications).
Geant4 stands out among Monte Carlo particle transport codes commonly used in experimental particle and nuclear physics with respect to the number of citations received by the associated reference papers.The distribution of the citations related to these codes (EGSnrc [17], FLUKA [18], ITS [19], MARS [20], MCNP 6 [21], OpenMC [22], PENELOPE [23], PHITS [24], Serpent [25], TRIPOLI-4 ® [26]) is illustrated in figure 1. Caution should be exercised at drawing quantitative conclusions from this plot for various reasons: the citation of the references of Geant4 and of other Monte Carlo codes is omitted in a large number of papers that mention them and use them to produce the scientific results they document [27]; additionally, those Monte Carlo transport codes not associated with a journal publication, consequently not indexed in the WoS, are not represented in figure 1.The number of citations of [1], the number of countries and the number of affiliations in the citing publications have been steadily growing, as is illustrated in figure 2    The longevity of Geant4 citations is remarkable in the context of scientific software: for instance, one can observe in figure 3 a rapid decline of the citations of the SHELX [28] software system, whose reference paper published in 2008 has collected more than 76000 citations.The citations of other currently popular codes exhibiting a growing trend , e.g.Quantum Espresso [29], span a shorter time range than Geant4 lifetime.Note that the fast decrease of the citations of the observation of the Higgs boson after its discovery is consistent with a previous scientometric assessment [30] of the relatively short mean life of the citations of particle physics discoveries.This drop could be related to the citation pattern concerning foundational discoveries mentioned in [31], which quickly become so familiar that they do not need a citation.A distinctive characteristic of Geant4 with respect to other highly cited physics software systems is its ability to satisfy the requirements of different sizes of scientific projects, from those pursued by small research teams to large-scale collaborations enrolling thousands of members, as is illustrated in figure 4. It is interesting to note in figure 4c that the citations of the observation of Higgs boson, which is the product of huge experimental collaborations, mainly derive from small authors' teams (presumably theoretical physicists) and only marginally from large collaborations.

Geographical distribution
The citations of Geant4 reference [1] derive from 114 countries.Among them, there are large differences in population, wealth and scientific research facilities.The scientometric analysis investigated the degree of inequality in the geographical origin of the authors of the citing papers by means of econometric methods.
Inequality measures [32] quantify the degree of non-uniformity of the distribution of a characteristic within a data set.They are widely used especially in evaluating the distribution of income.
The Gini index [33] is the most widely used measure of inequality in economics; it indicates the extent to which a distribution deviates from a situation of perfect equality.It is calculated as where () is the Lorenz curve, which represents the cumulative share of total values of the analyzed variable (e.g., income) against the cumulative proportion of elements of the population being analyzed.The diagonal line shown in figure 5 corresponds to perfect equality in graphical representations of the Lorenz curve.A Gini index larger than 0.5 is generally perceived in economics as a measure of unfair income distribution.Other common measures of inequality are the Pietra index [34], also related to the Lorenz curve, defined as and the Atkinson index [35].
Figure 5 shows a graphical representation of the Gini and Pietra indices associated with the distribution of countries in the citations of Geant4 and SHELX in 2021.In each panel, the coloured area is the integral in equation 5.

Diversity of research areas
The object oriented design of Geant4 and its rich functionality enable a wide variety of applications in multidisciplinary domains.The publications that cite Geant4 represent more than 60 research areas, defined according to the criteria adopted in the WoS.Among them, there are domains where Monte Carlo particle transport codes have been traditionally used for a long time, such as Physics, Nuclear Science and Technology, Astronomy and Astrophysics, Instruments and Instrumentation, Radiology, Nuclear Medicine and Medical Imaging; one can also identify less conventional application areas, such as Geology, Archaeology, Food Science, Meteorology and many more.We investigated the diversity in the research areas associated with Geant4 citations, drawing concepts and methods from the domain of ecology.Biodiversity measures the richness and the complexity of a community, taking into account the number of species it hosts and their abundance; it is related to the concept of entropy in information theory.
Hill indices, also known as Hill numbers [36], have recently achieved a wide consensus as measures of diversity [37]: they combine the concepts of richness, evenness and dominance, and their metrological properties allow the comparisons of different systems.They are defined as: where  is the number of species,   is the proportional abundance of species  in the sample.The parameter  is defined as the order of the index and is related to the sensitivity to rare species in the corresponding measure of diversity.Hill numbers can be interpreted as effective numbers of species, i.e. as the number of equally abundant species necessary to produce the observed value of diversity.
Several traditional diversity indices [38], such species richness, Shannon entropy [39] and Simpson index [40], can be derived from Hill numbers: the Hill index of order 0 corresponds to the number of species ; the Hill index of order 1 is defined as the limit and corresponds to the exponential of the Shannon diversity index; the Hill index of order 2 is the reciprocal of Simpson index.A sample of the analysis of diversity of research areas in the scientometric data is illustrated in figure 7. The plot shows the time profile of the Hill index of order 1 calculated over the sets of research areas associated with the citations of Geant4, SHELX and Quantum Espresso reference papers, and of the discovery of the Higgs boson.Geant4 citations exhibit the largest diversity.Both Mann-Kendall and Cox-Stuart trend tests reject the null hypothesis in favour of the alternative hypothesis of a growing trend in the data distribution related to Geant4; their response is not univocal about the data distributions related to SHELX and Quantum Espresso.

Patents
Geant4 has enabled many application developments of industrial and commercial relevance.Several hundred patents are associated with it; the data are shown in figure 8, which has the number of patents related to Geant4 issued by the United States Patent and Trademark Office (USPTO) between 2002 and 2022.The increasing relevance of Geant4 as a supporting document for patents, which one can qualitatively observe in figure 8, is confirmed by the outcome of the Mann-Kendall and Cox-Stuart trend tests: both tests reject the null hypothesis of absence of any trend in the period subject to investigation with 0.01 significance, in favour of the alternative hypothesis of a growing trend.

Conclusion
Our data show that Geant4 has a multidisciplinary nature, providing functionality for the simulation of experimental scenarios in many different scientific fields.Its foundation lies in the object oriented design defined by RD44, which allows the user to understand, extend and customise the toolkit.
Geant4 use has been continuously growing for the past 25 years in an increasing number of institutes and countries.At the time of its silver anniversary, its main reference has established a record of citations both in fundamental physics and in technological research domains.The scientometric analysis summarized in this paper has highlighted the diversity of research areas where Geant4 has enabled the production of scientific results and the fair geographical distribution of its use, as well as its contribution to a number of patents.

Figure 1 :
Figure 1: Citations of the reference papers associated with Monte Carlo particle transport codes, as reported in the Web of Science.
and objectively confirmed by the Mann-Kendall and Cox-Stuart trend tests.These tests reject the null hypothesis of absence of any trend in the data in favour of the alternative hypothesis of a growing trend.

Figure 2 :
Figure 2: From left to right: number of publications citing Geant4 main reference paper, number of countries and number of affiliations appearing in the citing publications, as a function of the publication year.

Figure 3 :
Figure 3: From left to right: number of publications citing the reference papers of the SHELX and Quantum Espresso codes, number of citations of the Higgs boson observation by the ATLAS experiment, as a function of the publication year.

Figure 4 :
Figure 4: From left to right: size of the authors' list in the publications citing Geant4 and SHELX reference papers, and the Higgs boson observation by the ATLAS experiment; the number of authors is binned according to the colour codes documented in the legend.
1 and the dotted line represents the Pietra index.Together they qualitatively indicate lower inequality in the geographical origin of the citations of Geant4 with respect to SHELX.The measured geographical inequality is shown for the two codes in figure 6 as a function of time.One can observe the similar time profiles characterizing the Gini, Pietra and Atkinson indices, despite the different values corresponding to their specific definitions of inequality, and the lower geographical inequality exhibited by Geant4 citations with respect to those of SHELX over the whole period.The drop of the three indices related to Geant4 in the years between 2007 and 2011 was influenced by the wider adoption of Geant4 for detector simulation by international collaborations.

Figure 5 :
Figure 5: Graphical illustration of inequality in the geographical origin of the citations of Geant4 and SHELX in 2021: the coloured area between the diagonal line and the Lorenz curve (solid red line) represents half the Gini index, while the dotted red line represents the Pietra index.The larger the coloured area, and the longer the dotted segment, the larger the inequality in the countries represented in the authors' list of the citing papers.

Figure 6 :
Figure 6: Inequality in the distribution of the countries represented in the authors' list of the publications citing Geant4 and SHELX as a function of the publication year: Gini, Pietra and Atkinson indices.

Figure 7 :
Figure 7: Hill index of order 1 as a function of time, representing the diversity of research areas in the citations of the reference papers of Geant4 and of other physics software system, and in the citations of the observation of the Higgs boson at the LHC.

Figure 8 :
Figure 8: Patents associated with Geant4 issued by the United States Patent and Trademark Office