Mathematical diversity of parts for a continuous distribution

The current paper is part of a series exploring how to link diversity measures (e.g., Gini-Simpson index, Shannon entropy, Hill numbers) to a distribution’s original shape and to compare parts of a distribution, in terms of diversity, with the whole. This linkage is crucial to understanding the exact relationship between the density of an original probability distribution, denoted by p(x), and the diversity D in non-uniform distributions, both within parts of a distribution and the whole. Empirically, our results are an important advance since we can compare various parts of a distribution, noting that systems found in contemporary data often have unequal distributions that possess multiple diversity types and have unknown and changing frequencies at different scales (e.g. income, economic complexity ratings, rankings, etc.). To date, we have proven our results for discrete distributions. Our focus here is continuous distributions. In both instances, we do so by linking case-based entropy, a diversity approach we developed, to a probability distribution’s shape for continuous distributions. This allows us to demonstrate that the original probability distribution g 1, the case-based entropy curve g 2, and the slope of diversity g 3 (c (a,x) versus the c(a,x)*lnA(a,x) curve) are one-to-one (or injective). Put simply, a change in the probability distribution, g 1, leads to variations in the curves for g 2 and g 3. Consequently, any alteration in the permutation of the initial probability distribution, which results in a different form, will distinctly define the graphs g 2 and g3 . By demonstrating the injective property of our method for continuous distributions, we introduce a unique technique to gauge the level of uniformity as indicated by D/c. Furthermore, we present a distinct method to calculate D/c for different forms of the original continuous distribution, enabling comparison of various distributions and their components.


Introduction
As we have explained elsewhere (Rajaram andCastellani 2020, Rajaram et al 2023), probability distributions are often the first quantitative characteristics of many systems and datasets, which, as Sornette and others have articulated (Newman 2010, Sornette 2009), makes them useful ways to explore diversity, as measurements on a wide range of systems and datasets are well approximated by their shape, particularly as the sample size increases.Given their value, we have developed a program of research exploring diversity within probability distributions.Specifically, we have sought new ways to link diversity measures (e.g., Gini-Simpson index, Shannon entropy, Hill numbers) to a distribution's original shape and to compare parts of a distribution, in terms of diversity, with the whole (Rajaram andCastellani 2020, Rajaram et al 2023).As we have shown across this research, this linkage is crucial to understanding the exact relationship between the density of an original probability distribution, denoted by p(x), and the diversity D in non-uniform distributions, both within parts of a distribution and the whole-something the current field has yet to sufficiently address (Chao and Jost 2015, Hsieh et al 2016, Jost 2006, 2018, Leinster and Cobbold 2012, Pavoine et al 2016).This linkage is also empirically useful across the natural and social sciences, given that, in terms of probably distributions, most real-world systems have unequal distributions and consist of multiple diversity types with unknown and changing frequencies at different levels of scale (e.g., income diversity, economic complexity indices, rankings).As part of our program of research, we have proven our results for discrete distributions.Our focus for this paper is continuous distributions.In both instances, our strategy for establishing our diversity linkage is our engagement with the literature on Hill numbers (Jost 2018, Gaggiotti et al 2018, Jost 2006, macArthur 1965, Hill 1973, Peet 1974).

Research strategy
As we have explained in a series of papers, Hill numbers are defined by a parameter q that gives preference to types with either lower or higher frequencies (Rajaram andCastellani 2020, Rajaram et al 2023).This depends on whether 0 < q < 1 or q > 1, respectively.Choosing q = 1 means that each type is assigned a weight proportion to its relative frequency by 1 D. We also have that 1 D = e H , where H is the Shannon entropy of the distribution (Leinster 2021).In terms of advancing our understanding of diversity within distributions, Hill numbers hold a special place because they provide an all-encompassing structure to seize the various dimensions of diversity (MacArthur 1965, Hill 1973, Peet 1974), which they do by unifying the principles of richness, evenness, and dominance into a single numeric index.In doing so, Hill numbers enable the assessment and classification of diverse systems across the natural and social sciences, including diversity in ecosystems, where they are most widely used (Alberdi andGilbert 2019, Gaggiotti et al 2018).
Still, the limitation of Hill numbers is that the precise relationship between the probability of each type within a distribution and the Hill number itself remains undeveloped.Moreover, the original concept of diversity, as proposed by Hill and Jost, is actually insensitive to permutations.This means a shuffling of the original probabilities in g 1 will not change the diversity of the entire distribution.
Hence the purpose of our program of research.In (Rajaram and Castellani 2016) we introduced our new measure, case-based entropy Cca modification of the ShannonWiener entropy measure H.As a next step, in (Rajaram and Castellani 2020) we proved a result relating the probability of each type p i and the total diversity 1 D K for a discrete probability distribution with K types.In a more recent paper (Rajaram et al 2023) we extended the results by explicitly proving a one-to-one relationship between the original probability distribution g 1 , the case-based entropy curve g 2 and the slope of diversity curve g 3 .We also showed that the ratio of diversity of a part to its cumulative probability denoted by is a measure of the degree of uniformity of the part P. Lastly, we also showed that the original probability distribution can be explicitly reconstructed by looking at the slopes of secants in the slope of diversity curve g 3 .
In the current paper, we will show that analogous results hold true for continuous distributions with finite entropy (differential entropy to be more exact).We will show that the case-based entropy curve g 2 and the c (a,x) versus the ( ) * curve g 3 , which we call the slope of diversity are one-to-one (or injective), i.e., a different probability distribution g 1 gives a different curve for g 2 and g 3 .This means that the graphs g 2 and g 3 are determined uniquely by the original probability distribution.A proof of the injectivity will establish the uniqueness of the degree of uniformity of parts as measured by D P /c P .It will also create a unique way to compute D P /c P for arbitrary probability distributions.We also show that the original density p(x) can be reconstructed by looking at the slope of tangents in the slope of diversity curve.We note once again, that analogous results have been proven for discrete distributions in (Rajaram et al 2023).Hence, this paper is an extension of those results for continuous distributions which have not been proven before.
We consider a general continuous probability distribution with finite entropy with a random variable X with support (a,b) (with a = − ∞ and b = + ∞ allowed) and probability density given by p(x).We ask the following question: Is it possible to determine a connection (direct or indirect) between the probability density p(x) and the case-based entropy curve (C c versus c)?More to the point, does a connection exist between the shape of the casebased entropy curve (C c versus c) and the probability density p(x)?How can we use the slope of diversity curve g to compute the degree of uniformity of a given part P and furthermore, how can we reconstruct the original probability distribution g 1 from the slope of diversity curve g 3 ?

Understanding diversity
As a measure, diversity is used to evaluate the richness and evenness of diversity in probability distributions (Jost 2006, MacArthur 1965, Hill 1973, Peet 1974).Richness refers to the quantity of types in a distribution; evenness refers to the equal likelihood of each type of diversity occurring, as highlighted in various studies.As we have explained elsewhere (Rajaram andCastellani 2020, Rajaram et al 2023), this concept of diversity is rooted in the understanding that if all the K types in a discrete probability distribution have the same probability of occurrence, then the diversity should be equivalent to the number of types K. On the other hand, any departure from uniformity in probabilities will invariably lead to a decrease in the value of diversity., is defined as the length of the support of an equivalent uniform distribution that yields the same value of Shannon entropy H. Differential Shannon entropy for continuous distributions with a density p(x) is defined as below: Remark 2.1.To avoid mathematical pathologies, we will only consider probability densities p(x) that have a finite value for the Shannon entropy ( ) H a b , As previously demonstrated by others (Jost 2006, MacArthur 1965, Hill 1973, Peet 1974) that definition 2.1 suggests that the total diversity 1 D (a,b) is given by:

An example of biodiversity
In (Jost 2006) a comparison of species of butterfly in two communities was carried out to illustrate the purpose of using diversity instead of entropy to study the similarities in the communities.A case was made that the Hill number 1 D is a better index of diversity than Shannon entropy.Data from the canopy and understory communities of fruit-feeding butterflies was used to illustrate the point of the multiplication principle.Instead of repeating the same example, let us consider two communities of birds.Let us assume that the first community has 8 species of birds and each species has 50 birds, and the second community has 10 species of birds each of which has 50 birds as well.Let us assume furthermore, that the species in the two communities are distinct.The diversity of the first community is intuitively 8 and that of the second community if 10.When we pool the two communities, the diversity of the pooled community should be 18 since we will then have 18 distinct species that are uniformly distributed.This is exactly what happens if we use the diversity 1 D instead of Shannon entropy if the original distributions are not uniform.Then 1 D will still be the right diversity index to use, where now each species will be counted in a manner proportional to the relative abundance in the pooled community.We extend this notion in this paper by proving results for general continuous distributions where different parts are being pooled with different relative abundances.We also definitively show that the notion of diversity 1 D for continuous distributions and its corresponding case-based diversity and slope of diversity curves are one-toone, and the slope of diversity curve can be used to measure the degree of uniformity of a continuous distribution.This establishes for the first time, important results for continuous distributions that need a separate consideration due to the intricacies involved in proving results using the probability density.
In this paper, we have four objectives: 1. Just like we showed in (Rajaram et al 2023) for discrete distributions, we show a similar way to compute the ratio D c P P for arbitrary parts P from the graph of the slope of diversity curve (c (a,x) versus ( ) * or g 3 ) for continuous distributions.This will be an important step towards calculating the extent of uniformity of parts of a continuous distribution.
2. We prove that the slope of the secant ( ) S x x , 1 2 of the slope of diversity curve can be used to compute the degree of uniformity of an arbitrary part P = (x 1 , x 2 ) of the original continuous distribution denoted by ( ) 3. We show that the original continuous distribution g 1 can be reconstructed using the slope of the tangent of the slope of diversity curve g 3 .
4. Finally, we show that the natural map between the original continuous distribution g 1 , the case-based entropy curve g 2 , and the slope of diversity curve g 3 is one-to-one or injective, thereby establishing that two different original distributions g 1 will always lead to different curves g 2 and g 3 .This will bridge the gap in connecting the Hill numbers to the form of the original continuous distribution.
In essence, this paper is an extension of (Rajaram et al 2023) for continuous distributions.The paper is organized as follows: In section 3 we prove the results in the first two objectives.In section 4, we prove the third objective above.In section 5, prove the fourth objective.In section 6 we demonstrate our results for the example of the continuous exponential distribution.In section 7, we will end the paper with some observations on our findings.We begin by recalling two 'parts-to-whole' formulae for discrete distributions, which we proved in (Rajaram and Castellani 2020).

Computing
Theorem 3.1.Given a discrete probability distribution similar to table 1, the diversity of the distribution D q K for a system or dataset (be it complex or otherwise), and the diversities of its disjoint parts D q P i and their respective cumulative probabilities c P i are associated as follows: We note that equations (3) and (4) are simply the weighted geometric and arithmetic means (of order 1 − q) respectively of the ratio ( ) . We also note that . The following corollary can be easily proved using the same technique as in the proof of theorem 3.1 in (Rajaram and Castellani 2020).
Corollary 3.1.Given a discrete probability distribution similar to table 1, let the part ⋃ = P P i i be a disjoint union of sub-parts P i .Then, the diversity of the part D q P and the diversities of disjoint sub-parts D q P i and their respective cumulative probabilities c P i are related as follows: , 5 Remark 3.1.We remark that in general, there is no monotonic relationship between the diversity of continuous and discrete distributions.For example, we could consider the uniform distribution in the discrete case wheren = p i N 1 for i=1,K,N and its counterpart in the continuous case where 1 on the interval (a,b).The diversity of the discrete uniform distribution is N and that of the continuous one is simply b − a.One can adjust N or ( ) b a to make the diversity of the discrete uniform distribution to be equal to, less than or larger than the diversity of the continuous uniform distribution.In general, due to the wide variation in shapes of distributions, theres no universal comparison that can be made between all continuous and all discrete distributions.However, given that the development of continuous distributions requires a separate mathematical treatment due to the intricacies involved in using a probability density, the proofs of the results are different and need to be written separately.For example, to reconstruct the original probability density from the slope of diversity curve in theorem 4.1, we have to use the slope of the tangent instead of the secant.Hence, the material in this paper for continuous distributions requires a separate consideration from discrete distributions.
Table 1.General dataset with complexity types x i each having a probability p i and a frequency f i .
We now state and prove the main theorem for continuous distributions.This is the first time that it has been proven for continuous distributions.We note that we will only consider the case q = 1 and hence omit the left superscript in 1 D and simply denote the diversity by D from now on.be the probability mass function of P. Note that c P i is defined in a similar manner.Also, i is the normalized probability density for the part P i (same definition for P).Then we have the following: Using information from the two equations above, and recalling that This proves the Theorem., We make some definitions to establish some notation to prove our next theorem.Definition 3.1.We define on the y-axis is called the case-based entropy curve.We denote these curves by g 2 .The graph of ( ) is called the slope of diversity curve.This is denoted by g 3 .
We now state and prove a theorem that relates the slope of secant ( ) S x x , 1 2 and the degree of uniformity.Taking the natural logarithm of both sides, we get ln .

P P i P P i i
Let P = (a, x 2 ); P 1 = (a, x 1 ) and P 2 = (x 1 , x 2 ) with a x 1 x 2 .Then we have, Rearranging we get Noticing that the right-hand-side of this equation is the slope of the secant line ( ) S x x , 1 2 for the graph of c (a,x) versus ( ) as defined in definition 3.2.By the same development as in the discrete case, let ( ) S x x , 1 2 be the slope of the secant line joining the points ( ( )) . Then we have ( ) of the slope of diversity curve.This is the main importance of the two results in this section.
4. Reconstruction of the original probability distribution using the slope of tangent s x of the slope of diversity curve g 3 So far, all of the results so far from the discrete case have carried over.In this section, we show that the slope of the tangent in the slope of diversity curve allow us to reconstruct the original density p(x).We note that every point on the slope of the diversity curve is of the form ( ( )) .
Definition 4.1.Given the slope of diversity curve, we define s x as the slope of the tangent of this curve at a general point given by ( ( )) Differentiate with respect to c (a,x) .Recall that c (a,x) is the x-axis on the slope of diversity curve, using the Chain Rule, dividing, and employing the Fundamental theorem of Calculus, we have  , This means that the slope of the tangent of the slope of diversity curve at c (a,x) explicitly determines the value of p(x) at x. Figure 1 illustrates the last two theorems.
Remark 4.1.We note that the result in theorem 4.1 is the continuous analog of a similar result that was proven in (Rajaram et al 2023) which states that the original discrete probability can be reconstructed using the slope of the secant of the slope of diversity curve.The secant in the discrete version became the tangent in the continuous version in theorem 4.1.Furthermore, theorem 4.1 explicitly relates the slope of tangent s x of the slope of diversity curve g 3 back to the original continuous probability distribution g 1 .This is the main importance of theorem 4.1.
Remark 4.2.In (Rajaram et al 2017) a direct comparison of the Boltzmann, Fermi and Bose-Einstein distributions was made using the case-based entropy idea.The celebrated Boltzmann distribution in one dimension is given as follows: where k B is the Boltzmann constant and .This shows an interesting relationship between the slope of tangent s x and the energy E. Also, equation (15) resembles the general relationship between the Hamiltonian of the canonical ensemble and the probability of states, In general, for various choices of ensembles in statistical mechanics, it would be interesting to see if the slope of tangent s x for the distribution of states can be related to the Hamiltonian.We will try and explore this in future papers.
5. Injectiveness of the graphs g 1 , g 2 , and g 3 In this section, we prove that there is a unique injective correspondence between the original density g 1 , the casebased-entropy curve g 2 and the slope of diversity curve g 3 .This means that the shape of original continuous distribution uniquely determines the shapes of both case-based entropy and slope of diversity curves.
Theorem 5.1.Let p(x) be a probability density function (pdf) on (a, b) with finite entropy and with = -¥ a and = +¥ b permitted.Also let  1 be the set of graphs of the original probability density p(x),  2 be the set of graphs of the corresponding case-based entropy curves, and  3 be the set of graphs of the corresponding slope of diversity curves, with g 1 , g 2 and g 3 denoting elements (graphs) in  1 ,  2 and  3 respectively.In addition, let  T j k be the map from the graph  j to the graph  k where = j k , 1, 2, 3. Then we have the following: is injective (or one-to-one).
Remark 5.1.We note that the map k is taken to be the map between Î  g j j and Î  g k k with points taken as they appear from left to right.(
This shows that the map  T 1 2 is injective.

Let
2 .It will be shown below that .
This shows that the map  T 2 3 is injective.
3 .We will show below that , .
This shows that the map  T 3 1 is injective.
Remark 5.2.Theorem 5.1 says that, just like in the discrete case, there is a one-to-one correspondence between the original density g 1 , the case-based-entropy curve g 2 and the slope of diversity curve g 3 .This means that the shape of g 1 uniquely determines the shapes of both g 2 and g 3 curves.

Exponential distribution
In this section, we compute the slope of diversity curve for the general exponential distribution and show that we can reconstruct the original distribution from the slope of diversity curve (and hence equivalently from the casebased entropy curve).Suppose that p(x) = λe − λ x ; x ä (0, ∞ ).
x x x x t dt x 0, 0, 0, 0 In other words, , for 0, ; Evaluating the inside integral by parts, we have ) x e e e x e ln ln 1 e 1 1 1 l n .
x e e e exp 1 1 l n 1 1 .
Taking the logarithm of both sides In other words this example illustrates our theoretical work.Also from figure 2, and since s x = λx − 1 is an increasing function of x since λ > 0, it is clear that for disjoint partitions (x 1 , x 2 ) and (x 3 , x 4 ) where x 3 > x 2 , the part (x 3 , x 4 ) is more uniformly distributed than (x 1 , x 2 ).

Power law
In (Castellani and Rajaram 2016), an empirical comparison of diversity of power law distributions obtained from real data for various systems was done using case-based entropy.Here, we consider the power law from a theoretical standpoint.We consider the power law distribution as below: Definition 6.1.A continuous random variable X is said to follow a power law distribution if its probability density function denoted by p z (x) satisfies the the following: Theorem 6.1.Given a power law distribution as in definition 6.1, its entropy is given by According to how we have defined C we have Using this fact, we can rewrite H as We can also examine this by recalling that which proves the Theorem., Remark 6.1.Recall that We know from our previous paper that s x is given by: Therefore, x z min min min Hence, the slope of tangent s x can also be back-calculated from the probability density p z (x).We note that the slope of tangent of the slope of diversity curve at (

Conclusion
Accurately quantifying the degree of uniformity of probability distributions or its parts is a fundamental idea that is important due to its potential applications in the realm of studying inequality of resources.While the the Hill numbers q D provide a good starting point of such a quantification, there exist several limitations.First, the Hill numbers are insensitive to permutations and hence give the same value for a rearrangement of the original distribution.This is problematic since the shape of the distribution provides a very important characteristic of the distribution, namely the regions of abundance or scarcity.Second, the Hill numbers in their traditional sense, do not lend easily to comparison of degree of uniformity (or inequality) of parts of a distribution.
In this paper, we have shown that mathematical diversity of a probability distribution is a tool that allows us to quantify the degree of uniformity of a distribution or its parts for continuous distributions.In theorem 3.1 we established an explicit relationship between the degree of uniformity of a partition P = U i P i and its sub-parts P i for continuous probability distributions.We also established an explicit way to compute the degree of uniformity of a given arbitrary part P = (x 1 , x 2 ) of a continuous distribution using the slope of secant ( ) S x x , 1 2 of the slope of diversity curve g 3 in theorem 3.3.We were able to completely reconstruct the original probability distribution using the slope of tangent s x of the slope of diversity curve g 3 in theorem 4.1.Finally, we were also able to show that there exists a one-to-one correspondence between the original continuous probability distribution g 1 , the case-based entropy curve g 2 and the slope of diversity curve g 3 in theorem 5.1.These results are the continuous counterparts of the results proved in (Rajaram et al 2023).
The main application of our work is in identifying regions of a given probability distribution that have the same degree of uniformity (we call this Shannon Equivalent Equiprobable or SEE parts) in a large dataset, based on our idea of mathematical diversity derived from information theory (or Shannon entropy to be more specific).Once such regions are identified, this gives researchers a starting point to further investigate such subsections of the original data to identify internal mechanisms or principles that led to such an equal degree of uniformity.One could start by looking at a single variable (which perhaps is an important characteristic of the dataset), and after identifying the SEE parts, can delve into the distribution of other variables of such parts to meaningfully explain the SEE behavior.Conversely, given two or more parts, we can compute and compare the degrees of uniformity of the given parts and say which part is more or less uniformly distributed compared to the others.
Another application could be to derive a much better measure of equality (or inequality) or uniformity (or non-uniformity) of a part or an entire distribution.For example, in the case of an income distribution, the slope of diversity curve g 3 can be used (by simply drawing secants of equal slope) to identify SEE parts of the distribution.We can compare the slopes of secant of parts to identify and also quantify the degree of uniformity of distribution of wealth.This is much more information than the GINI coefficient which (a) is an overall number and (b) is insensitive to the shape of the distribution.So in a sense, our technique will potentially prove to be more useful to analyze and quantify inequality in probability distributions by not only characterizing such an inequality for entire distributions, but also systematically dividing the distribution into SEE parts that have the same degree of uniformity of income.
In terms of where our program of research goes next, our goal is to advance these results to investigate distributions such as the power law, which has been well known to model the tails of several distributions in reality.For that matter, creating a quick toolbox that will quickly draw the three curves g 1 , g 2 and g 3 , along with the ability to draw the secants for g 3 and automatically filter out the SEE parts of the original distributional data for further investigation will prove very useful.We will endeavor to create such a computational toolbox.Lastly, we intend to apply our work to income distributions specifically to show that we can identify SEE parts of the distribution that will systematically divide the original data into parts that are SEE equivalent (and not just study the rich and poor parts).We strongly believe that this will lead to better policy formulation for the betterment of society towards equity in distribution of resources by using our information theoretic approach of diversity.
Definition 2.1.(Shannon Diversity corresponding to q = 1 for Hill numbers) Given a continuous random variable X with support (a,b) (with = -¥ a and = +¥ b allowed) and its probability density p(x), the diversity of the entire distribution ( ) D a b 1 Theorem 3.2.Let p(x) be a probability density function (pdf) on (a, b) with finite entropy and with = - the same steps for the part P i we have ( Theorem 3.3.Let p(x) be a probability density function (pdf) on (a, b) with finite entropy and with = -¥ a and the following are true: ⟺ Dividing both sides of equation (13) by the corresponding sides of equation (14), part P of a continuous distribution as the weighted geometric mean of the degree of diversity of -parts P i with the cumulative probabilities c P i as the weights.Theorem 3.3 means that when comparing the slopes of secants ( ) the slope of diversity curve, we are also comparing the degrees of uniformity in the parts ( )

. 1 .
Let p(x) be a probability density function (pdf) on (a, b) with finite entropy and with = -¥ a and = +¥ b permitted.Let s x represent the slope of the tangent at a general point on the slope of diversity curve denoted by the definition of A (a,x) , and by taking the natural log of equation (21), we have the following:

Figure 1 .
Figure 1.Slope of tangent of the slope of diversity curve.

Figure 2 .
Figure 2. Slope of diversity curve for the exponential distribution.
Slope of diversity curve for a power law distributionIn this section, we calculate an explicit formula for the slope of diversity curve of the power law distribution.Theorem 6.2.Given a power law distribution as in definition 6.1, the slope of diversity curve which plots ( ) y-axis has the following explicit formula: min Also, the slope of the tangent s x of the slope of diversity curve at ( )

)
Setting s x =0 we can obtain a minimum.In fact, no matter what α is, -Differentiating and setting ¢ = y 0 we again find that a minimum occurs at = --

Figure 3 .
Figure 3. Slope of diversity for a power law distribution showing a minimum at ( ) = c 0.632 x x , min We next define the degree of uniformity of a part P = (x 1 , x 2 ) i i i to be the average case-based entropy per unit cumulative frequency for the part P and the sub-part P i respectively.