A study on Raga characterization in Indian classical music in the light of MB and BE distribution

Raga characterization in Indian classical music is an important aspect of music learning in this country. But the methods usually followed are mostly qualitative. In this study, we intend to quantify such abstractness using measurable parameters. To study musical information congregation quantifiably, we introduce methods based on well-known concepts used in Statistical Physics, namely Maxwell-Boltzmann (MB) and Bose-Einstein (BE) distribution. In this present study, these distributions have been applied on the chosen acoustic signals to find new parameters (equivalent to ‘temperature’ in physical systems) which can distinguish between different features of different ragas (containing the same notes) in Indian classical music. Music clips chosen were the ‘Alap’ part of these three different ragas (Marwa, Puriya, Sohini) sung by a legendary classical music maestro. All of the chosen three ragas are based on the following same note structure: Sa, komal Re, shuddh Ga, tivra Ma, shuddh Dha, shuddh Ni. To apply MB statistics to music, it is assumed that different notes with different occurrence frequencies are at different energy levels, the distribution of which follows the MB distribution pattern. In case of BE statistics, a rank-frequency distribution of the time durations of various notes of different ragas is studied. The resulting analysis gives rise to a number of parameters that help to categorize the individual characteristics of ragas. The methods studied here are novel in the music research field and can prove to be useful in the fields of music and speech as quantifying parameters for style identification.


Introduction and background
Raga, in spirit, is the structural unit that holds the huge body of Indian classical music. It is (but not limited to) a combination of musical notes-comprising an expression of one or more emotionsfollowing certain rules. Ragas have a well-defined structure consisting of a series of four/five (or more) musical notes upon which its melody is constructed. However, more than the notes themselves, the way they are approached and used in musical phrases is more fundamental in defining a specific Raga rendition. This allows ample scope of improvising beyond the structured framework, a feature that provides Hindustani Classical Music a distinguished character among world music scenario. The goal of the performer is to convey the musical structure as well as the mood so that the emotion of the Raga is conveyed to the audience. Although there are a number of definitions attributed to a Raga, it is basically a tonal multifarious module. For every Raga, the tonic 'Sa' is the most important note as it can be identified uniquely from appropriate establishment of 'Sa'. Few basic elements must be present in each Raga such as: presence of atleast 5 (and maximum of 7) notes or swaras, presence of ascending/descending patterns (arohana/avarohana) and Vadi-samvadi, i.e., specific dominating and complementing pairs of notes, existence of gamakas or vibrations around the note, presence of characteristic phrases (raga specific) etc.
The vocal presentation of a Raga in Hindustani music style can be divided into two categories -Khayal and Dhrupad. Khayal is further divided into two parts: Alap/Opening section (accompanied by tanpura drone), where the Raga is introduced and developed using Raga specific characteristic notes, and Bandish/Binding section (accompanied by various instruments such as Tabla, Pakhawaj, Harmonium etc.), which are fixed, melodic compositions unique for each Raga -with lyrics, Ragaspecific notes and beats or taal. Dhrupads are simpler forms of Khayal having higher emphasis on Meends (a pattern of gliding from one note to the next). In Raga performances, the existing phrases are stretched or compressed, and the same may happen to motives from the phrases; further motives may be prefixed, infixed and suffixed. Phrases may be broken up or attached with others, and motives or phrases may be sequenced through different registers [1]. Unlike symphony or a concerto in Western Musical tradition, Raga is unpredictable and ever-blooming -the way in which a performer interprets a Raga during each specific performance is unique and is the very essence of improvisation in Hindustani Classical Music [2]. The existing approaches of Raga classification and identification base themselves in the emerging field of MIR (Music Information Retrieval). Fundamental frequency F0 and its respective ratios to other tonic frequency are generally applied in identifying the notes. Apart from it, pitch related features like pitch class distributions and template matching techniques are also used. Some studies have used Probability density function (PDF) of pitch contours for raga classification as well. For an extensive review, one should refer to [3]. It can be seen that different feature-based approaches have been applied both on North and South Indian classical music to varying degrees of success. The characteristics present in any of the Raga structure throw a vast number of musical information towards the listener and analysts alike. When looked at as a whole, this congregation appears too complex. But, similar to every other information humanly perceivable, it also is made up of repetitive pattern or sequences of some common basic elements. One central problem in the analysis of these sequences is how to effectively categorize their information content based on the common elements found in their origins. Variations of musical compositions-whether in rendition styles or in emotions it conveys-can only be recognized by trained and experienced listeners. This kind of categorization is mostly non-quantifiable. Hence, to categorize such musical information, one needs to address the specific problems of identifying the sequential patterns and quantitatively applying this knowledge in subsequent comparison. In this work, following the arguments above, we have attempted to quantify and categorize the musical information in samples of Hindustani classical music using tools from statistical physics. The usage of statistical physics in the domain of linguistics or social sciences is a longstanding practice [4][5] [6]. The basis of the application of the statistical methods in the structured, high dimensional data is a brilliant empirical law, primarily used in literature research, known as Zipf's Law, formulated by linguist George Zipf in 1949 [7]. It states that if we we assign the rank j = 1 to the most frequent word of a language, j = 2 to the second one, etc., then the frequency of occurrence f(j) of a given word varies with its rank j as: f(j) ~ 1 . j -α (1) Here, α is an exponent which is to be determined from the rank vs. frequency distribution. Zipf's law is so remarkable because it applies to various diverse systems, including economic, linguistic, urbanization and other social ones [8][9] [10]. It is also hard to overlook its similarity with statistical distributions concerning energy of a system of particles in equilibrium. According to statistical mechanics, when a system of particles is in equilibrium at constant temperature T, then it can be found in one of N states permissible. The probability p i that it is found at a given state i with energy E i is: p i ~ 1 . exp (-βE i ) (2) Here, β = 1.(kT) -1 ; k is the Boltzmann constant (= 1.38x10 -23 J/K) and T is absolute temperature, the 'measure' of the interaction of the system with the environment. The approach we follow in this study is based on the analogy between the rank-frequency distributions (using Zipf's law) of a note-duration combination of a music sample and the statistical distributions (both the Maxwell-Boltzmann distribution and the Bose-Einstein distribution in grand canonical formulation). The distribution of the occurrence of notes (combined with their durations) is characterized by a set of parameters, one of which includes the equivalent of Temperature in case of physical systems. This 'temperature' parameter is quite familiar in linguistic research [11] [12] and it has been used to specify the underlying dynamics of various languages, authorship disputes, changes in complexity of vocabulary and many more. In previous occasions, we have applied this technique for artist classification [13] and categorisation of artist's improvisational patterns [14]. Here, we apply this statistical approach on different note-duration combinations present in different Raga samples to find out whether it enables us to categorise them according to their note distribution and movement patterns.

Maxwell-Boltzmann distribution
The Maxwell-Boltzmann (MB) statistics is generally used for distribution of an amount of energy between identical but distinguishable particles. MB statistics predicts that the probability of finding a particle with a specific energy decreases exponentially with increasing energy, considering the system consists of a huge number of non-interacting distinguishable particles.
The distribution function has the following form: is the probability of a particle having energy E i , A is the normalisation constant, E i is the energy of the i-th state, k is the Boltzmann constant, and T is absolute temperature.

Bose-Einstein distribution
Bose-Einstein (BE) statistics describes the dynamics of an ensemble of identical and indistinguishable particles occupying discrete energy states. The distribution function indicating the energy distribution looks like: (4) where f(E i ) is the probability of a particle having energy E i , 1/A denotes the degeneracy, i.e., how many particles are having particular energy state E i , E i is the energy of the i-th state, k is the Boltzmann constant, and T is absolute temperature. The -1 factor in the denominator recognises the fact that the particles are indistinguishable, unlike MB statistics. BE distribution applies to a very particular kind of particles who have integer spin values, known as Bosons. They do not obey the Pauli's exclusion principle and hence unlimited number of particles can occupy the same energy state (The particles that obey the Pauli's exclusion principle are called Fermions). This unusual property of the BE distribution helps in applying this concept in various systems beyond the sub-atomic world. Figure 1 shows the MB and BE distributions on a population vs. energy graph.

General methodology
To analyse the ragas, first, we need to accumulate a 'musical corpus' of used notes in the raga renditions, similar to a literary corpus (in our case it is the compilation of notes and their respective durations used in three ragas) [11]. From the pitch profile of the music sample, an experienced classical musician determines the frequency of 'Sa'. Frequencies of the rest of the notes are found out with their respective frequency ratios with 'Sa'. Afterwards, the existence and duration of occurrence of all the notes is indicated by analysing the pitch profile of the music clip using Wavesurfer software. The window is taken to be 10 milliseconds each. This way we find the number of occurrences of each note and their respective durations. For example: suppose Usa 50 , indicating the occurrence of the upper octave 'Sa' for 50 milliseconds, has occurred 10 times during a piece. Similarly, Lre 30 (lower octave 'Re', 30 millisecond duration) appears 24 times, ga 40 (for middle octave) for 61 times etc. Then the probabilities of the occurrence of note-duration combination are plotted along with their respective 'energies' using equation (2), taking k=T=1. This leads us to Figure 2: As it's clear from Figure 2 that the probability vs. energy graph is perfectly exponential (as expected) and follows MB distribution pattern. The temperature T MB of the corpus is assumed to be 1K, for comparison purposes later. The model equation we use to curve fitting is [11]:  Next, we shall plot the probability (or frequency of occurrences) vs. energy graph for experimental clips and derive T MB in each of the cases to compare with the present 'music corpus'.

Rank-frequency distribution
To apply BE distribution, as discussed in the previous sections, first step is to prepare the rankfrequency distribution of the combination of notes and their durations. After segregating the noteduration combinations, the rank-frequency distribution is constructed from these data. The component having highest number of occurrences is given rank 1; the second most frequent is given rank 2, and so on. Components with the same frequency are given a consecutive range of ranks, the ordering within which can be arbitrary.
The rank-frequency distribution of such a sample looks like Figure 3: Horizontal plateaus in Figure 3 in the domain of high ranks/low frequencies correspond to a large number of components having the samefrequency. The longest plateau corresponds to frequency 1.

Physical analogy of frequency structure and B-E distribution
Following the treatment in [16], we invert the rank-frequency distribution in a relation between number of occupants N j vs their absolute frequencies j. We identify the energy level j with the number of occurrences of note-duration combinations. Hence, the components occurring once is situated in energy level j = 1, twice occuring components sit in energy level j = 2 etc. Each of the energy levels can have any number of occupants (N j ) without any restrictions. This idea alignes in accordance to the B-E distribution, where each energy level can be occupied by any number of particles, without restricting laws like Pauli's exclusion principle. Such a plot of N j vs j follows the B-E distribution pattern. For such a distribution, the relation between occupancy number of j-th energy level N j and j is [17]: Where, z is the fugacity, ε j is the energy of the j-th level and T BE is the temperature. The spectrum of ε j is given by: ε j = (j -1) α (7) Unity is subtracted to make sure that the lowermost energy state, j = 1, has zero energy. The main focus of the study is on the lower frequency data since the energy spectrum relationship can look different for higher energies, i.e., higher occurrence states.

Parameters to be determined
First, z is calculated from the lowest N j value, i.e., the occupancy of the lowermost occurent state using the equation (6), putting j = 1: Also, T BE is to be determined by fitting the data to equation (6).

Experimental details
For this pilot work, we have chosen to study three of the most popular and frequently performed ragas in North Indian classical music, namely: Marwa, Puriya and Sohini (all three performed by Pandit Ajoy Chakrabarty, an Indian classical music maestro). The reason of choosing them is the interesting commonality that all three of them use the same set of notes: Sa, Komal Re (Re1), Shuddha Ga (Ga2), Tivra Ma (Ma2), Shuddha Dha (Dha2), Shuddha Ni (Ni2). All three of them belong to Thaat Marwa, but they are widely believed to evoke three distinctly different categories of emotions among listeners. Table 1 summarizes the acoustic features of the three Ragas in light of Indian Classical Music. The sample music clips were of 3 minute 30 sec duration from the Alap part of the raga, selected by a classical music expert. Alap provides the essence of the raga as the characteristic phrases, Vadisamvadi pairs etc are well represented in this part.

The M-B distribution
The probability of the note-duration combinations for each raga is plotted against the corpus energy of Figure 1 and fitting a curve using eq. (5), we obtain the temperature T MB for each of the Ragas. The plots for the three ragas are given in Figure 4(a-c): The red line indicates the best fit of the scattered data. The curve fitting results are given later in Table 2.

The B-E distribution
The N j vs j plots of the three Raga performances are given in Figure 5 (a-c).
As it is seen in Figures 4 and 5, both the plots follow the respective distribution patterns closely. Next, we fit these datas into equations (5)-(8) to find T MB , z and T BE . The results of the fit are given in Table 2:   Table 2 indicates that among the three Ragas, Puriya has the highest number of note-duration combinations present, closely followed by Marwa. Sohini, on the other hand, has fewer notes compared to them. Although, interestingly, the number of once occurring note combinations (N 1 ) are almost similar in the three Ragas.
T MB values show that the 'temperature' of Raga Sohini is highest amongst the three (1.07 K), whereas, Puriya being the lowest (even lower than the corpus temperature, 1K). This indicates the kinetic nature of usage of the notes in these ragas. Puriya, having a Vadi-Samvadi pair of Ni2 and Ga2 in exact 5 th harmonic relationship, moves slowly and known to be of a serene nature. Sohini, on the other hand, has a faster tempo than others, evoking joyfulness. In literature analysis, it is seen that T MB (or 'word-energy') can be used as a classifier based on the word occurrence [12] [18]. Similarly, in our case, T MB is an indicator of the kinetic nature of the Raga (not unlike how higher temperature indicates higher kinetic energy in thermodynamics). Another noteworthy observation that could be made from the above data is that the ratio of N 1 and N (N 1 :N) follows the T MB trend, i.e., Sohini > Marwa > Puriya. This result echoes similar trends seen in [13]. A possible explanation could be made by going back to the origin of the analogy -gas molecules in a vessel -the M-B distribution, when represented in the population vs velocity graph, broadens and shifts toward higher velocity part with increasing temperature (Figure 6). This indicates that the probability of finding greater number of molecules having high velocity increases, i.e., if one molecule is taken at random, it is more likely that it will have most probable velocity. In our experimental scenario, ratio (N 1 :N) represents the probability of finding a random note-duration combination having highest occurrence state 1 (from Figure 3, it is evident that 1 is the highest occurring frequency). The linear relation between T MB and (N 1 :N) suggests that with high T MB , more combinations are likely to be found in the most probable state N = 1. This observation further justifies the analogy of music with statistical ensemble of particles.
In case of the temparature calculated from Bose-Einstein distribution (T BE ), Sohini has significantly lower values. Since the B-E distribution implies that low T BE means all the particles in the system  have very little energy variations (the lowest T BE being Bose-Einstein condensate where that is ZERO), similarly the lower value of 'temperature' in this case would mean lesser diversity in note occurrences and note durations (this is evident from the note distribution pattern given later in Figure 8: other than the dominant Sa, all other notes are short spanned for Sohini) [13]. That is why, despite having lowest notes, 1350 only, Raga Sohini has the highest 'once occuring' note combinations, 138. As indicated in [16], the z-value is very close to unity in each of the cases, which points toward the fitting of B-E analogy in case of the lower frequency notes as the fugacity z ≈ 1 (ideally) for B-E distribution. Additionally, the parameter τ (= lnT BE /lnN) has been observed to be a good variable to look for in the case of comparative studies [17] and the τ-N plane helps in comparing different genres by distributing them across the plane. The N-τ plane, in this study, is given in Figure 7: It is seen that the value of τ has insignificant variation with the size of the units considered [17], something that is replicated in this study as well. Also, in linguistic analysis, lower τ value corresponds to higher language 'analyticity', i.e., the tendency for each word to be single and isolated (in contrast to 'syntheticity' -tending to have more syntactic relations via more word inflection). Our study suggests that lower τ is an indicator of higher usage of single note-duration combinations ('musical analyticity', if one may). Mathematically put, the higher the ratio of N 1 :N, the more the analyticity. This is evident from the note distribution structure for the Ragas given in Figure 8: in the same period, other two Ragas have more notes than Raga Sohini, but the latter has more N 1 value due to the abundance of short spanned note usage.

Goodness-of-fit
The goodness-of-fit of the model could be tested using the determination co-efficient R 2 , as argued in [20] [21]. R 2 is defined as: where f i = the observed frequency of the value i, P i = the theoretical probability of the value i, N = the sample size (the total number of the observations) and ݂ ̅ is the mean of the observed frequencies (݂ ̅ = ∑ ݂ ݊ ⁄ ). In case of B-E data, the summation runs from j = 2 (since j = 1 is fixed by z), for the rest it runs from j = 1. The R 2 value are given below in Table 3. Generally, the R 2 value ≥ 0.9 is considered satisfactory, although ≥ 0.8 is also acceptable [21]. We can see that the fit is satisfactory is case of M-B distribution but lacks for B-E. Presumably, this happens due to the reduced number of plot points once the rank-frequency data is converted into N j vs j. Increased sample size could increase the efficiency of the fit significantly.

Conclusions
The main conclusions that could be summarised from this study are: 1) The fitting of the probability vs. energy graphs indicates that even with a small corpus, significant results can be obtained. 2) The parameter T MB demonstrates kinetic nature of the Raga (analogous to thermodynamics). It has the potential to be used as a Classification parameter for musical information research.
3) The consistency in the z-value indicates analogy of B-E distribution and high-structured sequential data congregation such as music is surprisingly significant. 4) The parameter T BE can be used to indicate diversity in note variations, and also as an improvisation parameter. 5) N-τ plane can be used to categorize different Raga renditions and performance variations for artists based on the nature of note distribution over the whole performance. 6) Based on the τ value, performances (and different musical renditions) can be categorized in 'musically analytic' to 'musically synthetic' spectrum, indicating their note usage and note distribution patterns. Overall, in this study, we have attempted to present an unconventional model to harvest musical information of Indian classical music using fundamental statistical tools like MB and BE distributions, which are traditionally used in the domains of the microscopic reality. The parameters it yielded can categorize different Ragas and their structural features on the basis of note occurrence and presence of note-duration variation. Usage of such statistical methodologies as a classificatory algorithm in the music domain is unique. With larger data and Raga diversities, further correlation between the parameters and finer categorization of musical information could be possible, we believe. The early results are indicative that this method could be used in various fields other than music (such as speech) for categorization, style identification and further classification purposes.