Music Tune Restoration Based on a Mother Wavelet Construction

It is offered to use the mother wavelet function obtained from the local part of an analyzed music signal. Requirements for the constructed function are proposed and the implementation technique and its properties are described. The suggested approach allows construction of mother wavelet families with specified identifying properties. Consequently, this makes possible to identify the basic signal variations of complex music signals including local time-frequency characteristics of the basic one.


Introduction
Discrete and continuous wavelet transforms are becoming an indispensable part of modern mathematics and other human activities. Furthermore, owing to the growth of computing technologies in recent years, hardware and software contributed to solution of many mathematical problems connected with pattern recognition. So, modern high-performance computing allows solving a number of problems including speech recognition, graphical objects, processing of seismic data, cardiograms, etc. A lot of them require using wavelet transforms. However, some problems in musical pattern recognition and their implementation in the automated information systems have been insufficiently studied [Ошибка! Источник ссылки не найден., 10]. One of such problems is a problem of identifying a separate note in one-voice and polyphonic melodies performed on certain musical instruments.

Music signal model
A simple mathematical model of any music melody consists of a set of notes played in different times on a certain musical instrument [Ошибка! Источник ссылки не найден.]: F(t)=A 1 n 1 (t-θ 1 )+A 2 n 2 (t-θ 2 )+…+A N n N (t-θ N )+h(t), where n i (t) is the amplitude-time characteristics of a single note voice; θ i is the temporal shift determining initial time of each note sounding; A i is the sound volume of the separate note; h(t) is the signal of disturbance introduced by sound-recording equipment; t is time.
The majority of musical instruments possesses the property of self-similarity which allows obtaining the temporal function of any note n i (t) from the same musical instrument from the temporal function of one note n 0 (t) of a certain musical instrument by scaling function n 0 (t) along time axis:  where m i is the scale factor, i is the position of note n i (t) by height relative to note n 0 (t). For uniformly tempered pitch of European music, m i is represented as 12 [Ошибка! Источник ссылки не найден.]. For example, note «C» of the second octave is situated 12 semitones higher than note «C» of the first octave, and has a pitch frequency two times higher than the pitch frequency of note «C» of the first octave [Ошибка! Источник ссылки не найден.]. The time function of note «C» of second octave n 2 (t) is two times compressed relatively function n 1 (t): Any note in the range of a certain instrument may be selected as base note n 0 (t) regardless of being used in concrete musical signal F(t). For example, the temporal function of note «A» of the first octave with pitch frequency υ =440 Hz may be taken as n 0 (t).
This property is implemented in all modern musical synthesizers using the method of wave table [Ошибка! Источник ссылки не найден.]. Such synthesizers use the data bank of voices of one (basic) note for each musical instrument. When forming a tune of one instrument, the basic note signal is always scaled by a pitch by value m i . Each note is shifted in time by value θ i and scaled by amplitude by value A i according to this instrument part [Ошибка! Источник ссылки не найден.]. All formed functions for musical instrument are summarized. As a result, the model of a musical signal formed by a synthesizer for a definite musical instrument has the following form: Similarly, for k-different musical instruments simulated by the synthesizer, a musical signal may be represented by a sum of signals of all notes played at different moments of time with different amplitude: where n 0 k (t) is the time-dependent function of a basic note signal for a k-musical instrument; θ i is the time interval shift of note n 0 k (t); m i is the scale of note n i k (t) in regard to basic note n 0 k (t) specifying frequency of the main tone; A i is the magnitude of note n i k (t). The above-mentioned function f(t) represents an idealized model of a musical signal obtained similarly to recording several musical instruments in an orchestra or an ensemble [Ошибка! Источник ссылки не найден.].
According to the model of musical signal f(t), the task of musical signal identification may be presented as the task of identification of note amplitudes with certain scale m i and time shift θ i for all k musical instruments presented in analyzed signal f(t). The task of identification of a singly recorded musical instrument melody or a melody generated by a musical synthesizer may be considered as a particular case of this task [Ошибка! Источник ссылки не найден.].

Continuous wavelet transform
Selecting the mathematical apparatus for time-frequency analysis of signal f(t) which is nonstationary and nonperiodic one, it was determined to apply a continuous wavelet transform (CWT): where w(t) is the mother wavelet function; s is the coefficient of wavelet scaling; τ is the coefficient of wavelet shift.
One of the features of CWT is the formation of wavelet family w s,τ (t) by shifts τ and scaling s of mother wavelet w(t). The wavelet family formation is similar to the system of formation of note family n i k (t) of one basic note n 0 k (t) by shifts θ i and scaling m i . Therefore, CWT usage in this task is rather restricted.
The procedure of selecting a mother wavelet function is empirical for each definite task and reduced to searching functions of mother wavelets in CWT until the achievement of a desired result. The research of wavelet function properties [Ошибка! Источник ссылки не найден.] showed that the best graphic presentations of CWT results are obtained in case of conformity of frequency spectra of signal f(t) and wavelet w(t).
For each scale s, function Wf s (τ) is similar to the cross-correlation function of signals w s (t) and f(t), and describes them both as the similarity measure of two signal form and their positional relationship to each other on the time axis.
It is known that values of the cross-correlation function are maximal in case of function coincidence [Ошибка! Источник ссылки не найден.]. In this case, values Wf s (τ) are maximal for such τ as functions f(t) and w s (t -τ) are equal: f(t)=w s (t-τ). It is obvious that besides shift τ, the equality of two functions in each point t is required for fulfillment of this condition.
The wavelet function used in CWT should meet a number of required conditions [Ошибка! Источник ссылки не найден.]: 1. limitation (localization) in time: w(t)>0, at t→∞; 2. sectional continuity of function w(t); 3. integrability with zero equality. The example of the mother wavelet-function satisfying all given conditions is Morlet wavelet (Figure 1).

Mother wavelet construction
Owing to the conditions of time localization imposed to the basic wavelet function, function Wf s (τ) reaches a maximum value when wavelet w s (t-τ) of scale s coincides more precisely with the local section of signal f(t) [Ошибка! Источник ссылки не найден.]. Functions of a wavelet and a signal should be equal for exact coincidence at the local time interval. We suggested using the function of the mother wavelet formed of the local section of the analyzed musical signal in the given paper. For a more restricted task, identification of musical instrument notes, the mother wavelet may be formed of the local section of the basic note signal function of this musical instrument n 0 (t) (Figure 2). For the formed wavelet, conditions 1, 2, 3 should be fulfilled as follows.
Here, t 0 characterizes time moment from which the values of wavelet function w(t) are equal to values of the function of signal n 0 (t), and value Т equals to the fragment duration of signal n 0 (t), coinciding with wavelet w(t). Values t 0 and T are selected to satisfy the rest conditions required for wavelet functions [Ошибка! Источник ссылки не найден.].
2. Sectional continuity The function of basic note n 0 (t) is continuous on the whole interval of existence as it describes oscillations of a physical body with finite mass in time and cannot have any breaks.
To support sectional continuity of the basic wavelet function, the condition of zero equality of initial and finite values of function n 0 (t) on interval [t 0 , T+t 0 ] should be fulfilled: n 0 (t 0 )=0 and n 0 (T+t 0 )=0.

Integrability with zero equality
One of the properties of musical instruments is absence of harmonic components with frequency lower than frequency of the note pitch. Zero harmonic is absent in musical instrument signals as well [Ошибка! Источник ссылки не найден.]. This property allows supporting zero mean for n 0 (t) on interval [t 0 , T+t 0 ] at the integer number of periods in function n 0 (t) on a given interval [Ошибка! Источник ссылки не найден.].
Therefore, to form the wavelet possessing the highest selectivity to signal n 0 (t), it is necessary to use a periodic section of signal n 0 (t) with zero initial n 0 (t 0 ) and zero end n 0 (T+t 0 ) moments such as   For the tasks of elementary components (notes) detection in a musical signal, two conditions should be followed for the wavelet family: 1. frequency resolution (the window height along scale axis ∆s) should identify different frequencies of two adjacent notes; 2. time resolution (the window width along time axis ∆τ) should allow identifying all notes of minimum possible duration.
1. Frequency resolution The experiment was carried out to estimate the resolution of artificial mother wavelets. The aim of the experiment was to determine a number of periods in the wavelet that allows identifying frequency scales m i of all notes being in the signal simultaneously (at CWT wavelet scales s relative to the basic one are equivalent to m i ).
Chord «C major» of the first octave was used in the experiment. Frequencies of note pitches of this chord corresponds to harmonic signals with frequencies 261,6; 329,6; and 392 Hz [Ошибка! Источник ссылки не найден.]. The signal duration is chosen to be equal to 0,2 s: Mother wavelets fw i (t) were constructed from the harmonic signal to study the test signal. The amount of harmonic signal periods in wavelets was 1, 2, 4, 8, and 16 periods, respectively ( Figure 5).
CWT was carried out with all wavelet families w i (t) for test signal f(t). For each transform, the results of graphical interpretations were interpreted in three-dimensional models [Ошибка! Источник ссылки не найден.]. 3-D models of CWT results for the families of wavelets w i (t) are presented in Figure 6. The ordinate axis of each illustration represents an axis of the wavelet s scale, the abscissa axis is the axis of time shift τ; τ 0 , τ 1 -time of signal f(t) beginning and ending. The magnitude of CWT results is presented by the gray scale where darker sections correspond to a higher magnitude of CWT results [Ошибка! Источник ссылки не найден.] (Figure 6).   Figure 6 shows that when the wavelet was used with one period, a region of uncertain results in the region of time shifts τ, with duration ∆τ is rather low; therefore, the error of the signal time estimate is low. In this case, frequency resolution is so low that for the majority of scales s, CWT results have the same values indicating the presence of frequency components in the whole frequency range of research. However, there are three localized harmonics in test signal f(t).
It is seen that for the wavelet with 16 periods over signal f(t), on the interval from τ 0 to τ 1 , there are high values only for CWT results for three scales s equivalent to frequencies 261,6; 329,6; and 392 Hz of signal f(t). For the rest values of s, CWT results are almost equal to zero. Low time resolution produced uncertain results at the beginning and ending moments of the signal with magnitude intermediate values on intervals with duration ∆τ. It does not allow judging the changes of the test signal magnitude on this interval.
Thus, the wavelet family with one period of the sine signal during CWT gives high time resolution, but rather low frequency resolution. However, the wavelet family with sixteen periods gives high frequency resolution (all harmonics in the signal are identified definitely), but low time resolution.
2. Time resolution Musical notation implies using notes and the rest symbols to mark out melody elements and gaps at which voices do not sound. The note duration as well as the rest duration are multiplied by duration t 1 of semibreve («whole note») -a note of maximum possible duration. The system of musical notation consists of alternating notes and the rest imposes strict constraints on the moments of the note sounding start, and the rest start. Moments of the note sounding start are sampled with sampling period t d equal to duration of the shortest note. Both in classical and modern musical compositions, the shortest note by duration is the hemidemisemiquaver note with duration t 64 =1/64·t 1 [Ошибка! Источник ссылки не найден.]. In practice, notes with duration t 64 occur rather seldom due to the technical complexity of performance. In fact, duration t 32 =1/32·t 1 (demisemiquaver note) may be considered as the shortest note. The fragment of a two-voice melody is shown in Figure 7. Each time, not more than two notes sound simultaneously. Note 2 is the shortest one. Notes 1, 3, and 4 are equal in duration and two times longer than note 2 and the rest. If t d =t 32 , then note 2 and the rest have duration t 32, and notes 1, 3, 4 have duration t 16 .
At wavelet duration T=t d , an envelope function of joint-correlation function Wf s (τ) of the wavelet and the signal of one note coinciding with it by form (for concrete value s) regenerating into the autocorrelation function has the form of an equilateral triangle with maximum in the centre of note sounding [Ошибка! Источник ссылки не найден.] and width 2t d (Figure 8, а). If all values of Wf s (t) are smaller than Wf mist (t), it would be ejected, and the note identification time is t i . If Wf mist (t)=0,5Wf max (t), then t i =0,5t d . This means that the note identification time equals to a half of its length.
If the wavelet duration is T=0,5t d , then the envelope of correlation function Wf s (τ) of the wavelet and one note signal, coincided with it in the form, has the form of an isosceles trapezium the width of the upper boundary of which is equal to 0,5t d (Figure 8, b). If all values Wf s (t) are smaller than Wf mist (t), they are rejected, and then the note identification time is t i . If Wf mist (t)=0,5Wf max (t), then t i =0,5t d . It means that note identification time equals to its length.  Upon further decrease of wavelet width T and condition Wf mist (t)=0,5Wf max (t), note identification time t i is constant and equal to t d .
Conclusion: the wavelet of length T is not more than t d , and should be used for time identification of the note with smallest length t d .
Taking into account two conditions to duration T of wavelet in time: 1.
, where υ is the frequency of the identified note pitch; 2. Tempo of the majority compositions performance varies as a rule in the range of 60-180 beats per minute (BPM) that corresponds to the time of sounding for one semibreve (whole) note t 1 =1,3...4,0 s. Therefore, note sounding with duration t 32 at quick tempo (180 BPM) amounts to t 32 =1,3/32=0,041 s. Therefore, the length of the wavelet capable of identifying the time interval of a note with the shortest duration at high tempo should amount to not more than t d =t 32 =0,041 s. The value of boundary frequency of the note is pitch identification equal to υ(t d )=16/0,041=390 Hz. This means that just identification of the beginning and ending time of note sounding is possible only for notes with pitch frequency higher than 390 Hz (Figure 9) (starting with note «G» of the Middle octave, the pitch frequency of which amounts to 392 Hz). For slower tempo of composition performance or for tasks of detecting the note with the duration longer than t 32 , the boundary frequency decreases. So, for example, for modern club dancing compositions, the reproducing tempo varies near the value of 120 BPM. And the shortest note is td=t 16 =0,125 s. The value of boundary frequency of note pitch identification is υ(t d )=16/0,125=128 Hz. Identification of sounding time of note beginning and ending is possible only for notes higher than «C» of Bass octave with pitch frequency 130,8 Hz. The majority of musical compositions use notes at the range of the Middle, under Middle and above Middle octaves, which is higher than the range of Bass octave.

Conclusion
The suggested approach allows making mother wavelet families with defined selectivity. The wavelet developed from a fragment of a certain basic signal allows detecting the time-frequency behavior of the basic signal under study instead of separate time-frequency characteristics of the studied signal. Mother wavelets for voices of various orchestra groups such as piano, organ, violin, bells, and trumpet were obtained experimentally. All wavelets contain 16 periods of a signal of a proper musical instrument with pitch frequency 55 Hz. While calculating CWT, the mother wavelet is scaled in such a manner that next wavelet w i (t) coincides by pitch frequency with note pitch frequency n i (t) of a musical instrument.
Construction of mother wavelet families, developed on the basic of musical instrument notes, showed the possibilities of detecting frequency and time parameters of certain instrument notes in onevoice and polyphonic tunes. Besides, it made it possible to identify a tune of a certain musical instrument on the background of sounding of another one in a number of experiments [Ошибка! Источник ссылки не найден.].
The authors suggested that the implementation of this technique in tasks, requiring detecting the fragments of certain families with the limited length in a signal against the background of other signals or disturbances, is to be researched further.