Paper The following article is Free article

A toy model for the auditory system that exploits stochastic resonance

and

Published 5 January 2022 © 2022 European Physical Society
, , Citation Francesco Veronesi and Edoardo Milotti 2022 Eur. J. Phys. 43 025703 DOI 10.1088/1361-6404/ac4431

0143-0807/43/2/025703

Abstract

The transduction process that occurs in the inner ear of the auditory system is a complex mechanism which requires a non-linear dynamical description. In addition to this, the stochastic phenomena that naturally arise in the inner ear during the transduction of an external sound into an electro-chemical signal must also be taken into account. The presence of noise is usually undesirable, but in non-linear systems a moderate amount of noise can improve the system's performance and increase the signal-to-noise ratio. The phenomenon of stochastic resonance combines randomness with non-linearity and is a natural candidate to explain at least part of the hearing process which is observed in the inner ear. In this work, we present a toy model of the auditory system which shows how stochastic resonance can be instrumental to sound perception, and suggests an explanation of the frequency dependence of the hearing threshold.

Export citation and abstract BibTeX RIS

1. Introduction

Mathematical modeling in biophysics is notoriously difficult, because the majority of biological systems cannot be subdivided into hierarchically separated subsystems. The internal correlations and non-linear interactions are often so strong that the reductionist approach that is so successful in physics cannot be applied to biology [13]. Still, in some fortunate cases, simple physical models can account for the main observed features. For example, in 1977 Edward Purcell published a beautiful, seminal paper under the title 'life at low Reynolds number' that explained in simple terms the physical reasons underpinning the evolutionary development of some aquatic organisms [4]. This was followed a few years later by a similarly styled paper 'the efficiency of propulsion by a rotating flagellum' [5] which further extended the considerations of the 1977 paper, again with simple and deep physical arguments. Other notable contributions of physics to biophysics that stand out for their simplicity and depth can be found, e.g., in the fields of biomechanics [6] and biophysical noise processes (see, e.g., [79], and for a modern perspective the beautiful book by Bialek [10]).

Here we try to follow these important leads while focusing on the complexity of the auditory system. The dynamical models used to describe the auditory systems are not analytically solvable, and the approximations used to predict the system's behavior may compromise the overall reliability of the solutions. The intrinsic stochasticity of the underlying biological processes adds another layer of complexity [11, 12].

However, under appropriate conditions, the presence of noise in non-linear systems can improve their performance [13], in particular signal detection can benefit from noise and display an enhancement of the signal-to-noise ratio (SNR). This is the result of the phenomenon known as stochastic resonance, first introduced by Benzi et al [14] in 1981, and which was initially used to model the switching behavior of the Earth climate that leads to the ice ages [15]. Since its introduction stochastic resonance has been applied to a variety of fields, like, e.g., logic gates [1618], with extensions as far reaching as biophysics, see, e.g., [19] which applies the concept to genetic networks.

In the context of the hearing system, stochastic resonance has been invoked as an explanation of tinnitus [20] or to describe the sensation of pitch [21], thanks to the fact that it is compatible with neural models [22] and their threshold-like behavior.

In this paper we describe a simple model of the auditory system which is based on stochastic resonance, as defined in [23, 24], that recreates to a good approximation the equal-loudness contours near the hearing threshold. The simplicity of the approach makes it well-suited as an introduction for BSc and MSc physics students both to stochastic resonance and to the auditory system.

To make the paper self-contained, it starts with a brief introduction to the auditory system (section 2) and to stochastic resonance (section 3), followed by the description of the model. In section 4 we demonstrate that the model provides a good qualitative description of the equal-loudness curves. Finally we place the results in a wider context in the concluding section.

2. Brief overview of the auditory system

The human auditory system is a sensory organ composed of the outer (or external) ear, the middle ear, the inner ear, and the central auditory nervous system, whose overall function is to perceive and process sounds. The first two elements of the auditory system that are involved in the process, as shown in figure 1, are the outer and the middle ear.

Figure 1.

Figure 1. Peripheral view of the auditory system and its major parts.

Standard image High-resolution image

The outer ear consists of the auricle and the ear canal. The former gathers and channels incident sound waves into the latter. Due to its relatively small size (small compared to the wavelengths of audible sounds), the auricle ensures its optimal operation point in the middle-high frequency region [11]. At the center of the auricle we find the ear canal, a soft and rough body approximately cylindrical in shape. Its length varies according to age, gender and to genetic factors related with the subject. The ear canal acts like a resonant band-pass filter, with a resonant frequency in the range from 2 kHz to 3 kHz [11].

Once the sound wave, properly conveyed by the auricle and the ear canal, reaches the eardrum, its acoustic energy is converted into mechanical energy by the middle ear organs. The middle ear acts as an impedance between the outer ear (filled with air) and the inner ear (mainly filled with perilymph). The energy coming from the outer ear, as shown in figure 2, causes the vibration of the tympanic membrane which, in the middle ear, is transferred first to three small bones (malleus, incus and stapes) and then to the fluid that fills the anterior part of the cochlea, the main organ of the inner ear [25].

Figure 2.

Figure 2. Schematic view of middle and inner ear. The basilar membrane divides the cochlea into two distinct compartments.

Standard image High-resolution image

The cochlea is a spiral-shaped canal composed of several tunnels, each filled with a specific fluid (endolymph, perilymph). The motion of the stapes is transferred to those fluids through the oval window and then absorbed by the basilar membrane, the prime structural element of the cochlea. The basilar membrane, depending on how it oscillates when excited by the pressure waves of the liquid (figure 3), is known to ensure sound-intensity and frequency encoding. In fact, the basilar membrane vibrations within the cochlea and the stimulation of its receptors, called hair cells, are converted into electro-chemical signals that reach the brain through the auditory nerve [11].

Figure 3.

Figure 3. Representation of a wave propagating within the cochlea.

Standard image High-resolution image

The main theory that attempts to explain how the cochlea is able to encode the frequency of a signal is the place theory of hearing [11]. It assumes that a certain frequency is encoded by the position (place) along the basilar membrane where the amplitude of the vibration produced by the acoustic stimulus is at a maximum. Moreover, according to the theory, each hair cell reacts to all frequencies stimuli but with distinct threshold values. Place theory does not explain how sound intensity is perceived and encoded. Current understanding suggests it is affected by [11]

  • The number of hair cells that respond simultaneously to the same stimulus (since a high intensity sound stimulates a large number of hair cells);
  • Spontaneous activity of nerve fibers, which adds an additional degree of accuracy.

Graphically, sound-intensity perception is represented by equal-loudness contours, see figure 4. It is worth noting the presence of two minima (i.e., sensitivity maxima), one at a frequency just below 4 kHz and the other one at about 12 kHz (the values are similar to the resonance frequencies of the ear canal), and the behavior at low frequencies ($< $500 Hz).

Figure 4.

Figure 4. Equal-loudness contours, as defined by standard 226 of the International Organization for Standardization [26]. The basis of the equal-loudness contour is the phon, a unit of loudness that represents the dB sound pressure level necessary for a tone to elicit the same loudness as a 1000 Hz reference tone.

Standard image High-resolution image

Among all the aspects that have emerged in this overview of the auditory system we emphasize the fact that the inner ear is a nonlinear dynamic system with threshold operation, whose transduction process (responsible for the transformation of an acoustic signal into an electro-chemical one) could be enhanced by the internal noise related to the spontaneous activity of hair-cell neurons.

3. Introduction to stochastic resonance

The term noise describes random fluctuations or perturbations [13], that introduce irregularities in physical signals [27]. In systems with linear or weakly nonlinear dynamics an increase in noise intensity leads to a reduction of the SNR, defined as the ratio between the mean signal power and the mean noise power, expressed in dB. 3 For a sinusoidal signal at a specific frequency fs, the SNR can be evaluated from the respective power spectra densities S and SN at the same frequency:

Equation (1)

Surprisingly, in nonlinear systems there are circumstances where the presence of noise can lead to an increase of the SNR [30]: this is the phenomenon of stochastic resonance. One can loosely interpret stochastic resonance as 'randomness that makes nonlinearity less detrimental to a signal' [13].

Stochastic resonance was first introduced by Benzi [14] at the NATO International School of Climatology [13], where it was proposed as a possible explanation of some observed recurrences (approximately every 100 000 years) in the ice ages of the last 700 000 years [27, 30]. This phenomenon—although not a real resonance—was given the name of stochastic resonance because the SNR assumes its maximum value when the intensity of the input noise is 'tuned' to a specific value [13, 30].

Given the ubiquity of noise in nature—and more specifically in biophysical and physiological contexts, where nonlinearity is widespread—this property has prompted searches for the existence and the manifestation of stochastic resonance in neural and sensory models. The 'cooperation' that arises between signal and noise introduces a coherence in the system that is quantified very conveniently by means of the power spectral density (PSD) associated to the system [27, 31]. In fact, if stochastic resonance is realized between noise and a pure sinusoidal signal of frequency fs, then the power spectrum displays a peak at frequency fs (see figure 5). The height of this peak is both frequency- and noise-intensity-dependent [27, 31]. The dependence of SNR on noise amplitude also exhibits a similar behavior (see figure 6). Therefore, stochastic resonance is said to occur if this plot displays a maximum [27] or, equivalently, is characterized by an inverted U-shape [32]. This type of trend—typical of all types of stochastic resonance [13]—is considered the hallmark of the effect [31].

Figure 5.

Figure 5. PSD of a generic system which exhibits stochastic resonance. The peak height of the signal is used to compute the SNR at the frequency fs of the signal. In this simulation a zero-mean white noise has been used. Note the representation in dB.

Standard image High-resolution image
Figure 6.

Figure 6. The plot shows the typical SNR—denoted here as output performance—vs noise magnitude for a system that exhibits stochastic resonance.

Standard image High-resolution image

For historical reasons, it is customary to distinguish between dynamical stochastic resonance and non-dynamical stochastic resonance. In fact, the original stochastic resonance presented by Benzi et al [14] was a phenomenon that occurred only in bistable or multistable dynamical systems [23, 24], whose definition required the verification of precise conditions. Moreover, this definition made the word stochastic resonance inappropriate for nonlinear systems where the nonlinearity was due solely to a 'static threshold' [13]. For these reasons, nowadays, it is usual to differentiate between the original dynamical stochastic resonance and 'static', or non-dynamical, stochastic resonance, despite the fact that both types exhibit the same properties presented so far.

In this paper we deal only with non-dynamical stochastic resonance, giving the opportunity, to those interested, to delve into the dynamical one by consulting, for example, the articles by Wellens et al (2004) [27], by McNamara and Wiesenfeld (1989) [30] and by Bulsara and Gammaitoni (1996) [31].

A system that exhibits stochastic resonance is said to be 'static' when the nonlinear perturbations to which it is subjected, and which alter the nature of the input signal, are not governed by temporal differential equations, but by simple dynamical rules that produce an output signal related to an instantaneous value assumed by the input signal [13]. The simplest static system in which non-dynamical stochastic resonance occurs, as shown in the lower plot of figure 7, consists solely of a threshold detector and is called level crossing detector (LCD) [23, 24].

Figure 7.

Figure 7. Input (lower panel) and output (upper panel) representation of an asymmetric LCD system. Here the threshold is represented by the orange line. Whenever the input—given by the signal plus noise—exceeds the threshold, a short pulse of arbitrary amplitude is added to the output time series.

Standard image High-resolution image

LCDs base their operation on the following rule: whenever the input given by the sum of signal plus noise crosses the threshold, a narrow pulse of arbitrary amplitude is reported in the time series, as shown in figure 7 for a pure sinusoidal signal. Depending on whether one chooses to subject the system to a single threshold (usually positive) or two (one positive and one negative), the LCD system is called asymmetric or symmetric, respectively.

After the publication of first work on stochastic resonance in neuronal models in 1991, which was soon followed by experimental observations in 1993 by studying the functioning of crayfish's mechanoreceptors [13, 22], the presence of stochastic resonance has been theorized in various biological contexts [20, 33]. To this day, it is still debated whether it can play a role in neuroscience, and in particular in the sensory functions of touch, hearing and vision [22].

Up until now, it is not yet clear whether neurons do make use of stochastic resonance [33], and the evidence that they actually exploit it is only indirect [13]. In fact, in most experimental settings the noise input to the sensory receptors or neurons comes from external sources. For this reason, any manifestation of stochastic resonance only allows to deduce that sensory cells are nonlinear dynamical systems that could benefit from the presence of intrinsic noise in neural processing [13]. Despite this, stochastic resonance remains a phenomenon that is compatible with several neural models and some theories of neural processing [22]. Indeed, neurons are known to be intrinsically noisy, with a behavior that is similar to threshold systems [33]: whenever a certain internal threshold is exceeded a neuron 'fires', generates a 'nerve impulse' (action potential) and returns to the resting state waiting for a new supra-threshold event [32]. It is clear that dynamical stochastic resonance could play an important role in the functioning of neurons or sensory cells [21, 32, 34, 35]. However, studying such systems in the non-dynamical approximation makes the discussion simpler and equally valid [20, 22, 31].

It is interesting to note that a LCD produces both detection and a kind of pulse-train encoding similar to that found in dedicated electronic circuits [36]: the amplitude of sub-threshold stimuli is encoded into the frequency of threshold crossings. The incoming stimuli can be sub-threshold and therefore undetectable. If noise is added to the stimulus, threshold crossing occurs with higher probability when the stimulus is close to the threshold. The resulting spike train, despite being 'noisy', contains a large part of the information carried by the sub-threshold signal. If one compares this situation with that in which noise is the only signal present, whereby the threshold crossing occurs randomly, one deduces that the extra information that is found in the spike train generated by a non-stochastic signal ensures that the sub-threshold stimulus is well-characterized.

We can therefore say that noise activates a random sampling of the stimulus. Therefore, for good information transmission, the 'sampling rate' should be greater than the frequency of the sub-threshold signal. A convenient measure of the quality of the output signal (pulse train) from the threshold system, and thus of how well it is able to represent the sub-threshold signal, is precisely the SNR, which can be used to find the optimal threshold level for a given noise intensity [37].

The auditory system, as seen in the previous section, is very complex and is composed of several nonlinear sub-structures. Since noise is ubiquitous in the sensory systems [21, 22], it is clear that the auditory system could exploit, for its operation or in some of its parts, stochastic resonance [22]. Considerations of this kind have been studied and debated in several contexts [20, 22, 38, 39]. There is not a real consensus, e.g., Rufener et al [40] carried out experiments by applying external noise and do not find an enhanced sound perception, however the application of external noise reduces the SNR in a well-tuned stochastic resonance system, and their results do not disprove the importance of stochastic resonance.

Here we focus only on the role that stochastic resonance can play in the transduction process that takes place in the inner ear, which involves the cochlea, the inner hair cells and the neurons of the auditory nerve. The signal detected by the cochlea and processed by the hair cells activates the neurons of the auditory nerve. At first glance, their extreme noisiness seems to hinder their ability to transmit precise sounds and acoustic signals (an ability that depends, in a decisive way, on exact timing and frequencies). However, stochastic resonance does help, and the presence of noise has beneficial effects [20, 21].

4. Toy model of the auditory system

In this section we present the basic features of a simple model of human hearing based on stochastic resonance. The main hypothesis behind the model is that stochastic resonance is a phenomenon continuously occurring in the human auditory system, which provides a simple transduction mechanism. The main element of the model is a symmetrical LCD system, which reproduces the behavior of human hearing in the context of loudness perception for sounds close to the hearing threshold.

The input signal is the sum of a sinusoidal waveform and noise [23, 24]:

Equation (2)

where ɛ and fs are respectively the amplitude and frequency of the sub-threshold sinusoidal signal, while G(t) is the noise that is added to the process.

Unlike the LCD system presented in [23, 24], the output signal Vout(t) is equal to 0 when Vin does not exceed the threshold and it is equal to the deviation between the signal and the threshold in the other cases:

Equation (3)

The choice of a symmetric LCD system such as the one defined by equation (3) is, in our opinion, the most appropriate for a model that aims to simulate the threshold behavior of one or more neurons. The resulting LCD is simulated by generating evenly spaced Vin samples. We take the sampling rate, fc = 40 kHz to cover the audible frequency band up to the 20 kHz Nyquist frequency. The total sampling time is T = 0.1 s, so that signals with frequency fs < 10 Hz fail to successfully complete a cycle and must be rejected. Again, this choice is justified by the lower frequency threshold of human hearing at about 12 Hz [41].

The signal of frequency fs combines, on its way to the auditory nerve, with various noise sources, some external, others internal, which together concur to produce stochastic resonance. In this toy model we choose white, Gaussian noise (zero mean and variance D2). This choice, as discussed in [21], can be considered acceptable although it is not always plausible.

For the model we do not use physiological values. The values of the threshold ΔU, the standard deviation D of the white Gaussian noise and the amplitude of the sinusoidal signal ɛ, with which the simulations are performed (see figure 8), are chosen, for convenience, to be of the order of 100 mV, 4 and therefore two orders of magnitude larger than the typical values of the auditory system (mV) [32]. Accordingly, the amplitude ɛ is chosen in such a way that the sinusoidal signal is always sub-threshold (ɛ < ΔU).

Figure 8.

Figure 8. Graphical representation of the LCD system for fs = 4 kHz, ΔU = 0.45 V, D = 0.3 V and ɛ = 0.1 V. Top: signal Vout(t) as defined in equation (3); bottom: signal Vin(t) (blue) as defined in equation (2), the thresholds ±ΔU (orange) and the signal ɛ sin(2πfs t) (black).

Standard image High-resolution image

The PSD of the output signal is estimated by taking the scalar average of the PSD computed with the FFT algorithm for a number of simulations (preferably ≫ 10). This approach reduces the dispersion in each frequency bin (see figure 9) and provides a more accurate evaluation of the SNR.

Figure 9.

Figure 9. PSD of Vout(t) as defined in equation (3) for fs = 4 kHz, ΔU = 0.45 V, D = 0.3 V and ɛ = 0.1 V. The dB representation of the PSD has been realised by choosing the average noise power as the reference parameter.

Standard image High-resolution image

Plotting the dependence of the SNR vs the standard deviation of noise allows to find that stochastic resonance does occur in the system. This has been verified (see figures 10 and 11) choosing values of D between Dmin = 0.1 V and Dmax = 1.0 V and for three different threshold values: ΔU0 = 0.30 V, ΔU1 = 0.45 V, ΔU2 = 0.60 V.

Figure 10.

Figure 10. SNR at the frequency fs = 4 kHz as a function of noise standard deviation for three different threshold values (ΔUi ). We choose ɛ = 0.1 V for the amplitude of the input sinusoidal signal.

Standard image High-resolution image
Figure 11.

Figure 11. SNR at the frequency fs = 19 kHz as a function of noise standard deviation for three different threshold values (ΔUi ). We choose ɛ = 0.1 V for the amplitude of the input sinusoidal signal.

Standard image High-resolution image

Observing the graphs of figures 10 and 11, it can be noted that at high frequencies the SNR is about half that at low frequencies. In LCD systems, optimal information transmission depends on the sampling frequency chosen to simulate the system. When the sinusoidal signal takes a time similar to $2\delta t={({f}_{\text{c}}/2)}^{-1}$ to complete a full cycle, the noise amplitude varies with a frequency similar to that of the signal. This condition increases the frequency at which the threshold is exceeded. This results in a signal that is noisier than desired (but still exhibits stochastic resonance) and thus in a lower SNR than the one observable for fsfc/2.

The sensitivity of the human ear to loudness, as seen previously in section 2, reaches a maximum in the medium-high frequency range (1–4 kHz), while it is lower at low frequencies ($< $0.2 kHz). This characteristic does not depend on the hair cells or on the physiology of the inner ear, but on the shape of the auditory canal, inside which the pure signal, mixed with external noise, propagates. For this reason we assume that sound is filtered at low frequencies before reaching the cochlear membrane. Provided that the external noise is absent or negligible compared to the internal noise, in this toy model we add a high-pass filter to the LCD system that acts only on the sinusoidal signal, before combining with noise. The first-order IIR high-pass filter we use has cut-off frequency at 30 Hz.

In order to produce equal-loudness contours with the available model and thus describe how the perception of sounds close to the threshold of hearing can occur, the following considerations were used:

  • Assuming that the sound field consists of free progressive plane waves, the sound intensity Iout depends on the sound amplitude ɛ: Ioutɛ2. We take the intensity I0 at 1 kHz as the reference intensity.
  • We take the SNR associated with the output signal of the LCD system as a measure of the perceived sound intensity Iin. Thus, if we keep in mind that SNR ∝ ɛ2 [23, 33, 42] for fixed values of noise intensity and threshold and we assume, by appealing the plasticity of the auditory system [43], that the internal noise intensity is constantly optimal and therefore guarantees Iinɛ2 even after the LCD [20], then, at the output of the LCD system, loudness of the sub-threshold signal is such that
    Equation (4)
    Thanks to this, it is reasonable to conclude that the set of ɛ values that correspond to a constant SNR defines a candidate equal-loudness contour.

Using these considerations we find the equal-SNR contours shown in figure 12. The behavior at low frequencies is determined by the high-pass IIR filter. The frequencies fs of the sinusoidal signal are chosen in such a way that they are equally spaced in the logarithmic scale graph and do not produce scalloping loss. This graph represents the most important result of the present work.

Figure 12.

Figure 12. Equal-SNR contours as a function of the external sound intensity level and the fs frequency of the sinusoidal signal. For the simulations we have selected ΔU = 0.45 V (threshold), D = 0.3 V (noise standard deviation) and f3dB = 30 Hz (filter cut-off frequency).

Standard image High-resolution image

5. Conclusions

The recent scientific literature has explored in several ways the relevance of stochastic resonance in the functioning of the auditory system [20, 4446]. In this paper we have shown how the mechanism of stochastic resonance coupled with a high-pass filter may hint at a straightforward—albeit partial—explanation of the equal loudness curves. As such, the model fulfills the educational goal that we stated in the introduction.

Although the model presented here is highly conjectural, it can be extended in many ways that can potentially be of interest in a more complex model of human hearing. Consider for instance figure 7: in the case of a white background noise, the number of noise spikes that pass the threshold is a Poisson process with a mean that depends on the root-mean-square (RMS) noise amplitude and on the threshold value. By adding a sinusoidal signal like in figure 8, we notice that the rate is slightly higher whenever a peak (either positive or negative) occurs. This behavior becomes more prominent for low-frequency deterministic signals, as shown in figure 13, where we see that for a given threshold-crossing rate associated with a specific RMS noise amplitude—threshold value combination we could infer the period of the sine wave by counting the individual positive or negative pulses. Thus, with additional logical circuitry, this simple threshold detector could measure the dominant instantaneous frequency in a signal, greatly extending its reach.

Figure 13.

Figure 13. Graphical representation of the LCD system for fs = 5 Hz, ΔU = 0.45 V, D = 0.15 V and ɛ = 0.35 V.

Standard image High-resolution image

Footnotes

  • Our choice to use the SNR as the proper figure of merit fits well with the convention used for equal loudness curves such as those shown in figure 4 and with the usual elementary treatments of stochastic resonance. For completeness, we remark that in recent years, when discussing the auditory system, an ever increasing emphasis is placed on its information theoretic properties [28, 29].

  • The resulting values are easier to read and the toy model—by its nature—is scale independent, so that the actual values do not matter here.

Please wait… references are loading.