Towards ASSR-based hearing assessment using natural sounds

Objective. The auditory steady-state response (ASSR) allows estimation of hearing thresholds. The ASSR can be estimated from electroencephalography (EEG) recordings from electrodes positioned on both the scalp and within the ear (ear-EEG). Ear-EEG can potentially be integrated into hearing aids, which would enable automatic fitting of the hearing device in daily life. The conventional stimuli for ASSR-based hearing assessment, such as pure tones and chirps, are monotonous and tiresome, making them inconvenient for repeated use in everyday situations. In this study we investigate the use of natural speech sounds for ASSR estimation. Approach. EEG was recorded from 22 normal hearing subjects from both scalp and ear electrodes. Subjects were stimulated monaurally with 180 min of speech stimulus modified by applying a 40 Hz amplitude modulation (AM) to an octave frequency sub-band centered at 1 kHz. Each 50 ms sub-interval in the AM sub-band was scaled to match one of 10 pre-defined levels (0–45 dB sensation level, 5 dB steps). The apparent latency for the ASSR was estimated as the maximum average cross-correlation between the envelope of the AM sub-band and the recorded EEG and was used to align the EEG signal with the audio signal. The EEG was then split up into sub-epochs of 50 ms length and sorted according to the stimulation level. ASSR was estimated for each level for both scalp- and ear-EEG. Main results. Significant ASSRs with increasing amplitude as a function of presentation level were recorded from both scalp and ear electrode configurations. Significance. Utilizing natural sounds in ASSR estimation offers the potential for electrophysiological hearing assessment that are more comfortable and less fatiguing compared to existing ASSR methods. Combined with ear-EEG, this approach may allow convenient hearing threshold estimation in everyday life, utilizing ambient sounds. Additionally, it may facilitate both initial fitting and subsequent adjustments of hearing aids outside of clinical settings.


Introduction
Hearing loss results in challenges with communication, which in turn may lead to social isolation, cognitive decline, and depression due to the lack of social interaction.The purpose of hearing aids is to compensate for hearing impairment.To do so, it is crucial that they are fitted in close accordance with the hearing abilities of the individual user.Hearing loss often develops over time, and this necessitates recurrent re-fitting of the hearing device to maintain optimal compensation for the hearing loss.Traditionally, fitting of hearing aids is carried out in the clinic and is based primarily on the estimation of frequency specific hearing thresholds in the form of an audiogram.The audiogram is used to set the gain of the hearing aid at different audiometric frequencies.
Traditionally, the audiogram is estimated based on behavioral tests, such as pure tone audiometry.Alternatively, hearing thresholds can be estimated from electrophysiological measures, such as the auditory steady-state response (ASSR) [1,2]-neural activity evoked by amplitude-and/or frequencymodulated acoustic stimuli-which can be recorded from electroencephalography (EEG) electrodes placed on the scalp.Physiological hearing thresholds are found to be elevated 10-25 dB relative to behavioral hearing thresholds in normal hearing subjects and 5-20 dB in hearing impaired subjects depending on recording length [2][3][4][5][6][7].This offset is consistent across studies and can therefore be taken into account in a fitting procedure.ASSR recordings are typically performed in the clinic and require dedicated equipment and trained personnel.
Over the last decade, a new EEG recording approach called ear-EEG has been developed [8,9].Here, EEG electrodes are placed in or around the ear, allowing the recording platform to be more discreet.Christensen et al have shown that it is possible to estimate physiological thresholds using ear-EEG in both normal hearing [10] and hearingimpaired subjects [11].Combined with a mobile EEG recorder, ear-EEG allows for automatic and unsupervised estimation of hearing thresholds outside the clinic.Integrated into a hearing aid, this technology would therefore enable both initial and recurrent refitting of the hearing device in the daily life of a user.
The ASSR is of relatively small amplitude compared to background noise (spontaneous EEG, and other physiological and non-physiological sources).It is therefore necessary to average the EEG signal over relative long time in order to suppress noise and achieve a sufficiently high signal-to-noise ratio (SNR) [2].Moreover, the same auditory stimulus shall be presented several times at different sound pressure levels (SPL) making ASSR-based hearing tests a timeconsuming procedure.Using ideal experimental settings: performed in a quiet laboratory on sleepy/relaxed subjects, using chirp stimuli [12] and multiple band stimulation [13], several studies reported an average time of about 20-30 min for accurate physiological threshold estimation [14][15][16][17].However, in real life settings, this time may be longer due to increased noise [18].
The traditional stimuli used in ASSR-based hearing tests, such as pure tones and chirps, are synthetic and monotone, and are therefore rather dull and unpleasant to listen to.The combination of long testing time and unpleasant synthetic stimuli makes them unsuitable for the everyday use case described above.
An ASSR-based fitting procedure outside the clinic must be more convenient than those currently used in clinical practice; the procedure should require as little effort and engagement from the user as possible and should be as imperceptible as possible.Accordingly, stimuli based on naturally occurring sounds would be more suitable.
Several studies have shown that there is neural entrainment to nonperiodic stimuli, such as speech and music [19][20][21].The auditory cortex follows the envelope of speech and consistently reacts to changes in the envelope [22].Laugesen et al [23] recorded ASSR to NB-chirps modified by imposing a speech envelope on the stimuli.
An alternative approach is to construct an ASSR stimulus where natural speech is used as a carrier signal.By applying an amplitude modulation to the speech signal, it is possible to create a sound stimulus that is able to evoke an ASSR.By filtering the speech signal into multiple frequency sub-bands and imposing the amplitude modulation to one or several of the frequency sub-bands (using different modulation frequencies), it is possible to stimulate a frequency specific ASSR.The stimulus can, in principle, be created in real-time based on the ambient sounds in the user's environment, thereby further increasing the feasibility of implementing a hearing test into daily life.
The amplitude of the ASSR depends on the intensity of the stimulus, i.e.ASSR increases with increasing SPL [1,2,24].Natural occurring sounds, such as speech, vary in intensity over time and hence the intensity of each period of an amplitudemodulated (AM) speech signal also varies over time.Since the ASSR is an envelope-following response, the ASSR amplitude to AM speech stimulus will also vary over time.ASSR as a function of intensity can therefore be estimated by partitioning the recorded EEG signal according to the SPL of the corresponding AM speech stimulus.For accurate estimation it is crucial to compensate for the apparent latency of ASSR-the time between stimulus onset and ASSR onset [2].
In this study, we present an approach where the ASSR vs. presentation level relation can be estimated using a partly AM running speech signal.We show that this relation can be estimated both in scalp-and ear-EEG.

Subjects
Twenty-two subjects (11 women, average age 32.7 years (SD = 8)) participated in the study.All subjects had normal hearing (<20 dB HL for octave frequencies between 500 and 4000 Hz) and no history of hearing diseases.Behavioral hearing thresholds were measured for each ear using the ascending method described in ISO 8253-1:2010 [25].
Exclusion criteria were: known hearing loss, use of medications that stimulate the central nervous system, epilepsy or other brain disease.All test subjects gave written and informed consent before inclusion in the study.The study was approved by the Institutional Review Board at Aarhus University (no.2021-88).

Measurement setup
EEG was recorded concurrently from four scalp electrodes and 12 ear electrodes.The scalp electrodes were placed at the left (M1) and right (M2) mastoids, Fpz and AFz, according to the 10-20 EEG electrode system, and attached to the skin using custom designed electrode holders made of silicone with double adhesive pads.Alcohol swabs were used to clean the skin prior to the attachment of the scalp electrodes.A small amount of gel (Electro-Gel, Electro-Cap International, Inc., USA) was applied on the scalp electrodes.
The 12 ear electrodes, six in each ear, were placed on individually designed earpieces in positions according to the labeling scheme for ear-EEG electrodes described by Kidmose et al [8].For the current study, the following ear-electrode positions were used: ExA, ExB 1 , ExB 2 and ExC in the concha part of the ear, ExJ in the ear-canal, and ExT located on the tragus, where x denotes the left (L) or right (R) ear (see figure 1).
The earpieces were modeled based on individual ear casts using 3D software (EarMouldDesigner, 3Shape, Denmark) and made of biocompatible silicone (Detax softwear 2.0, Detax GmbH, Germany).Prior to insertion of the earpieces, the ears were cleaned with a water-soaked cotton swab.
The EEG recordings were acquired with a TMSi Refa16e EEG amplifier (TMSi, The Netherlands) with a sampling rate of 2500 Hz and an average reference.All recording electrodes were dry-contact Ag/AgCl electrodes with a diameter of 4 mm [26].The ground electrode was placed on the neck using a disposable wet gel electrode (Ambu WS, Ambu A/S, Denmark).To maintain the active shielding provided by the amplifier, all electrodes were connected to the amplifier using coax cables.
Before the actual recordings, an initial check of the signal quality was performed by making a visual inspection of the signals in a live-viewer.The experimenter ensured that the signals resembled EEG and inspected for the presence of artifacts when the subject was instructed to perform eye blinks and facial muscle movements.
The auditory stimuli were presented to the test subjects using insert earphones (3 M E-A-RTONE for ABR, 50 Ohm, 3 M, USA) via an RME soundcard (Fireface UC, RME, Germany) with a sampling frequency of 48 kHz.The tubes from the earphones were inserted into the sound bore of the earpieces.The receiver was calibrated using an ear and cheek simulator (43AG, G.R.A.S. Sound and Vibration, Denmark) powered by a 12AA power module (G.R.A.S.) and a 42AA pistonphone (G.R.A.S.).
In order to synchronize the sound stimulus and the EEG data, a trig signal generated by the soundcard was fed to the trig input of the EEG amplifier via a trigger box (g.TRIGbox, g.tec medical engineering GmbH, Schiedlberg, Austria).The trig signal was generated by one of the audio channels of the soundcard, and the trig signal was periodic with a frequency of 0.1 Hz.

ASSR stimuli and recordings
Two types of sound stimuli were used in the current study: white Gaussian noise (later white noise stimulus or WNS) and an audiobook narrated in english by a male narrator (later speech stimulus or SS).Silent pauses in the speech stimulus were truncated to 0.1 s.
Both stimuli were designed using the following steps: 1.An 8th order zero-phase filter with Butterworth characteristic (no passband ripple) was used to split the sound into six frequency sub-bands (see figure 2): a.Four one-octave wide frequency sub-bands centered at 500 Hz, 1 kHz, 2 kHz and 4 kHz.b.Low frequency sub-band (low-pass with cutoff 353 Hz) and high frequency sub-band (high-pass with cut-off 5656 Hz) 2. The 1 kHz sub-band was amplitude modulated with a 40 Hz sinusoid with a modulation depth of 100%.The amplitude modulation signal was , where F m is the modulation frequency.The 1 kHz sub-band was rescaled in order to keep the root-mean-square (RMS) unchanged after applying the modulation.The amplitude-modulated 1 kHz sub-band (later AM sub-band) was further amplified by 6 dB. 3.For SS, each 50 ms sub-interval (containing two periods of 40 Hz modulation frequency) of the AM sub-band was adjusted in amplitude to align with one of the values from a set of 10 levels uniformly distributed in 5 dB steps between 0 and 45 dB SPL (see figure 3).
For WNS, the AM sub-band was adjusted in amplitude to 45 dB SPL.
Level adjustment did not change the amplitude ratio between sub-bands in both SS and WNS. 4. All sub-bands were summed together.
The objective was to impose an AM on a subband of the speech signal, as illustrated in figure 2. In order to estimate the ASSR as a function of the stimulus level, the AM sub-band of the speech signal was divided into short sub-segments, and each subsegment was then assigned to one of the predefined levels.This was done by first calculating the RMS for each 50 ms sub-segment of the signal; and second, by means of an empirical amplitude transformation, scaling each sub-segment to one of the predefined levels.The empirical amplitude transformation was made such that more sub-segments were assigned to the lower levels, and fewer to the higher levels, so that the resulting noise level in the ASSR estimation followed approximately the shape of the ASSR, thereby maintaining an almost constant SNR across the different stimulation levels.The AM sub-band in SS contained 30, 24, 20, 20, 18, 18, 14, 12, 12 and 12 min at levels 0, 5, 10, 15, 20, 25, 30, 35, 40 and 45 dB SPL respectively (see figure 3).
The AM sub-band in the WNS contained 12 min at 45 dB SPL.The intention of using WNS was to create a reference stimulus that had design characteristics similar to SS (i.e.having an AM sub-band), but presented as a traditional steady-state stimulus.
The stimuli were presented monaurally relative to the behavioral hearing threshold (i.e.sensation level (SL)) at 1 kHz at the predefined levels as described in the stimulus design above.The stimulation side used was randomly chosen and evenly distributed across subjects.WNS was presented first, followed by SS.
The EEG recordings were performed in a doublewalled, sound-attenuated room.The room was equipped with a window through which the experimenter was able to monitor the subject throughout the recording.
During the EEG recordings, the subjects were sitting in a comfortable chair.They were instructed to relax during the experiment but avoid falling asleep.They were offered to watch a silent movie of their own choice with subtitles.

Data analysis of EEG
Analysis of the EEG data was performed offline after the recordings were completed.ASSRs were estimated from three different electrode configurations: scalp-EEG, cross-ear-EEG and in-ear-EEG (see figure 1).For convenience, hereinafter we will refer to these three configurations as Scalp, CrossEar and InEar.Scalp datasets were created by re-referencing electrodes M1 and M2 to Afz (the Fpz electrode was included only as a backup in case of poor contact to the Afz electrode) for left and right side configurations, respectively.For both ear-EEG configurations, the datasets were created by applying a spatial filter, taking a weighted combination of the electrodes (for more details, see section 'Spatial filter for ear-EEG') thereby transforming multi-channel ear-EEG signals into a single-channel signal.For CrossEar, the spatial filter was applied to electrodes from both ears, resulting in one dataset for this configuration.For InEar, the spatial filter was applied to the electrodes from each ear individually, resulting in two datasets-one dataset for each ear.
For all configurations, the data was a band-passfiltered using an 8th order, zero-phase filter with passband between 20 and 60 Hz and Butterworth characteristic.A notch filter was applied to remove 50 Hz line noise.
In order to align the recorded ASSR with the AM sub-band in SS, the EEG signal was shifted relative to the trigger signal by the required number of samples corresponding to the apparent latency (for more details, see section 'latency compensation').The trigger signal was used to split the dataset into epochs of 10 s length.Each epoch was then split up into sub-epochs of 50 ms length.The sub-epochs were distributed among 10 datasets according to levels in the AM sub-band (see figure 4).
Within each dataset the sub-epochs were combined into four seconds epochs, which were then averaged using weighted averaging as described by John et al [27].
For ASSR recorded to WNS, the dataset was split up into epochs of four seconds length and the epochs were averaged using weighted averaging [27].
The averaged epoch was then transformed into the frequency domain by means of a discrete Fourier transform (DFT).The amplitude of the ASSR was determined as the amplitude at the modulation frequency bin, and the background noise was calculated as the RMS in the frequency band ±8 Hz relative to the modulation frequency (excluding the modulation frequency) [2].
Statistical significance of the ASSR amplitude was determined based on an F-test as described by Zurek [28].The F-ratio was calculated as the ratio between the power at the modulation frequency and the average power in the frequency band ±8 Hz relative to the modulation frequency (excluding the modulation frequency).An F-ratio with a p-value ⩽ 0.05 was regarded as statistically significant.Only statistically significant ASSRs were included in the grand average across the subjects.Both ipsilateral (IL) ASSR (stimulation and recording from the same side) and contralateral (CL) ASSR (stimulation on one side and recording from the opposite side) were analyzed for Scalp and InEar configurations.

Latency compensation
In order to align the AM sub-band in the SS with the recorded EEG, it was necessary to estimate the apparent latency for the ASSR.The apparent latency can be estimated by calculating the cross-correlation between the analytic EEG signal x (t) and the envelope of the AM sub-band signal y (t).
where H() represents the Hilbert transform operator.
To calculate the envelope y (t), AM sub-band s (t) was down-sampled to the EEG sampling frequency and the envelope was calculated as an amplitude of the complex-valued analytic signal y (t) = |s (t) + jH (s (t))|.EEG data x (t) was epoched according to the trigger signal and epochs were combined again into one data segment.Both the envelope and the EEG were band-pass-filtered using an 8th order, zero-phase filter with passband between 20 and 60 Hz and Butterworth characteristic.A notch filter was applied to remove 50 Hz line noise in EEG.
Cross-correlation curves were averaged across subjects, and the lag with the maximum average cross-correlation was considered the ASSR latency.

Spatial filter for ear-EEG
The spatial filter for the ear-EEG data was found by maximizing the SNR.The optimal spatial filter can thus be expressed as ŵ = arg max , where R s and R n are the signal and noise covariance matrices respectively.The solution to this optimization problem can be found by applying the generalized eigenvalue decomposition to R s and R n and the optimal spatial filter is then the eigenvector associated with the largest eigenvalue [29][30][31].
To calculate the covariance matrices, the EEG data was transformed into the frequency domain using DFT and point-wise multiplied by rectangular window functions.For R n , the data was multiplied by a window, which preserved the frequency bins in the range of ±8 Hz centered at the modulation frequency (excluding the modulation frequency bin).For R s only the modulation frequency bin was preserved, whereas all other frequency bins were set to zero.The inverse Fourier transform was then applied to recover the time-domain band-pass-filtered signal.The covariance matrix, R n , was calculated as the normalized sum of the covariance matrices for each epoch of 4 s length multiplied by the weights estimated as in weighted averaging [27].For better suppression of the noise in the modulation frequency bin, and thereby better signal estimation, covariance matrix R s was calculated based on the weighted averaged epoch.To mitigate numerical problems in the generalized eigenvalue decomposition, Ledoit-Wolf regularization was applied to each of the covariance matrices [32].
The spatial filter was estimated based on the entire 180 min EEG data recorded during SS presentation, i.e. before data segmentation to the different levels, and then applied to EEG data for each level as well as to EEG data from WNS presentation.The spatial filter was normalized to have a 2-norm equal to √ 2, which is the same as the 2-norm of a pair of electrodes.

Statistical analysis of the recorded ASSR
Statistical analysis of the data was performed using mixed-effect models in R (lme4, version 1.1-26, Rstudio R-4.0.3) [33].The statistical model included Level (L: continuous variable), Level 2 (L 2 : continuous variable), Behavioral threshold (BT: continuous variable) and Measurement side (MS: IL, CL) as fixed effects and Subject (S) as a random effect.The decision to include the fixed effect L 2 was made retrospectively after inspection of the data.This was done to allow the model to model a nonlinear dependency between stimulus level and response.In retrospect, this is pertinent, as there is no consensus in the literature regarding a linear association between ASSR amplitude and sound intensity.Some studies have reported a monotonically increasing ASSR with rising intensity [24,34,35], while other studies have reported saturation of ASSR [36][37][38].
Statistical significance of the different fixed effects was tested by model reduction based on a likelihood ratio test using the function anova() in R.
Difference between ASSR at 45 dB SL derived from two different types of stimuli (SS and WNS) was tested using a paired t-test.

Latency
The average latencies for Scalp were estimated to be 43.6 and 42.4 ms for IL and CL respectively.For CrossEar the average latency was found at 22.4 ms and for InEar it was 26.4 (IL) and 20.4 (CL) ms.These latencies were used in the subsequent ASSR analysis for each configuration/measurement side.For more details about latency estimation, please refer to the supplementary material (figure S1).A full mixed model was fitted to the measured ASSR amplitudes for each configuration.Visual inspection of the residuals did not reveal any obvious deviation from normality.Residual plots, Q-Q plots and histograms of the residuals along with model control analysis are included in the supplementary material (figures S5-S7).Model reduction tests are also included in the supplementary material (tables S2-S4).Coefficients of the final models, along with standard errors and 95% confidence intervals, are summarized in tables 1-3.The red lines in figure 5 show the fits from the mixed models.

ASSR vs. dB SL
The ASSR amplitude increased with increasing presentation level for all electrode configurations (see figure 5).The likelihood ratio tests also revealed significance of the fixed effect Level in all three configurations (Scalp: χ 2 (1) = 254.69,p < 0.001, CrossEar: χ 2 (1) = 33.89,p < 0.001, InEar: χ 2 (1) = 51.88,p < 0.001).For CrossEar and InEar the ASSR amplitudes were approximately 10 dB and 15 dB lower compared to Scalp respectively.From figure 5 it appears that the slope of the ASSR amplitude is dependent on the presentation level.This was supported by the mixed model analysis, where the fixed effect Level 2 was found to have significant effect for both Scalp (χ 2 (1) = 132.50,p < 0.001) and CrossEar (χ 2 (1) = 11.74,p < 0.001).For the Scalp configuration the slope of the ASSR amplitude was 0.42 dB/dB at 0-15 dB SL, 0.23 dB/dB at 15-30 dB SL, and practically 0 dB/dB at 30-45 dB SL.For CrossEar the slope of the ASSR amplitude curve was 0.26 dB/dB at 0-25 dB SL and 0.04 dB/dB at 30-45 dB SL.In contrast to Scalp and CrossEar, the curves for InEar were shallower and the fixed effect Level 2 was not found to have significant effect (χ 2 (1) = 0.502, p = 0.479).However, ASSR generally increased with increasing presentation level with a slope of 0.18 dB/dB.
The likelihood ratio test in the model reduction procedure for Scalp did not reveal the significance of Measurement Side (χ 2 (1) = 0.725, p = 0.394), which means no statistically significant difference between IL and CL measurements in Scalp.In contrast, Measurement Side was found to be significant for InEar (χ 2 (1) = 28.693,p < 0.001).For InEar IL measurements were found to be significantly larger compared to CL measurements.

Latency
The estimated apparent latencies for Scalp EEG of 43.6 (IL) and 42.4 (CL) ms found in the current study are generally in good agreement with the latencies previously reported in the literature when taking the differences in stimulation level into account.Several  studies reported an average apparent latency of 32-48 ms for ASSR with amplitude modulations of about 40 Hz [4,34,[39][40][41][42].
The ASSR can be thought of as the summed activity of sources along the auditory pathway weighted by the individual source-electrode transfer function [1,43].In this regard, the quite long latencies found in the current study correspond to the latencies expected for activation of the auditory cortex [44], suggesting that cortex sources dominate the responses measured using the scalp configuration.
The latencies estimated for CrossEar and InEar EEG were shorter than those found for Scalp EEG.This indicates that sources earlier in the auditory pathway have a larger weight in the ear ASSR as compared to the conventional scalp ASSR.

Data segmentation
As part of the ASSR analysis, the EEG data was segmented into 50 ms long sub-epochs (two periods of the modulation frequency), which were sorted according to the intensity of the corresponding speech stimuli and recombined into ten leveldependent datasets.Recombination of the EEG data may introduce distortions into the EEG time series for every 50 ms, resulting in a 20 Hz artifact with a second harmonic at 40 Hz, which could interfere with the 40 Hz ASSR, leading to an increased false positive rate.To mitigate this, the EEG data was highpass-filtered before the segmentation-recombination process.In order to investigate the effect of the highpass filter, we compared the ASSR amplitude for the overall EEG data recorded to SS with and without random shuffling of 50 ms sub-epochs.For highpass filters with cut-off frequencies above 10 Hz, the data segmentation-recombination did not change the amplitude at either 20 Hz or at 40 Hz in the periodogram of the EEG signal (see figure S8 in the supplementary material).Consequently, in the current study, a cut-off frequency of 20 Hz was used.
As an additional control, ASSR was recorded to WNS, that was designed in the same manner as the SS except that the level was held constant.WNS was therefore only presented at 45 dB SL, and the segmentation-recombination was not performed in the analysis of these data.The amplitude of ASSR recorded to WNS was comparable to the amplitude of ASSR recorded to SS at 45 dB SL and statistical analysis did not show any significant difference between ASSR derived from SS and WNS.

Spatial filter
Previous ear-EEG studies have shown that the ASSR SNR is lower for ear-EEG compared to scalp-EEG [10,45].Therefore, a spatial filtering method was applied in the current study to improve the SNR of ear-EEG recordings using a weighted combination of multiple electrodes [29].Under the assumption that the underlying sources of the ASSR were the same across the presentation levels, one common spatial filter was calculated based on the whole EEG signal and applied to the data for each individual level.This approach was used because the estimation of a spatial filter generally becomes more robust when a larger amount of data is used for its estimation.Further, using the same spatial filter across all intensities ensures that ASSR differences are not due to differences between the spatial filters.However, this approach introduces the risk of overfitting since the test data is thereby a part of the training data (6%-16% depending on the level).To estimate the degree of overfitting, results estimated using a leave-one-out cross-validation approach (where the spatial filter for the individual level was trained on all levels except the one to which the filter was applied) are included in the supplementary material for comparison (figure S9).This analysis found only very minor effects of the filter estimation method on the ASSR amplitude and the number of significant observations.

ASSR vs. dB SL
The ASSR amplitude increased with increasing presentation level for all electrode configurations (see figure 5).This observation was further supported by the statistical analysis, which showed a significant increase in ASSR with increasing presentation level.Some previous studies have reported a monotonically increasing ASSR amplitude with increasing intensity up to rather high levels of 80 dB SPL [24,34,35]; while other studies have reported saturation of ASSR at moderate-to-high levels [36][37][38].
In this study, the ASSR saturated at levels of 30-45 dB SL for the Scalp configuration.This could be attributed to an increasing masking effect from neighboring sub-bands, especially from lower frequency sub-bands of the SS, as the spread of excitation tends to increase with rising intensity [46].Earlier research has found that the ASSR amplitude significantly decreases when an IL noise masker is presented at the same level as the tone [47], as well as at levels 20 dB below the tone [35].
Nevertheless, the slope of the Scalp ASSR amplitude vs. presentation level relation at levels of 0-30 dB SL was found to be in agreement with previous studies [24,35].
The relation between the ASSR amplitude and the presentation level in CrossEar had a similar pattern as that seen on the Scalp.However, the ASSR amplitude was generally reduced by 10 dB compared to that recorded on the Scalp, while the background noise was reduced only by 6 dB.This corresponded to a decrease in SNR of 4 dB and resulted in a lower number of significant measurements.
For InEar, the SNR dropped with approximately 8 dB compared to Scalp, and in consequence, the number of significant measurements was even lower than for CrossEar.This result is consistent with previous ear-EEG studies [10,45].Since the grand average ASSRs at the low levels were calculated based on fewer recordings, most likely stemming from the subjects with the strongest ASSR, the grand average ASSR at these levels was likely artificially elevated.This may explain why the slope was smaller for InEar as compared to Scalp and CrossEar.
No significant difference was found between IL and CL ASSR in Scalp, which is consistent with several other studies [48][49][50].However, IL ASSRs were found to be significantly larger than CL ASSRs for InEar configuration, which may be due to a larger contribution from peripheral sources on the IL side.

ASSR vs. behavioral threshold
Although this study was conducted on normalhearing subjects, there was still some variability in the behavioral threshold among the participants.The difference in hearing threshold was thought to be compensated by presenting the stimuli relative to the individual hearing thresholds (i.e. in dB SL).Nevertheless, for completeness, the behavioral threshold was still included as a fixed effect in the statistical analysis.In contrast to our expectations, the behavioral threshold showed a statistically significant effect on the Scalp ASSR amplitude: the larger the behavioral threshold, the larger the ASSR amplitude.The coefficient for the behavioral threshold in the mixed model was found to be 0.304 (see table 1), which means that if two subjects have a 20 dB difference in their behavioral threshold, there is an expected 6 dB difference in ASSR amplitude.Although it was a surprising result, it is in agreement with earlier ASSR studies conducted on hearing-impaired subjects, showing higher ASSR amplitude relative to the sensation level [3,11,51].This phenomenon is typically attributed to increased neural recruitment in the hearing impaired, and it is likely that the same effect applies in this normal-hearing population, although the subjects with the highest hearing threshold were characterized as normal-hearing.
It is noteworthy that, based on the proposed speech-based stimuli and the selected duration of recording, the ASSR could be detected down to the level of the behavioral threshold (0 dB SL) for a subset of the subjects.

Stimuli
Amplitude modulation applied to the sub-band of the running speech affected the sound quality of the original speech signal, but the speech signal was still perceived as natural and without any degradation of the speech intelligibility.By reducing the modulation depth of the AM signal, the quality of the modified speech stimulus can be improved to the extent that the amplitude modulation is hardly noticeable.
In this study, we applied amplitude modulation to only one of the frequency sub-bands, keeping the other frequency sub-bands unmodified.In practice, each frequency sub-band can be independently amplitude modulated.This will allow hearing assessment at several audiometric frequencies simultaneously and save time, but will further affect the perceived quality of the speech signal.
The level adjustment step (step 3) in the stimulus design (see also figure 3) had two purposes.Firstly, to ensure that all sub-intervals had an intensity within the range of interest and that the distribution of intensities was inversely related to the ASSR amplitude.Secondly, to design the stimulus with exact levels in order to be able to compare the ASSRintensity relation derived using modified speech signal with existing literature, where traditional ASSR stimuli were used.In practice, this step is not needed.Instead, the EEG segments could simply be sorted according to an interval of levels, e.g.[17.5-22.5]dB instead of 20 dB.

Application
The current study has shown that ASSR as a function of presentation level can be estimated using sub-band amplitude-modulated speech signal in both scalpand ear-EEG.This forms the basis for translating the speech-based ASSR to a physiological threshold.Incorporated into a hearing device, the approach introduced in the current study would enable hearing assessment in everyday life.The naturally occurring sounds can be amplitude modulated by the hearing device and thus evoke an ASSR.By applying the amplitude modulation to one sub-band and using a more shallow modulation pattern, it is possible to create stimuli from naturally occurring sounds, without significant impact on the perception.It is worth noting that, in principle, the ASSR could be continuously estimated throughout the entire time the hearing aid is in use.Therefore, using more natural, albeit less efficient, stimuli, the hearing thresholds can be continuously monitored without affecting the daily life of the user.

Conclusion
In this paper, we have proposed and evaluated a novel method for estimation of ASSR based on sub-band amplitude modulation of natural sounds.The method has been evaluated on normal hearing subjects using a speech signal with an intensity level ranging from 0 to 45 dB SL.The study has demonstrated that the ASSR can be estimated as a function of intensity level based in both scalp-and ear-EEG.Combining the ASSR method based on natural sounds with ear-EEG integrated into hearing aids has the potential to enable recurrent updates of the hearing device's fitting, offering continuous optimal hearing loss compensation.

Figure 1 .
Figure 1.Left: computer model of a left ear earpiece illustrating the position of the electrodes.Right: schematic illustration of the electrode configurations.

Figure 2 .
Figure 2. Schematic representation of the stimulus design.

Figure 3 .
Figure 3. Illustration of the level adjustment procedure.The lower left panel shows the input distribution of the audio signal in the AM sub-band.The upper left panel shows the empirical level adjustment function, which transforms the continuous input distribution into the discrete output distribution.The upper right panel shows the amplitude distribution after the level adjustment.

Figure 4 .
Figure 4. Schematic illustration of how EEG data was distributed among different levels.First, EEG data was aligned with the audio and then split up into 50 ms sub-epochs (corresponding to two periods of the modulation frequency).The sub-epochs were then distributed according to levels in the AM sub-band.

Figure 5
shows the grand average ASSR and background noise as a function of dB SL derived from SS and WNS for the three different configurations: Scalp (a), CrossEar (b) and InEar (c).IL and CL measurements in Scalp and InEar are shown with solid and dashed lines respectively.The numbers above and below the error bars (standard error of mean) indicate the number of significant measurements in the grand average.Grand average for Scalp ASSR and background noise are plotted with faded colors together with both CrossEar (b) and InEar (c) for comparison.The estimated grand average ASSR and background noise values for the three configurations are included in table S1 in the supplementary material.Moreover, the ASSR and background noise figures for each individual subject are included in figures S2-S4 in the supplementaty material for Scalp, CrossEar and InEar respectively.

Figure 5 .
Figure 5. Grand average ASSR and background noise as a function of the dB SL for (a) Scalp, (b) CrossEar and (c) InEar configurations derived from SS and WNS.Error bars represent the standard error of mean.Solid and dashed lines in (a) and (c) represent IL and CL measurements respectively.The numbers above and below the error bars indicate the number of significant measurements included in the grand average.In (b) and (c) the grand average ASSR and background noise for Scalp configuration are presented in the background with faded colors to ease the comparison between the different configurations.The curves are shifted slightly horizontally for better readability.Red curves represent the fit from mixed models.

Table 1 .
Fixed and random effect coefficients of the final model ASSR ∼ L + L2 + BT + (1|S) for Scalp.

Table 3 .
Fixed and random effect coefficients of the final model ASSR ∼ L + MS + (1|S) for InEar.