MEMS piezoelectric resonant microphone array for lung sound classification

Abstract This paper reports a highly sensitive piezoelectric microelectromechanical systems (MEMS) resonant microphone array (RMA) for detection and classification of wheezing in lung sounds. The RMA is composed of eight width-stepped cantilever resonant microphones with Mel-distributed resonance frequencies from 230 to 630 Hz, the main frequency range of wheezing. At the resonance frequencies, the unamplified sensitivities of the microphones in the RMA are between 86 and 265 mV Pa−1, while the signal-to-noise ratios (SNRs) for 1 Pa sound pressure are between 86.6 and 98.0 dBA. Over 200–650 Hz, the unamplified sensitivities are between 35 and 265 mV Pa−1, while the SNRs are between 79 and 98 dBA. Wheezing feature in lung sounds recorded by the RMA is more distinguishable than that recorded by a reference microphone with traditional flat sensitivity, and thus, the automatic classification accuracy of wheezing is higher with the lung sounds recorded by the RMA than with those by the reference microphone, when tested with deep learning algorithms on computer or with simple machine learning algorithms on low-power wireless chip set for wearable applications.


Introduction
About 7.4% of the world population suffer from chronic respiratory diseases, among which asthma and chronic obstructive pulmonary disease (COPD) are most common [1,2]. As many as 262 million people are affected by asthma and more 1 These authors contributed equally. * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. than 1200 individuals die from asthma every single day on average [3]. Wheezing due to narrowing airway of the lung caused by asthma is a common symptom and can be sensed by a stethoscope [4][5][6][7]. Thus, lung sound monitoring can be very helpful for asthma patients, especially children, who cannot carry out the well-established pulmonary function tests accurately due to their inability to understand or follow the instruction on how to force air out of their lungs. Lung sounds can be monitored with electronic stethoscopes, but not for more than 1 h continuously due to their bulkiness and heaviness (stemming from the acoustic coupler needed to amplify faint lung sound) [8][9][10]. Also, a weak wheezing may be missed because of a low signal-to-noise ratio (SNR) of the microphone (used in the stethoscope). The published sensitivities of commercial MEMS condenser microphones are between 5 (TDK INMP411) [11] and 25.12 mV Pa −1 (TDK ICS-40730) [12], which depends on the applied bias voltage (typically about 10 V DC ). Their SNRs for 1 Pa sound pressure are between 59 (Knowles SPH2430HR5H-B) [13] and 74 dBA (TDK ICS-40730) [12]. A MEMS condenser microphone with 22.39 mV Pa −1 sensitivity and 73 dBA SNR over 22 Hz-22 kHz has been reported, but with 200 V DC applied bias voltage [14]. In the case of piezoelectric MEMS microphones, a bias voltage is not needed, and an unamplified sensitivity of 38 mV Pa −1 over 100-700 Hz with the fundamental resonance at 890 Hz has been reported [15].
Microphone sensitivity is enhanced at the mechanical resonance of a microphone diaphragm when the resonance's quality factor (Q) is greater than 1, and MEMS resonant microphones have been reported [16][17][18][19][20][21][22][23]. An array of such resonant microphones can mimic the human auditory system based on resonances of 30 000 cochlear hairs at the basilar membrane [24,25]. A higher Q means a higher sensitivity at the resonance frequency, but over a narrower bandwidth. Thus, a diaphragm with multiple resonances or a resonant microphone array (RMA) consisting of multiple resonant microphones covering different frequencies is needed to cover a wide frequency range. A piezoelectric RMA with unamplified sensitivities of the resonant microphones being 34.6-131.4 mV Pa −1 at their resonance frequencies between 169 and 662 Hz was reported for lung sound monitoring [21]. However, the noise floor and SNR of the RMA was not reported in [21]. This paper presents the design, fabrication, characterization, and application of a highly sensitive piezoelectric MEMS RMA for detection and automatic classification of wheezing in lung sounds. Measured unamplified sensitivity and SNR of the RMA are presented along with machine learning algorithms developed and implemented on a computer and on a commercial wireless chip set CYBLE-416045-02. Also presented are measured classification accuracies and speeds (directly related to energy consumption) for wheezing in lung sounds.

Design
Eight of the width-stepped Si cantilevers, with two narrow beams supporting a rectangular plate, are used for the resonant microphones in the RMA (figure 1). The resonance frequencies are Mel-spaced (denser at lower frequencies as humans are capable of distinguishing lower frequencies better) between 200 and 800 Hz (frequency range of wheezing). More cantilevers can cover more frequencies with highly sensitive resonances, albeit at the cost of larger size for the RMA. For the wheezing detection over 200-800 Hz, eight cantilevers offer good trade-off between the performance and the size. Piezoelectric thin film ZnO, which converts the cantilever bending stress (due to applied sound pressure) to voltage, is placed only over the narrow support beams for maximum average stress over a largest possible area. Electrical insulation layer SiN encapsulates ZnO to prevent charge transfer between the top and bottom electrodes through ZnO for good sensitivity at low frequencies, as the resistivity of ZnO (10 7 Ω · cm) is relatively low. The air gap between the cantilever and the Si base is as narrow as 20 µm to minimize acoustic pressure leakage at low frequencies. The sizes of the cantilever resonant microphones are 3.6-2.3 mm (table 1), while the thickness of the Si cantilever is 5 µm (table 2).
The fundamental resonance frequency of a width-stepped cantilever can be calculated through a beam free vibration equation for the cantilever displacements W 1 (x) and W 2 (x) (equation (1)) for the two parts having different widths where β 4 1 = m1ω 2 E1I1 and β 4 2 = m2ω 2 E2I2 , with m 1 and m 2 being the mass per unit length, E 1 and E 2 being the Young's modulus, and I 1 and I 2 being the moment of inertia, at part 1 and 2, respectively. Once β 1 and β 2 are solved through equation (1) [26], the resonance frequency f can be obtained as follows Compared to a rectangular cantilever with one fixed and three free ends (figure 2(a)), the bending-induced stress due to an applied pressure and the cantilever size for a same fundamental frequency are higher and smaller, respectively, for a width-stepped cantilever (figures 2-4). Thus, a higher unamplified sensitivity is expected with a piezoelectric microphone built on a width-stepped cantilever than that on a standard cantilever, as a piezoelectric film ZnO is placed only on the support beams of a width-stepped cantilever (figure 1). The voltage V produced across the ZnO thickness due to average stress σ induced by bending caused by an applied pressure is where C, A and t are the ZnO's capacitance, area, and thickness, respectively, while d 31 and ϵ r are the piezoelectric coefficient and relative permittivity of the piezoelectric film, respectively, with ϵ 0 being vacuum permittivity. The resonance frequency of a width-stepped cantilever is not only dependent on the size of the whole cantilever but also on the size of the narrow segment as shown in table 1. Therefore, there is more design flexibility for the width-stepped cantilever. To make the RMA illustrated in figure 1 smaller, the   (table 1) and a standard cantilever vs fundamental resonance frequency.
width-stepped cantilever #4 (having the fundamental resonance frequency of 429 Hz) in the array is designed to have the same length l as #5 (having the fundamental resonance frequency of 495 Hz) but with longer and narrower Narrow Part (table 1). This is why the curves of the length, average stress, and 2nd resonance frequency are not smooth for the widthstepped cantilever at #4 in figures 3, 4 and 6, respectively.   (949 Hz) resonance frequencies, respectively, with one narrow support beam in the center, (c) and (d) the fundamental (436 Hz) and second-harmonic (2049 Hz) resonance frequencies, respectively, with two narrow support beams at the two ends. The width of the one narrow beam in the center is twice that of the narrow beam at the end, while the length is the same. The total width and length of the cantilevers are the same to be 2.6 × 2.6 mm 2 .
A width-stepped cantilever with one narrow support beam in the center (figure 5(a)) has the 2nd harmonic resonance frequency close to the fundamental resonance frequency (figures 5(b) and 6), which may result in the 2nd resonance overlapping the fundamental resonance of another cantilever in an RMA. As we would like to utilize the fundamental resonance of each resonant microphone in an RMA and avoid any interference of the harmonics, a width-stepped cantilever with two narrow support beams (figures 5(c), (d) and 6) is designed for each resonant microphone in the RMA.

Fabrication
The RMA is fabricated on a silicon-on-insulator wafer with 5 µm thick Si device layer (figure 7(a)). First, 0.5 µm thick low-stress SiN is deposited with low pressure chemical vapor deposition and patterned (figure 7(b)) for etch mask during KOH etching the Si (figure 7(c)). After etching the buried SiO 2 in buffered HF, followed by etching of the top SiN in reactive ion etching (figure 7(d)), we sputter-deposit and pattern   Al and piezoelectric ZnO, deposit SiN with plasma-enhanced chemical vapor deposition (PECVD) and pattern SiN, and then sputter-deposit and pattern Al (figure 7(e)). After dicing RMAs from the wafer, the cantilevers are released on each chip through etching Si on the diaphragms of each RMA (figure 7(f)). Long cantilevers (particularly, #1, #2 and #3) in the fabricated RMA (figure 8) show substantial downward warpage due to relatively large compressive residual stress in the ZnO film (table 3). The residual stresses of the thin films in table 3 are the average values calculated through measuring the curvature of a 3 ′′ wafer by a profilometer DektakXT before and after the film deposition.

Unamplified sensitivity
The measured capacitances and resistances of the resonant microphones (table 4) are close to the designed values. The fabricated RMA is placed over a slot (for sound input) in a printed circuit board (PCB) (figure 9) and voltage amplifiers based on LTC6244 op amp with input resistance and capacitance of 10 12 Ω and 2.1 pF, respectively (figure 10). The signal from each resonant microphone of the RMA is magnified and recorded separately without connecting to each other. The measured sensitivity from the amplifier output is divided by the amplification factor of 101 for unamplified sensitivity. A bias resistor of 1 GΩ (figure 10) is used for DC-biasing the op amp without affecting the low frequency response of the piezoelectric microphone.
The PCB is mounted to a cover plate (with a slot for sound input) of a metal box which blocks electromagnetic interference ( figure 11). The outputs of the microphone amplifiers are connected to a data acquisition system (ROGA Plug.n.DAQ). The sound input to the resonant microphone is calibrated with a reference measurement microphone (GRAS 40AO, noise floor 25 dBA, sensitivity 12.5 mV Pa −1 , bandwidth 3.15 Hz-20 kHz) with both the RMA and the reference microphone   being placed next to each other in a plane wave tube (PWT) in an anechoic chamber (figure 12). A loudspeaker placed at one end of the PWT delivers same sound pressure to the RMA and the reference microphone which are located near to each other.  The measured unamplified sensitivities of the eight resonant microphones in the RMA are as high as 265-86 mV Pa −1 at the eight resonance frequencies ( figure 13). The sensitivity of the RMA at all frequencies between 200 and 650 Hz is above 35 mV Pa −1 (above the dash line in figure 13). The measured resonance frequencies are lower than the designed ones (table 1), mainly because the Si cantilevers turn out to be 4 µm thick, rather than 5 µm thick. The sensitivity curve of each resonant microphone shows ripples near the resonance frequencies of the other resonant microphones due to electrical crosstalk among the resonant microphones in the RMA. This phenomenon can be reduced through better grounding of the microphones and amplification circuits. The quality factors (based on resonant frequency f 0 divided by the −3 dB bandwidth in figure 13) of the resonant microphones are between 13.5 and 22 ( figure 14). The damping coefficient of a smaller resonant microphone is usually smaller because the damping is mainly from the air surrounding the cantilever. Consequently, the quality factor is usually higher for the resonant microphones with higher resonance frequencies and smaller size. Higher quality factor leads to higher unamplified sensitivity and lower noise floor of a resonant microphone while the bandwidth (over which the sensitivity is enhanced by the resonance) is narrower. Therefore, we need proper quality factors for the resonant microphones in the RMA so that the RMA has both high sensitivities to detect weak sound and enough bandwidth to cover the frequency range of interest.

Noise floor and SNR
To measure the noise without any electromagnetic or sound interference noise, the RMA and amplification circuit are placed in a double metal box with battery ( figure 15).
With the double metal box on a vibration isolation table, the output of the amplification circuit for each resonant microphone in the RMA is divided by the amplification (101) for input-referred noise of each resonant microphone and its amplifier. The measured input-referred root-mean-square (RMS) noise over 20 Hz-20 kHz observation bandwidth is 8-10 and 3-4 µV before and after A-weighting, respectively. The noise floor in pressure is obtained by dividing the input-referred RMS noise voltage by the unamplified sensitivity, while the noise floor in dB is 20 log (noise-floor-in-Pa/referencepressure) where the reference pressure is 2 × 10 −5 Pa. And the SNR for 1 Pa sound pressure input is obtained by deducting the noise floor in dB from 94 dB, as 1 Pa sound pressure is 94 dB (= 20log(1/(2 × 10 −5 ))). The measured SNRs of the reson-  the external sound and vibration are isolated very well during the noise measurement.

Lung sounds detection and classification
Well-annotated lung sounds from International Conference on Biomedical and Health Informatics (ICBHI) Respiratory Sound Database [27] are played by a loudspeaker and recorded with the RMA and the reference microphone in a set-up shown in figure 12. The recordings are analyzed in time and frequency domains, and processed through deep learning and machine learning algorithms for automatic classification of wheezing in the lung sounds, to show the advantages of the RMA over a standard microphone.

Recorded signals in time and frequency domain
Wheezing in lung sounds is easily recognizable in the recording by the RMA in time ( figure 18(a)) and spectrogram  ( figure 19(a)), while the recording by the reference microphone shows little wheezing feature in time ( figure 18(b)) and a weak feature in the spectrogram ( figure 19(b)). A weak wheezing is not visible in the time recordings by the RMA and the reference microphone (figures 20(a) and 21(a)). Such a weak wheezing, though, can still be distinguished in the spectrogram of the recording by the RMA (figure 20(b)), but is not distinguishable in the spectrogram of the recording by the reference microphone ( figure 21(b)). Thus, wheezing in the lung sounds can be recognized better in both time and frequency domain by the RMA than that by the conventional microphone.

Automatic wheeze classification with deep learning
Fifty lung sounds from the ICBHI database [27], with twentyfive of them having wheezing, are played by a loudspeaker  and recorded by the RMA and the reference microphone ( figure 12). The recordings are classified by deep learning algorithms, and the classification accuracies are compared.
With temporal convolutional networks (TCNs) [28], the recorded lung sounds in time domain are processed without any pre-processing for the classification. Twelve-layer networks are used in TCN to extract a ten-dimensioned feature vector for the classification. On the other hand, for convolutional neural networks (CNNs), pre-extracted Mel-frequency cepstral coefficients are used. K-fold cross-validation is applied so that all the data can be used for both training and test. The recordings are divided into K (5 in this case) groups randomly. One group is used for the test while the other groups are used for the training at  each iteration until every group has been tested. The classification accuracy is the average accuracy of all the K iterations. With K being 5, 40 recordings are used for the training, and 10 recordings are used for the test in each iteration, for a total of 5 iterations. The classification accuracies of the lung sounds recorded by the RMA with both TCN and CNN are higher than what are obtained with the reference microphone ( figure 22).

Automatic classification with machine learning on a chip set for wearable wireless communication
As deep learning algorithms cannot be implemented on a low power chip set for wireless communication such as Infineon CYBLE-416045-02 [29], which contains a microcontroller unit (MCU) and other components including antenna. We have developed and tested machine learning algorithms on the MCU (PSoC 63), which contains analog-to-digital converters, central processing unit, memory and blue tooth low energy transceiver ( figure 23).
Each of the recordings is divided into 7650 pieces with each piece being 40 ms long for feature extraction. Mel-Spectrum features are extracted for classification as the frequency  Spectral signature averaged per frame for lung sounds with and without wheezing recorded. The shaded regions indicate the standard deviation at each frequency. The power spectral density (PSD) of the lung sounds with wheezing is higher than that without wheezing especially between 300 and 600 Hz. spectra of the lung sounds with and without wheezing are quite different (figure 24). On the data recorded by the reference microphone, fast Fourier transform (FFT) and digital filtering are applied for the feature extraction ( figure 25(b)). However, with the data recorded by the RMA, the features at different frequencies are obtained through calculating the energy at each recording by individual resonant microphone in the RMA with its unique capability of acoustically filtering the audio signal ( figure 25(a)). Thus, the feature extraction with the RMA is much faster (more than ten times) than that with the reference microphone as FFT is time consuming (figure 26). Though the idea of calculating the energy from individual channel of an RMA without FFT has been reported [30], the stronger MCU in PSoC 63 and optimized algorithms have resulted much faster signal processing.
Two machine models, Gaussian Naïve Bayes and support vector machine, are developed for the classification based on the extracted features. These classifiers are trained for single frame prediction at each moment because temporal-variation and multi-frame analyses require more memory and processing speed from PSoC 63. The recorded data are split with 70% for training and 30% for testing. The training is implemented on a desktop computer, and then the parameters of the algorithms obtained from the training are transferred to PSoC 63 for the test. We have measured the classification accuracy which is equal to (t p + t n ) / (t p + f p + t n + f n ) and F1  score which is equal to 2PR/ (P + R), with t p ≡ true positive, t n ≡ true negative, f p ≡ false positive, f n ≡ false negative, P ≡ t p / (t p + f p ) and R ≡ t p / (t p + f n ). Both the classification accuracy and F1 score are better with the recordings with RMA than with the reference microphone ( figure 27).

Discussion
The unamplified sensitivities of the RMA (265-35 mV Pa −1 ) over 200-650 Hz where wheezing is prominent are higher than other MEMS microphones reported (figure 28), albeit with a larger size. The noise floor of the RMA also is lower than other reported MEMS microphones over 200-650 Hz ( figure 29). Thus, the minimum detectable wheezing signature in lung sounds is better with the RMA. If a microphone is targeted for a limited frequency range, an RMA with multiple resonances over the frequency range is shown to offer unprecedented minimum detectable sound level. Although the sensitivity and noise floor of the RMA is not flat, we did not find the effect of this un-flatness on the lung sound classification.  Design innovation and optimization are important to make the resonant microphones small and highly sensitive. The width-stepped cantilever design with two narrow beams supporting a rectangular plate is shown to be smaller and to offer higher sensitivity than a standard cantilever (figures 2-4) which has much less bending stiffness than a diaphragm with its four edges clamped. The size can be reduced further with a spiral structure or cantilever with serpentine support beams [20,21], which has exhibited less sensitivity than the current design in this paper.
With the conventional approach, the number of digital filters can be increased to improve the classification accuracy ( figure 30), but at increased process time and power consumption. The accuracy and F1 score with reference microphone plus 40 filters are still lower than that with the proposed RMA. Therefore, more advanced algorithms with poor quality data from microphones with higher noise floor may not compete with simple algorithms with high quality data from the RMA with extremely low noise floor. Furthermore, the signal processing is much faster (figure 26) with the RMA which inherently filters sounds into specific bandwidths. Thus, the current work shows the significant advantages of the RMA for realtime lung sound monitoring and classification with a wearable stethoscope.

Summary
An array of piezoelectric MEMS resonant microphones, with novel width-stepped cantilever design with two narrow beams supporting a rectangular plate, has been developed with Meldistributed resonance frequencies to cover the frequency range where wheezing in lung sounds is prominent, and is shown to have the highest unamplified sensitivity and SNR in this frequency range compared with other reported MEMS microphones. With the array, wheezing in lung sound is shown to be detected and automatically classified better than with a reference microphone. The automatic classification accuracies for wheezing are higher with the RMA for both deep learning (performed on a computer) and machine learning (performed on a chip set for wearable wireless communication). In addition, the signal processing with the RMA is shown to be more than ten times faster and consumes 92% less energy than that with a traditional microphone on a low power chip set for wearable wireless communication. Therefore, the current work paves the way for a wearable stethoscope to continuously monitor and automatically classify lung sounds in real-time so that patients or caregivers may be alerted and also so that medical professionals may have recordings of relevant lung sounds.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).