An audio-tactile interface based on dielectric elastomer actuators

This paper presents a concept of a dielectric elastomer actuator (DEA) user interface (smart button) that can sense a user’s touch and provide multi-sensory tactile and acoustic feedbacks through a single electrical input signal. The DEA relies on a multi-layer layout, in which a layer detects user-driven deformations (touches) via custom-built capacitance sensing electronics, and the remaining layers are used to provide actuation (audio-tactile feedbacks). Building upon a recently presented principle, combined tactile and acoustic feedbacks are produced by concurrently exciting different vibration modes of the same active membrane over different frequency ranges. An integrated demonstrator setup is presented, which includes a DEA, an acoustic enclosure, compact sensing and driving electronics. A characterization of the prototype is conducted, including an analysis of the sound pressure level, the force/stroke output at lower working frequencies, the ability to sense deformations with different profiles and produce combined audio-tactile outputs. Compared to previous works on multi-function DEAs, the system presented in this paper provides largely improved sensing performance (with lower working voltage) and features a deeper level of integration (with small-scale custom sensing electronics, and logics embedded onto scalable microcontrollers) and is thus specifically optimised for user-interaction applications. On this end, tests with users are presented here for the first time, which allowed evaluating the subjective perception of the interface’s feedbacks. By means of further optimisation and miniaturisation of the power/sensing electronics and structural components, the layout and multifunction DEA principle presented here might lead, in the future, to the development of DEA-based smart buttons for active surfaces, or portable/wearable user interfaces and communicators.

(Some figures may appear in colour only in the online journal) * Authors to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Introduction
Dielectric elastomers (DEs) are stretchable materials that can be used to build electrostatic actuators that work based on a variable capacitance principle [1]. Because of their high actuation energy density, design flexibility, and architectural simplicity, in the last decade DE actuators (DEAs) have received significant attention from the scientific communities, which led to the development of application demonstrators in a variety of areas, ranging from soft robotics [2] to industrial [3] and space applications [4]. The actuation capability of DEAs covers a bandwidth spanning several orders of magnitude, ranging from the sub-Hertz regime [5] up to frequencies in the kilohertz range in coil-less loudspeaker applications [6,7]. Furthermore, thanks to their self-sensing capability, DEAs bear the potential to be closed-loop controlled solely based on voltage/current measurements, with no need for dedicated mechanical sensors [8].
A very attractive field of application for DEAs is represented by user interfaces, such as the haptic communicators shown by Zhao et al [9] or Braille displays for visually impaired subjects developed by Frediani et al [10]. Marette et al [11], Zhao et al [9] and Lee et al [12] showed, for example, that DEs might provide a breakthrough in the field of user interfaces, as they could allow developing fully-compliant units that can be rolled or folded and easily carried along by users [11], or integrated onto garments and textiles [9,12]. Several concepts of DE-based tactile communicators have been proposed in the past. Frediani et al [13] developed wearable hydrostatically coupled interfaces for virtual reality (VR) capable of providing fingertips stimulation. A similar approach was pursued by Ji et al [14], who developed an untethered fully-compliant haptic DE interface and tested it against pattern-recognition tasks with users. Phung et al [15] proposed a tactile DEA able to produce forces on the order of 100 mN with a diameter of 4 mm. Marette et al [11] developed a flexible haptic display based on buckling DEAs, whose elements can produce an actuation even when bent/rolled against a substrate. Wood and co-workers [9,12] investigated the integration of haptic displays based on arrays of linear DEAs onto garments such as arm bands.
A barely explored potential point of strength of DEs is multifunctionality, i.e. the ability to implement different functions, such as sensing, linear actuation, and sound generation with the same DE layout. In the field of user and haptic interfaces, combining these different working modes might lead to advanced concepts of smart buttons and intelligent interfaces, able to provide on-demand multi-sensory feedbacks.
Audio-tactile interfaces aim to add an acoustic sensory layer on top of vibrotactile stimulation, so as to convey complex actuation patterns or interaction modes (e.g. the clicking feedback of buttons) in a virtual manner [16]. State-of-the-art solutions for audio-tactile interfaces typically require two separate actuation units for vibration and acoustic feedbacks, such as vibration motors and electrodynamic speakers [17]. These units are usually made with stiff and bulky components, and the might be difficult to install close to one another, in which case they would offer limited levels of co-location of the haptic and acoustic stimuli and limited potential for rendering realistic localized scenes [18].
In this paper, we present a proof-of-concept for an audiotactile DE user interface, namely a DEA push-button, that combines sensing, vibrotactile actuation, and sound generation into a single active DE unit. The proposed DE interface can generate combined tactile and acoustic feedbacks based on a principle recently presented by the authors of this work [19]. Specifically, the interface produces acoustic signals and lower-frequency tactile stimulation using different vibration modes of a same DEA, excited by means of a multi-chromatic voltage input. The considered actuator layout is a conical outof-plane DEA (COP-DEA), which features a pumping linear actuation motion (here, used to convey vibrotactile stimulations) in the low frequency (LF) range, and develops complex structural vibration modes of the DE membrane at high frequency (HF) [20], which can be used to generate sound. Beside cutting down the number of actuation elements and driving electronics compared to multi-actuator audio-tactile interfaces, the proposed concept provides perfectly co-located feedbacks, as both the vibrotactile stimulation and sound come from a same active DEA source. In addition to providing actuation, the interface can sense users' touches via capacitive sensing. This is achieved by providing the active DEA unit with a multi-layer structure, where different groups of layers (sharing a same electrical ground (GND)) are dedicated to sensing and actuation tasks respectively.
We developed an integrated demonstrator of the prototype, which includes a COP-DEA, a holding case (which allows safe interaction of the user with the device), custom sensing electronics, and a microcontroller that executes sensing algorithms and drives the DEA button. We carried out characterisations of the DEA performance in different operating modes (namely, LF pumping actuation, HF sound generation, multi-output working mode, and sensing), and we performed interaction tests with users aimed at evaluating the level and recognisability of the feedbacks provided by the interface.
Compared to our previous work on multi-mode actuation of DEAs [19], this paper features an advanced DEA layout (with partitioned multi-layer structure for sensing and actuation, which allows performing advanced sensing tasks and reducing the operating voltage during the phases when no actuation is produced), and sets the focus on the audio-tactile application (here, systematically investigated through user tests) and system integration (including sensing electronics, and coordinated sensing/actuation algorithms).
We remark that the aim of this work is to provide a proofof-concept of DEA user interfaces that combine different functions into a same active polymeric stack. Improvements in the device layout (e.g. the design of more compact or integrated interfaces) will be pursued in the future, together with the definition of more advanced testing scenarios and user case studies.
The paper is structured as follows. Section 2 describes the operating principle of the multi-mode DE interface. Section 3 presents the layout and components of the integrated device. Section 4 presents experimental results, which consist in characterisations of the LF, HF and multi-frequency actuation, and tests with users. Section 5 presents concluding remarks and future developments.

Layout and operating principle
The proposed user interface consists of a stack of DE membranes covered by compliant electrodes, according to the structure shown in figure 1. The stack has two separate independent portions: a subset of layers is used for actuation purposes, whereas another layer (located on the upper face of the stack, on the user's side) is used for sensing purposes. The sensing layer is used to detect deformations impressed by a user, which are used to trigger a combined audio-tactile stimulation produced by the actuator. The layout of the device is based on a known actuator topology called COP-DEA [21,22]: the DE stack has a circular shape, it is mounted with a certain radial pre-stretch on a rigid outer frame, and pre-loaded out-of-plane by an elastic biasing element connected to a rigid end-effector disc (figure 1). Applying a voltage u on the electrodes of the actuator layers generates an electrostatic pressure p em on the dielectric membrane surfaces, which is proportional to the square of the applied voltage according to the following equation: where h is the thickness of the dielectric layers, while ε 0 and ε r are the vacuum and DE relative permittivity, respectively. This electrostatic pressure is responsible for voltage-driven deformations of the DE membrane [1]. Applying a LF voltage waveform results in a linear movement of the end effector along the device axis. We use a nonlinear biasing element (NBS) [23] as pre-loading element, consisting of a metallic buckled beam. In the presence of HF voltage excitations, the motion of the end-effector becomes negligible, as a result of the low-pass dynamics of the device. However, the DE is still able to generate sound by means of voltage-driven transverse structural vibrations of the membrane surface (figure 2). Exciting the actuation layers with a multi-chromatic voltage allows concurrently producing a linear LF actuation and generate sound, in accordance with the principle presented [19] and shortly recalled in section 2.2. In our application, multi-mode actuation is used to provide users with vibrotactile feedbacks (LF actuation), acoustic feedback (HF actuation) or combinations of the two.
The interface can be used as an active push-button, in which axial movements of the end-effector generated by users' touches can be detected by means of the sensing layer, as further described in section 2.3, and used to trigger vibrotactile and acoustic stimulations by means of the actuation layers.
We designed a device consisting of three actuation layers and a single sensing layer. The structure of the DE layering and the connection is shown in figure 1. The system has two electrodes connected to GND, two electrodes connected to high voltage (HV), and an additional positive electrode for sensing. The actuation and sensing layers share a common GND. The connections (tracks) between the deformable electrodes and the circuit wires feature the layout shown in figure 1. HV and GND tracks run on diametrically opposite sides of the electrodes, whereas the positive sensing electrode track overlaps with the GND track. Thanks to this structure, the sensing electronic can be installed on the GND side, and connected to the same GND as the HV electronic, so as to reduce the required number of GND electrodes.
Compared to the multi-function DEA concept introduced in [19], where sensing and actuation were performed through a same set of DE layers (using current measurements), the incorporation of a dedicated sensing layer in the active stack allows performing advanced sensing tasks (e.g. distinguishing rapid sequences of user-prescribed deformations) and improves the system reliability. In [19], a constant HV bias was indeed constantly applied on all DE layers of the DEA, with the aim of generating a readable current in response to applied deformations. In the concept proposed here, the sensing layer is operated with low voltages, potentially leading to a significant improvement in lifetime. The incorporation of the sensing layer (with its own elastic stiffness) comes at the cost of a moderate reduction in the strain that the active DE stack can produce (see supplementary material), but it allows preserving a simple and compact layout for the active DE interface, which still makes use of a single DE stack.

Multi-mode actuation
In [19,20], we showed that the deformation patterns followed by the COP-DEA vary significantly over different frequency intervals, ranging from a linear pumping motion of the end effector and the membrane surface at LF, to transverse membrane vibrations similar to those traditionally observed in circular or annular tensioned membranes at HF [20]. The frequency ranges where the COP-DEA exhibits a pumping behaviour and those where it generates sound through structural vibrations are highly uncoupled, because of the significant difference between the end-effector mass and the DE stack mass. Based on this result, in [19] we proved that independent linear actuation and sound generation can be produced using a single stack of DE membranes driven by an electrical input (figure 2). This is achieved by driving the actuator with an input excitation that is the combination of a signal with LF fundamental frequency and large amplitude (responsible for the pumping mode excitation), and a HF low-amplitude signal (responsible for the excitation of structural modes, corresponding to transversal vibrations of the DE membrane): whereū (t) and ⌣ u(t) are signals ranging between −1 and +1 that vary in time on different time scales;ū (t) has fundamental frequency in the LF range (e.g. 10 • -10 2 Hz), ⌣ u (t) has most of its spectral content in the range 5 · 10 2 -5 · 10 3 Hz; U max and U min are the maximum/minimum voltages to which the DEA is subject when ⌣ u = 0, while U a is a measure of the amplitude of the HF voltage component, since in the time intervals whereū (t) ≈ 1.
The square root in equation (2) is introduced to render the system input (namely u 2 , see (1)) linear with respect toū and ⌣ u.

Deformation sensing
The sensing layer is used to detect touches impressed by a user onto the DE end effector. The sensing layer's capacitance C DE depends on the DEA's geometry and out-of-plane displacement z according to a functional dependence with the following general form: where ε is the DE permittivity, A l and t l are the sensing layer's surface area and thickness, which are functions of z (A l increases and t l decreases with z), whereas Ω l = A l t l is a constant (incompressible) volume of the layer. Approximated explicit forms for C DE (z) have been proposed and validated in literature, based on assumptions on the COP-DEA deformation kinematics (e.g. by assuming that the deformed COP-DEA has the shape of a truncated cone, subject to pure-shear deformations) [21].
Here, the capacitance change of the sensing layer is measured via custom electronics. Figure 3 shows the equivalent circuit of the sensing electronic and the sensing principle. In the picture, R 1 , R 2 are external resistors involved in the charging/discharging phases of the sensing cycle respectively, and U supp is a constant voltage provided by a source.
The capacitance sensing electronic performs a measurement of the cyclic charging and discharging time. The sensing layer of the DE is cyclically charged up to voltage U max and discharged down to voltage U min (with U min < U max < U supp ), and the duration of the resulting cycle, t p = t c + t d , given by the sum of the charging and discharging times (t c , t d respectively), is measured. By varying the position of the DE end effector, the capacitance of the DE changes and the resulting time for a full charge-discharge cycle also changes. By assuming that the charging transient occurs in a sufficiently short time interval, during which C DE can be assumed as approximately constant, the current flowing in the circuit is given by where t is time, the voltage on the DE at t = 0 is U min and the initial current is The charging time is calculated by setting corresponding to a condition where the voltage on the DE sensing layer is U max , which leads to: Under similar assumptions, the discharging time during which the DE voltage is decreased from U max to U min through resistor R 2 is given by The total cycle time is thus proportional to C DE : While in practice more complicated relationships between t p and C DE might apply (owing, e.g. to the DE's compliant electrodes resistance), a monotonic relationship between cycle time and capacitance is expected to hold.
In the application presented in this work, we use capacitance measurements to detect touches impressed by a user on the DE interface (which result in a decrease in z and, hence, in the capacitance C DE ). The DE sensing layer is combined with electronics (see section 3) that produce an output that is inversely proportional to the cycle time t p . Integrated logics recognize whether (and how often) the sensor output surpasses a certain threshold (indicating the achievement of a certain minimum displacement), hence allowing to detect and count separate touches performed by the user.

User interface integration and construction
A COP-DEA prototype was built using Elastosil 2030 50 µm dielectric from Wacker Chemie AG for the dielectric layers, coated with carbon-loaded silicone layers manufactured via a well-established screen printing process, which allows producing stable low-resistivity electrodes [24]. The COP-DEA has an outer diameter of 30 mm, with a central 3D printed endeffector disc with 15 mm diameter. The DE layers have a radial pre-stretch of 20% in the planar mounting configuration. The NBS features a design similar to that described in [19], and consists of a stack of two layers of 50 µm spring steel, cut using a cab XENO 1 Laser. The out-of-plane equilibrium displacement of the end-effector with respect to the membrane base plane is 5.25 mm.
The assembly of the DEA, the NBS, the clamping system, and the connections is installed into a 3D-printed housing (figure 4). The latter is used to prevent a direct contact between the user and the DE membrane surface (for safety reasons and to guarantee the structural integrity of the DE), while also serving as an acoustic enclosure for the DE membrane which prevents acoustic short circuits between the back and the front faces of the membrane [25]. The top cover of the housing holds a printed grid, to allow sound waves to leave the box. A plastic spacer is installed on top of the DE's end-effector and connected to a circular layer of acoustic fabric (akustikstoff.com; item No.: eb150100), whose perimeter is connected (with no prestrain) to a circular aperture in the housing. In the equilibrium configuration, the textile layer is flat. Pushing on the acoustic textile causes a downward deformation of both the textile layer and the DE. The spacer between the DE and the textile surface has the aim to create a significant distance (on the order of 10 mm) between the user's finger and the point where HV is applied. The mass of the axially moving parts (NBS, disc, spacer, screw) is 8 g, whereas the mass of the DE membrane is below 0.1 g.
An exploded view of the setup is shown in figure 4. After preparation, the four DE layers are clamped together via 3 D printed clamps, from where the electrical connections to the electrode are led outside. The NBS is also clamped and connected with a 3D printed spacer to the DE. Adjustment of the working out-of-plane equilibrium position of the membrane is performed by regulating the spacing between the DE and the NBS through a screw (see supplementary material). Wiring of the prototype box is done with 4 mm jack plugs that can be directly connected to the HV power supply and sensing electronics.
The logics and electronic architecture of the prototype is shown in figure 5. The audio-tactile interface (DE membrane with mounting case, as shown in figure 4) is the core element of the system. The sensing electronics used to detect users' touches feature a custom circuit design. A microcontroller (STM32 H743ZI2) is used to read the capacitance sensor electronic, provide a driving waveform to the HV amplifier, execute the logics, and drive built-in ancillary systems (LED lights). Driving voltage waveforms (u in (2)) are generated by the microcontroller and delivered to a power amplifier by means of an integrated digital-to-analog converter (DAC). The sampling frequency of the DAC depends on the clock frequency of the system, which requires the value of the signal output be recalculated for every resulting timestep of the DAC output.
A commercial programable HV amplifier HA51U-3P5 by hivolt.de Gmbh & Co.KG with 3000 V maximum voltage and 10 mA maximum current is used as the power supply, and a custom amplifier board is included in order to fit the maximum output voltage of the microcontroller (3.3 V) to the maximum input voltage of the amplifier (10 V). Since the actuation layers have a capacitance of around 1 nF and the maximum current of the amplifier is 10 mA with maximum voltage of 3 kV, the audio-tactile interface is within the safety limits (20 mA continuous DC current and >100 nF) recommended for DEA applications [26]. Nevertheless, the housing is provided with redundant measurements (distances from HV connections, housing's hard walls) to prevent contact between user, membrane, and HV tracks.
All the tests presented in the paper were carried out on a same DE membrane (plus, an additional replica of the DE system was used for some of the tests described in section 4.5), which proved able to withstand multichromatic voltage loading/user-impressed deformations with the features described in section 4 over an estimated working time in the range 10-20 h.

Experimental results
In this section, we present results from the user interface characterisation. We first present measurements of the linear stroke/force at the end effector in the presence of LF excitations, so as to quantify the DEA's ability to convey vibrotactile stimulations, and the acoustic response to HF excitations. We then investigate the sensing performance in terms of the device ability to recognise user touches with different frequencies and intensities. We show that the DEA is able to provide combined audio-tactile feedbacks through a series of multifrequency excitation profiles. Finally, we present user tests that prove that the interface can provide perceivable and recognizable feedbacks.   With the aim of characterising the device performance as linear actuator, we measured the DEA free displacement and blocking force in a frequency range on the order of 0-10 2 Hz (figure 7). Free displacement tests were performed by applying a voltage sweep on the DEA, and measuring the resulting end effector displacement with a micro-epsilon optoNCD11402 laser sensor. Measurements were performed on the DEA assembly alone (without housing) and on the assembled system (with housing). Results in figure 7(left) show the device response to a signal with peak-to-peak amplitude between 0 and 2.5 kV (top) or 1.5 kV (bottom) respectively. As expected, the displacement is significantly reduced for lower excitation amplitudes (owing to the quadratic dependence of the Maxwell stress on the voltage, cf (1)). At low frequencies, the displacement of the assembled interface is lower than that of the free DEA. This is a consequence of the stiffness of the textile layer mounted on top of the DEA's end-effector, which is initially slack but gets stretched when the DEA moves out-ofplane. In static conditions, the displacement of the free DEA is of 0.47 mm (at 2.5 kV peak-to-peak excitation), whereas the initial out-of-plane bias displacement of the assembled unit is roughly 5.25 mm. The highest peak in the free displacement response corresponds to the natural frequency of the pumping motion of the end-effector (against the DE and the end effector stiffness), whereas secondary peaks owe to the DEA nonlinear response. Because of the additional stiffness of the acoustic textile, the pumping motion has higher natural frequency and lower amplitude in the assembled version of the prototype. The displacement of the DE with acoustic textile in static conditions is 0.18 mm (at 2.5 kV peak-to-peak excitation).

LF force/stroke response
The blocking force produced by the device in the LF range is measured using a loadcell ME-Meßsysteme KD40s ± 5 N, connected via a rigid spacer to the DEA end effector. Different measurements are performed by changing the axial distance between the load cell assembly and the DEA holding frame, so as to impress different pre-loads on the load cell. Figure 7(right) shows that the amplitude of the generated force is constant over the considered frequency bandwidth (deviations from a flat trend in the final portion of the range are ascribable to mechanical resonances introduced by the load cell, and are not representative of the actual device response). The output force is nearly constant in a relatively broad band of pre-loads (0.1-1 N), causing a modest axial movement of the end-effector (<1 mm, see figure 6(top)). Further increasing the pre-load (2 N) causes a significant downwards motion of the end-effector, with a consequent reduction in the output force (as the membrane approaches the flat mechanically-singular configuration).

HF acoustic response
We measured the sound pressure level (SPL) produced in different operating conditions and at various locations, by installing the DEA in a custom built sound absorbing box (see [19]) using calibrated microphone RODE NT-USB placed at a distance of 0.3 m from the device. Acoustic measurements were performed with the device installed on a support frame (with the device axis parallel to the horizontal plane), that allows varying the relative angle between device and microphone. The device was driven with HF voltage sweeps for different bias voltages (in the range 1.5-2.5 kV) and amplitudes (100-300 V, i.e. much lower than the bias voltage). An overview of the SPL response of the device is shown figure 8. The device produces low or no acoustic output in the range below ∼800 Hz. The SPL response shows a peak in the neighbourhood of 1 kHz. As observed in [19], this corresponds to the natural frequency of the first structural vibration mode of the DE membrane (characterised by little or no motion of the end effector, and a bubble-like deformation of the membrane lateral surface- figure 2). This can be regarded as a 'cut-in frequency' for the device acoustic response, i.e. the device is able to steadily provide acoustic outputs above 70 dB at frequencies higher than the natural frequency of the first structural mode (except for some anti-resonance over narrow frequency ranges). This frequency range is well above the passband of the pumping motion used to generate tactile stimulations (see figure 7 left), and above typical perception thresholds for human fingers [27]. As a result, exciting the DEA in this HF range is expected to produce no sensible tactile stimulation.
By increasing the bias voltage and/or the HF excitation amplitude, the SPL increases accordingly. In particular, increasing the bias voltage, the elastic stresses on the membrane decrease (due to Maxwell stress), and the peaks in the SPL shift toward lower frequencies.
To assess the influence of the housing on the acoustic response of the interface, we compared the SPL produced by the assembled device with that of the sole COP-DEA installed outside of its enclosure (figure 9-left). The SPL response of the free COP-DEA (without housing) differs from that of the assembled system in that (a) The assembled interface features higher SPL at low frequencies, and similar (or lower) level in the HF range; (b) The response of the free DEA shows a clear prominent peak in correspondence of the (cut-in) natural frequency of the first structural mode, as opposed to a flatter resonance response for the assembled unit;  (c) The abscissa (natural frequency) of the first resonance peak is lower for the free DEA.
These differences are due to the geometry of the housing, that creates a nearly-closed volume of air underneath the DE, located between the DEA bottom face and the internal walls of the enclosure. The housing structure behaves as an enclosure for the DEA membrane, preventing acoustic short circuits between the top and the bottom faces of the vibrating membrane, hence increasing the SPL generated in the LF range. Increases in the natural frequency of the structural mode are consistent with additional stiffness contributions due to the compressibility of the enclosed air volumes (see supplementary material). Loss of SPL at higher frequencies are due to shadowing effects caused by the top wall of the housing.
A study of the device directivity was eventually carried out. We measured the speaker off-axis response, by rotating the device with respect to the microphone (figure 9-right). The SPL decreases with the angle, experimenting a reduction of up to 10 dB at 90 • . This result is in accordance with simulation studies presented in [28], which proved that COP-DEA speakers have directional response, characterised by shadow regions at 90 • radiation angles.

Multi-mode multi-frequency response
To show that the DE interface can concurrently produce multiple outputs (namely, LF tactile stimulation and HF acoustic outputs), we drove the device with different voltage inputs, in the form given by equation (2), characterised by different combinations of LF and HF components. In figure 10 we show the device output in response to three different excitation waveforms: (a) a waveform which solely consists of a LF square waveform, meant to provide users with a variablefrequency vibrotactile stimulation (top row); (b) a waveform which only includes HF harmonics, resulting in the execution of a jingle with no tactile stimulation (bottom row); and (c) a composite LF + HF signal, providing the user with a simple audio-tactile multi-sensory stimulation (central row). The tactile LF excitation waveform (top row) consists in a square wave signalū with swept frequency (rising from 1 to 90 Hz and then decreasing back to 1 Hz) and amplitude between U min = 200 V and U max = 2100 V (with no HF superposed signal, U a = 0). The purely HF excitation signal (bottom row) is a waveform ⌣ u with bias voltage U max = U min = 2300 V and amplitude V a = 300 V, rendering a soundtrack. The composite audio-tactile excitation waveform (middle row) is a LF square waveū (fundamental frequency 2 Hz, duty cycle 50%, U max = 2500 V; U min = 700 V) with a superposed 850 Hz HF pure pitch ⌣ u, with slowly varying amplitude, which changes synchronously withū: U a = 220 V whenū is maximum, and U a = 66 V whenū is minimum. The resulting output sound is a high-pitched syren alarm, whose beat is synchronised with the tactile stimulation.
In addition to those discussed in figure 10, other multichromatic excitation waveforms were coded on the DE button prototype, as shown in the supplementary video.
We performed blocking tests, in which we measured the force produced by the interface against a load cell, mounted against the device end-effector with a pre-load of 1 N (in the same fashion as figure 7 right), and free displacement tests, in which we measured the end-effector displacement with a laser (figure 7 left). The plots in figure 10 show the timeseries of the excitation voltage, the force produced in blocking conditions, the free displacement and SPL generated in free displacement conditions.
The interface consistently produces outputs with forces on the order of 0.2-0.4 N, free displacements of 0.1-0.2 mm and acoustic pressure outputs of up to 1 Pa amplitude (roughly 90 dB). It can be noted that in the purely tactile working mode (no HF component, U a = 0) an acoustic output is also produced in addition to the LF force and displacement outputs.
This has smaller intensity compared to that observed in the other two cases (where sound is generated by a HF smallamplitude excitation), and it is ascribable to higher-order harmonics in the driving signal componentū as well as structural vibrations in the moving parts (e.g. the metallic NBS) due to the steep rising/falling times of the square excitation waveform. For the purely acoustic mode (bottom row), very small vibrational feedback (displacement and force variation) is measurable, as the excitation frequency is well above the resonance frequency of the pumping motion. Such residual axial movements/forces fall above the perceivable threshold for human tactile sensitivity [29]. The combined audio-tactile excitation scenario (middle row) consistently produces LF force variations close to 0.5 N and acoustic pressure with peak amplitude over 0.5 Pa.

Sensing performance
We performed an assessment of the sensing capability of the user interface (DE sensing layer + sensing electronics/logics), by evaluating its ability to detect and count sequences of user touches (with different intensities and frequencies). Figure 11 shows the sensor output signal (which has the dimension of a frequency, and is inversely proportional to the charging-discharging time of the sensing circuit in figure 3) in response to a sequence of touches impressed on the unit by two different users (left and right column). Each plot reports two sequences of deformations impressed by a same user (red and blue lines). The sensing logics were executed on the microprocessor using three different threshold values in parallel, which were used to count the number of touches. The logics count one touch each time the sensing signal crosses the threshold while decreasing. The results show that the sensing layer and electronics produce a smooth output signal, which in turns allows obtaining an accurate count of the number of crossings of the threshold values. Different thresholds consistently lead to the detection of a different number of touches: the lowest value of the threshold allows detecting touches of small intensity, but it does not allow distinguishing among consecutive touches unless the user completely releases the endeffector (letting it go back to the initial position) in between consecutive touches. The highest threshold, in turn, allows recognising consecutive touches even when the user does not release the end-effector completely, provided that each touch has sufficient intensity. Wrong counts (in which a wrong estimate of the number of touches was provided, even with the maximum/minimum intensity of the touch lying within the selected thresholds) only happened in a few cases, during which the peak values of the sensing signal fell close to/in correspondence of the threshold values.
For the sake of the tests described in sections 4.4 and 4.5, we set the threshold to the intermediate value among those shown in figure 11 and set a second threshold close to that to prevent wrong counts.

User tests
We combined sensing and multi-sensory actuation capability by programming the DE interface in a way that it produces different outputs (namely, the three waveforms discussed in figure 10) in response to different users' inputs (number of touches), that are detected using the logics discussed in section 4.4. Each output is triggered by pushing the interface end-effector a different number of times: the sensing logic recognises the number of touches impressed by the user and produces a different vibro-tactile routine accordingly (see supplementary video).
To show the performance of the resulting integrated demonstrator, two different types of user tests (with 14 volunteer subjects) are carried out.
In a first test, the subjects are asked to push the button a prescribed number of times, so as to trigger a response, and keep their index finger on the interface end-effector during the successive execution of the feedback routine. For each of the routines described in figure 10, users are then asked to rate the intensity of the tactile and acoustic feedback of the device on a scale from 0 (weak) to 10 (strong). Sensing was automatically tested by recording the actual number of touches detected by the interface in response to the intended number of user's touches (for which a success rate of 100% was achieved).
A boxplot with median, 25th and 75th percentiles, and the most extreme data points for the rating of the users perceived intensity (determined using the iqr command in Matlab) are shown in figure 12. Ratings assigned by users ( figure 12) demonstrate that the interface produces an appreciable level of tactile and acoustic feedback. Excitation with a LF variablefrequency signal, with ⌣ u = 0 (same waveform as in figure 10top), was also reported to produce a sensible level of acoustic feedback (left column in figure 12). As already mentioned  with reference to figure 10, this is due to high-amplitude higher-order harmonics in the excitation (that consists in a sequence of square pulses) and vibrations of the spring and the housing. In the purely acoustic working mode (where the sole HF excitation ⌣ u is applied), users consistently reported no or very low levels of haptic feedback, possibly due to an illusory perception of motion triggered by the acoustic feedback (right column in figure 12). In the case of a composite audio-tactile actuation (with LF tactile stimulation at 2 Hz, and high-frequency sound feedback at 850 Hz), both stimuli (tactile and acoustic) were reported to be clearly perceptible (central column in figure 12). Vibrotactile intensity has been reported to be higher in the first case (tactile stimulation alone), because of the larger applied frequencies (90 Hz), to which human tactile perception is more sensitive.
A second set of tests are performed with the aim of evaluating the local assignability of the output (vibrotactile, acoustic, or combined) by a user who interacts with different replicas of the DE user interface and, hence, highlight possible perception advantages of a co-located audio-tactile interfaces like the one studied in this paper.
For this purpose, two identical prototypes (DEA + NBS + housing) are built. The two units are located on a same workbench at variable relative distances, and users are asked to place their index fingers on the end effectors of the two interfaces ( figure 13). The two devices are then excited with different (but synchronous) combinations of inputs, built starting from the composite voltage excitation waveform presented in figure 10-middle. We broke the signal into two components (LF and HF), that can be used to separately excite the two units. Each device can be excited with: HF frequency signal (resulting in an acoustic output); LF signal (resulting in a tactile stimulation); a combined multi-frequency signal (audio-tactile feedback), same as in figure 10-middle; no signal. A correction factor is applied on the amplitudes of the HF signal in the different scenarios, so as to ensure that the measured sound output has the same intensity in both cases where it is executed alone (acoustic output only) or in combination with a LF signal (audio-tactile output).
Users are presented eight different repetitions (some of which are executed twice), in which the two devices are excited with different combinations of inputs. They are then asked to recognise what type of output is being produced by each of the units (purely haptic, purely acoustic, audio-tactile, no output) every time. The eight permutations of applied inputs are shown in figure 14. The first repetition (row 1, column 1) corresponds to a case in which the device on the right produces no output and the device on the left produces a purely acoustic output; in the second repetition (row 1 column 2), the left DEA produces a purely acoustic output while the right DEA produces a purely tactile output, etc. All eight permutations are repeated (in a random order) at three distances between the devices (10, 20 and 30 cm), the case with minimum distance (10 cm) corresponding to a scenario in which the sides of the two devices nearly touch one another.
Results from the tests are processed and reported in figure 15 in the form of confusion matrices, which count the number of times a user perceived a certain type of stimulus (haptic, acoustic, no stimulus) on the correct side (left, right): different matrices correspond to tests with different distances between the two DEA button units. Entries in each matrix correspond to cumulated values for the left and right unit. Elements on the rows represent the actual stimulus supplied to the user (e.g. non_l means that no feedback was applied on the left device, haptic_r means that the stimulus applied on the right unit had an haptic component, etc); elements on the columns represent the feedback reported by the user (e.g. non_l means that the user perceived no feedback on the left finger, haptic_r means that the user perceived a feedback containing a tactile component on the right side, etc). The elements on the diagonal correspond to correct associations between the supplied stimulus and the stimulus reported by the user. Different signals combinations (among those in figure 14) produce a different number of entries in the matrix, because of the composite nature of the signals (which, in general, contain tactile and acoustic components). Details on how the matrices have been populated starting from the users' answers are reported in the supplementary materials. The percentages associated to the different rows express how often a given stimulus was correctly classified by the users. Conversely, the percentages associated to the columns express how often a given answer provided by users corresponded to the actual supplied stimulus.
Despite the complex nature of the task (note indeed that users were asked to concurrently identify stimuli received by two units at the same time), the average percentage of correct assignments (diagonal elements) is significantly higher than that of misinterpreted inputs. As expected, this holds especially true in cases where the two devices are located at a greater distance, rendering the collocation task easier for the user. Most common confusion scenarios are related to the assignment of a sound source (left vs right), with results that consistently improve as the distance among units is increased. In particular, in the scenarios where a vibrotactile feedback and an acoustic feedback are applied on different units (e.g. top right case in figure 14), users tend to wrongly assign the location of the acoustic stimulus with higher probability (i.e. they locate the sound source on the same side as that of the tactile stimulation). This is a natural result of cross-modal associations between sound and tactile perceptions, reported in the past by specialistic perception studies [29,30].
Despite the abovementioned effects, the presented results prove that users are generally able to correctly collocate different (or composite) stimuli in space. This offers an even stronger motivation in favour of the co-located feedback generation approach pursued Confusion matrices for the user tests with two replicas of the DEA button, located at a distance of 10 cm (a), 20 cm (b) and 30 cm (c) from one another. The matrix elements correspond to different types of stimuli (haptic, acoustic, no stimulus), elements on the rows stand for the actual feedbacks generated by the units, whereas elements on the columns represent stimuli perceived by the users. Elements on the diagonal denote a correct association between supplied stimulus and user's answer. Entries in the matrix correspond to the number of times a certain combination was reported by users. Percentages on the different rows quantify how often a given stimulus was correctly identified by the user. Percentages on the columns indicate how often a certain feedback category reported by users matched the actual applied stimulus.
by the presented DEA button interface. Being able to provide co-located vibrotactile and acoustic feedbacks (with both stimuli generated at a same spatial location), the proposed DE user interface might find applications in VR for multi-sensory rendering of localized scenes.

Conclusions
In this paper, we designed, built and characterised a multimode DE user interface which can recognise push inputs from users and produce tactile, acoustic, or combined audio-tactile stimuli. The core element of the interface is a multi-layer DE membrane, which comprises a set of layers devoted to actuation purposes (tactile and audio feedbacks), and a layer that allows continuously monitoring the capacitance and detect users' touches. Building upon a recently proposed multimode operating principle, tactile stimulations are generated by inducing a LF linear pumping motion of an end-effector connected to the pre-loaded DE membrane, whereas acoustic outputs are generated exploiting HF higher-order structural vibration modes of the DE membrane in the kilohertz range. Combined audio-tactile stimulations can be produced by concurrently exciting different vibration modes of the DE membrane (LF pumping motion and HF structural modes) via a multichromatic voltage excitation.
Compared to our previous works, where we provided a general proof-of-concept of multi-modality in DEAs through the exploitation of different vibration modes, here we specifically set the focus on user-interface applications. In particular, we proposed a DEA embodiment that allows performing advanced sensing tasks (through a dedicated sensing layer working at low voltage), we improved the system acoustic design, and we validated its ability to provide audio-tactile feedbacks through interaction tests with users.
The developed DE interface has dimensions in the centimetre scale and it is installed into a holding structure that has the double aim of guaranteeing a safe interaction with users and improving the midrange acoustic performance. We equipped the interface with custom sensing electronics, and embedded sensing and driving logics onto a compact microcontroller.
We characterized the prototype performance in actuation tasks at LF, measuring blocking forces on the order of 10 −1 N, and its acoustic response to HF excitations, measuring SPLs over 90 dB at a distance of 0.3 m from the device. We then characterised the performance of the interface as sensor, in combination with custom sensing logics and electronics, and showed that the device can detect and correctly count consecutive user touches (causing capacitance variations on the order of 100 pF) at a frequency larger than 1 Hz.
To showcase the device ability to concurrently perform sensing and multi-sensory stimulation tasks, the device is programmed to perform different working routines (resulting in purely tactile stimulation, purely acoustic feedbacks, or a combination of the two) in response to different user inputs (number of touches impressed by a user). We evaluated the subjective perception of such feedbacks via user tests, in which we asked a set of users to rate the stimuli intensity. In addition to that, we performed a set of tests aimed at highlighting the added value of a producing acoustic and tactile stimulations in a co-located fashion (as achieved by the proposed DEA interface). In such tests, users were asked to concurrently interact with two replicas of the user interface, and they were asked to recognise the different stimuli (tactile or acoustic) provided by each of the two devices. In the case of multiple combined stimulations coming from different devices, users proved able to discern the different sources of the stimuli in more than 80% of the cases (even with the two devices located side by side, providing different stimuli at the same time), hence strengthening the motivation for a co-located design, in which audio and tactile stimulations are generated by a single active DE unit.
In future works, more advanced designs and applications of the proposed audio-tactile interface will be explored. Advanced designs might aim at the development of more compact DE units, in which the structural elements, the biasing mechanism, and the acoustic enclosure are optimised and downscaled, or alternative designs based on arrays of multiple DE elements. Possible applications include portable audiotactile communicators, which might be worn by users. This might be achieved by resorting to fully-polymeric designs that make use of flexible structural and preloading mechanisms for the DEA, and compact flexible power/sensing electronics that would allow, among other, integration onto garments.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.