An approach for automatic construction of the wavelet-domain de-noising procedure for THz pulsed spectroscopy signal processing

De-noising of terahertz pulsed spectroscopy (TPS) signals is an essential problem, since a noise in the TPS data samples makes correct reconstruction of sample spectral dielectric properties and internal structure challenging. It is especially important for the spectral regions where detector sensitivity is typically low. A lot of effective techniques for 1D and 2D signal de-noising based on the signal processing in wavelet-domain have been developed in recent times. The present work demonstrates the ability to perform effective de-noising of pulsed spectroscopy signals using the algorithm of the Fast Wavelet Transform (FWT). The results of optimal wavelet basis selection and the results of adaptive wavelet-domain filter selection are reported. The performance of the wavelet-domain de-noising algorithm implementation is also discussed. A technique for automatic construction of the wavelet-domain de-noising procedure is offered.


Introduction and background
Terahertz (THz) pulsed spectroscopy [1] (TPS) is an essential and effective tool for measuring the THz dielectric properties of a sample, including spectral dependencies of complex dielectric permittivity of a homogeneous sample [2]- [4] and permittivity and conductivity profiles characterizing the internal structure of an object [5]- [7]. The last technique, which has been described in the paper [7] for the first time, is called THz tomography or T-ray tomography.
In TPS the sample of interest is radiated with a short pulse of THz radiation, which form is close to single electrical field oscillation. After transmission through the sample or reflection from the sample surface, the electric field is detected with very high time resolution: it reaches for the TPS system utilized in the present work. Then the fast Fourier transform (FFT) algorithm is applied to analyze spectrums of the detected THz waveforms and reconstruct the THz spectral dielectric properties of the sample.
Regardless the type of the data we want to extract from the detected THz waveforms, the accuracy of the result is strongly correlated with the signal-to-noise ratio. The time-domain signal-to-noise ratio is constant for all of time-domain delays , but the signal-to-noise ratio in the Fourier-domain significantly depends on the frequency of the signal harmonics [8]. The power of the detected signal is high for the medium frequency range (for our time-domain system (TDS) [5] system it corresponds to the range from to ), and the is rather high. Depending on the THz pulse source and detector types, and on the number of averaged waveforms, the can be . Otherwise, for the low-and high-frequency ranges (for utilized TDS system these ranges are and , respectively) the is smaller than . Therefore the results of the sample spectroscopic analysis have high accuracy in the region of spectrum corresponding to to and lower accuracy in the and regions. The purpose of the present work is to develop an effective algorithm for THz waveform de-noising, which can help to study the sample properties with maximal accuracy in the entire range of TPS spectral sensitivity. A lot of recent works are dedicated to the wavelet-domain processing of and data [9], [10] and the wavelet de-noising of TPS signals [11]- [13]. The peculiarity of the wavelet transform consists in the existence of a large amount of wavelet-bases and wavelet spectrum processing techniques which can be used for noise suppression. The selection of optimal wavelet basis and the selection of the most effective noise suppressing technique are important problems, which need to be solved during the development of any de-noising algorithm. An approach to the solution of these problems will be described in the present work.
The wavelet analysis is a generalization of the spectral analysis techniques [9]. The typical and the most common technique for the signal spectral analysis is the Fourier analysis, which includes different methods for the direct and the inverse Fourier transform implementation and signal processing methods. The Fourier transform can be described as a projection of a signal on the basis of periodic functions with different frequencies and infinite width: The main drawback of Fourier analysis is poor time-domain localization. For instance, if we consider that the useful part of the signal is presented only in the time interval , and the whole time interval of analysis is , then the can be described as follows: Here we use an assumption of white noise, so the noise power in the frequency-domain is constant and proportional to . Obviously, the more local information we deal with, the less we could obtain. To overcome this drawback, we could use the windowed Fourier transform, but it is redundant and difficult to invert. A better solution is to use wavelets. In contrast to Fourier analysis, the waveletdomain signal processing provides the time-domain resolution, since the wavelet-transform is equal to the projection of a signal on the basis of local functions (wavelets). The discrete wavelet transform (DWT) can be described as the following series representation of a function [9]: where the coefficients of the wavelet-spectrum are defined with an equation: The form of wavelet kernels can vary depending on the type of the basis we use for signal analysis. In the DWT the number of kernels is strongly restricted. All of the kernels are orthogonal to each other: where is a Kronecker symbol. Also, kernels should satisfy the following normalization condition: Note that this condition shows that the projection of a function on wavelet kernels is equivalent to the band-pass filtration of a signal in the Fourier-domain with the frequency of maximal filter transparency equal to . For the DWT and the fast wavelet transform (FWT) algorithms the relationship between scales is presented by the following equation: . A solution for discrete step of wavelets can be derived from the basis orthogonality condition. More details about the DFT and the FWT implementation can be found in [9].
The FWT should use a limited set of wavelet transform bases due to the specific restrictions [9], [10]. The wavelet-domain signal-to-noise ratio is proportional to the following constant:

{
Here is defined the same way as in the equation (2), and is the size of the wavelet kernel. We use the same assumption of white noise. One can compare the equations (2) and (7). Since the width of the wavelet kernel is smaller than the whole time interval of analysis , we can conclude that the wavelet-domain signal-to-noise ratio is higher than the Fourier-domain signal-to-noise ratio . This condition leads to benefits of the wavelet-domain de-noising approach in comparison to the de-noising in the Fourier-domain.
The next part of the paper presents the results of the wavelet-domain de-noising procedure implementation for processing the typical data samples acquired with TDS systems.

Selection of the optimal wavelet basis for TDS system signals representation.
The problem of the optimal wavelet basis selection is an essential one, since the usage of the wavelet basis, incompatible with the TDS signals can cause non-linear signal distortions.
An approach for the optimal wavelet basis selection for TDS signals processing has been presented in the work [11]. The proposed technique utilizes several criteria to evaluate compatibilty of signal and wavelet kernels , including: The results of listed criteria calculation for the TDS system signal, presented in the paper [11], show that the wavelet bases 'bior3.3' is the most optimal for analysis of the considered TDS signals. In the present work we consider the TDS system based on LT-GaAs photoconductive antenna source of THz pulses and ZnTe electrooptical detector of THz field. Our system utilizes ultrashort optical pulses with duration. Obviously, the optimal wavelet basis for the signals of our TDS system can differ from the described one.
We use the following generalized criterion for the selection of several wavelet bases, which are the most optimal for the present TDS system signal processing: where , , and are coefficients, regulating the impact of each described criterion, and ( ), ( ), and ( ) are the normalized criteria for target wavelet basis: Here we use the following relation between the criterion impacts: . After calculation of the generalized criterion (8) for different wavelet bases, we have selected the following set of optimal wavelet bases for our TDS system.
The test signal which has been used for the optimal wavelet basis selection is presented in the figure 1. All the bases of the described set shows high value of the compatibility with the test signal and can provide high de-noising accuracy. We compare this set of wavelets again on the step of the de-noising technique implementation.

Selection of the wavelet domain de-noising method.
A lot of wavelet-domain de-noising techniques exist. Wavelet de-noising procedure is associated with the thresholding of wavelet-domain spectrum coefficients representing a signal : { where is the certain threshold value. Several common techniques for calculation of are described in [9], [10]. In the present paper we consider and compare the following thresholding procedures [9]:  Minimax thresholding ('MiniMax');  Heuristic threshold ('Heursure');  √ threshold ('SqTwoLog'), where -is the length of the certain wavelet level array.
Since the extraction of useful information from registered signal (determination of the sample spectral properties, material parameters) involves the processing of TDS signal in Fourier-domain, the accuracy of spectral characteristics reconstruction depends directly on the accuracy of TDS single signal determination in Fourier-domain . The technique of wavelet-domain de-noising should suppress the noise in the time-domain and in the Fourier-domain, respectively, but it also should not cause non-linear distortions to the Fourier-domain TDS. The criterion for the filtering procedure comparison should be based on the Fourier-domain data analysis.
To compare thresholding techniques the white Gaussian noise with certain standard deviation was added to the test signal . The noised signal and its spectrum are presented in the figure 2. One could notice, that the lines of water vapor absorption located between the frequencies and (figure 1), cannot be detected based on the noised signal spectrum (figure 2). We use several criteria which help us to evaluate the quality of signal reconstruction and to compare the results of denoising with different wavelet-bases and different thresholding techniques:  The standard deviation criterion: where is the frequency in Fourier-domain, ̃ is the initial waveform Fourierspectrum before noise addition, ̃ is the Fourier-spectrum of the reconstructed waveform.
 The line minimum location criterion: where is the location of the water vapor absorption line in the Fourier-spectrum of the initial signal, and is the location of this line in the Fourier spectrum of the de-noised signal.
 The generalize optimal filtration criterion: where and are coefficients to regulate the impact of criteria ( ) and ( ), respectively. We consider equal coefficients here: .
Calculation of the listed criterion has been produced for listed threshold types and for several optimal (table 1) and non-optimal wavelet bases. The results of calculations are presented in the figure 3.