Pre-processing data and window function testing on wave spectrum analysis

Pre-processing data in oceanographic data research is an important step in processing data into the required information. This phase often takes longer and more rigorous time to present the correct data. Data pre-processing involves the selection of data, and the modification of data so that it can be readable in a computational algorithm. The data selection process includes selection of sample numbers, correction and filling of data gaps.The data used in spectrum analysis is the recording of the water surface profile of the wave in the data resolution of 1 second. The data needs to be corrected by the process of detrending against the tidal height which become the still water level. The spectral analysis is processed using the pwelch function on MATLAB. The frequency spectrum curve of the measurement is obtained from the calculation of the power spectral density value. In order to obtain a stable power spectral density curve, the Hann window function is used with the number of window elements, nfft/16.


Introduction
The data read by measuring device is not always directly usable.Sometimes there is noise obtained during measurement.The noise may be caused by disturbance by activity around the recording device, or environmental anomalies, or the device's failure to function properly due to the power instability or due to limitation in the sensor used.Commonly known, there are several methods in measuring waves in the sea [1], among others: by using an accelerometer on a buoy, measuring electrical capacitance using wave staff, echosounder installed in reverse at the bottom of the water, the acoustic doppler current profiler (ADCP) for identify the orbital motion of water particles due to the wave movement, using pressure sensor [2], or using ultra-sonic sensors like the device developed by Adrianto [3].
In this article the author will share the process in the early stages of data processing performed for spectral analysis of wave measurement data carried out by Adrianto in 2016 in the West Jave Sea [3].The wave recording device developed is the modified ultra sonic device (MUSD) which operates on the principle of sending ultra sonic acoustic waves from the transducer to the surface of the water.Then the height of the surface profile of water is measured based on the distance between the water surface and the transducer read from the travel time of the acoustic wave passes back to the receiver.The MUSD is designed to measure data at sampling frequency of 6 Hz, meaning that in 1 second, the device will record as many as 6 times in the same time interval.The objective is to capture the wave cycles in the highfrequency component, or more precisely the wave frequency components generated by the wind.This  In FFT analysis, the Nyquist frequency is known, which in order to identify a particular frequency component, requires a sample frequency that is greater or at least 2 times the highest frequency.For example, to examine a wave with a frequency of 1 cycle per second, then a sample frequency of at least 2 cycles per second is required (or 0.5 Hz).Thus, in order to review components of sea waves that have higher frequencies, a larger data recording interval is required.Recording carried out by Adrianto [3] with a sampling frequency of 6 Hz, would ideally be able to identify up to a frequency of 3 Hz.Where waves with a frequency of 3 Hz are capillary waves that appear to restore the equilibrium to surface water with the force of gravity.However, before the data is used, it is necessary to check whether the entire data can be used directly, or whether modification processes need to be carried out on the data such as smoothing, cutting, and filling.

Data Inspection and Editing
Data is surface recording of wave water with a resolution of 6 Hz.Where in 1 second consist of 6 data with the same time interval (1/6 seconds).However, most of data indicates a recording time gap with an irregular row, as shown in Figure 2. To maintain the reliability of data and prevent misinformation about the presence of other wave frequency components, only data with an interval of 1 second is used.Meanwhile, to the fill the gap between data in seconds to i and i+1, linear interpolation is performed on data time rows.Moreover, if the data is found to be more than 1 second apart, then the data at a time that does not have information will be positioned according to the still water level (which is zero elevation in measuring the variance of water level).The data is then modified in the water surface's high vector array and saved in the *.csv file.

Detrending Process to wave surface profil
It should be remembered that the analysis of the wave spectrum is based on the deviation of the water level at a time from the average water level.However, it is known that the water level always varies according to the cycle of the wavelength of tides.Assume if the sea turns calm, we will find that the height of the water surface is the sea level due to tidal factors alone.Thus, the middle water level of a pulse movement of the up and down waves more accurately indicates the position of the calm water level, rather than the average water level.This is especially true of a data set that has a duration of more than 4 hours (considering the shortest tidal component periods which are influenced by the sun's gravitational force in shallow waters [4]).
Figure 4 shows a plot of data measurement of water surface profiles at the Pabelokan wave measurement station [5] whic is overlaid with tidal water level data (i) published by the Geospatial Information Agency [6]; and (ii) Naotide tide predictions (credit: National Astronomical Observatory of Japan).The best tidal data should be provided by provided by the measurement staion.However, the availability of tidal data at Cilegon Sation is far from the location of wave measurements at Pabelokan Station, and is not in an area that geographically can represent each other.Therefore, water level recording data from the Cilegon tidal station does not represent the still water level at Pabelokan Station.From the results of tidal prediction using the Naotide program, it was found that the water level was exactly at the center line between the height of the rising and falling wave water surface profiles.This also adds confidence to the water surface profile recording data for wave analysis used in this study.Another use of aligning recorded water level data in wave and tidal measurements is to define water level heights in a time series where data is missing.This is illustrated in Figure 5 where there are no measurements from 16:00 to 17:00.In order to maintain the length of the data, then the information at that time must be filled with an altitude data.The most realistic things is to fill this information with data on the condition of still water level.This is done so as not to cause miss information from entering any data.

Figure 5. Plot data profile surface wave water and tidal conditions 1 minute interval over a duration of one days
The detrending process is performed against the data time series to eliminate the trend and periodic cycles contained in the data.The result of the detrending process is to calculate the deviation between the height of the water surface profile and the average value, or the value at the pulse reference height at rest.The result of the detrending process of wave water surface measurement data against tidal data are presented in Figure 6b.For short data (within a duration of less than 1 hours), this detrending process can be performed against the mean value by ignoring the error value that may occur even if the value is small.

Pwelch Function on Power Spectral Density
Wave spectrum analysis is based on the form of power spectral density (PSD) [2] using the Ocealyz Program [7].Power spectral density is a measure of the power distribution of each frequency component of the wave composer, expressed in the unit "power per frequency" or "power per radians.Mathematically PSD is expressed in notation: Notation   is the amplitude value of the pulse of a wave in the frequency domain, so that the numerator term of the equation shows the power value (amplitude^2) of the wave per frequency bin.Computation process to convert waves from the time domain to the frequency domain is done using the

Missing data
Fast Fourier Transform technique (FFT).One of the algorithms commonly used in calculating power spectral density is the Pwelch function [8].The Pwelch function uses the welch method to divide wave pulse into several overlapping segments.In order to reduce the loss of spectral information on the edge of the window, a data set overlapping between the window segments is performed.The process of transforming data into periodograms is done with discrete Fourier transform in the FFT algorithm.Each periodogram is then synthesized to obtain a PSD estimate.The advantage of using the Pwelch method is that it can reduce the spectral leakage that becomes readable on other frequency bins when analyzing signals in a limited duration.So the use of the Pwelch method can be useful to give better stability to the obtained PSD curve.

Windows Function on Power Spectral Density
Recording of the wave profile in a series of pulses in a series of pulses is carried out for a limited time.It is assumed that the wave pulses will have a repeating shape after a span of observation time.It is realized that this can cause wave discontinuity at the edge of each window segment.The result of this wave discontinuity can cause a leak of the energy contained in each frequency bin to another frequency bin, a phenomenon called a spectral leakage.The window function is applied to a time series signals as a consequence of using samples within a limited time.The window function has a role as a weighting function which will gradually weaken the signal amplitude to a value of 0 on the edge of the window.Thus, there is no significant frequency change due to sudden changes in the data.
There are many window functions that have been developed, each of which will provide different frequency response characteristics.In simple terms, the form of window function is classified into three main categories, namely: rectangular window functions, the sharp decreasing function called the tapering windows, and the cosine-shaped functions.The advantage of a rectangular window is that it can maintain the amplitude and wave energy, while the advantage of cosine and tapering windows is that they can shrink sidelobes in the frequency response to prevent spectral leakage.Meanwhile, the weakness of the cosine and tapering windows generally give smaller amplitude values to the spectrum and cause the loss of some detailed power spectrum information and frequency resolution.One of the basics for selecting a window function is based on the resolution and dynamic range produced by a window function.Resolution means the ability of the frequency response to show the difference between two adjacent frequencies.Dynamic range is the ability to detect weak frequency components in the presence of strong frequency components.A rectangular window a window that has the highest resolution and lowest dynamic range.This means that the rectangular window is good at distinguishing components of the same amplitude even if they are at very close frequencies, but bad at distinguishing components with different amplitudes even at distant frequency.In contrast to the rectangular window type, the cosine window type has the lowest frequency resolution but the highest dynamic range.An example of a taper type is a Bartlett window, while a cosine type has a shape that resemble a bell function like Gauss, Hamming, Hann, and so on.
In addition to the resolution and dynamic range, in selecting the window type it is also necessary to consider the stability of the spectrum estimate and its spectral leakage rate [9].Stability is indicated in a condition to which spectrum estimates obtained from different segments can be aligned.Stability also implies smoothing on structures that are irrelevant to remove.Between these resolution and stability parameters is a contradictive consideration, where high stability is generally obtained from averaging a large number of periodograms, resulting in resolution reduction.The last consideration on the choice of window type is how much energy leakage can be minimized.
In spectrum analysis with power spectral density, the window function has an important role in obtaining PSD graphs.The Pwelch method divides the data into several segments for later averaging.Each segment is given a weighting function by applying a window function to avoid spectral leaks and obtain PSD information as expected.In this test, six window functions were applied to observe whether the selection of the window functions would have a significant influence on the spectral analysis of the wavefront profile.The six function windows tested are Rectangular, Gauss, Bartlett, Barthann, Hamming, and Hann windows.The frequency response given by the six window types is shown in Figure 8.The equation that forms the window weighting function are as follows [10] [11]: a. Rectangular b. Gauss inversely proportional to the standard deviation,  ( = /2).In general, the characteristics that can be seen from each type of window above are as follows [4].In a rectangular window, the level of the first side lobes are 22% of the main lobes.Whereas in a Hann window the level of the side Lobes is much higher than that given by the rectangular window and results in a wider main Lobes.Meanwhile, the Hamming window produces a very low decrease in the level of the first side lobe (< 1% of the main lobes).However, after a very significant reduction in the level of the first side lobe, in the next side, the level of side lobes is not significantly reduced.On the Barlett window, it produces a narrower width of the main lobes, and a much smaller height of the side lobes than the rectangular window, which is about 5% of the heights of the main lobes.Referring to [9], to compare the effects of window types can be observed from the following characteristics: (i) normalized half main-lobe width, (ii) first side-lobe level, (iii) maximum side-lobe level, (iv) ratio of main-lobe energy to total energy, and (v) rate of fall-off side-lobe level.

Window Function Settings Application in Power Spectral Density
The pwelch function used in calculating the power spectral density has default setting, namely: each segment applies the Hamming window weighting function; the amount of data in the FFT (nfft) process is 256 or the next power of 2 greater than the length of the segments; overlapping data between one segment and the next is 50% of the window length.
The application of the pwelch function by default to form a PSD curve from the wave surface profile data is shown in Figure 9.Both images show PSD curves for each day, namely August 23 th and August 24 th .The PSD curva on each day is then represented with a PSD average of 24 curves (24 hours) on that day, which is shown with a thick red line.It appears that high frequency resolution is unfavourable to form a smooth and well-defined PSD curve.In order to get a form of PSD curve to be the ideal formula, then a simple but firm curve shape is needed.Therefore, it is necessary to test to select the window type and set the number or element in the window averaging.
The testing of the PSD curve shape in varying window types and the number of elements in each window segment were performed on the result of PSD curves with the most dynamic values, namely the PSD curve on 23/08/206 at 21:00 and on 24/08/2016 by 22:00.The smooth and firm PSD curvature shape is expected to provide the highest reliability value that can represent spectrum estimates.Thus, to obtain a smooth PSD curve from the processing of the pwelch function, it is necessary to carry out the smoothing process which here uses the moving average technique.In Figure 10, it can be seen that the smaller the number of window elements used, the smoother the dynamic shape of PSD.This is because decreasing the number of window elements will cause an increase in the number of averaged windows, and will lead to a decline in the frequency resolution.This technique is done to achieve smoothness and statistical stability of spectrum estimates [9].The element window nfft/16 means that from a long set of data is divided into 16 windows, where the default of this pwelch function divides a set of data into 8 windows.By minimizing the window element, its frequency resolution value will decrease and can cause the height of the PSD in each frequencies bin to decreases.The setting of the value of a window element also cannot be immediately minimized, it is necessary to select a value of window element that can emphasize the shape of the PSD curve but also does not have a major consequence on the loss of amplitude power.Based on testing in Figure 10, it was finally determined that the number of window elements nfft/16 showed the most realistic shape.
In Figure 11 and Figure 12, comparisons were made of PSD results on data for August 23 th 2016 (21:00) and August 24 th 2016 (22:00).Each data was applied to the six types of spectrums that were tested, namely: Rectangular, Gauss, Bartlett, Barthann, Hamming, and Hann.The selection of the six types of window to be tested is random from the many types of window available.The only consideration is to give a representation of the general shape of the window type, for example, the triangular shape group represented by Bartlett, the power-cosine group of windows represented By Hann and Hamming, and the Parabolic group is by the Gaussian window.From testing the six-window type, it was found that the rectangular type gave the most deviating results among the other types.This is normal because the rectangular window does not provide weighting to modify amplitude values for limited data.It can be seen that the frequency response provided by the rectangular type has the smallest power reduction.On the other five types give the same tendency to the results of the PSD curve.If these PSD curves are enlarged, as shown in Figure 11.b and Figure 12.b, then there seems to be a small difference that each window type produces.Nevertheless, among the five types, there is nothing significant in terms of reducing the power spectrum value.In spectral analysis applications to form frequency spectrum functions of random waves, the choice of window types can be slightly neglected.In essence, we need spectral shapes that has low resolution so that the spectrum curve can be more stable and more reliable in showing spectral estimates.
The selection of window type will generally consider two contradictory compensations, namely the quality of the frequency resolution or the received spectral leakage.The results of using the window function can be estimated based on the given characteristics.For the purposes of frequency resolution expected to be smaller, then can use the window function which has a narrow main lobe.On the other hand, window functions that have wider main lobes, it will produce high frequency resolution.Meanwhile, to obtain a low spectral leakage estimate, select the window functions with a low side lobes level and a high roll-off rate, whereas a window function with a high level of side lobes with a low rolloff rate results a high level of spectral leakage [12].To obtain good stability in the spectrum curve, a low frequency resolution is required, then select a window type with narrower main lobes width.So based on the window type applied in this study, then the Hamming, Hann, Barlett, and BartHann window types have the same main Lobes wide.So, these four window types can be taken into consideration.In line with this, in applications for spectrum analysis of ocean waves, the Hamming window or Hann window type is generally used for spectrum analysis of random waves [13], [14], [15].The results of the implementation of the Hann window function on data dated August 23 th 2016 and August 24 th 2016 are presented in Figure 13.It appears that the PSD curve has become smoother and firmer to serve as a reference in providing estimated power spectrum values.Even though it appears in the data, there is a PSD curve that deviates significantly from the shape of its constituent, as shown in PSD 23/08/2016 (21:00); PSD 23/8/2016 (21:00); 24/08/2016 (21:00); 24/08/2017 (22:00); 24/8/2017 (23:00).On a curve which deviates substantially from the constituent, then it should be considered as an outlier value that does not represent PSD under normal conditions.So, it can be considered to be ignored in the next analysis process.

Conclusion
The window function has an important role to play in preventing spectral leaks to other frequency bins.This function is applied as a consequence of wave data in limited measurement.In order to obtain a more stable PSD curve, it is necessary to estimate the value of a smooth and firm PSD Curve so that it does not produce a dynamic value in the frequency space.Nevertheless, the variation in the use of the window type does not result in significant differences between one and the other.The number of elements of each window is quite influential in producing PSD values according to the purposes of the research.In this test, the best PSD result is obtained from the Hann window function type with the number of window elements nfft/16, or the amount of data in fft processing divided by 16.
1298 (2024) 012037 IOP Publishing doi:10.1088/1755-1315/1298/1/012037 2 component is important because in the analysis of random waves in a short time.The wave generated by the wind dominate every occurrence of waves event at sea.Note on Figure 1, the condition of the waves generated by the wind forming wave in a frequency component of around 0.05 Hz to 8 Hz.

Figure 2 .Figure 3 .
Figure 2. Loss of data from measurement

Figure 4 .
Figure 4. Plot data profile surface wave water and tidal conditions 1 hour interval over a duration of six days

Figure 6 .
Figure 6.a) Illustration of detrending on data with uptrend; b) Detrending data profile surface wave water against the tidal elevation

Figure 7 .
Figure 7. Illustrate the use of window functions

Figure 8 .
Figure 8. Frequency response for varying window function type

Figure 9 .Figure 10 .
Figure 10.Test the variation of the window type and the number of window elements

Figure 13 .
a) PSD curve using Hann window function, number of window elemets is nfft/16 for data on August 23 th 2016; b) for data on August 24 th 2016