Data driven innovations in structural health monitoring

At present, substantial investments are being allocated to civil infrastructures also considered as valuable assets at a national or global scale. Structural Health Monitoring (SHM) is an indispensable tool required to ensure the performance and safety of these structures based on measured response parameters. The research to date on damage assessment has tended to focus on the utilization of wireless sensor networks (WSN) as it proves to be the best alternative over the traditional visual inspections and tethered or wired counterparts. Over the last decade, the structural health and behaviour of innumerable infrastructure has been measured and evaluated owing to several successful ventures of implementing these sensor networks. Various monitoring systems have the capability to rapidly transmit, measure, and store large capacities of data. The amount of data collected from these networks have eventually been unmanageable which paved the way to other relevant issues such as data quality, relevance, re-use, and decision support. There is an increasing need to integrate new technologies in order to automate the evaluation processes as well as to enhance the objectivity of data assessment routines. This paper aims to identify feasible methodologies towards the application of time-series analysis techniques to judiciously exploit the vast amount of readily available as well as the upcoming data resources. It continues the momentum of a greater effort to collect and archive SHM approaches that will serve as data-driven innovations for the assessment of damage through efficient algorithms and data analytics.


Introduction
Throughout the world, civil infrastructures such as buildings, bridges and tunnels involve substantial investments as they have become significant pillars of modern civilizations. These structures will eventually deteriorate or disintegrate as they are incessantly subjected to weathering effects and operational loads. Defective and unstable operation of these infrastructures can cause economic encumbrance and pose major threats to public health and safety. Hence, the serviceability of these assets should be consistently and accurately monitored and evaluated for risk minimization. For example, localization and detection of structural damage should be performed at the onset of its emergence by using precision equipment. Damage is defined as deviations introduced into a system that can potentially cause harmful effects on both the existing or future performance of that system [1]. It has also been suggested that damage is the occurrence of an imperceptible quantity of micro cracks on a structural element [2]. Traditionally, visual inspections are done to assess the health of the civil structures. However, this method proved to be labor-intensive, costly, and time-consuming. Also, visual assessments can be highly subjective as some damages may be below the surface and less visible producing varying results even by trained inspectors [3].
Structural Health Monitoring (SHM) is a complimentary approach to visual inspections providing precise real-time information about the condition of the civil structure based on quantifiable response parameters such as humidity, temperature, strain and acceleration then converting these into digital format to be delivered to a central data collection station [4]. Central to the discipline of SHM is to monitor and assess the health of the structure based on measurable response parameters as these can ultimately become unstable or faulty due to weathering effects, excessive loads and natural corrosion [3]. The deployment of SHM in monitoring civil infrastructures can prolong the serviceability of the structure and guarantee life-safety standards over their operational lives [5]. Evolving wireless technological advancements and applications in conjunction with micro-electromechanical systems (MEMs) laid the foundation of wireless sensor networks (WSNs). Interest in using WSNs for structural health monitoring has increased as more research demonstrates successful application to increasingly complex structures.
According to [20], accurate and efficient damage detection in long-term health monitoring still encounters many difficulties, and recorded big-data requires efficient new damage detection algorithms. In a novel algorithm described in [19,20], transmissibility along with Mahalanobis distance and Hotelling T-square were used for a study involving a numerically simulated beam and an experimentally tested laboratory structure. A more advanced damage detection and localization technique based on the BAT algorithm is found in [21]. This technique enables not only single but also the detection of multiple damage positions along with the rate of damage. Two recent studies on composite beam structures based on vibration analysis can be found in [22,23].
A common view is that the interpretation of the monitoring data in terms of the integrity of the structure is the key function of the SHM system. The data management section involves the filtering out of extraneous data, direct and remote collection of data, conversion of the data into responses, interpretation of these responses to state the condition of the structure and recommendation for future data collection and interpretation [6]. The captured data, from a structural monitoring perspective, constitute a rich resource for the detection and assessment of the imminent behavior, damage and deterioration within the infrastructure under regular or extreme loadings. The massive amount of data related to long-term SHM systems presents many issues such as data quality, relevance, re-use, and decision support. Additional challenges relate to data scrubbing and robust data management, as well as the requirements for increased memory, storage and computing power [7].
Over the past decade, numerous SHM systems have become increasingly versatile and capable of handling larger data capacities and achieving speedier transmission, measurement and storage of data acquired from the infrastructure system being monitored. The integration of data analytics in SHM will bring about the following gains which include reduction of the analysis time, increase in objectivity, automation of pre-processing, identification of meaningful datasets and utilization of the knowledge gained from historical datasets [8]. Fundamentally, damage will not be considered as significant without comparing two different states of the system, one being the undamaged or baseline that will represent the initial state of that system [9]. Thus, the need for a methodical approach to optimally maximize the information at hand is self-evident. This paper is focused on the determination of the methodological approach to apply data science with the intention of productively managing and analyzing the huge sensor dataset generated from an existing WSN setup. The feature extraction techniques specifically used in this paper are the outlier detection techniques based on time-domain statistical analysis such as Auto Regression (AR) and Auto Regression with exogenous input method (ARX) [15,16]. The Mahalanobis distance [15] is the feature discrimination algorithm employed in this study to demonstrate the signal deviations occurring in the damaged states against the undamaged conditions.

Experimental setup
The study is primarily designed to quantify and experiment on accessible data collected from a prevailing wireless sensor networking system within a building. The paper emphasis is on the simulation work done for SHM of an existing laboratory test data obtained from an instrumented laboratory threestory structure reported in [10]. The basic dimensions and the schematic depiction of the test structure setup are shown in Figure 1.

Data acquisition
Along the centre line of the structure, an electro-dynamic shaker was installed to provide the lateral excitation. Both the shaker and the test structure are grounded on an aluminium baseplate (76.2 x 30.5 x 2.5 cm) [10]. The entire structural setup is mounted on rigid foam which will be in place to reduce the effect of the certain extraneous excitation sources that may be introduced to the system through its base. The input force from the shaker is measured using a load cell, and the overall system response is measured using four accelerometers. The analog sensor signals output from the data acquisition system was discretized and sampled producing a total of 8,192 data points sampled to 320 Hz frequency at 3.125 ms intervals. Time histories at approximately 25.6s periods were yielded from these sampling parameters. The yielded spectra in the frequency domain produced 3,600 lines demonstrating the data up to a frequency of 140.6 Hz at 0.0391 Hz resolution. The structure was excited using a band-limited random excitation in the 20-150 Hz range. The excitation signal chosen for experimentation was set to above 20 Hz to prevent the effects of the rigid body modes of the structures that are typically existent below 20 Hz [10]. Table 1 shows the different states considered in the experiment and the collection of the force-and acceleration histories for the different structural state conditions. A case in point, state #4 describes a state condition that has "87.5% stiffness reduction in column 1BD." This shows that the stiffness in the column situated between the intersection of plane B and D located and between the first floor and the base will be reduced by 87.5% [10].

Dataset Description
The experiment setup was conducted following four main structural state conditions divided into groups. The first category was called the baseline condition or state #1 which denotes the reference undamaged structural state. The second group, states #2 to #9, comprises of the circumstances where either mass or stiffness properties of the columns were altered. The second category is intended to simulate these scenarios through varying stiffness levels and mass change. The mass change in undamaged states #2 and #3, denoted by m, involved adding 1.2 kg which is approximately 19% of the overall mass of each floor [10].
The changes made with the stiffness, from states #4 to #9, was carried out by reducing the rigidity of one or more columns in the test-bed by 87.5%. The ensuing state conditions (from state #10 to #14) comprise the damaged state conditions replicated through the nonlinearities initiated into the structure through the suspended column and the bumper and varying gap levels (0.05, 0.10, 0.13, 0.15 and 0.20 mm) for a given level of excitation between them. Lastly, the fourth group involves the damaged states #15 to #17 which involve differing gaps between the bumper and the suspended column with the addition of the mass and stiffness deviations to represent the environmental and operational aspects in real world structures.
The variability in the data is taken into account by performing ten tests for every condition state which follows that ten time histories were also measured corresponding to each of the five transducers as well. Furthermore, the associated coherence functions and frequency response functions (FRFs) at the accelerometers (from Channel 2 to 5) were also measured relative to the input generated from the load cell channel #1 were recorded in the data acquisition system. Table 1. Structural state conditions used in the experimental set up in [10].

Damage detection using time series analysis
The underlying principle of vibration-based damage detection is that the presence of damage will considerably change the properties of the system such as energy dissipation, mass and stiffness that will sequentially change the measured dynamic response of that system [9]. A damage detection algorithm should be able to satisfy the following general features: detect even the smallest of damages, be robust to environmental conditions, and necessitate minimal wireless data transmission. It would also be desirable if the damage detection algorithm is less computationally intensive and can accommodate energy saving strategies such as sleep/awake states [11].
A time series is a succession of data points, measured at continuous and regularly spaced time intervals. Time-series analysis is concerned with the fact that the seasonal trend, variation and autocorrelation of the data points taken over time can be utilized to extract the Damage Sensitive Features (DSFs) that can directly be applied with the measured data for damage detection.
Regression analysis is based on time-series analysis and is a form of supervised learning classification that can be employed in the damage detection process. Some of the regression models that have been successfully applied so far include radial basis functions and neutral networks [12]. Regression techniques have been proven as capable of detecting the existence, quantify and localize the damages present in the structure through the use of modal data. To determine the strengths and limitations of the different statistical procedures used for distinguishing damages within the civil structures, the data set attained from the test structure assembled in the laboratory will be subjected to the LANL's statistical pattern recognition paradigm.

Autoregression (AR) model
In the AR model formulation, the model order is defined by the largest time lag resulting from the AR models regressing the response at one location onto a delayed version of itself [13]. To reliably deduce that damage is indeed present, it is desirable to define the threshold of how far apart the AR coefficients must be [14]. It would be necessary to employ some form of supervised learning where the damage state is known beforehand. The damage detection algorithm in supervised learning will use training data to recognize and discriminate between undamaged and damaged data sets in order to determine a threshold value. Damage is present in excess of this threshold value. In an autoregressive model, the DSFs are the AR coefficients and the structural response measurements are the acceleration time histories. The summarized AR model is described via the equations below: Let ( ) represent the acceleration data from sensor i. This will be subdivided into segments ( ) where the sensor number is denoted by i and the segments are symbolized by j. These segments will be subjected to data pre-processing. Data pre-processing is a process of standardization and normalization of the data set such as the acceleration time series. This is done to minimize or eliminate dependence on variable environmental conditions [15]. The estimation of the AR order will be done subsequent to the completion of the pre-processing procedure [13]. The time signals will be denoted as: where, = Signal measured at discrete time index i ø = k th AR coefficient, used as DSFs p = AR model order = residual term or unobservable random error Reference [15] proposed the Burg algorithm that is used in conjunction with the above equation by [13] for the estimation of the AR coefficients. The AR model will be run in each segment that will result in N vectors of AR coefficients that is also equivalent to the overall number of segments from one sensor having a length The mean and standard deviation of the AR coefficients at various segment lengths will be calculated to determine the optimal segment length and identify at which point these started to stabilize.
In general, the optimum AR model order is unspecified. A higher order model may entirely represent or match the data, but may lose generality when applied to other datasets whereas a lower order model may not necessarily capture the fundamental system response of the physical entity being observed [17].

Autoregressive with Exogeneous (ARX) Inputs Model
In a study conducted by Lei et al. (2003), they have modified the damage detection approach to take into account the excitation variability and factors of the prediction model orders on the originally obtained damage detection feature. Signals generated from the undamaged structure are compared against the residual error of a new signal coming from an unknown structural condition referenced from the prediction model. The ARX is constructed from the selected reference database as where, the ARX model order numbers are given by na and nb , the AR coefficients and the exogenous input are represented by aki and bki ,while the ( ) signifies the ARX(na, nb) residual error, correspondingly. The resultant model will then be employed for the prediction of the new signal designated as ( ) given in wherein aki and bki are the coefficients associated with ( ) derived from Eq. (3) and the residual error of the new signal is symbolized by ( ).
In the formulation of this model, the responses of two locations having different time lags are employed to execute the regression analysis. The ARX algorithm is used to obtain DSFs, model identification and filtering by means of a least-square approach. The data is fitted in a defined ARX model to determine the relationship between the input and output parameters. This procedure is intended to enhance the damage detection process through the exploitation of the information associated with an identified input specified by the sensing system [13].

Damage Localization
The statistical model used to perform the feature discrimination and quantify the extent of damage is the Mahalanobis distance. This model was introduced to isolate the outliers for the purpose of damage detection in a signal [18]. The primary goal of this algorithm is to categorize the extracted features from the normal condition versus the potentially damaged states that recorded the outliers. The feature discrimination training phase involves the learning of the mean and the covariance matrix of the AR model coefficients, serving as the feature vectors, to train a Mahalanobis distance-based detector functionality. The normalization for the features are done through the machine learning algorithm by reducing each feature vector to a score. In multivariate statistics, the Mahalanobis distance is used to perform both identification and quantification of outliers. The striking difference between the Mahalanobis distance and the Euclidean distance is that the latter is not dependent on the scale of the observations but rather reliant on the correlation between the variables. Given the group of m pdimensional real-valued patterns classified under ℜ having the multivariate mean vector = ( 1 , 2 , … , ) and covariance matrix, Σ, the Mahalanobis distance between the new pattern = ( 1 , 2 , … , ) and that group will be represented by where the normal operational condition is represented by the covariance matrix (Σ) and the mean vector (μ). The potentially damaged condition will be symbolized by x.
The key events in the order in which they happened in this study are as follows. Initially, a series of five-second blocks of acceleration time histories was acquired from the sampled data set for each undamaged state condition. This was followed by the extraction of the AR-based feature vectors done through the fitting of the linear autoregressive model to each of the time histories. The next step involved the training of a Mahalanobis distance-based detector of the sampled dataset by learning the covariance matrix and the mean of the feature vectors from the undamaged structure. On completion, the trained sample runs several test cases one at a time within the structure in different undamaged and damaged states wherein the resultant of which will aid in the determination of the threshold based on 99% confidence to classify the extent of damage in the structure [10].

Results and discussion
This paper implemented two major stages of data processing to undertake supervised learning statistical modelling analysis: training and live testing. After acquiring the data, this will be undergoing the training phase to learn a model of the baseline or undamaged system. This will then be segmented into smaller fragments of data that will be processed in sequence in the live testing stage to be classified as either "undamaged" or "damaged" referenced from the training or learned model. The threshold will then be derived statistically.

AR Method
The auto-regressive (AR) model is an outlier/novelty detection method that can be used to localize the source of damage in the structure. As mentioned before, the coefficients of the AR model operate as the feature vectors from an array of sensors and will be used as DSFs damage-sensitive features. Changes have been observed on the values of the AR coefficients for acceleration time histories when there is a reduction in the stiffness of the structure under ambient vibrations caused by damage, thus, can be considered as appropriate DSFs. Fig.2  damage seems to induce changes in those parameters related to the level of damage and therefore, this feature can also be used to evaluate the severity of the damage in the structure.

ARX Method
ARX damage detection algorithm can be utilized for the detection, localization and quantification of damage that is present in the structure. ARX is another statistical outlier/novelty detection method wherein the input parameters are used as damage-sensitive features as well through the use of the least square approach. The ARX analysis and determination of the model order is carried out for oneacceleration-time history of the reference condition (State #1) from Channel 5. Fig.3 shows that the ARX parameters perform better discrimination of the state conditions, especially in channel #3 than the AR parameters considering the same order number of 15. The location of the damage can be identified by comparing the number of outliers per channel. It has been apparent from the plot that the magnitude of the outliers tends to increase and more likely is able to indicate the location where the damage source may either be present or at least proximate.
A comparison of AR and ARX models in Figure 4 reveals noteworthy and highly significant differences. One of the striking dissimilarities is based on their computational costs to execute the damage detection feature generation. The AR modelling required less computational time whereas the ARX recorded higher magnitudes exceeding nearly three-fold than the magnitudes of the AR modelling especially for escalating model orders. On the other hand, however, the application of ARX outperformed AR modelling in damage localization [13]. The tested data consists of one-time history acquired from each condition. The damage indicators recorded from the undamaged condition (black) for the feature vectors is constant contrary with the feature vectors from the damaged states which displayed an escalating trend. Given the Mahalanobis distance plot, it can be seen that the incremental change in the damage measure will capture the change from one damage state to the next.
Also it can be seen that the extracted features from Channels 2 to 3 are lesser sensitive from Channels 4 and 5 to discriminate the undamaged (1-9) and damaged (10-17) state conditions. This is an indication that the source of damage is located near Channels 4 and 5.

Conclusions
In summary, a comprehensive literature review was carried out to discuss the theoretical background and importance of SHM, prevalence of WSNs in SHM, and the motivation of the study. The applicability of data analytics in the context of SHM was also examined following a succinct discussion on the existing statistical classifiers and feature discrimination used in the damage detection algorithms. The methodology used in this paper utilized the large amount of experimental data collected from a suitable simulation model designed by LANL. The laboratory test structure attempts to duplicate the dynamic response of actual structures by subjecting the test bed to controlled damage tests. Damages were successfully detected in the simulated models through the use of the damage detection algorithms founded on statistical pattern recognition techniques.
In reference to the damage detection methods, the ARX model provided a better discrimination of the state conditions than the AR model as viewed on the magnitude of the outliers per channel to locate the damage that incurred in the infrastructure. The ARX model also outperformed and returned more distinguishing features than AR using the Mahalanobis learning distance algorithm to accomplish the feature classification procedure. The dependability of the Mahalanobis distance learning algorithm was also demonstrated in this study as a function of both AR and ARX damage detection algorithms.
If appropriately utilised, it is believed that damage detection technologies will extend the lifetime of buildings by allowing deterioration or damage to be recognized earlier and thereby permitting comparatively inconsequential remedial actions to be taken before the deterioration or damage matures to a state where major actions will be mandatory. There are still a number of challenges and opportunities worth pursuing. Future work will involve the experimental endeavors and focus on using statistical pattern recognition approaches to identify the outliers or the different levels of damage within the structure. New data analysis and datamining tools must be developed to address the issue or the existing tools may be revisited. Future data mining research should also focus on providing another level of damage identification to include estimating the remaining service life of a structure.