A Deep Learning Approach to Reduce False Alarms for Optical Smoke Detectors

Optical smoke detectors (OSDs) are fire-fighting equipment used to detect fire by detecting smoke with scattering phenomenon. From the principle of OSDs we can see that they are vulnerable to false alarms caused by dust or water steam. To reduce false alarms and to make OSDs more reliable, we present a deep learning approach to train a classifier to distinguish fire event from non-fire ones based on time series data. The classifier is modelled with a 1-Dimension convolutional neural network, and generative adversarial network is used to augment and balance training data. Experiment shows that our classifier can reduce more than 50% false alarms caused by water steam while maintaining sensitivity for fire events.


Introduction
The optical smoke detectors (OSDs) are fire-fighting equipment used to detect fire by detecting smoke produced by material burning. Commercial OSDs, which have dominated the market, are mainly composed of light source, light sensor, optical chamber, data processing circuit and shell. The optical chamber interrupts the light sensor from the light source so that the light sensor does not have significant reading. While smoke particles produced by fire get a way into the optical chamber, these particles scatter the light beam from light source to light sensor, which produces an electrical signal. The intensity of the signal indicates the consistency of smoke particles in the air. In other words, the OSDs identify fire event by detect the scattering phenomenon in optical chamber.
However, there are other particles in the air may cause scattering phenomenon besides smoke particles, like dust and water steam, which are quite common in our living environment. A naive threshold based classifier cannot distinguish signal caused by smoke from signal caused by dust or water steam, which makes OSDs extremely vulnerable to false alarms. It is of great importance to investigate a method to reduce false alarms and to make OSDs more reliable.
The core issue leads to false alarms lies in the lack of information gathered by OSDs. An intuitive solution is to increase the source of information by adding more sensors of variable kinds, with the help of multi-sensor fusion technology OSDs can be more robust to disturb. However integrating more sensors means higher manufacturing cost, which is fatal to the popularization of commercial OSDs. An alternative way is to expand the information in time dimension. Compare to a single data point of particle consistency sampled by OSDs, a curve recording how particle consistency changes in a period of time contains much more information. The problem becomes how to extract useful information from time series data.
In this paper we present a deep learning approach to reduce false alarms for OSDs. A 1-Dimension convolutional neural network (CNN) is designed and trained as a classifier to extract features from the AICS 2020 Journal of Physics: Conference Series 1631 (2020) 012032 IOP Publishing doi:10.1088/1742-6596/1631/1/012032 2 sampled curve and decide whether or not the curve comes from a fire event. CNN training is notoriously data consuming, nevertheless it is expansive to sample and label data with OSDs, to make things work we apply generative adversarial network (GAN) as a data augmentation method. The experiment shows that the CNN classifier can reduce more than 50% false alarms caused by water steam while maintaining its sensitivity for fire events. The main contributions of this work can be summarized as follows.
 We propose a pipeline for OSDs false alarms reduction which filters more than 50% false alarms caused by water steam.
 We design a 1-Dimension CNN which can work on a small embedded system to extract information from OSDs data.
 We use a conditional Wasserstein GAN with gradient penalty to expand OSDs dataset with high quality artificial data.

Related Work
The subject of our work intersects with three domains of research, namely OSDs optimization, CNN designing & training, and GAN based data augmentation.

OSDs Optimization
Kitsak et al. [1] use laser as light rather than LED and the light is split into two coherent beams, under the effect of scattering and interference, a time-varying signal can be received by the light sensor, which contains enough information to distinguish false alarms from fire events. To et al. [2] use a hand designed feature extraction method to convert the time series into a gradient histogram, and dissimilarity measure like Euclidean Distance is used to classifier these histograms. Chen et al. [3] add gas sensor to the smoke sensor system to identify false alarms. Cheon et al. [4] add temperature sensor to the smoke sensor system, which not only sense heat from a fire but also compensate the smoke detector. Aspey et al. [5] use a polychromatic LED to produce light with variable wavelength, and spectral analysis is applied to uniquely identify the combustion material. Cashdollar et al. [6] use a white light source and three light sensors sensitive to different wavelength to get more information of the scattering particles, from which the size and mass of particles can be measured. Wang et al. [7] combine temperature sensor, infrared sensor and OSDs together to get more information, and use BP neural network to identify false alarm.

CNN Designing & Training
CNN is a successful architecture in a variety of domains like computer vision task, natural language processing task, speech recognition task, and automatic translation task. LeCun et al. [8] design a network to recognize handwrite digits, this network scanning the input image with a single neuron that has a local receptive, such a network is known as CNN and is much more expressive and efficient than conventional fully connected neural network. Nair et al. [9] introduce ReLU activation function into deep learning, which is faster and less vulnerable to gradient vanishing than conventional Sigmoid activation function. Sergey et al. [10] propose Batch Normalization to deal with the covariate shift problem, which may be responsible for network losing learning capability, Batch Normalization makes deep CNNs more trainable. Diederik et al. [11] introduce Adam optimizer, which has become one of the most popular optimization algorithms in deep learning community.

GAN Based Data Augmentation
Goodfellow et al. [12] propose GAN which is an elegant architecture that contains two parts, a Generator and a Discriminator, the Generator tries to generate something 'real' but not exist while the Discriminator tries to distinguish real data from generated ones. These two networks progress in an adversarial way, until the Generator can generate real enough data. Ever since the original GAN is proposed, it becomes a hot point in deep learning research, a bunch of works have been done in this domain. Arjovsky et al. [13] propose WGAN by introducing Wasserstein Distance to replace the JS divergence objective function in original GAN. The advantage of WGAN is that it's free from mode collapse and more stable in training, however WGAN uses Weight Clipping to guarantee that Lipschitz constraint is met, which is 'a clearly terrible way' according to the authors and they encourage further investigation on the Lipschitz constraint. Gulrajani et al. [14] introduce a soft way to deal with Lipschitz constraint by penalizing the norm of gradient, the so called WGAN-GP algorithm is more robust and performs better than WGAN. Mirza et al. [15] propose cGAN which can generate data belongs to a given label. Frid-Adar et al. [16] use GANs as a data augmentation method to improve the performance of liver lesion classifier.

Data Collection
We collect OSDs data according to the following rules. The sampling frequency is adaptively adjusted according to the current sample value, higher sample value leads to higher sampling frequency. We store the sample value it in a first-in-first-out queue, whose capacity is set to be 24, large enough to cover the intensity ascending process. An event is defined by sample value larger than the pre-set threshold 3 times in a row. When an event takes place, 4 extra data points will be sampled and then the whole queue, as illustrated in figure 1, will be saved along with whether it's a fire event or non-fire event as a piece of complete data. We sample data in 7 scenes, namely beech stick smouldering, cotton rope smouldering, polyurethane foam (without flame retardant) burning, n-heptane/toluene mixture burning, joss stick smouldering, smoke chamber testing, and water steam blast. The first 6 scenes belong to fire event, and the last scene belongs to non-fire event, more detail to be found in table 1.

Data Augmentation
Data augmentation is of great importance in deep learning. Popular data augmentation methods in computer vision, like rotation [17] or flipping [18], don't make sense on OSDs data. Meanwhile OSDs data is sampled at a low frequency so that the augmentation cannot be done in frequent domain like [19] do in speech recognition. To make things work, we use two kinds of augmentation, namely Gaussian noise and GAN.
As presented in figure 2, our GAN combines the features of cGAN [15] and WGAN-GP [14]. The Generator takes a piece of noise which sampled from a Gaussian distribution and a label as input, and output a piece of data that matches the given label. While the Discriminator take a piece of data as input, and output the prediction of label and Wasserstein distance. Loss of Discriminator consists cross entropy loss, adversarial loss, and gradient penalty. Loss of Generator consists cross entropy loss and adversarial loss. It is worth noting that in this architecture Generator and Discriminator cooperate together to avoid ambiguity in classification.
We use Adam algorithm [11] for both Generator and Discriminator with learning rate of 2e-4. To balance the process of Generator and Discriminator, we update these two networks with a ratio of 1:5. A weight of 10 is assigned to gradient penalty and 1 for other losses. We train the GAN with all of our real data for 1e4 epochs. Figure 3 shows some data generated by the Generator, and compared with real data we collected in variant scenes.

CNN Classifier
To deploy our classifier on commercial OSDs which have strictly limited storage and computing resources, we have to restrict the depth and width of our network. We build our classifier with 4 convolution blocks, which contains 1-Dimension convolutional layer, ReLU layer, and batch normalization layer. The max width of the network is set to be 2 as show in figure 4. The classifier takes a vector of length 24 as input, and output a scalar which indicates the probability of the input data belongs to a fire event.
We use Adam algorithm [11] with learning rate of 3e-4 to train the classifier. Real data is split into two subsets, namely train set and validate set, in a ratio of 4:1. In mini batch, the ratio of real data versus artificial data is 1:1. For artificial data generation, we sample random labels from uniform distribution to balance positive and negative samples, and sample random noise from Gaussian distribution. Binary cross entropy loss is used to guide the training.

Experiment
To verify the performance of our classifier, we make controlled experiments in same scenes as we collect training data. The first four experiments are made in a fire laboratory following Chinese national standard GB 4715-2005. Joss stick is lit in a pipe to assemble the smoke together and lead it to testing OSDs. Smoke chamber is a standard equipment to test OSDs as stipulated in GB 4715-2005. Water steam experiment is made in a box with an electric stove to heat water, when the lid is removed, a burst of water steam blast pounds the testing OSDs mounted on the box ceiling, as showed in figure  5.   Figure 3. Real data versus artificial data: row a is real data from fire event, row b is artificial data from fire event, row c is real data from non-fire event, and row d is artificial data from non-fire event.  The experiments result is recorded in table 2, the recording 6/15 means we repeat that experiment 15 times and in 6 times the testing OSD gives an alarm. The first 6 scenes are fire event, and we expect the OSDs to give alarms, both OSDs pass these tests with 100% alert rate. In the scene of water steam, which we expect the OSDs to keep silence, the OSD without our classifier experiences 13 false alarms while the one with our classifier has 6 false alarms, which means more than 50% false alarms are filtered by our classifier.
From the false alarm records of our classifier, we find that in some cases, the time series curve of water steam blast appears to be a step signal, which is quite similar to the curve of joss stick. Under the principle of "Zero Leakage Alarm", we prefer to give alarm for all the step signals. Such a step signal phenomenon is due to a large amount of particles get into the optical chamber in a short time, say about a few sample periods, and light sensor reaches saturation. A higher sample frequency may relieve the problem, however we should also balance performance and battery life time. More investigation will be made in the aspect.  Figure 5. The steam source we used is not large enough to trigger OSDs in an empty room, so we use a box to concentrate steam. An OSD with our classifier and a common OSD are mounted on the box ceiling side by side. The OSDs is about 50cm above the electric stove we used to produce steam. To exclude the influence of mounting location, we exchange the place of these two OSDs during experiment frequently. After the experiments condensation is found on the inner surface of optical chamber.

Conclusions
In this work we present a deep learning approach to reduce false alarms for OSDs. We use a conditional Wasserstein GAN with gradient penalty to augment and balance training data. A compact and efficient 1-Dimension CNN classifier is designed and trained to distinguish fire event from nonfire event. Our experiments demonstrate that our classifier can filter more than 50% false alarms caused by water steam.