HDCDS-CDG: A Hierarchically Diffused Connected Dominating Sets based Compressed Data Gathering Scheme

How to reduce the energy consumption and prolong the network lifetime is one of the most important issues in Wireless Sensor Networks (WSNs). This paper proposed a Compressed Data Gathering (CDG) scheme based on Hierarchically Diffused Connected Dominating Sets (HDCDS). In the proposed scheme, a connected dominating set was constructed as the backbone of the network. All of the other nodes, i.e., non-backbone nodes, are connected to the backbone. Nodes in the backbone collect data from non-backbone nodes using the traditional data collection scheme. Data within the backbone are transmitted to the base station in CDG manner. Theoretical analysis and simulations showed that, when compared to the LEACH methods and traditional compressive sensing-based data collection methods, HDCDS-CDG not only reduces the consumption of network resources but also increases the lifetime of the network.


Introduction
A Wireless Sensor Network (WSN) is composed of a large number of stationary or moving sensors in a self-organizing and multi-hop manner [1]. In WSNs, how to reduce the energy consumption and balance network loads is a very important problem. The Low Energy Adaptive Clustering Hierarchy (LEACH) scheme is widely used in WSNs [2][3]. Compared to common routing algorithms, LEACH has many advantages: 1. Cluster heads are randomly selected, which distributes the network energy consumption and hence balances network loads; 2. A clustering mechanism is adopted to manage cluster heads. As nodes in a higher level of the network, cluster heads are responsible for managing cluster members, which results in free selection of transmission paths and storage of routings; 3. The clustering mechanism has a good expansion effect. The ability of self-organizing and the method of sequential election of cluster heads extends the lifetime of the network. However, the LEACH scheme still has disadvantages: cluster heads are not uniformly distributed in the network, which fails to consider the location and residual energy of nodes and excessive energy consumption of cluster heads in the stable phase. Considering these disadvantages, reference [4] proposed the LEACH-EDH method, which fully considers the energy and geographical location of nodes during the clustering process, and adopts the probabilistic hybrid routing algorithm to reduce energy consumption in the transmission phase. Reference [5] proposed the LEACH-improved method, which improves the calculation of threshold by adding spacing factor, residual energy factor and node density factor, and thus reduces the energy consumption of nodes.
This paper considered the energy consumption and network lifetime of data acquisition and transmission in WSNs.Traditional methods generally compress data in the network by exploiting the correlation of data, and then send it to the base station [6][7] technology, which is a new direction in mathematics and information science, is very popular [8]. The idea is to preserve the useful information and discard the useless part in data. Reference [9] applied the CS technology in the data collection process in WSNs, and proposed the Compressive Data Gathering (CDG) method. Different from the traditional data collection methods, CDG collects linear combinations of the original data instead of the original data itself. The base station is able to recover the original data as long as enough number of linear combinations are collected. CDG not only reduces the energy consumption but also balances the network loads.
In this paper, we proposed a Hierarchically Diffused Connected Dominating Set-based Compressed Data Gathering (HDCDS-CDG) method by combining CDG and the Connected Dominating Set (CDS). It has the following advantages: 1. In HDCDS-CDG, the hierarchical diffusion is used to find cluster heads, i.e., nodes in the connected dominating set. The distribution of each cluster head is reasonable. It considers the residual energy of each node in the process of selecting cluster head. This avoids the problem that a node is selected as cluster heads for several times due to the randomness, and thus extends the network lifetime. Since HDCDS-CDG uses the idea of CDG to transmit linear combinations of the original data, the number of transmission for link within the CDS is fixed to m. The data can be finally transmitted to the aggregation node with relatively small transmissions.
2. Theoretical analysis and numerical experiments showed that, when compared to the LEACH algorithm, the selection of cluster heads and their distribution are more reasonable. In the process of data transmission, the energy consumption of cluster heads is reduced and the effeciency of data transmission is improved. Generally, the HDCDS-CDG algorithm can achieve less energy consumption, more efficient data transmission and longer lifetime of the network.

Hierarchically Diffused Connected Dominating Set-based Compressed Data Gathering
The HDCDS-CDG scheme first constructs a backbone which is consisted of nodes with large transmission gains. Nodes in the backbone can transmit data to the sink within a relatively small hops. We divide the data collection into two phases: Phase 1: non-backbone nodes transmit original data to the backbone; Phase 2: nodes in the backbone transmit linearly combined data to the sink using the idea of pipelining.

2.1The network model
Suppose the network can be represented as an undirected graph G=(V, E), where V is the set of nodes with |V|=n, and E is the set of edge. Suppose nodes are randomly distributed in a unit square, and the cluster heads and cluster members are fixed after the deployment. Moreover, we assume that each node has the same initial energy E 0 and communication range R c . Each node is able to fuse data and sense its own residual energy. A unique identifier, i.e., the IDs (1, 2, ..., n), is used to identify nodes in the network. Before introducing the proposed scheme, we first introduce some definitions: An important step is to construct the backbone of the network. In this paper, the backbone is a connected dominating set. A dominating set is called as the Connected Dominating Set (CDS) if each pair of the nodes are connected. The less number of nodes in the CDS, the higher efficiency of the data transmissions. A key point is to find a CDS with small size.

2.2Selection of cluster heads
The construction of the CDS is based on a MIS. Initially, all nodes are white, and finally these nodes will be converted to black or gray. Black nodes represent cluster heads. During the process of  2. After receiving the the M_Dominatee message, white nodes mark themself as green. Meanwhile, they enter the election state and transmit the M_Active message to neighbors. This message carries information about the number of white neighbors and the residual energy. A white node marks itself as gray if the both the M_Dominator message and the M_Dominatee message are received. Then it will broadcasts the M_Dominatee message. If a green node receives the M_Active message, all of its green neighbors will join in the election. A green neighbor with the maximum residual energy wins and marks itself as black. Then the winner broadcasts the M_Dominator message. After receiving the M_Dominator message, a green node marks itself as gray and then broadcasts the M_Dominatee message.
3. If a gray node receives the M_Dominator (respectively M_Dominatee) message, the color of the transmitting node in the information table will be marked as black (respectively gray). This process is repeated until no white nodes appear in the network.
Stage 3 : Constructing the CDS. 1. Each gray node marks itself as blue if there exist at least 2 black neighbors which are in different levels. If all of the black neighbors connect to at least one blue node, the gray node remains unchanged.
2. After all of the gray nodes are checked, each blue node marks itself as black. So far, all of the black nodes consists a CDS.

2.3The clustering
After the selection of cluster heads, each cluster member needs to choose its corresponding cluster head. Each node in the CDS sends an M_Head message with its own residual energy to neighbors indicating that it is a cluster head. After receiving M_Head messages from several cluster heads, the cluster member searches its information with largest residual energy. Finally, each cluster head deletes nodes from which M_Join message is node received, and updates corresponding information.

2.4Data transmissions in the stable state
Data transmissions in the stable state is divided into two phases: In the phase 1, each cluster members transmits the original data to its corresponding cluster head. In the phase 2, data are transmitted to the sink within the backbone by adopting the CDG scheme. In the CDG scheme, each node sends a linear combination of the original data instead of the original data itself. Figure 2(a) shows the CDG scheme, where each node linearly combines its own data with the received data from the previous hop and transmits the result to the next hop. Figure 2(b) shows the idea of the proposed scheme in a linear network. It can be seen that, when compared with the pure CDG scheme, the proposed scheme reduces the number of transmissions.

Theoretical analysis
In this paper, we only consider the energy that is consumed by data transmissions. This is because the energy consumed by computation is trivial. In the phase 1, each cluster member transmits its original data to the corresponding cluster head. Therefore, the number of transmissions in the phase 1 is given by |D| where D is the number of cluster members. In the phase 2, each cluster head transmits the fused data m times. Therefore, the number of transmissions in the phase 2 is given by |I∪W| where I is the number of cluster heads and W is the number of node in the CDS excluding cluster heads. Next, we use Lemma 1 proposed by Oler [10] to determine |I∪W|.
Lemma 1: Suppose a compact convex area C contains non-overlapping disks with a diameter of 1. Then, the total number of discs does not exceed: (1) In equation (1), A(C) is the area of C, and P(C) is the perimeter of C. According to Lemma 1, the maximum number of non-overlapping circles with radius r is given . Since I is an MIS, we have that ( By using results in the reference [11], we can get equation (3) where * i  represents the number of neighbors for any cluster head node.

Numerical experiments
In this paper, Matlab software is used for simulation experiments. We assume that n nodes are randomly deployed in the 100m×100m rule area. The initial energy of each node is E 0 =5J, the free space attenuation coefficient is ε fs =10pJ/bit/m 2 , the multipath channel attenuation coefficient is ε mp =0.011pJ/bit/m 4 , the data packet size is set to DM=4000bit, the distance threshold r=30m, the matrix sparsity in the compressed data collection is set as k=100, and the probability of a node becoming a cluster head in the LEACH is set to p=10%. To smooth the randomness, each result is averaged over 50 experiments.
E_min and E_ave represent the minimum residual energy and the average proportion of the residual energy of each node respectively. Figure 3 shows the the ratio the normalized E_min between HDCDS-CDG scheme and the LEACH scheme. It can be observed that the node with the least residual energy in the LEACH scheme consumes more energy than that in the HDCDS-CDG scheme. Moreover, as the network size increases, the node with the least residual energy in the LEACH scheme. Moreover, as the network size increases, the node with the least residual energy in the LEACH scheme decreases faster than that in the HDCDS-CDG scheme. This showed that the HDCDS-CDG scheme performed better than the LEACH scheme from the perspective of network lifetime. Figure 4 compares the performance of HDCDS-CDG scheme and LEACH scheme in terms total energy consumption. It can be observe that, as the size of the network increases, the total energy consumption of LEACH scheme is between HDCDS-CDG scheme and the LEACH scheme increases. This implies that the proposed scheme is more suitable for large-scale networks.   Figure 5 compares the lifetime between the HDCDS-CDG scheme and the LEACH scheme. The lifetime of a network is defined as the time until the first node die. It can be observed that, as the network size increases, the network lifetime decreases. However, the network lifetime in the HDCDS-CDG scheme decreases slower than that in the LEACH scheme. This implies that the HDCDS-CDG scheme is able to prolong the network lifetime.

Summary
This paper introduces a Hierarchically Diffused Connected Dominating Set-based Compressed Data Gathering method. Compared with the traditional data collection method of wireless sensor networks, the nodes near the base station need to forward the data from the nodes far away from the base station, which often causes the load imbalance of the whole network, resulting in the energy consumption of the nodes near the base station faster than the nodes far away from the base station. The HDCDS-CDG algorithm adopts the hierarchical diffusion connected dominating set election method, which makes the distribution of nodes more balanced, the selection more reasonable, and further saves energy consumption and improves efficiency.