Research on Federated learning algorithms under communication constraints

Federated learning is a kind of distributed machine learning, due to the limited communication resources of Internet of things devices, it is very inefficient to train various neural network models through federated learning (FL) mode. In this paper, we propose two new efficient federated learning algorithms: static quantization federated average algorithm (SQFedAvg) and dynamic quantization federated average algorithm (DQFedAvg). They are composed of two parts, which are optimized locally on the client side, and then the updated parameters are quantified and sent to the server, thus reducing computational storage and communication costs and speeding up the training process. Finally, two models: convolutional neural network (CNN) model, a two-layer perceptron (2NN) model and two standard detection data sets (CIFAR-10, MINIST) are used to verify the effectiveness and efficiency of the proposed federated learning algorithm.


Introduction
In the past few decades, artificial intelligence technology has made great progress in various real-world applications, The development of these artificial intelligence-based applications relies heavily on a large amount of data (also known as big data). In the current implementation, the collected real-time data is uploaded to the cloud server in a centralized manner and processed by the cloud server in a unified manner [1]. However, the centralized processing of cloud servers has aroused public attention to privacy-sensitive data sharing and may suffer unaccepted long delays and high traffic burdens. In order to alleviate these challenges, it is best to separate model training from the need for remote collection and centralized processing of raw data. This paper studies a learning algorithm whose goal is to train a high-quality centralized model, while the training data is still distributed on a large number of clients. It is called Federated Learning (FL) [2], [3]. In FL, each client calculates the update to the local model based on its local training data set, and then uploads the updated local model parameters to the FL server. The FL server aggregates updates from all clients to improve the global model, and sends the newly updated global model back to the client. These steps will be repeated many times until convergence or the required accuracy is reached. However, due to the constant communication between the client and the central server to exchange a large number of model parameters, the communication time, the number of communication rounds, and the total number of bits required to transmit data are relatively high, resulting in high communication overhead. In fact, communication overhead and communication efficiency have become the bottleneck for making full use of distributed computing resources to accelerate machine learning model training [5], [6].
In this context, communication-efficient federated learning methods have recently become more and more popular. More and more researchers are focusing on reducing the cost of federated learning

Quantification method
2.1.1. Static quantization. In order to reduce communication overhead, this paper proposes a method to improve communication efficiency-quantification. The simplest quantization is scalar quantization. Usually the operation of scalar quantization can be given by the following description: Where x is a real number (such as the gradient or the weight in a neural network), x     is the rounded down function, the generated integer Real-valued function. The integer i is sometimes called a quantitative index.
In computer applications, a known quantization is uniform quantization (static quantization). In uniform quantization," M "bit can be used to represent the accuracy of quantization. When 8 M  , 32-bit floating-point type can be quantized into 8-bit integer by the following quantization formula: In this formula, 0.5 represents the offset, and 0.5 is added to make each quantized representation located in the middle of the input area, so as to ensure the quantization to the nearest point, thereby reducing the quantization error. Since the distribution of gradient values and weights calculated in reality is basically not uniform, the uniform quantization may bring a relatively large loss of accuracy.
Therefore, the weight can be quantized from 32-bit floating point (FP32) to 8-bit integer (int8) by the following formula (3): The participant sends the quantized weight to the server. On the contrary, the server can decode int8 into FP32 by the following formula after receiving the quantized weight: Therefore, the dynamic quantization process is equivalent to a process of encoding and decoding [11]. Since there is a decoding operation in the central server, the dynamic quantization error will be greatly reduced.

Results and Discussion
In this section, Compare the algorithm proposed in this paper with some classic federated learning algorithms, such as federated learning algorithm based on stochastic gradient descent (Fedsgd), federated learning algorithm based on Adam optimization algorithm (Fedadam), and federated average algorithm (FedAvg) for numerical simulation. Reflect the advantages of the algorithm in this paper.

Experimental setup
The simulation experiment in this paper is based on two very classic detection data sets (MNIST and CIFAR-10)and two models, model 1: a multilayer perceptron, with two hidden layers, each layer has 200 units, abbreviated as 2NN. model 2: a CNN with two convolutional layers, a fully connected layer, and a final softmax output layer. For the learning process, set the static learning rate of all settings to 0.01, the number of quantization bits b to 8, the number of iterations E of the client is set to 10, the mini-batch size B used when the client is updated is set to 10, SGD The momentum is 0.5, and Adam's weight decay is set to1 4 e  . This paper establishes two indicators to compare the proposed algorithm with the baseline. The first is to test the convergence accuracy of the global model trained on the data set. The second is the number of communication rounds required to achieve convergence accuracy.

Analysis of experimental results
Test accuracy: For the final convergence accuracy, the convergence accuracy achieved by DQFedAvg is close to FedAvg. Although the accuracy of the model after quantization is not optimal, the accuracy achieved by the optimization algorithm we use is sufficient to meet our needs. Because our goal is to evaluate our optimization method, not to achieve the highest possible accuracy on this task. Communication rounds required to achieve convergence accuracy: It can be observed in the figure, in testing the CNN model with the MNIST data set, compared with the baseline FedAvg, the number of communication rounds required for SQFedAvg and DQFedAvg to achieve convergence accuracy is reduced by 17.69% and 20.14%, respectively. Compared with Fedsgd and Fedadam, The number of communication rounds required for SQFedAvg to achieve convergence accuracy has been reduced by 48.95% and 54.17%, respectively. The communication rounds required for DQFedAvg to achieve convergence accuracy were reduced by 50.63% and 55.68%, respectively.
In addition, DQFedAvg has a greater improvement in accuracy than SQFedAvg. For example, when using the MNIST data set to test the CNN model, the convergence accuracy of DQFedAvg is 10.75% higher than that of SQFedAvg. Similarly, it can be clearly seen from the figure that on other data sets or other models, the communication rounds are basically the same. In the case of DQFedAvg, the accuracy that can be achieved when it converges is significantly higher than that of SQFedAvg.

Conclusions
In this article, for federated learning under communication constraints, two efficient quantization methods are proposed: static quantization and dynamic quantization. The experimental results show that, compared with the traditional federated learning algorithm, the two quantization methods can