Integrated Energy System Operation Optimization Based on Reinforcement Learning

Each subject in the integrated energy system has different interests and demands, and it is necessary to optimize the energy dispatching with the help of multi-subject game theory. In order to solve the above problems, this paper proposes a reinforcement learning-based multi-object operation optimization method for integrated energy systems. Firstly, a multi-subject integrated energy system model including energy suppliers, park service providers and users is constructed; secondly, a game search method based on reinforcement signals is proposed to improve the speed of multi-subject game solution; finally, a simulation is conducted with an integrated energy system as an example to verify the effectiveness and rapidity of the proposed method.


Introduction
Integrated energy system is a kind of multi-energy coupled and complementary energy supply system, which breaks the original mode of each energy supply system operating alone and planning alone, integrates multiple types of energy sources and carries out joint dispatching, and can significantly improve the utilization efficiency of various types of energy sources [1]. Therefore, the study of integrated energy system is of great significance to the development of modern society.
The literature [2] fully considers the user-side demand response and realizes the multi-time scale and multi-energy coupling optimization of integrated energy systems. The literature [3] aims at reducing the operation cost and flexibly deploys the internal equipment of the park to realize the optimal economic dispatch of the integrated energy system. The literature [4] proposed an expression for energy utilization efficiency applicable to the evaluation of the integrated energy system. The literature [5] established a park integrated energy (DIES) model to realize the joint intra-day dispatch of the park integrated energy system. In literature [6], an economic optimal dispatch model based on risk quantification was proposed and analyzed with a typical day in summer to verify the effectiveness of the proposed model.

Integrated Energy Systems
This paper adopts the integrated energy system model, which contains the following parts: (1) Supply side: It includes power grid, heat source plant and energy supplier, heat source plant can only provide heat energy, and energy supplier can provide electricity, heat and gas three kinds of energy. (2) Park service provider: responsible for purchasing energy from the energy supplier, selectively calling various types of equipment inside the park and supplying energy to users.
(3) Users: including electric, thermal and gas loads, users have the ability to respond to demand, and the form of demand response is interruptible load.

Energy Suppliers
As a supplier of energy to the campus, the main work of the energy supplier is to deploy energy production equipment and interact with the campus service provider in various energy games. Its profit is the difference between the revenue from energy sales to the campus service provider and the cost of energy supply. The formula as shown in equation (1) In the formula, In the formula,  are satisfaction coefficients, which take positive values.

Service Providers
The comprehensive cost of the service provider consists of the compensation cost of demand response to the park users ( ) DR t C denotes the compensation cost of the campus service provider to the users participating in demand response, which is calculated as follows.
( ) In the formula, ES t C denotes the total cost of energy purchased by the campus service provider from the energy supplier; E t C denotes the total cost of electricity purchased by the campus service provider from the grid;  ( ) The energy allocation relationship of the service provider is represented by the following equation.

User
Users will consider their own energy purchase cost and comfort function to decide the value of their intermittent load, and the objective function of users is as follows. ( ) In the formula, k y denotes the preference coefficient of the user for k types of energy, which takes a positive value.

Case Study
In this paper, an integrated energy system model is used for simulation verification. The simulation duration is set to 24h, the price of electricity purchased from the grid is 715CNY/MWh, the price of heat purchased from the heat source plant is 650CNY/MWh, the grid crossing fee is 65CNY/MWh, the cost of load shedding compensation is 32.5CNY/MWh, the cost of environmental pollution unit penalty is 19.5CNY/MWh, the transformer efficiency is set to 0.95, the P2G equipment efficiency is set to 0.7, the electrical energy production efficiency of the CHP unit is 0.25, the thermal energy production efficiency is 0.65, and the gas boiler production efficiency is set to 0.7. The simulation uses the game search method combined with the Nash-Q algorithm to solve the whole game process, and the learning rate is set to 0.01 and the discount factor is 0.9.

Analysis of game results
The results of the game between energy suppliers, campus service providers, and users are shown in the figure, respectively. Analyzing the price curves of energy suppliers and campus service providers in Figures 1 and 2, it can be seen that they always tend to choose higher energy prices at the moment of higher customer loads, because the gain in energy sales from raising energy prices at this moment is greater than the loss of satisfaction in the satisfaction function.

Scene Analysis
The following four scenarios are set up in the paper for comparative analysis.
Scenario 1: a multi-body game interaction of electricity, heat and gas between energy suppliers, campus service providers and users, with users considering demand response. Scenario 2: the game process of electricity and heat only between energy suppliers, campus service providers and users, with all gas prices fixed and users considering demand response. Scenario 3: game process between energy suppliers, campus service providers and users for electricity only, all heat and gas prices are fixed, and users consider demand response.  Table 1 shows the revenue results in different scenarios. When the number of energy types involved in the game increases, the revenue of service providers and suppliers also increases; at the same time, the multi-entity game makes the upper-level service providers consider the subjectivity of users and control the energy price appropriately, so that users voluntarily increase their energy purchase, which obviously increases the revenue of users as well. Thus, it can be seen that the multi-entity game enhances the interests of all parties.

Conclusion
In this paper, a model of integrated energy system containing multiple subjects is established, based on which a hierarchical control of the integrated energy system of the park is carried out in order to quickly solve the multi-subject game process, and a game search method based on reinforced signals is proposed, and finally conclusions are drawn based on the simulation results. The introduction of multi-subject game theory in the integrated energy system of the park can fully consider the interests of each subject, and then significantly improve the benefits of each subject.