Development of ultra-low emission prediction model for steel enterprises based on ARIMA

The iron and steel industry is an important raw material industry in the country, but also a large air pollutant emission. In the face of the increasing environmental standards and requirements of the country, promoting the implementation of ultra-low emissions in the steel industry, significantly reducing pollutant emissions, promoting continuous improvement of environmental quality[1], and accelerating green low-carbon transformation and high-quality development are the general trend and industry needs. According to the promotion requirements of ultra-low emission work in Baoshan Base, the ultra-low emission application system was developed to meet the ultra-low emission control requirements. Using the ultra-low emission monitoring data of Nanjing Mei shan the parameters p, d and q were first determined, and then the ultra-low emission differential autoregressive moving average (ARIMA) prediction model was established.


Introduction
A. Environmental policy requirements: In April 2019, the National Ministry of Ecology and Environment and other five ministries issued the Opinions on Promoting the Implementation of Ultra-Low Emissions in the Steel Industry (Huanair [2019] No. 35), and the ecological environment authorities in Hebei, Shandong, Shanxi, Jiangsu and other provinces have successively issued local documents to promote ultra-low emissions in steel enterprises.On October 14, 2020, Jiangsu Provincial Development and Reform Commission and the Department of Ecology and Environment issued the Notice on the Implementation of Ultra-Low Emission Differential Electricity Price Policy for Steel Enterprises.It is required to implement differentiated electricity price policies from organized, unorganized and clean transportation from 2021 to 2025(Clean Transportation will increase the price by 0.3 min/KWH in 2021, 0.5 min/KWH in 2022, and 1 min/KWH in 2023-2025).
B. Status quo of Baosteel: Ultra-low emission ratio of exhaust gas from four bases of Baosteel: Baoshan Base 68%, Qingshan Base 73%, Dongshan base 76%, Meishan base 80%; The index of Meishan base is in a leading position among the four bases of Baosteel.However, the assessment and evaluation of ultra-low emissions currently rely on offline surveys, statistics and analysis.On the one hand, it is time-consuming and laborious, the labor cost is high, and on the other hand, it is difficult to achieve accurate and timely, and it is unable to provide real-time dynamic and immediate data.At the end of

2.1.Model Introduction
The full name of ARIMA Model is Autoregressive Integrated Moving Average Model.The ARIMA model is mainly composed of three parts: autoregressive model (AR), differential process (I) and moving average model (MA) [2].
The basic idea behind the ARIMA model is to use historical information about the data itself to predict the future.The value of a label at a point in time is affected both by the value of a label in the past and by chance events in the past, that is, the ARIMA model assumes that: The label value fluctuates around the general trend of time, wherein the trend is formed by the influence of historical labels, and the fluctuation is formed by the influence of accidental events in a period of time, and the general trend itself is not necessarily stable.[3]In short, the ARIMA model is an attempt to extract the time series patterns hidden behind the data through the way of autocorrelation and difference, and then use these patterns to predict future data.The ARIMA model can both capture trends in data and handle data that is subject to temporary, sudden, or noisy changes.Therefore, ARIMA model has a good performance in many time series prediction problems.Although the word "difference" does not appear in the English name of ARIMA, it is a key step.[4]

2.2.Principle of model
In describing the ARIMA model, AR, MA, and ARMA models are inseparable, and these two models are described below.

AR model (autoregressive):
Autoregression is only applicable to predicting phenomena related to its own previous period, and the mathematical model expression is as follows: +   (1) Where y t is the current value, μ is the constant term, p is the order, r i is the autocorrelation coefficient, ϵ t is the error, and ϵ t must conform to the normal distribution.The model reflects that there is a linear relationship between the target value at time t and the previous t-1 ~p target values, namely: MA Model (Moving Average): The moving average model focuses on the accumulation of error terms in the autoregressive model, and the mathematical model expression is as follows: 3) The model reflects that there is a linear relationship between the target value at time t and the previous T-1 ~p error values, namely: This model describes the combination of autoregressive and moving average [5], and the specific mathematical model is as follows: ARIMA model: The data is transformed into stationary data by difference, and then the dependent variable is only regressed to its lag value and the present value and lag value of the random error term.AR is autoregressive, p is autoregressive; MA is the moving average, q is the number of moving average terms, and d is the number of differences made when the time series becomes stationary.Generally, first-order differences are made, and second-order differences are rarely made.
The ACF is a function that calculates the correlation between a sequence's current value and its past values, taking into account trends, seasonality, periodicity, and residuals [6].It provides autocorrelation values for different lagged sequences.
The PACF (partial autocorrelation function) specifically examines the direct association between an observed value and its lagged term, while accounting for the impact of other shorter lagged terms(y t−1 , y t−2 ,...,y t−k−1 ).
Trailing and truncating Truncation: Truncation of order k when it rapidly approaches 0 after some constant k.
Trailing: Always has a non-zero value, does not become identical to zero after k is greater than some constant (or fluctuates randomly around 0). ARIMA(p,d,q) order Since the establishment of AR (autoregressive) must have stationarity, stationarity is also required in the establishment of ARIMA model, so that the method of data stationarity can carry out differential processing on the data, such as first-order difference, that is, the difference between t and t-1, secondorder difference is the first-order difference, and the difference number after data stationarity is the parameter d we want to determine [7].TABLE I. Criteria for determining order p and q model ACF PACF

AR(p)
To gradually diminish towards zero (either in a geometric or oscillatory manner).
End after the p order MA(q) End after the q order To gradually diminish towards zero (either in a geometric or oscillatory manner).

ARMA(p,q)
To gradually diminish towards zero after order q (either in a geometric or oscillatory manner).
To gradually diminish towards zero after order p (either in a geometric or oscillatory manner).
If the data undergoes censoring following the PACF p stage, the model determines the parameter p as the order of censored data.Similarly [8].
If censoring occurs after the ACF q stage, the model determines q as the order of censored data.

Basic flow of modeling
This modeling must first preprocess the data, first find out the data of smoke emissions used in this modeling from January to June, and then carry out differential processing on the data, and then carry out data stationarity test.If the next order difference is not carried out by returning, the next step can be carried out if it passes: The parameters (p,d,q) are determined, and then the model is constructed and the model prediction is carried out.The flowchart of this modeling is as follows.

Start modeling
After the above introduction, the specific process of modeling began.

4.1.Source data presentation
There are many categories in the source data, such as sulfur dioxide concentration, sulfur dioxide emission, nitrogen oxide concentration, smoke concentration, etc.This modeling selects part of the data of smoke emission.(Data source: Nanjing Meishan Iron and Steel Enterprise from January to June 2023 online daily data details) , Figure 3 is the source data of this modeling, and Figure 4 is the line diagram of the source data.It can be found from the figure above that the data is much more stable after the first-order difference and the second-order difference, and the difference between the first-order difference and the secondorder difference is not much.
3)Determination of parameters p and q: The stationary data ACF and PACF and the first-order difference data ACF and PACF are calculated and plotted respectively.The picture of acf with non-stationary data is compared with that of pacf.At the same time, the picture of acf with first-order difference data is compared with that of pacf, as shown in FIG.6 and FIG. 7.

Other functions
In order to increase practicality, the system can also add a new functional module, that is, the front-end display module, a display interface has been made in QT, which can be used to display the main interface of forecast data, system Settings, debugging help and other plates, and subsequently can increase more detailed classification according to actual requirements, such as increasing data classification, the ultralow emission data is divided into solid state, liquid state, liquid state, etc.Three types of gas, and under the gas is divided into sulfur dioxide, nitrogen oxide, etc., the function of this module has not been fully opened, waiting for subsequent addition.

Conclusion
This modeling is based on python language, and the ARIMA model is established in pycharm, which can effectively predict the trend chart of various data of iron and steel enterprises and predict the trend of their data, which has certain theoretical and practical value for the establishment of ultra-low emission model of iron and steel enterprises.However, the function is not too much, and the early warning module can be added later.It is also considered to transplant it to the development board.Through the prediction comparison chart, it can be found that the rise and fall trend of the forecast data is basically accurate, but the accuracy needs to be optimized.In the future work, I will try to improve its prediction accuracy and add other functions

Figure 1 .
Figure 1.Introduction to ARIMA model parameters

Figure 3 .
Figure 3. Soot emission from the tail outlet of No. 5 sintering machine

Figure 4 .
Figure 4.A line image of soot emissions from January to June 4.2.Parameter determination(p, d, q) 1) Determination of parameter d: The stationarity of source data is observed by first -and second-order difference.

Figure 5 .
Figure 5. pictures of Source data and first -and second-order differences

Figure 6 .
Figure 6.acf and pacf of unstable data

Figure 8 .
Figure 8. Image comparing source data and forecast data

Figure 9 .
Figure 9.The front-end display interface