Stochastic modeling of aircraft flight parameters in terminal control area based on automatic dependent surveillance-broadcast (ADS-B) data

A critical step in developing good traffic management (ATM) system simulation model is to build a kinematic model of an aircraft movement that can represent the actual conditions. This study aims to obtain flight parameters for aircraft operating in the terminal control area of Soekarno-Hatta International Airport using Automatic Dependent Surveillance-Broadcast (ADS-B) data, which is openly available on Flight Radar24. Flight parameters were measured for several flight phases, such as take-off, initial climb, climb, cruise, descend, and final approach. In addition to these flight phases, flight parameters at waypoints on the arrival and departure were also measured. Furthermore, all flight parameters are modeled stochastically through the probability distribution function (PDF) approach. The best model for each parameter is obtained using the Maximum Likelihood method (MLE) with some relevant Kosmolgrove Smirnov Test criteria. The use of ADS-B data along with the stochastic model presented in this paper provides comprehensive information on flight behavior, which can be further utilized for an aircraft simulation model of an Air Traffic Management system.


Introduction
Aircraft Traffic Management (ATM) gives safe, economic, and efficient air traffic and airspace during all phases of aircraft operations through integrated air traffic services, airspace management, and air traffic flow management. The integrated management and collaboration with all parties and involving airborne and ground-based functions through facilities and seamless services [1]. One effective method for studying and analyzing complex systems such as ATM systems is a simulation model that provided visualization to support the validation process [2]. The flight parameters model is required to develop an aircraft kinematics model as part of the ATM simulation model. Estimating the aircraft flight parameter using stochastic modeling based on ADS-B data is the advanced method used in this research.
Open access to flight data from ADS-B has provided researchers with more insights into air traffic management. Specific flight data such as position, altitude, velocity, and vertical speed of common aircraft types are broadcast and continuously received around the world by these ground receiver IOP Publishing doi: 10.1088/1757-899X/1173/1/012053 2 networks. Nowadays, as recommended by the International Civil Aviation Organization (ICAO) [3], almost all commercial aircraft are equipped with ADS-B that makes it easier to collect ADS-B data. The biggest challenge is that ADS-B data are in large quantities. Machine learning and data mining methods have provided efficient ways to process these data into organized segments based on its flight phases.
Aircraft flight parameters from the flight phases observed include take-off, initial climb, climb, cruise, descend, and final approach. The significance of the performance indicators differs from one flight phase to another. In this research, five aircraft types are used to analyze flight parameters at specific flight phases and waypoints. The paper aims to obtain the stochastic flight parameters model for aircraft operating in the terminal control area of Soekarno-Hatta International Airport. The step of obtaining the flight parameters at the specified flight phases and waypoints will be explained.

Literature Review
The ATM System's simulation model helps analyze operational, tactical, and strategic levels to improve the ATM system performance. The ability to represent social and technical factor provide a simulation model of an ATM system a comprehensive and complete perspective of the system [4] [5]. The ATM simulation model needs a good model of aircraft kinematics to represent the valid aircrafts traffic and movement in the control area. The kinematic flight model allows the simple generation of stochastic representations for crucial parameters. It is easily incorporated into various simulations with these representations changed to reflect empirical data describing the performance of various aircraft [6].
The proper aircraft kinematics model needs a stochastic flight parameter model representing any aircraft type that operates on the ATM control area. Commonly, flight parameter can be gathered from aircraft performance data provided by aircraft manufactures and flight performance model that developed by research/commercial institution such as Eurocontrol Experimental Centre that develops Base of Aircraft Data (BADA) [7] and Lissys Ltd. that develops PIANO-X [8]. These flight parameter models need additional cost and have license limitations.
Collecting and extracting data from ADS-B gives an alternative method to obtain a valid stochastic flight parameter model. The advantage of this method is that ADS-B data can be accessed easily at a lower cost. ADS-B data can be gathered freely using off-the-shelf receivers. There are also ADS-B data providers that provide ADS-B data worldwide for a lower cost and can be downloaded through the internet, such as Flightradar24, FlightAware, and ADS-B Exchange [9]. In order to process large amounts of ADS-B data, machine learning methods can be used, such as identifying flight phases [10]. The stochastic modeling applied in this method is used to handle dynamic and large-scale problems whose main characteristics are quite challenging to quantify or identify [11]. Stochastic modeling has been used in present aviation-related research, such as the queuing model of air traffic [12], airport operation [13], and air traffic management [14].
Sun et al. utilize large amounts of ADS-B data to construct accurate aircraft flight parameter models. A comprehensive set of methods is used for extracting complete operational and limitation models solely based on the open data approach. The method uses machine learning and data mining to efficiently process loose data into organized segments based upon trajectories and respective flight phases. The Maximum Likelihood Estimation (MLE) method is used to estimate the parameter's best value based on the observation. The results are compared with the existing models such as BADA and the Eurocontrol Aircraft Performance Database [15] to validate the results [16]. Wind speed, wind direction, and temperature may be extracted from massive ADS-B and Mode S data using the Meteo-Particle model [17] or from local METeorological Aerodrome Reports (METARs) data.

Data collection
The ADS-B data are downloaded from the FlightRadar24 database in Comma Separated Values (CSV) format. Data are classified based on arrival/departure from the specific airport and aircraft types. In this research, the data collection and processing based on fights departed and arrived at Jakarta Soekarno-Hatta International Airport (WIII) [18]. Data from FlightRadar24 contains information of timestamp, Data preparation used wind data model compiled from METARs data that contains weather reports coming from airports or permanent weather observation stations. METARs data used in this research is taken from Aviation Meteorological Information System in Meteorological, Climatology, and Geophysical Agency (BMKG) website (aviation.bmkg.go.id) on time range from May 2019-May 2020 and recorded on WIII weather station [19]. METARs data is collected automatically using Python scripts and Selenium module as a proxy browser to connect to the website. Retrieved data is a two-line METARs standard that will not be entirely processed, and only wind speed and direction data will be used to make the atmospheric model in an attempt to retrieve true airspeed.
Some data preparation and assumption was made in order to anticipate some limitations on retrieved METARs data. Limitations of METARs data and corresponding data preparation, including: -Low data frequency, METARs data is recorded twice an hour, so the assumption on wind data is adjusted to the first recorded METARs data prior to recorded flight. -The occurrence of variable wind direction and gust, some of the recordings, including natural weather phenomena in gust (G) and variable wind direction (VRB). Data preparation used for this limitation is to make the data value the same as prior recorded wind direction and speed. The variable wind is omitted as null and later imputed using K-Nearest Neighbors (KNN). -Data units, standard METARs are using knots and degrees as wind speed and direction unit. For the simplifying parameter calculations, it will be converted to SI units.

Data preparation
The purpose of data preparation is to obtain some parameters that not available in the original ADS-B data. This parameter is required for further analysis, such as probability distribution function (PDF) estimation. Input data required for data preparation are ADS-B data available in flightradar24.com and METARs database for the weather analysis. METARs database is used for calculating TAS and Mach numbers. Both input data will be prepared for flight metadata, time point database for flight phase analysis, and flight parameter database for PDF estimation. The original ADS-B data contains only altitude (in ft), ground speed (in knots), and direction (in degree). To get the precision parameter flight data, especially for flight phase analysis, interpolating is necessary. This action is required due to the non-uniform of each timestamp interval. The interpolation procedure will follow the linear interpolation based on the interval for each row. The calculating parameters and interpolating procedures will be done parallel. Some parameters could be obtained based on the original ADS-B data, such as vertical rate (rate of climb) and flight path angle.
PDF estimation requires a lot of data, such as 3000 data. Hence, classification is a must for better distribution. Hence, it is classified into two categories, such as depart from WIII and arrive at WIII. Besides the departure and arrival category, each data will also be classified by aircraft type-this required metadata for ease of data management. The metadata will consist of naming convention, original name file, category departure or arrival, aircraft type, departure and arrival airport, and departure and arrival runway. The original ADS-B data does not inform the departure and arrival airport and its runway. K-Nearest Neighbors (KNN) algorithm is necessary to detect airports and runways in use.
KNN is a non-parametric classification algorithm that did not make presumptions of the dataset and classify data based on the number of closest neighbors dataset given [20]. For a given dataset that consists of two categories with two specific parameters. In this research, the dataset comes from an airport or runway location. To detect which airport and runway from each ADS-B data, it is required the latitude and longitude of take-off and landing position. For runway detection, it is required additional data such as aircraft heading and runway heading.
This research analyzes flight parameters for each aircraft type. However, each flight parameter for take-off, climb, cruise, and others are not identical. It is required to detect the flight phase for ease of analyzing segmentation. The result for each flight phase will be the index number. The index number is shown in the first column (0,1,2,3,…). In this research, six flight phases must be analyzed: initial climb, climb, cruise, descent, approach, and final approach. Each flight phase will be detected based on its criteria, as shown in Table 1. Figure 1 shows the result of phase-detection for a specific flight. The red dot indicates the start point of the flight phase.  The preparation of data gives ease of use for flight parameter modeling. There are three probability distribution functions (PDF) that are used to estimate the flight performance parameter value by applying the maximum likelihood estimation (MLE) method. Each flight parameter will be fitted to three distribution, i.e., Normal Distribution, Beta Distribution, and Gamma Distribution. Kolmogorov-Smirnov Test (K-S Test) is intended to find the best distribution. The D score results from K-S Test will be the largest distance of two functions, as shown in Equation 1. The smaller D score will give a better fit for the statistical data [21].  Figure 2 shows the example of three distributions that fit with the statistical model. The best distribution is shown as Beta Distribution, which has the smallest D value from its figure. This research also analyzes flight parameters for specific waypoints. This process will filter which flights that went through the intended waypoint. First of all, the waypoints near Soekarno-Hatta International Airport need to be classified by runway and category (Departure or Arrival). This classifying process is listed in the Soekarno Hatta International Airport Terminal Chart published by the Indonesia Directorate General of Civil Aviation (DGCA) [22]. The flight parameter that will be analyzed is the closest position to the waypoint, and the closest position is less than 1 NM to its waypoint using the KNN algorithm.
For example, flight_27698 is intended to land at Runway 25R, as shown in Table 2. This flight went through waypoint BUNIK, RAMAL, PRIOK, ARKAP. This flight trajectory is shown in Figure 3. The closest position to its waypoint will be row numbers 5091, 5301, 6135, and 6263. Flight parameters for these row numbers are then extracted to get PDF estimation.  Figure 3. Flight_27698 trajectory with its waypoints while arriving at Runway 25R

Result and Discussion
Estimated flight performance parameters are categorized and classified into arrival/departure, aircraft type, and flight phase. Each parameter is plotted for analyzing and validating the process. Figure 4 shows a sample of the estimated flight parameters of the Airbus A320. It can be shown there are three types of the distribution function, Normal distribution function (red color), Gamma distribution function (blue color), and Beta distribution function (green color). Table 3 shows the mean and standard deviation of flight parameters from five types of aircraft.  An analysis of estimated parameters on all five aircraft types shows a good fit for overall aircraft climb and cruise parameters. Estimation on the cruise Mach number differs below ~3% across all aircraft types, while estimation on cruise altitude differs ~15% compared to a database value. Estimation on climb parameter also shows adequate measures on the rate of climb, constant Mach number, and constant CAS number estimation. The estimation of descending flight phase parameters is inaccurate compared to a database value. Generally, estimated descend parameters resulting in less descend rate, lower constant Mach number, longer descend time, and longer distance compared to the value in the database. Parameter estimation also shows the struggle in the initial climb and final approach phase, which results in high error across all aircraft types. Similar to Airbus A320 results, parameter estimation from other aircraft types in distance and time on the climb and descend phase yields high standard deviation value.
Based on the waypoint detection and aircraft types, Table 4 below indicates 16 waypoints for arrival on Runway 25R/L as a sample of estimated flight parameters on a specific waypoint using Airbus A320. These estimated parameters are compared with the reference parameter from WIII Terminal Chart issued by Indonesia DGCA [22].  Table 4. The PDF parameters, mean and standard deviation of estimated flight parameters on specific waypoint for arrival Runway 25L/R compare to reference parameters from WIII Terminal Chart From Table 4, it can be seen that the estimated parameters for altitude are close enough to the reference altitude. For airspeed, calibrated airspeed from estimated parameters has a value that closes enough to the reference speed limit note in indicated airspeed. The estimated parameter has a significant difference in the track angle. This difference may occur because aircraft movement did not exactly fly over the waypoint, so the track angle did not have a track angle as specified in the terminal chart. Some waypoints did not have enough data because many aircraft fly directly (with the Air Traffic Controller's approval) without passing these waypoints.

Conclusion
This research utilizes ADS-B data that are obtained from FlightRadar24 to estimate valid stochastic flight parameters. Flight parameters are extracted for specific flight phases and waypoints. The flight data consist of information on the aircraft that arrive and depart from Soekarno Hatta International Airport. Each data file has been equipped with metadata to facilitate file classification and search. The ADS-B and METARs data processing step is explained to get flight performance parameters for a specific flight phase. It is challenging to work on raw data because every bit of information provided may not be useful. Data preparation will manage data into a homogenized form to be easily matched to those significant figures. Managing data will be resulting in well-structured data and reduces timeconsuming with a low risk of losing essential data. Data preparation filters a large amount of raw data collected from FlightRadar24 into detailed data that will be useful in the future analysis of the flight trajectory. Data management and data preparation are essential to make analysis easier. The validation process confirms that most of the estimated flight parameters at any flight phase close to value when to compare to EuroControl Aircraft Performance Database. The estimation of the vertical rate parameter needs accuracy by developing advanced methods. Estimated flight parameters for specific waypoints show that altitude and airspeed have a slight difference with the terminal chart's reference values.