Origin-destination matrix estimation in Solo agglomeration with the maximum likelihood method

The movement of vehicles is one form of fulfilling community needs. High vehicle movement can cause transportation problems, one of which is congestion. To overcome this problem, planning a transportation system is essential. Using the EMME/4 program, we analyzed the distribution of movements in the Origin Destination Matrix (MAT) using a gravity model. Using the User’s Equilibrium method, the study aims to determine the parameter β, data validity value (R2), and movement patterns. The research study area was carried out in Solo Agglomeration with 19 zones. The results indicate a β value of 0.0796 (R2 = 0.817), with an estimated Solo Agglomeration movement of 34428 passenger-car-units per hour in 2023. The most extensive distribution of movements occurred within internal movements, making up 62.85%, while the smallest were external-external movements, accounting only for 4.01%.


Introduction
The growth of urban activity likely triggers land redevelopment.This kind of development is believed to be a significant component of city evolution [1].Besides its positive effect, there is also a negative impact, such as the explosion of traffic movement.Traffic is inseparable from the needs of people's lives.According to [2], the demand for movement is a derivative need, where every movement occurs to fulfill community needs.If the movement that happens is not controlled correctly, it can cause transportation problems.These problems can be minimized or prevented through one of the transportation models, namely the estimation of the movement distribution represented by the Origin-Destination Matrix (OD-Matrix).There is a different method for estimating the O-D matrices.The maximum likelihood model is used by [3] with total trip attractions and trip productions of the study zones are known.[4] develops a new statistical method for estimating the matrix with random linkchoosing proportions.Moreover, [5] and [6] use observed link flows and the prior to estimate the matrix with the Bayesian Network model.The goal of those methods in estimating the matrices is to predict the travel demand.
This study aims to determine how traffic moves throughout the Solo Agglomeration area.As the center of the national administration, commercial hub, trade core, and service center, Solo Agglomeration becomes a pull factor for people to go to the area [7].In addition, the site is highly urbanized and has distinct characteristics in each region [8].This research is expected to realize a longterm transportation system as a guideline in determining strategies for preventing or overcoming transportation problems related to vehicle movement.
To achieve the objectives of this study, a gravity parameter is needed, namely the β parameter.To estimate the trip distribution, this parameter will be used to generate the cost function.The β is obtained IOP Publishing doi:10.1088/1755-1315/1294/1/012005 2 by parameter calibration utilizing the approach of maximum likelihood with the impedance function of negative exponential and the trip generation-attraction constraint.

Study Method
The location of the study is in the Solo Agglomeration area of 5677 km 2 .This region comprises seven districts: Surakarta City, Klaten, Boyolali, Sukoharjo, Sragen, Karanganyar, and Wonogiri Regency.The road networks involved in this study are the toll road, arterial, and collector road networks with primary access to regional movements.The Solo Agglomeration area is divided into 13 internal and six external zones.The internal zones are divided based on the breakdown of districts or cities.In contrast, the external zones are divided based on towns or communities following the administrative area of Solo Agglomeration.
The primary source for the research is obtained from the survey results of traffic flow during observation, while secondary data is collected from related agencies or other sources.Both data types are processed into an estimation process assisted by the 'Equilibre Multimodal, Multimodal Equilibrium' (EMME)/4 program.EMME/4 is a program with high CPU performance in modeling road networks and multimodal requests [9].Through the EMME/4 program, road network modeling will have a road network database, including capacity, free flow speed, and free flow journey time.The road network database was obtained by referring to 'Manual Kapasitas Jalan Indonesia' (MKJI) 1997 [10].The EMME/4 program yields a matrix as its final output, which will be employed for MAT estimation through the Maximum Likelihood method and the negative exponential resistance function.Still, before that, it is necessary to calibrate the β parameter as a parameter that describes the average travel cost [2].This β parameter will be used to compute the resistance function with a negative exponential approach within the gravity formula.After the MAT model results are obtained, the accuracy of the data is tested with the determination coefficient (R 2 ) based on the comparison of observed and modeled traffic flows.Figure 1 provides a more thorough explanation of the research flow.

Maximum Likelihood (ML) Formulation
This study used the ML method in the calibration process for the β parameter.Based on combined probability density theory, for example, a random variable x 1 , x 2 , x 3 , ... xn, which is a stochastic variable characterized by its probability density function, which is expressed by f(x, ) [2], [11].The value of  is an unknown variable.The probability of getting x1, x2, x3, ... Xn is stated in Equation 1 below.( 1 ,  2 , . . .  ; ) = ( 1 ; ), ( 2 ; ), . . .(  ; ) The formula for the function L is multiplicative, so it will be easier if expressed in logarithmic form like this following Equation 2. Obtaining the value of  can be done by reducing the function L to  equal to zero as in Equation 3.
The derivation of the multinomial distribution through the ML method is stated in Equation 4 below.
The p-value is expressed as a parameter signified by θ so that the ML function θ follows a multimodal distribution stated in Equation 5: The value of If, for example, the pid value is the probability of obtaining specific data from the distribution of trips between zones, the pid value can be stated in Equation 6: The ML function to obtain values of  ̂ for every origin zone (i), and destination zone (d), is formulated in Equation 8.
The ML method's objective function is to maximize Equation 8.Then, by maximizing Equation 2 with the total movement limit, the following Equation 9 will be obtained.
Based on a pure logarithmic function, there will be a maximization of the L1 function, which becomes: ∑ ∑   − ) +     =1  =1 (10) By inserting Equation 6 into Equation10, we get: ∑ ∑   ) +  +     =1  =1 (11) With the total movement limit, the value of θ is assumed to be 1.When the constant k is from Equation 5, the objective purpose of the ML method will be expressed in mathematical formula 12 as follows.

Analysis Findings and Discussion
The EMME/4 program and Microsoft Excel were utilized to support the Tid estimation results in this investigation.The parameter is calculated using the negative exponential resistance function and the KM method's 6 th iteration, which results in a value of 0.0796.The parameter's value can indicate the typical trip cost on the resistance function or f(Cid).Transportation planners and policymakers can use the β parameter to estimate MAT in the following year with an overview of the Solo Agglomeration area to identify which road sections require handling in accessibility changes in Solo Agglomeration.The Solo Agglomeration research area had a total movement of 34428 pcu/hour as determined by the 2023 MAT estimation technique.Internal-internal, internal-external, external-internal, and externalexternal movements are among those obtained.The following Figure 2 shows the details of the number of movements.Based on the movement across zones, the internal-internal movement shows the most significant movement between zones.This is caused by how widely movements are distributed among districts and cities.If the Solo Agglomeration study covers the movement, it is regarded as an internal movement of Solo Agglomeration.Zone D to zone C transition is one instance of movement.Zone C defines an external-internal movement as one coming from zone D to zone C.However, when observed from within that entity, the movement is seen as an internal movement within Solo Agglomeration.Given that Solo Agglomeration comprises seven districts/cities, each of the most significant district or citylevel motions is an external-internal trip.Thus, the Solo Agglomeration's internal movement came to dominate.The internal-external zones experience less movement than the external-internal zones.This is because zone C contains large-and small-scale industrial and office spaces, attracting workers from inside and outside the Solo Agglomeration.Even though there were more moves due to this, they were still less frequent than those between internal zones.
The slightest movement is produced when motion occurs in the external-external zone.It be observed in zones G, H, I, and J, all external zones with low values for generation and attraction.This results from people not wanting to engage in morning activities during prime time (06.30-07.30WIB).To make the movement distribution study procedure simpler, given the size of the Solo Agglomeration, the initial 19 zones were reduced to 10 zones (zones A-J).As seen in Figure 3, a desire line represents movement outcomes across zones.The movement increases as the line becomes thicker, and vice versa.According to Figure 3, zones D (Sukoharjo Regency) and C (Surakarta City) are mainly where the movements with the highest flows occur.Zone C's sizable central business district is the reason for this.This fact is reinforced by an earlier study [12], which indicates that Surakarta City has a burgeoning health services industry and social activities that significantly impact the neighborhood.Moreover, zone D contains a substantial residential sector, especially for the construction of new housing, with as many as 1294 settlements in 2019 -2021 [13].A validity test is performed after receiving the MAT modeling result to compare traffic flow data from observations to traffic flow data from the modeling results.An R 2 value of 0.817 was obtained with an error of 18.3 %. Figure 4 displays the R2 value derived from the EMME/4 algorithm.

Conclusion
The parameter's value is calculated as 0.0796 based on the outcomes of the estimation of the distribution of movement in the Solo Agglomeration using the ML approach and the negative exponential resistance function.The data validity value achieved with a coefficient of determination (R 2 ) is 0.817 based on the The internal-internal zone movements, which account for 21639 pcu/hour (62.85%) of these total movements, have the most considerable estimated movement dispersion.Due to the sizeable central business district (CBD) in zone C and the sizable residential region in zone D, this movement has a high value.Further studies that include the collector and local road networks are required to strengthen the accuracy of this research and make the findings a greater reality.

4 Figure 2 .
Figure 2. Movement Across Zones in Solo Agglomeration Region

Figure 3 .
Figure 3. Desire Line, Estimated Results for Solo Agglomeration Movement in 2023

6 MAT
gravity model.The overall movement per hour with the generation-pull restriction is 34428 pcu.