A review of accident data for traffic safety studies in Indonesia

Accident data is a crucial indicator of traffic safety. This study investigates the use of accident data in traffic safety studies in Indonesia. The analysis was conducted on studies that met the eligibility criteria, which included using traffic accident data in Indonesia and articles published in Indonesian and English language journals or proceedings. The search was conducted on four databases: Garuda, Neliti, Google Scholar, and Scopus. The final selection resulted in 50 articles. The data analysis indicates the need for improvement in the number of studies, the utilization of data elements in investigations, supplement data, and data analysis techniques. In addition to providing recommendations to optimize the use of traffic accident data in future studies, this study also emphasizes the importance of improving the accuracy of traffic accident data.


Introduction
Worldwide traffic safety is still very concerning.Traffic accidents have caused in 1.35 million fatalities yearly, making them the leading cause of death among children and young adults [1].The same concern is also present in Indonesia.After experiencing a decrease from 116,411 in 2019 to 100,028 in 2020, traffic accidents rose again to 103,645 in 2021 [2].The number of traffic accidents indicates traffic safety [3].Although it is prone to errors [4] and has some other limitations [3], data on traffic accidents form the basis for most decisions to improve traffic safety in various countries [4].Traffic accident data determines accident-prone locations and identifies factors contributing to accidents.This information is then utilized to implement interventions for enhancing traffic safety.
Kweon [5] mentioned two types of traffic accident data: traffic safety data and supplement data.Generally, police accident data is used as the primary source of traffic safety data.Additionally, hospital accident data and safety survey data are also included in it.In addition to the traffic safety data, supplement data can be used to enhance the understanding of factors contributing to the occurrence and outcomes of accidents.Supplement traffic data include roadway and traffic data, license and registration data, travel survey data, and sociodemographic and economic data.
Referring to the classification provided by Kweon [5] above, this review intends to investigate how this accident data is employed in traffic safety studies in Indonesia.These studies were obtained through a literature search conducted in several databases.The findings of this study are expected to offer valuable insights for upcoming research aimed at using traffic accident data to enhance traffic safety.This study employed the traditional review approach.A traditional review aims to determine what has been achieved, enabling the consolidation and summarization of past efforts and identifying gaps [6].Several traffic safety studies have used this approach [7]- [9].However, it is worth noting that traditional reviews have distinct characteristics from systematic literature reviews [10].These characteristics can be considered weaknesses of traditional reviews, making them prone to publication bias and the file-drawer problem [6], [11].Nevertheless, when executed meticulously, this type of review can offer an extensive overview of the current state of research within a short time frame and at a low cost [11].This review is based on Weed's guidelines [12] to address these weaknesses.

Literature search
The present study aimed to answer how previous studies have utilized traffic accident data in Indonesia.This review was conducted following the guidelines from Weed [12], which include determining the statement of purpose, search methods, criteria for evaluating the quality of studies, methods for summarizing evidence, and criteria for conclusion and recommendations.As mentioned above, using guidelines in writing a traditional review is necessary to avoid publication bias and the file-drawer problem.
In the literature search, several eligibility criteria were established.Studies had to involve utilizing traffic accident data within the Indonesian region.The traffic accident data classification referred to Kweon [5].Studies needed to be published in journals or proceedings and written in Indonesian or English.The requirement for studies to be published in journals or proceedings served as a means to assess the quality of the research.No specific timeframe criteria were applied in the search.
The literature search was conducted in two stages.The first stage was conducted in October 2022 and was carried out in Garuda, Neliti, and Google Scholar databases.The keyword used was "data kecelakaan lalu lintas."For the search on Google Scholar, only results from the first five pages were considered.The second search stage was conducted in the Scopus database in July 2023.The keyword used was "traffic accident" AND "Indonesia."The purpose of the second stage was to complement the results of the first stage.

Data analysis
The articles obtained from the search were then selected based on the eligibility criteria.The relevant articles were abstracted and analyzed descriptively.The results were then depicted in tables and graphics.Furthermore, a narrative summary was provided to explain and relate the selected studies to the review objective.Conclusions and recommendations were then provided based on the findings.

Traffic accident data
Figure 3 illustrates the sources of traffic accident data used in the selected studies.Traffic accident data was obtained from six institutions: the police, the Ministry of Transportation, the Komite Nasional Keselamatan Transportasi (abbreviated as KNKT, which stands for National Transportation Safety Committee), the Badan Pusat Statistik (abbreviated as BPS, which stands for Central Statistics Agency), toll road companies, and hospitals.Most of the data was sourced from recorded accidents by the police (n=38).Two studies referred to BPS data, but it should be noted that the traffic accident data in BPS  Interestingly, some studies utilized specific accident data, such as accidents on toll roads (n=3) and accidents involving public transportation with specific criteria handled by the KNKT (n=1).The KNKT is a non-structural institution tasked with investigating traffic accidents involving public transportation, with specific criteria including a minimum of eight fatalities, attracting widespread public attention, causing significant damage to infrastructure, exhibiting repetition in vehicle types or specific locations, and resulting in environmental pollution [63].
One study [20] utilized the Health and Demographic Surveillance System (HDSS) survey data.The HDSS is a surveillance system that collects periodic data on population transitions, health status, and social transitions starting from 2015 [64].HDSS is conducted by the Faculty of Medicine, Public Health, and Nursing at Universitas Gadjah Mada in collaboration with the Sleman District Government.This study [20] utilizes HDSS data to investigate the relationship between demographic characteristics and accident injuries.The result revealed that age is associated with accident injury status.
Figure 4 shows the data duration in the selected studies.Most studies utilized traffic accident data within a relatively long timeframe, ranging from three to five years.Some studies used data with a timeframe of more than five years (n=11), indicating consistency in data recording.
Referring to Kweon [5], the data elements are divided into information regarding location, environment, vehicle, driver, injury severity, and others.Location information includes province/regency names, road names, and location coordinates.Environment information includes time and date, weather conditions, lighting, crash types, road conditions and classifications, land use, and traffic conditions.Vehicle information includes vehicle types, brands, vehicle movements and maneuvers before the crash, and an estimate of property damage.Driver information depicts individuals involved in the accident, including drivers, motorcyclists, pedestrians, and cyclists.The data enclose gender, age, profession, address, education, marital status, economic status, role, and types of violations.Injury severity includes minor, moderate, severe, and fatalities.Lastly, the "others" category includes the number of accidents, chronology, and causes of the accidents.Table 2 displays the data elements used in the selected studies.Injury severity is the most common data element in the selected studies, while driver information is the least common.Supplement data can be used to enhance understanding of the factors contributing to the occurrence and outcomes of accidents [5].Seventeen (34%) studies utilized supplement data.Supplemental secondary data included annual traffic volume and geometric data obtained from toll road company databases (e.g., [48]).Supplemental observational data included traffic volume, road geometry, and speed (e.g., [24]).

Studies' outcomes
As shown in Figure 5, most studies utilized traffic accident data to determine the characteristics of traffic accidents (n=13) and identify black spots (n=15).Previous studies also frequently used traffic accident data to identify patterns of accidents (n=11).Studies that generate accident models to investigate factors related to accidents and calculate accident costs tend to receive less attention.Previous studies described accident characteristics to provide an overview of the frequency of accidents based on specific criteria, such as the day or month of occurrence, time of occurrence, collision type, severity level, vehicle type, and the age and gender of road users involved (e.g., [16]).Generally, Previous studies use several commonly used methods in determining black spots or black sites, including Z-score and Cussum (e.g., [31]) or Equivalent accident number and UCL (e.g., [61]).Previous studies attempted to identify accident patterns using data mining analysis, primarily employing K-means clustering.These analyses were conducted not only to cluster locations based on vulnerability (e.g., [14]) but also to cluster the occurrence of accidents based on time (e.g., [15]).

Discussion
This study investigates the utilization of accident data in traffic safety studies in Indonesia.The results showed that researchers' interest in using traffic accident data still needs to be improved.The first publication was in 2008, reached its highest number of publications in 2017, and experienced a decrease in the number until 2022.Previous studies' coverage areas were mainly local, while studies using national data were minimal.Like other countries, the police are the primary source of traffic accident data [5].The KNKT and toll road companies provide accident data according to their respective authorities.The use of survey data needs to be improved, as only one study utilizing HDSS data has been found so far [20].Moreover, this survey is regional in scale, not national, and not explicitly focused on traffic accidents [64].The presence of studies using data with a timeframe of more than ten years indicates that the documentation has been sustained.It is worth noting that traffic accident data recorded in BPS publications date back to 1992 [2]. Furthermore, only a few studies utilize supplement data and optimize the various available data elements in accident data.Supplement data is typically used in studies to identify black spots, involving traffic observations and road conditions at specific locations to determine appropriate interventions.The utilization of driver data is relatively low compared to other data elements.Although supplementing data and using various data elements is driven by the research questions, considering both in data analysis may provide a more comprehensive understanding.
Moreover, only a limited number of studies have specifically analyzed accidents involving certain vehicle types, with four studies focusing on motorcycles [20], [41], [59], [62] and one study on buses [43].This result indicates a need for more attention to specific traffic accident cases to enable indepth analysis.Specific analysis has yet to be found for other vulnerable road users, such as pedestrians.
Based on the review results, several recommendations are suggested to optimize the use of traffic accident data in future studies.Firstly, there is a need for an accessible database for various traffic safety studies, an example, similar to the Highway Safety Information System (HSIS) in the United States [5].This database should meet various studies' data needs, from policy studies to engineering and behavioral studies.It should be nationally integrated and incorporate multiple sources.
Traffic accident data documented by the police is already accessible to the public through publications by BPS [2].The KNKT regularly provides investigation reports on its website [65].These publications should be further enhanced to be more "researcher-friendly," such as by providing comprehensive data elements for researchers to request access.The provision of disaggregated datasets should, of course, comply with research ethics regulations.The relevant agency's website can also display basic aggregate data analysis, such as frequency, rate, and trend analysis [5].These analyses can be added to the traffic accident statistics displayed on several agency websites [66]- [68].This way, researchers and the public can quickly obtain a general overview of traffic safety.
Moreover, the utilization of traffic survey data also needs to be optimized.In addition to HDSS, there is The Indonesian Family Life Survey (IFLS), a national-scale longitudinal survey, with this sample representing approximately 83% of the Indonesian population [69].Questions related to traffic accident experiences can be added to the survey questionnaire.Furthermore, accidents recorded in self-reports can be linked to official databases to enhance data validity [70].However, researchers need to be mindful of the limitations of self-reporting, such as reporting biases.They must take appropriate measures to address social desirability and incorrect memory recall.
Second, considering that traffic accident data is rich in the necessary information to be used as a basis for improving traffic safety, its utilization needs to be optimized.Future studies analyze specific collision types involving specific road users (such as vulnerable road users like pedestrians or cyclists) or specific locations (such as urban areas or intersections).Generally, vulnerable road users have received less attention from researchers [70].Some previous studies were found to be related to motorcycle accidents, as this type of accident is dominant in Indonesia [1].
The use of supplement data should be encouraged.For example, in studies on interventions in black spots, the observation of traffic conflicts can be used as supplement data along with observations of road and traffic conditions.The traffic conflict technique shows promise as a complementary indicator to accident data, and it has also been applied in several studies in Indonesia [71].
Third, there is a need for appropriate data analysis techniques.Various studies have employed data mining techniques for examining accident data, encompassing classification techniques like decision trees, clustering techniques, and association rule mining [72].For example, to tackle heterogeneity in accident data, which may conceal specific patterns, Depaire et al. [73] applied latent class clustering to detect more homogeneous traffic accident categories.After forming the clusters, injury analysis is performed for each cluster.De Oña et al. [74] also recommend using two or more data analysis techniques.The data heterogeneity can also be addressed by improving data quality through noise reduction, such as by using an algorithm called NoiseCleaner, suggested by Deb and Lew [75].The review conducted by Gutierrez-Osorio and Pedraza [7] on using analytic algorithms and machine learning methods in traffic safety studies can be further referenced.
Lastly, it is important to underline caution when utilizing traffic accident data.Several studies have raised issues regarding the accuracy of accident data [4], [70], [76], [77].There are two types of errors in accident data: reporting and recording errors [4], [77].Several studies have found that the most inaccurate data pertains to road characteristics, severity levels of accidents, and accident locations.Since traffic accident data forms the backbone of safety systems, errors can render safety improvement efforts ineffective [4].With these errors, accident patterns become more challenging to identify, the identification of accident-prone locations becomes inaccurate, and the determination of parameters responsible for accidents becomes imprecise.
Several efforts can be made to address the potential errors in accident data.Improving the reporting and recording system of accident data can be achieved by enhancing the skills and motivation of personnel, automating the recording process, utilizing modern tools, and increasing public awareness to report accidents to the police [4].Additionally, the black box in vehicles can be utilized to assess the accuracy of accident data [77].
The study has provided an overview of how accident data is used in traffic safety studies in Indonesia and offers insights for future research.However, the study has limitations, primarily due to its non-systematic nature.It is an initial exploration of utilizing accident data and should be followed up with a systematic literature review.Additionally, insights from using accident data in other countries should be sought.
The limitations of the accessed database, chosen keywords, and the exclusion of gray literature are also limitations of this study.These limitations have resulted in a limited number of analyzed studies.Nevertheless, the findings of this study offer valuable insights and can be followed up with a systematic literature review, as suggested above.

Conclusion
This study has provided an overview of the use of accident data in traffic safety studies in Indonesia.The findings show the need for improvement in the number of studies, using data elements in investigations, supplement data, and data analysis techniques.This study has also provided recommendations, particularly regarding the need for a comprehensive database that is easily accessible to the public and researchers and optimizing the utilization of traffic accident data.Finally, this study emphasizes the importance of improving the accuracy of traffic accident data.

Figure 2 .
Figure 2. Area of the studies.
identify black spots or black sites also investigated these accident characteristics.

Table 1 .
The distribution of selected article sources.