Community perceptions classification towards the development of transportation access at Darmo Corridor Surabaya using SVM and naive bayes methods

This research conducts Classification of Public Perceptions of Transportation Access Development in Darmo Corridor Surabaya using SVM and Naive Bayes methods. The increasing mobility demands necessitate adequate public transportation facilities, such as the Surabaya bus, which has been operational since 2018. The Darmo corridor has witnessed significant development in public transportation facilities and infrastructure, serving as a transit point between two Surabaya bus routes. The purpose of this study is to evaluate public perception towards the implementation of Transit Oriented Development (TOD) concept in the corridor. In this study, a qualitative descriptive analysis was conducted on various aspects of TOD implementation, such as pedestrian paths, bicycle lanes, and green open spaces. Furthermore, public perception was analyzed using Naive Bayes Classifier and SVM methods by collecting data from Twitter accounts and giving weights based on the frequency of words in the tweets. From the analysis it could be concluded Naïve Bayes is not suitable for classification because it’s not stable enough, the SVM Methods matchs 95% proven when its repeated for 5 more time it. The results showed that the Darmo corridor successfully implemented the TOD concept well, with 90% of the community giving a positive response to the TOD implementation in the area. This finding indicates the success of TOD initiatives in improving transportation facilities and infrastructure in Darmo corridor.


Introduction
Transportation is a major component in living and life systems, government systems, and social systems.The socio-demographic conditions of the region have an influence on the performance of transportation in the region [1].The phenomenon of public transportation is related to the logic of modernization and capitalism.The phenomenon of the emergence of public transportation problems in big cities in Indonesia at this time cannot be solved technically alone.The shift in people's behavior patterns with the existence of mass transportation, in the form of bus ways, trains, for example, can be interpreted as a significant change in the choice of modes of transportation by the community.For users of transportation services, the existence of mass transit means that there are changes regarding the pattern of population mobility, patterns of transportation behavior [2].
The government the implementation of public transportation means that the government makes policies for the procurement of transportation ranging from technical, sociological to political in nature, such as land acquisition, spatial planning, capital, and so on.This policy continues on the government's IOP Publishing doi:10.1088/1755-1315/1263/1/012031 2 interaction with the power of capital.To build a sustainable public transportation system, it is necessary to revitalize all aspects related to public transportation [3].Municipal governments play an important role in planning and implementing public transport policies.Accessibility is pursued by planning a transportation network and a variety of means of transportation with a high degree of integration with one another.Equality is pursued through the provision of affordable transportation for all levels of society, upholding fair business competition, and sharing the use of space and utilization of infrastructure in a fair manner as well as transparency in every policy making [4].
Transit oriented development is the concept of developing an area that has a variety of mixed land use functions such as residential, commercial, office and other public facilities with a high level of density that are connected to pedestrian paths, bicycle lanes and parking zones, thus supporting the movement of people using a mode of public transportation that can deal with urban congestion problems and optimize the value of land functions in the area [8].The TOD concept can assist the development of transit areas in the Jalan Darmo corridor, so as to create a transit area that has various land functions and provides pedestrian facilities that support pedestrian friendliness, so that the area has good accessibility.
Darmo Surabaya Corridor is one of the main street corridors of Surabaya City which connects South Surabaya with Central Surabaya.Darmo has an important role in the mobility movement of the people of Surabaya because it is included in the development project plan which is passed by the Surabaya Tram train, and has been crossed by the Surabaya Bus.The area around Darmo is an area planned as a trade and service area [5].The development of strategic areas in the city with a rapid development speed will certainly have spatial consequences as well [6].
The scope of the research area is in the corridor of Jalan Darmo which is located between 2 subdistricts, namely Tegalsari sub-district, Wonokromo Sub-District, in detail can be seen in Figure 1.In this study, the community's perspective will be taken from the perspective of social media, especially the Twitter application.This considers that taking a point of view from social media regarding the image of a city is based on the statement of [7] which stated social media is a platform for forming "history" that is user-centered, sharing photos on social media thus allows the creation of shared heritage and collective memory.This public perception can be classified into positive and negative perceptions using the SVM and Naïve Bayes methods.With the development of today's technology and support of access to the availability of data abundance on big data, taking steps to analyze the community's perspective on the application of the Transit Oriented Development concept in the Darmo corridor, Surabaya City from social media will be the right way to conduct this research.This would also help to create better urban development policies in the future which are more suitable and on points with the needs of the community

Material and methods
The data used in this research is primary data.The data used are tweets in the Indonesian language found on Twitter.The tweets used are tweets that contain the opinions of the Indonesian people on transportation and transit-oriented development in the Darmo Surabaya corridor area.The total tweets used as data amounted to 267.
There are several methods used in the watch data analysis process, that is: 1.Text pre-processing: Case Folding (process for converting all text characters to lowercase and removing punctuation and numbers); 2. Tokenizing (the process of breaking the original sentence into words or breaking a sequence of strings into pieces) 3. Stopwords (vocabulary that does not include unique words or features in a document or does not convey any message significantly in text or sentences [10].4. Stemming (the process of obtaining root words by removing prefixes, suffixes, inserts, and confixes (combinations of prefixes and suffixes)); 5. Naive Bayes : Statistical classification method that can be used to predict the probability of membership of a class. .In this classification to find out how successful the NBC (Naive Bayes Classifier) algorithm is in classifying Indonesian-language texts.6. Confusion Matrix: Measurement of the accuracy of the classification is done to see the performance of the classification that has been done.In measuring the accuracy of classification, it is necessary to know the number in each predicted class and the actual class consisting of TP (True Positive), TN (True Negative), FP (False Positive), and FN (False Negative).7. Identify applicable funding agencies here.If none, delete this text box.8. SVM : Classification method that can be used to predict sentences into groups of negative or positive sentences.9. Comparing the performance of the NBC and SVM methods based on the average levels of accuracy, recall, precision and F-Measure.10.Word Cloud : a graphical representation of a document which is done by plotting the words that often appear in a document in a two-dimensional.

Naïve bayes clasification
Naive Bayes Classifier also known as Bayesian Classification is a statistical classification method that can be used to predict the probability of membership of a class.NBC is based on Bayes' theorem which has classification capabilities similar to decision trees and neural networks.In addition, NBC is proven to have high accuracy and speed when applied to large databases.In this classification to find out how successful the NBC (Naive Bayes Classifier) algorithm is in classifying Indonesian-language texts.The Bayesian method is a statistical approach to perform inductive inference on classification problems .The first time we discussed the basic concepts and definitions of Bayes' theorem, then used this theorem to classify in data mining.Bayes' theorem has the following general form (Equation 1).( ) while data xi is included in class 2 (yi =-1) when, because the training data can be separated linearly, there is no data that meets 0 = + b T x w .

Results and Discussions
The results of this study will classify people's perceptions of transportation on Darmo street Surabaya.To classify, two classification methods are used, namely Support vector machine and naïve Bayes.Distribution of training and testing data on public perception data of the Darmo Surabaya Corridor with 5 k-Folf.

Characteristics of the perception of Darmo Surabaya Corridor
The Darmo Surabaya corridor area is one of the areas included in the administrative areas of 2 subdistricts, namely Tegalsari and Wonokromo sub-districts.The Darmo Surabaya Corridor is one of the main street corridors in the city of Surabaya that connects South Surabaya with Central Surabaya.The characteristics of the area around the Darmo road corridor are dominated by conservation areas, trade and services, offices, education, health, green open space, and settlements.There is also a religious tourism area (Tomb of Sunan Bungkul).Based on the RTRW of Surabaya City in 2010-2030, the Jalan Darmo corridor area is an area planned as a trade and service area.The area of the study area which is limited by the physical boundaries of the research area is 121 ha and is divided into several blocks to facilitate the research process, which can be seen in Figure 2. The availability of pedestrian paths in a Darmo road corridor area is one of the important elements in the development of the TOD area.Pedestrian paths are lanes designated for pedestrian infrastructure and facilities that connect activity centers so as to provide smoothness, safety and comfort for these pedestrians.
In the research area, pedestrian paths or sidewalks around the Darmo road corridor area are only available on the main road, namely Jalan Darmo, which can be seen in more detail in Table 1.There is tactile on the pedestrian lane on the darmo road for the darmo road kali tidak tersedia

There are shade trees available 7
There is a tactile on the pedestrian path on Jalan Raya Darmo for Jalan Darmo Kali it is not available There are shade trees available The availability of pedestrian paths is also inseparable from the dimensions of the pedestrian paths.Dimensions on pedestrian paths or sidewalks are an important part of the provision of pedestrian infrastructure, namely to avoid the possibility of physical contact between pedestrians and collisions with motorized vehicles.The Darmo street corridor has a wide pedestrian way, ranging from 2.5 m to 3.5 m.Several modes of transportation that support accessibility in the Darmo road corridor are Suroboyo Bus, Damri Bus, Mikrolet V, and Trans Semanggi.Another mode of achieving accessibility is by cycling or walking.Pedestrian connectivity in this case is the ease of walking and accessibility which can be reached quickly and easily from transit points to activity centers or vice versa.
Cycling is another choice of emission-free, healthy and affordable mode of transportation.Cycling can also be a solution in reducing congestion problems and enlivening urban streets, as well as increasing the coverage of public transport station services.In supporting these activities, infrastructure is needed in the form of special bicycle lanes on the main road.
The availability of bicycle lane facilities in the study area is only found on the main road, Jalan Darmo.The lane is available on the left and right of the road with a width of approximately 1.5 meters and the bicycle lane is still integrated with other motorized vehicles, because it is located on the shoulder of the road Figure 3.The main attraction and icon on the Darmo road corridor is Bungkul Park which is a city-scale park located in the center of Surabaya.From the TOD analysis, several analyzes can be carried out, namely the availability of parking zones in Bungkul Park.In the study area, the parking zones around Bungkul park are neatly arranged and equipped with automatic tickets, but unfortunately, they do not yet have a special parking space located in the park area/area near the park with a special place.Just like the rainbow park on Jl.Ahmad Yani who already has a parking zone that doesn't take up the road.Bungkul park is also a public green space located within the study area block, but there are still many informal vendors selling outside the designated zone.Overall, the location of Bungkul park is wellorganized and well-maintained.It also has designated selling stands, but there are still street vendors who do not sell their goods within the stands.After conducting a study of several TOD variables in the Darmo road corridor, data processing was carried out to review public perceptions of accessibility and transportation in the study area, so as to find a relationship between the suitability of the TOD analysis results and the opinions generated in real time.Figure 6 is a word cloud of words that often appear based on the perceptions of the people of Surabaya on the layout and transportation of the Darmo Surabaya corridor.The words that often appear in the public's perception of the Darmo Surabaya Corridor are parks, bungkul, pedestrians, paths, bicycle paths, pedestrian paths, facilities.This proves that the accessibility of Jalan Darmo is quite good.The connectivity of pedestrian paths in the Jalan Darmo Corridor area is seen based on the average travel time of walking from the transit point to the center of activity in each block.In the Darmo Surabaya street corridor, there is a 100% tactile pedestrian path on the pedestrian surface which is not evenly distributed.The average footpath width in Darmo is around 2.5 m, this is in accordance with the Transit Oriented Development criteria.
The case folding process is carried out to change the uppercase letters in words to lowercase letters.Removal of punctuation marks and characters or emote icons is also carried out at this stage as shown in Table 2.In Table 2 we could see most of the tweets are about activities done at Bungkul Park is, the first indicated Bungkul Park is so crowded, and the rest initiated a meet up at the park.But the sentences are still not 'pure' as it contains lots of symbol, prefixes and still need to be purified through severap process.
Stopword is a stemming process carried out with the aim of getting the basic word.This process can also remove conjunctions and word prefixes such as "yang", "men", "lah".The stemming results are as in Table 3 below.The text mentioned at table 3 are sentences about Bungkul Park but it still contains spaces, symbols and suffixes.Tokenizing is a process to separate sentences into word for word based on spaces.The resulting word will be a variable which is then weighted by TF-IDF based on the frequency of the word that occurs frequently.The results of tokenizing can be seen in Table 4.

Classification of community perceptions with support vector machine
Separation of training and testing data on public perception data about the Darmo Surabaya road corridor using a K-fold of 5 is equivalent to 80:20 of the total data.This method has 3 estimation functions, namely linear, polynomial, and Gaussian RBF.Classification of public perception data about the Darmo road corridor using the Gaussian Basis Radial kernel function.The results of the classification in terms of performance are the values of accuracy, precision, recall and measure.This kernal function is very good to use when the data does not have a linear pattern.The best classification results from the training data are the 2nd, 3rd, 4th, and 5th folds which have the same value.The classification algorithm obtained is carried out on testing data as shown in Figure 7 and Table 5 of the Confusion Matrix.Based on Figure 7 and Table 5 it can be seen that the Gaussian function classification method in SVM can classify correctly.It can be seen from the data testing that there is no data that is classified as negative from the actual positive data and vice versa.

Classification of public perceptions with naïve bayes
The public's perception of the concept of the Darmo road Surabaya corridor can be classified using naïve Bayes with the concept of probability.First, we will look for the probability of each existing word category.Of the 1219 keywords, 985 were classified as positive and 234 were negative.Naiver Bayes should not be used on data that has a linear pattern for data division using k-fold.NBC can classify correctly when the data split is 90:10.Figure 8 and Table 8 are the results of classification using data testing.The results obtained by the naïve classification method can classify with an accuracy level of 75%.It can be seen from the testing data that there are still data that are classified as negative from positive actual data and vice versa.This whole process done to check the validity of the data from comparing between two methods and finding which one is better to classify the text mining date previously from the TOD range.It's a continuation to gauge the substance of previous words.As seen in the previous analysis Naïve Bayes is not suitable for classification because it's not stable enough.In contrast, the SVM Methods matchs 95% and when we repeat the process for 5 more time it proves to be stable.In table 5Darmo corridor gets positive perception with 95,74%.It could be conclude the TOD concept applied in the area proven to be matched and receiving positive perception

Conclusion
Based on the analysis results of the Darmo road corridor on several TOD variables, it is concluded that the accessibility of the Darmo road corridor is quite good.Variable Pedestrian friendly, the availability of pedestrian paths in the study area reached 55.2%, Increasing the availability of pedestrian paths to 44.8% from the current availability.The Darmo road corridor area has bicycle lanes on Darmo road Sajan with a width of 1.5 meters and these bicycle lanes are still not safe from motorized vehicles.Build facilities and infrastructure that support cycling activities such as bicycle lanes that are safe from motorized vehicles, bicycle racks that are placed in activity centres.In the public perception data about the Surabaya Darmo road corridor, 90% of the public has a positive perception.The proper classification method for data text mining between SVM and NBC is SVM.Thus, it is proven The TOD tools in urban planning especially at Darmo Corridor is pretty effective in meeting with the community needs, and all of the urban development need to implement the accessibility in other areas

Figure 1 .
Figure 1.Map of the boundaries of the research area.

Figure 3 .
Figure 3.For special bicycle lanes on Darmo street.

Figure 4 .
Figure 4.The condition of the parking zone in Bungkul Park.

Figure 5 .
Figure 5. Violation of trading outside the zone that has been provided.

Table 1 .
Avaibility of pedestrian way in Darmo Corridor.

Table 2 .
Case folding on tod concepts of community perception at Darmo Corridor Surabaya.

Table 3 .
Stopword on tod concepts of community perception at Darmo Corridor Surabaya.

Table 4 .
Tokenizing on tod concepts of community perception at Darmo Corridor Surabaya.

Table 5 .
Performance of support vector machine classification results on community perception data of

Table 7 .
is the probability of negative perception and P(Y1) is the probability of positive perception.Based on the calculations that have been done, it can be seen that the opportunity for the public to have a positive perception is 0.810 and the probability that Instagram comments are in the negative category is 0.192.The following is the classification performance obtained from the Naïve Bayes Classifier method on public perception data of the Surabaya Darmo road corridor concept.Classification performance is obtained from the classification results on data testing.Performance of naïve bayes classification results on data community perception of street Darmo Surabaya Corridor.