Travel path and transport mode identification method using "less-frequently-detected" position data

This study aims to seek method on travel path and transport mode identification in case positions of travellers are detected in low frequency. The survey in which ten test travellers with GPS logger move around Tokyo city centre was conducted. Travel path datasets of each traveller in which position data are selected every five minutes are processed from our survey data. Coverage index analysis based on the buffer analysis using GIS software is conducted. The condition and possibility to identify a path and a transport mode used are discussed.


Introduction
In the tourism research field, GPS tracking data has recently been considered as a powerful tool for identifying locations and travel paths of travellers. Shimizu obtained travel path data of foreign rental car drivers using GPS loggers in Central Hokkaido region, Japan, in order to assess the availability of rental car travel by foreigners [1]. McKercher et al used GPS data loggers to analyze the difference between behavioural pattern of a first visitor and that of a repeating visitor in Hong Kong [2]. Hallo et al examined GPS technology to track walking path of nature-based tourists in Virginia [3]. Yabe et al reviewed analytical methods on tourists' activities and behaviours using GPS technology [4]. Even after these studies, methods to apply GPS technology for understanding travellers' behaviour in tourism areas have not been established.
The studies above gave dedicated GPS loggers to sample travellers. In these cases, it is impossible to obtain data from an unspecified number of samples. Mobile phone companies have recently obtained a huge volume of position data of their users through GPS equipped mobile phones in order to utilize to various marketing analyses. The use of such data in tourism industry has also been anticipated.
From position and its time stamp data obtained by GPS technology, the location of an activity including the information of time spent, and travel path between locations of activity can be estimated effectively. This study pays attention to the latter, especially how we identify a transport mode. Zenji et al and Nakayama et al developed a method on transport mode identification using GPS logger [5] [6]. In these studies, positions were detected in high frequency, every five or ten seconds. If we use a dataset provided by mobile phone companies, positions may not be detected frequently (e.g. every five minutes) due to the limitation of communication capacity. In this case, methods proposed in these studies cannot be applied effectively.
This study aims to seek a method on travel path and transport mode identification in case positions of travellers are detected in low frequency. Ono et al already studied trip pattern estimation method using less-frequently-detected GPS data [7]. However, the transport mode was not yet identified. 5 To whom any correspondence should be addressed. The outline of the survey is explained in Chapter2. Data processing and analytical method are explained in Chapter 3. Some results of the analysis are explained in Chapter 4. Chapter 5 contains the conclusions.

Survey
The survey in which ten test travellers with GPS logger (Black Gold 1300, Qstarz International Co. Ltd.) move around Tokyo city centre was conducted in December 2012 and January 2013. Each traveller was asked to visit designated zones by designated transport modes and, to take a tour in the zones for several hours. The GPS logger obtained position data every one second (hereinafter this is called as "one second travel path"). 14 zones and three transport modes, walking, railway (surface and underground) and bus were selected. A test traveller was asked to report its travel path and transport modes used in the designated map. We can exactly identify a travel path and transport modes by this map.

Data processing and analytical method
30 different travel path datasets of each traveller in which position data are selected every five minutes (hereinafter this is called as "five minutes travel path") were processed. Each one second travel path and five minutes travel path were separated on the basis of transport mode.
In the GIS software (ArcGIS), seven buffer widths (5m, 10m, 20m, 50m, 100m, 200m and 500m) were created for each five minutes travel path ( Figure 1). Here, one evaluation index, coverage, is introduced. Coverage means the content percentage of position data of one second travel path in the focused buffer width. If coverage in one five minutes travel path of one transport mode is close to 100%, the travel path and the transport mode used can definitely be identified.   Figure 2 shows the distribution of 30 five minutes travel paths by one traveller. This traveller starts in Kichijyoji zone and takes a tour in Kichijyoji and Ikebukuro zones and move between zones by surface railway. There is a long curve section on surface railway and travel paths at this section vary each other. However, we can guess by travel speed information between detected positions that this traveller may use railway. Figure 3 shows the coverage index by buffer width. It is obvious that about 90% position data of the one second travel path are included in 100m buffer width. Figure 4 shows the distribution of 30 five minutes travel paths by another traveller. This traveller starts in Aoba ward, Yokohama and takes a tour in Shinjuku and Kawasaki zones and move between zones by surface and underground railway. Figure 5 shows the coverage index by buffer width. Compared with the case of previous traveller, less than 80% position data of the one second travel      Figure 6, 7 and 8 shows the coverage index by buffer width in bus, railway and walking respectively. Black solid line means the average of all five minute travel paths of all travellers. Averagely, 70% position data of the one second travel path by bus are included in 100m buffer width. While, there are many cases in which coverage index is less than 50%. These less coverage indexes are mainly caused by cases in which a bus route is tortuous. If we consider a speed between positions, we can distinguish from walking. However, it is hard to distinguish bus from taxi or passenger car in reality.

Coverage analysis by transport mode
In average, only 40% position data of the one second travel path by railway are included in 100m buffer width. Besides, coverage index varies according to shape of a route. However, speed between positions by railway should be larger. In the end, the use of railway and its path can be identified by speed information even if coverage index is smaller.
On the contrary, more than 85% position data of the one second travel path by walking are included in 100m buffer width, averagely. Lower coverage index in some cases is caused by the effect of high-rise building. Despite this higher coverage index, it is hard to identify the exact travel path.
Buffer width (m) Figure 6. Coverage index by buffer width (bus).  Figure 8. Coverage index by buffer width (railway).

Conclusions
This study proposed method on travel path and transport mode identification in case positions of travellers are detected in low frequency. Through the original survey and the proposal of coverage index, the condition and possibility to identify a path and a transport mode used were discussed. In further study, the condition of the identification should be more clarified.