Individual station estimation from smart card transactions
-
摘要: 随着城市公共交通网络的快速发展以及智能交通卡的普及,智能交通卡中隐藏着越来越丰富的个人及群体移动行为信息.但当前很多城市智能公交卡主要用于收费功能,并未包含乘客确切的上下车时间及站点信息,这给分析挖掘交通卡刷卡数据、提供基于精确位置的服务带来了阻碍.本文针对上海市不含公交上下车站点的刷卡数据集,借助于确定的地铁站点刷卡信息,分析个人的整体刷卡历史记录,提出一个基础的基于时空邻近性的恢复算法(STA,Space-Time Adjacencyalgorithm)和一个改进的基于历史的恢复算法(HTB,Historical Trip Basedalgorithm).具体地,STA算法根据刷卡记录线路的时空邻近关系进行恢复,在此基础上,HTB算法将刷卡记录集合根据时间和空间属性进行切分,获得有明确出行意义的出行记录,再利用历史记录集合,提取乘坐线路以及频繁换乘线路,根据线路间的空间关系生成线路带权候选站点列表,再次进行站点恢复.实验证明本文算法可以较好地缩小线路的候选上下车站点范围,且时间效率较高.Abstract: With the fast development of public transportation network and widespread use of smart card, more and more rich semantic information about human mobility behaviors are hidden in smart card transaction data. However, a great number of current smart cards are initially designed for charging and do not record any detailed information about where and when a passenger gets on or gets off a bus, which brings out great difficulties for analyzing, mining transaction data and providing more precise location-based services. This paper presents Space-Time Adjacency algorithm (STA) and Historical Trip Based algorithm (HTB) to estimate the bus station of each card's transaction records with the aid of integral historical data including complete subway transaction data. Specifically, STA does the initial reconstruction work according to the space-time proximity of adjacent transaction records. Then HTB first cuts the collection of records to form trips that contain explicit trip purposes, then extracts taken lines and transfer lines using historical data, next generates candidate stations for each taken line, and finally uses them to recover the transaction records again. Experiments show that the proposed algorithms work well and narrow the range of candidate stations for bus lines, and have good time efficiency.
-
Key words:
- smart card /
- incomplete data /
- card mining /
- station estimation
-
表 1 各类型刷卡数据数目统计
Tab. 1 Statistics of various transaction data types
类型 刷卡数 比例/% 地铁 2.48亿 60.07 公交 1.55亿 37.55 出租车 785万 1.9 其他 198万 0.48 表 2 刷卡数据示例
Tab. 2 Charging records
卡号 日期 时间 类型 线路名称 2603642602 2014-04-06 11:45:31 公交 451路 3000706373 2014-04-18 11:22:26 地铁 2号线川沙 3000706373 2014-04-18 11:50:21 地铁 2号线金科路 2002816084 2014-04-26 20:55:00 出租车 无 算法1基于刷卡记录时空邻近性的站点推测算法STA 输入:刷卡记录集合$D$, 距离阈值$d_{1}$, $d_{2}$ 输出:乘客乘坐线路上下站点结果集ResultMap 1 PartitionMap$<$CardID, RecordList$>$ $\leftarrow $partition($D)$ 2 FOR each card in PartitionMap DO 3 sortedList $\leftarrow $sortList(PartitionMap. get(card)) 4 FOR each record in sortedList DO 5 ResultMap $\leftarrow $findStation(record$_{i}$, record$_{i-1}$, $d_{1}$, $d_{2})$ 表 3 线路连续乘坐情况统计
Tab. 3 Adjacent rides condition
乘坐模式 距离较远/% 同一线路/% 距离较近/% 公交-地铁 13.15 - 86.85 公交-公交 3.68 34.30 41.52 (相同站点数$\le $3), 20.50 (相同站点数>3) 算法2出行记录切分算法 输入:一张卡的所有刷卡记录集合$E$, 换乘时间阈值$T_{1}$, 乘坐时间阈值$T_{2}$ 输出:出行记录集合$L$ 1 初始化出行记录集合$L$ 2 初始化新的出行记录的起始位置pos为0 3 FOR i=1 to $\vert E \vert $-1 DO 4 IF (E[i]不是卡中连续出现的第偶数条地铁刷卡记录) 5 IF (cut(E[i-1], E[i], $T_{1}$, $T_{2}))$ /*刷卡记录满足切分规则*/ 6 从E[pos]到E[i-1]组成一条新的出行记录并加入$L$ 7 pos $\leftarrow $i 算法3乘客乘坐线路及频繁换乘线路提取 输入:一张卡的所有出行记录集合$L$, 频繁换乘线路频次阈值freq 输出:乘客乘坐线路及频次集合LineMap, 频繁换乘线路及频次集合TransferMap 1 初始化集合LineMap, TransferMap 2 FOR i=0 to $\vert L \vert $-1 DO 3 IF($L_{i}$的标签为ONESTART或START) 4 LineMap.update($L_{i}.r_{0}$.line) 5 IF($L_{i}$.$r_{0}$类型为公交 & & $L_{i}$.$r_{1}$类型为地铁或公交) 6 TransferMap.update($L_{i}$.$r_{1}$.line) 7 ELSE IF($L_{i}$的标签为ONEEND或END) 8 LineMap.update($L_{i}$.$r_{last}$.line) 9 IF($L_{i}$.$r_{last}$类型为公交 & & $L_{i}$.$r_{last-1}$.类型为地铁或公交) 10 TransferMap.update($L_{i}$.r$_{last-1}$.line) 11 FOR each $l$ in TransferMap DO 12 IF ($l$的出现频次$<$freq) 13 将$l$从TransferMap中移除 算法4公交线路上下车候选站点生成算法 输入:乘坐线路列表LineMap频繁换乘线路列表TransferMap, 输出:换乘线路的候选站点列表CandidateMap<线路名称, Map<站点名称, 权重≫ 1 初始化公交线路的候选站点列表CandidateMap 2 FOR each $l$ in LineMap DO 3 IF(线路$l$出现在TransferMap中) 4 将其从LineMap中删除 5 FOR each line$_{i }$ in LineMap DO 6 FOR each line$_{j }$ in LineMap DO 7 IF(!line$_{i}$==line$_{j})$ 8 List$<$Station$>$ stations =findCandidateStations(line$_{i}$, line$_{j})$ 9 updateCandidateMap(stations, CandidateMap) 10 FOR each lineCandidate in CandidateMap DO 11 removeStation(CandidateMap, LineMap) 表 4 人工标注数据出行记录数目分布
Tab. 4 Distribution of cards' trip number
出行记录数目范围 百分比/% 1$\sim $5 24 6$\sim $10 21 11$\sim $15 20 16 $\sim $20 9 $>20$ 26 表 5 人工标注数据乘坐线路数目分布
Tab. 5 Distribution of taken lines' number
乘坐线路数目范围 百分比/% 1$\sim $2 22 3$\sim $4 32 5$\sim $6 20 $>6$ 26 表 6 出行记录的标签占比统计
Tab. 6 Ratio of various labels on trips
出行记录标签 比例/% MID 14 START 36 ONESTART 7.3 END 36.1 ONEEND 6.6 表 7 算法性能对比
Tab. 7 Performance of comparison
算法 准确率/% 召回率/% F$_{1}$值 STA 48.6 34.7 0.41 HTB 78.9 85.1 0.82 -
[1] LATHIA N, CAPRA L. How smart is your smartcard? Measuring travel behaviours, perceptions, and incentives[C]//Proceedings of the 13th International Conference on Ubiquitous Computing. ACM, 2011:291-300. [2] LATHIA N, FROEHLICH J, CAPRA L. Mining public transport usage for personalised intelligent transport systems[C]//2010 IEEE 10th International Conference on Data Mining. IEEE, 2010:887-892. [3] BAGCHI M, WHITE P R. The potential of public transport smart card data[J]. Transport Policy, 2005, 12(5):464-474. doi: 10.1016/j.tranpol.2005.06.008 [4] PELLETIER M P, TRÉPANIER M, MORENCY C. Smart card data use in public transit:A literature review[J]. Transportation Research Part C Emerging Technologies, 2011, 19(4):557-568. doi: 10.1016/j.trc.2010.12.003 [5] ZHANG F, YUAN N J, WANG Y, et al. Reconstructing individual mobility from smart card transactions:A collaborative space alignment approach[J]. Knowledge and Information Systems, 2015, 44(2):299-323. doi: 10.1007/s10115-014-0763-x [6] TRÉPANIER M, TRANCHANT N, CHAPLEAU R. Individual trip destina tion estimation in a transit smart card automated fare collection system[J]. Journal of Intelligent Transportation Systems Technology Planning & Operations, 2007, 11(1):1-14. doi: 10.1631/jzus.C12a0049 [7] WANG W, ATTANUCCI J P, WILSON N H M. Bus passenger origin-destination estimation and related analyses using automated data collection systems[J]. Journal of Public Transportation, 2010, 14(4):131-150. [8] BARRY J, NEWHOUSER R, RAHBEE A, et al. Origin and destination estimation in New York City with automated fare system data[J]. Transportation Research Record, 2002, 1817:183-187. doi: 10.3141/1817-24 [9] SONG C, QU Z, BLUMM N, et al. Limits of predictability in human mobility[J]. Science, 2010, 327:1018-1021. doi: 10.1126/science.1177170 [10] GIANNOTTI F, NANNI M, PEDRESCHI D, et al. Unveiling the complexity of human mobility by querying and mining massive trajectory data[J]. The VLDB Journal, 2011, 20(5):695-719. doi: 10.1007/s00778-011-0244-8 [11] LI Z, DING B, HAN J, et al. Mining periodic behaviors for moving objects[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2010:1099-1108. [12] WANG Y, YUAN N J, LIAN D, et al. Regularity and conformity:Location prediction using heterogeneous mobility data[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015:1275-1284. [13] BALAN R K, NGUYEN K X, JIANG L. Real-time trip information service for a large taxi fleet[C]//Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services. ACM, 2011:99-112. [14] DASH M, KOO K K, HOLLECZEK T, et al. From mobile phone data to transport network-gaining insight about human mobility[C]//IEEE International Conference on Mobile Data Management. IEEE, 2015:243-250. [15] DAI J, YANG B, GUO C, et al. Personalized route recommendation using big trajectory data[C]//IEEE 31st International Conference on Data Engineering. IEEE, 2015:543-554. [16] 龙瀛, 孙立君, 陶遂.基于公共交通智能卡数据的城市研究综述[J].城市规划学刊, 2015, 3:70-77. http://www.cnki.com.cn/Article/CJFDTOTAL-CXGH201503010.htm [17] LONG Y, THILL J C. Combining smart card data and household travel survey to analyze jobs-housing relationships in Beijing[J]. Computers Environment & Urban Systems, 2015, 53:19-35. doi: 10.1007/978-3-319-19342-7_8 [18] EL-GENEIDY A, GRIMSRUD M, WASFI R, et al. New evidence on walking distances to transit stops:Identifying redundancies and gaps using variable service areas[J]. Transportation, 2014, 41(1):193-210. doi: 10.1007/s11116-013-9508-z [19] DANIELS R, MULLEY C. Explaining walking distance to public transport:The dominance of public transport supply[J]. Journal of Transport & Land Use, 2011, 6(2):5-20. [20] CUI A. Bus passenger origin-destination matrix estimation using automated data collection systems[D]. Cambridge, MA:Massachusetts Institute of Technology, 2006. [21] 胡继华, 邓俊, 黄泽.结合出行链的公交IC卡乘客下车站点判断概率模型[J].交通运输系统工程与信息, 2014, 14(2):62-67. http://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201402012.htm [22] 上海市政府数据服务网. [DB/OL]. [2017-05-20]. http://www.datashanghai.gov.cn.