基于反向最远邻的商品推荐算法研究

王凯; 李博涵; 万朔; 张安曼; 关东海

doi:10.3969/j.issn.1000-5641.2019.03.008

基于反向最远邻的商品推荐算法研究

doi: 10.3969/j.issn.1000-5641.2019.03.008

王凯^1,,
李博涵^1,2,3, ,,
万朔¹,
张安曼¹,
关东海¹

1.
南京航空航天大学计算机科学与技术学院, 南京 211106
2.
软件新技术与产业化协同创新中心, 南京 211106
3.
江苏易图地理信息科技股份有限公司, 江苏扬州 225000

基金项目:

国家自然科学基金 61728204

国家自然科学基金 61672284

国家自然科学基金 41301407

南京航空航天大学科研基地创新基金 NJ20160028

南京航空航天大学青年科技创新基金 NT2018028

高安全系统的软件开发与验证技术工业和信息化部重点实验室项目 NJ2018014

详细信息

作者简介:
王凯, 男, 硕士研究生, 主要研究方向为时空数据库、推荐系统.E-mail:hxyrqx123@sina.com

通讯作者:
李博涵, 男, 博士, 副教授, 硕士生导师, 研究方向为空间与时空数据库、地理信息系统等.E-mail:bhli@nuaa.edu.cn

中图分类号: TP391
计量
- 文章访问数: 99
- HTML全文浏览量: 37
- PDF下载量: 123
- 被引次数: 0
出版历程
- 收稿日期: 2018-08-03
- 刊出日期: 2019-05-25

Research on a commodity recommendation algorithm based on reverse furthest neighbor

WANG Kai^1
,,
LI Bo-han^{1,2,3
, ,},
WAN Shuo¹,
ZHANG An-man¹,
GUAN Dong-hai¹

1.
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
2.
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 211106, China
3.
Jiangsu Easymap Geographic Information Technology Corp. Ltd, Yangzhou Jiangsu 225000, China

摘要

摘要: 有效的推荐算法可以最大限度地发掘商品的价值.通过研究用户的偏好，分析了从海量商品信息中为用户推荐感兴趣内容的方法.目前大多数推荐系统向用户推荐的是较为流行的商品，而忽略了那些当下不"热门"，却有着巨大潜力的商品.以发掘小众中的大众商品为目的，提出了一种基于反向最远邻（ReverseFurthest Neighbor，RFN）查询的商品推荐算法：基于专家用户的信任协同过滤算法，替代传统用户相似匹配的协同过滤推荐算法；利用幂律对商品进行范围缩减，优化系统筛选的效率，实现了对有潜在价值商品的推荐，使小众商品属性的分布得到更深层次的挖掘.实验结果表明本文推荐算法输出结果质量较高，适用于解决部分"长尾问题".
- 推荐系统 /
- 协同过滤 /
- 反向最远邻 /
- 幂律
Abstract: Effective recommendation algorithms can help maximize the value of a product. By studying the user's preferences, we can recommend underlying contents for the user from mass merchandise information. However, most recommendation systems focus on popular products, ignoring those products that are currently not popular but have huge potential. Our recommendation system, based on reverse furthest neighbor (RFN) queries, uses the idea of mining popular products in niche markets. We improve the traditional collaborative filtering recommendation algorithm and adopt a collaborative filtering algorithm based on expert users. The modified algorithm can recommend products with potential value based on the power law, which makes the distribution of minority mined products more visible to users. Experimental results show that the quality of the proposed algorithm is high and is suitable for partially addressing the long tail problem.
- recommendation system /
- collaborative filtering /
- reverse furthest neighbor (RFN) /
- power law

HTML全文

图 1 幂律分布曲线

Fig. 1 Power law distribution curve

下载: 全尺寸图片幻灯片

图 2 准确性随$k$值变化趋势

Fig. 2 Variation trend of accuracy with value $k$

下载: 全尺寸图片幻灯片

图 3 准确性随$m$值变化趋势

Fig. 3 Variation trend of accuracy with value $m$

下载: 全尺寸图片幻灯片

图 4 经幂律过滤后准确性变化趋势

Fig. 4 Trend of accuracy change after power law filtering

下载: 全尺寸图片幻灯片

图 5 经幂律过滤后准确性变化趋势

Fig. 5 Trend of accuracy change after power law filtering

下载: 全尺寸图片幻灯片

图 6 本文模型(RUPRA)与Pearson模型准确性对比

Fig. 6 Accuracy comparison between the proposed model (RUPRA) and the Pearson model

下载: 全尺寸图片幻灯片

图 7 不同样本数量的推荐算法准确性变化趋势

Fig. 7 Accuracy trend of the recommendation algorithm with different sample sizes

下载: 全尺寸图片幻灯片

表 1 符号定义

Tab. 1 Symbol definition

符号	定义
$U$, $I$	分别表示用户集合和商品集合
sim$(u, e)$	用户$u$和$e$的相似度
ACT$(u)$	用户$u$的活跃度
$C(u)$	用户$u$的评分准确度
$R(u)$	用户$u$的声誉值
$R_{u, i}$	用户$u$对商品$i$的评分
$\overline{R}_{i}$	商品$i$的平均得分
$a_{u, i, a}$	用户$u$对于物品$i$的$a$属性的专家评分
ca$_{i, a}$	物品$i$关于属性标签$a$的连续值
DAV$_{i}$	商品$i$的离散型向量
CAV$_{i}$	商品$i$的连续性向量
$p(x)$	幂律概率密度函数
Total$(u, i)$	用户$u$评论过带有$i$属性商品的数量

下载: 导出CSV

算法1: EEA
输入:用户$-$物品评分矩阵$R$, 用户$U$
输出:专家用户集合EU
1. EU$\leftarrow \varnothing $//将EU集合初始化为空
2. for $i\leftarrow $0 to U.numberdo
3. ACT$\leftarrow $act$(u)$//计算用户$u$的活跃度
4. accuracy_rate$\leftarrow E(u)$//计算用户$u$的准确度
5. if EVAL(u)$>t$//如果用户最终得分大于阈值
6. EU.add$(u)$//将用户$u$加入到专家集合EU
7. end if
8. end for
9. return EU//专家用户集合返回

下载: 导出CSV

表 2 UserA的电影评分记录

Tab. 2 UserA's movie rating matrix

电影名	评分
Star Wars	5
Titanic	4
Lion	3
Heat	2
Spawn	3

下载: 导出CSV

表 3 UserA的用户类型矩阵UTM

Tab. 3 UserA's user type matrix UTM

电影名	用户类型
电影名	Action	Adventure	Children's	Comedy
Star Wars	1	1	0	0
Heat	1	0	0	0
Titanic	1	0	0	0
Spawn	1	1	0	0
Lion	0	0	1	0
Total	4	2	1	0

下载: 导出CSV

算法2: DASC算法
输入:用户-物品评分矩阵$R$, 需要被连续化的物品$i$, 物品$i$的离散化属性向量DAV$_{i}$
输出:物品$i$的连续化属性向量CAV$_{i}$
1. U$\leftarrow $getUserSet(i)//获取所有对$i$有过评分记录的用户集合
2. for a$\leftarrow $0 to $\vert $DAV$_{i}$$\vert $ do//对于DAV$_{i}$的每一个属性
3. if DAV$_{i, a} == 0 $
4. CAV$_{i, a}$$\leftarrow $ 0//如果DAV$_{i}$的属性为0, 那么CAV$_{i}$的该属性也为0
5. else
6. expert_set$_{i, a}\leftarrow $extract_expert_a($i)$//获取该物品属性专家用户集合
7. if $\vert $ expert_set$_{i, a}$$\vert $==0
8. $ca_{i, a}$$\leftarrow $ 0//对$ca_{i, a}$赋初值
9. for $u$ in $U$ do
10. ca$_{i, a}\leftarrow$ca$_{i, a} + R_{u, i}$
11. ca$_{i, a}\leftarrow$ca$_{i, a}$/U.number//将对ca$_{i, a}$设为所有用户对$a$属性的均值
12. else
13. ca$_{i, a }$$\leftarrow $0//对ca$_{i, a}$赋初值
14. for every $u$ in expert_set$_{i, a}$ do
15. ca$_{i, a}\leftarrow$ca${\_}{i, a} + R_{u, i}/(\max{\_}{\rm rating}{\_}a^{\ast}$ave$(R_{u})^{\ast}\vert {\rm expert}{\_}{\rm set}_{i, a}\vert$)//通过专家用户计算ca$_{i, a}$
16. end for
17. end if
18. CAV$_{i, a}\leftarrow$ca$_{i, a}$//将ca$_{i, a}$赋值到对应的属性向量位置上
19. end if
20. end for
21. return CAV$_{i}$//返回物品$i$的连续化属性向量CAV$_{i}$

下载: 导出CSV

表 4 电影的连续型属性向量CAV

Tab. 4 The continuous attribute vector (CAV) of the film

电影名	用户类型
电影名	Action	Adventure	Children's	Comedy
Star Wars	0.73	0.59	0	0

下载: 导出CSV

算法3: RUPRA算法
输入:经连续空间化的用户集合$U$、商品集合$I$
输出:推荐列表$L$集合
1. L$\leftarrow $$\emptyset $//将$L$列表集合初始化为空
2. for i$\leftarrow $0 to U.number do
3. UPC($u$)//对每个用户进行偏好连续化
4. end for
5. for i$\leftarrow $0 to I.number do
6. Ii.knn$\leftarrow $Compute_knn(Ii)//针对用户集合$U$计算商品$Ii$的knn
7. for k$\leftarrow $0 to U.number do
8. if Uk $\in $Ii.knn//如果用户属于商品$Ii$的knn
9. Lk.add($Ii$)//将商品$Ii$加入到列表集合的第$k$个列表项中
10. end if
11. end for
12. end for
13. return $L$//推荐列表集合返回

下载: 导出CSV

参考文献(28)

[1]	BOBADILLA J, ORTEGA F, HERNANDO A. Recommender systems survey[J]. Knowledge-Based Systems, 2013, 46(1):109-132. http://d.old.wanfangdata.com.cn/Periodical/rjxb201301008
[2]	XU H L, WU X, LI X D, et al. Comparison study of Internet recommendation system[J]. Journal of Software, 2009, 20(2):350-363. doi: 10.3724-SP.J.1001.2009.00350/
[3]	HUSSEIN T. Context-Aware Recommender Systems 2011[C]//ACM Conference on Recommender Systems. ACM, 2011:349-350.
[4]	MENG X W, XUN H U, WANG L C, et al. Mobile recommender systems and their applications[J]. Journal of Software, 2013, 24(1):91-108. http://d.old.wanfangdata.com.cn/Periodical/rjxb201301008
[5]	ANDERSON C. The long tail[J]. Wired Magazine, 2004, 12(10):170-177. http://d.old.wanfangdata.com.cn/OAPaper/oai_pubmedcentral.nih.gov_210215
[6]	NEWMAN M E J. Power laws, Pareto distributions and Zipf's law[J]. Contemporary Physics, 2005, 46(5):323-351. doi: 10.1080/00107510500052444
[7]	YANG X W, STECK H, GUO Y, et al. On top-k recommendation using social networks[C]//Proceedings of the 6th ACM Conference on Recommender Systems. ACM, 2012:67-74.
[8]	KHRIBI M K, JEMNI M, NASRAOUI O. Automatic recommendations for e-learning personalization based on web usage mining techniques and information retrieval[C]//Advanced Learning Technologies, 2008, ICALT'08, 8th IEEE International Conference on. IEEE, 2008:241-245. http://www.freepatentsonline.com/article/212768683.html
[9]	ABEL F, GAO Q, HOUBEN G J, et al. Analyzing user modeling on twitter for personalized news recommendations[C]//International Conference on User Modeling, Adaptation, and Personalization. Berlin:Springer, 2011:1-12. doi: 10.1007%2F978-3-642-22362-4_1
[10]	YAO B, LI F F, KUMAR P. Reverse furthest neighbors in spatial databases[C]//IEEE, International Conference on Data Engineering. IEEE, 2009:664-675. https://www.researchgate.net/publication/220965425_Reverse_Furthest_Neighbors_in_Spatial_Databases
[11]	LIU J Q, CHEN H X, FURUSE K, et al. An efficient algorithm for arbitrary reverse furthest neighbor queries[C]//Asia-Pacific Web Conference. Berlin:Springer, 2012:60-72. https://www.researchgate.net/publication/261859969_An_Efficient_Algorithm_for_Arbitrary_Reverse_Furthest_Neighbor_Queries
[12]	WANG S L, CHEEMA M A, LIN X M, et al. Efficiently computing reverse k furthest neighbors[C]//Data Engineering (ICDE), 2016 IEEE 32nd International Conference on. IEEE, 2016:1110-1121. https://www.researchgate.net/publication/304456743_Efficiently_computing_reverse_k_furthest_neighbors
[13]	GOLDBERG D, NICHOLS D, OKI B M, et al. Using collaborative filtering to weave an information tapestry[J]. Communications of the ACM, 1992, 35(12):61-70. doi: 10.1145/138859.138867
[14]	SARWAR B, KARYPIS G, KONSTAN J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th international conference on World Wide Web. ACM, 2001:285-295. https://www.researchgate.net/publication/2369002_Item-based_Collaborative_Filtering_Recommendation_Algorithms
[15]	XIAO P, SHAO L S, LI X R. Improved collaborative filtering algorithm in the research and application of personalized movie recommendations[C]//Intelligent Systems Design and Engineering Applications, 20134th International Conference on. IEEE, 2013:349-352. https://www.researchgate.net/publication/271472705_Improved_Collaborative_Filtering_Algorithm_in_the_Research_and_Application_of_Personalized_Movie_Recommendations
[16]	ZHAO Z D, SHANG M S. User-based collaborative-filtering recommendation algorithms on hadoop[C]//20103rd International Conference on Knowledge Discovery and Data Mining. IEEE, 2010:478-481. https://www.researchgate.net/publication/221306166_User-Based_Collaborative-Filtering_Recommendation_Algorithms_on_Hadoop
[17]	PIRASTEH P, JUNG J J, HWANG D. Item-based collaborative filtering with attribute correlation:A case study on movie recommendation[C]//Asian Conference on Intelligent Information and Database Systems. Cham:Springer, 2014:245-252. doi: 10.1007%2F978-3-319-05458-2_26
[18]	VOZALIS E G, MARGARITIS K G. Recommender systems:An experimental comparison of two filtering algorithms[C]//Proceedings of the 9th Panhellenic Conference in Informatics-PCI 2003. 2003.
[19]	MA H F, JIA M H Z, ZHANG D, et al. Combining tag correlation and user social relation for microblog recommendation[J]. Information Sciences, 2017, 385/386:325-337. doi: 10.1016/j.ins.2016.12.047
[20]	郭娣, 赵海燕.融合标签流行度和时间权重的矩阵分解推荐算法[J].小型微型计算机系统, 2016, 37(2):293-297. doi: 10.3969/j.issn.1000-1220.2016.02.019
[21]	FALOUTSOS M, FALOUTSOS P, FALOUTSOS C. On power-law relationships of the internet topology[J]. ACM SIGCOMM Computer Communication Review, 1999, 29(4):251-262. doi: 10.1145/316194
[22]	CLEMENTI F, GALLEGATI M. Power law tails in the Italian personal income distribution[J]. Physica A:Statistical Mechanics and its Applications, 2005, 350(2/3/4):427-438. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_cond-mat%2f0408067
[23]	NEWMAN M E J. Power laws, Pareto distributions and Zipf's law[J]. Contemporary physics, 2005, 46(5):323-351. doi: 10.1080/00107510500052444
[24]	CLAUSET A, SHALIZI C R, NEWMAN M E J. Power-law distributions in empirical data[J]. SIAM review, 2009, 51(4):661-703. doi: 10.1137/070710111
[25]	LI B H, ZHANG C, CHEN W H, et al. Dynamic reverse furthest neighbor querying algorithm of moving objects[C]//ADMA 2016:Advanced Data Mining and Applications. Cham:Springer, 2016:266-279. http://en.cnki.com.cn/Article_en/CJFDTotal-XXWX201606003.htm
[26]	郑伟, 李博涵, 王雅楠, 等.融合偏好交互的组推荐算法模型[J].小型微型计算机系统, 2018, 2(39):372-378. http://d.old.wanfangdata.com.cn/Periodical/xxwxjsjxt201802034
[27]	CLAUSET A, SHALIZI C R, NEWMAN M E J. Power law distributions in empirical data[J]. SIAM Review, 2009, 51(4):661-703. doi: 10.1137/070710111
[28]	高珂. 基于消费者行为分析的电子商务市场研究[D]. 山东青岛: 青岛理工大学, 2010. http://cdmd.cnki.com.cn/Article/CDMD-10429-1011302212.htm