Research on a commodity recommendation algorithm based on reverse furthest neighbor
-
摘要: 有效的推荐算法可以最大限度地发掘商品的价值.通过研究用户的偏好,分析了从海量商品信息中为用户推荐感兴趣内容的方法.目前大多数推荐系统向用户推荐的是较为流行的商品,而忽略了那些当下不"热门",却有着巨大潜力的商品.以发掘小众中的大众商品为目的,提出了一种基于反向最远邻(ReverseFurthest Neighbor,RFN)查询的商品推荐算法:基于专家用户的信任协同过滤算法,替代传统用户相似匹配的协同过滤推荐算法;利用幂律对商品进行范围缩减,优化系统筛选的效率,实现了对有潜在价值商品的推荐,使小众商品属性的分布得到更深层次的挖掘.实验结果表明本文推荐算法输出结果质量较高,适用于解决部分"长尾问题".Abstract: Effective recommendation algorithms can help maximize the value of a product. By studying the user's preferences, we can recommend underlying contents for the user from mass merchandise information. However, most recommendation systems focus on popular products, ignoring those products that are currently not popular but have huge potential. Our recommendation system, based on reverse furthest neighbor (RFN) queries, uses the idea of mining popular products in niche markets. We improve the traditional collaborative filtering recommendation algorithm and adopt a collaborative filtering algorithm based on expert users. The modified algorithm can recommend products with potential value based on the power law, which makes the distribution of minority mined products more visible to users. Experimental results show that the quality of the proposed algorithm is high and is suitable for partially addressing the long tail problem.
-
表 1 符号定义
Tab. 1 Symbol definition
符号 定义 $U$, $I$ 分别表示用户集合和商品集合 sim$(u, e)$ 用户$u$和$e$的相似度 ACT$(u)$ 用户$u$的活跃度 $C(u)$ 用户$u$的评分准确度 $R(u)$ 用户$u$的声誉值 $R_{u, i}$ 用户$u$对商品$i$的评分 $\overline{R}_{i}$ 商品$i$的平均得分 $a_{u, i, a}$ 用户$u$对于物品$i$的$a$属性的专家评分 ca$_{i, a}$ 物品$i$关于属性标签$a$的连续值 DAV$_{i}$ 商品$i$的离散型向量 CAV$_{i}$ 商品$i$的连续性向量 $p(x)$ 幂律概率密度函数 Total$(u, i)$ 用户$u$评论过带有$i$属性商品的数量 算法1: EEA 输入:用户$-$物品评分矩阵$R$, 用户$U$ 输出:专家用户集合EU 1. EU$\leftarrow \varnothing $//将EU集合初始化为空 2. for $i\leftarrow $0 to U.numberdo 3. ACT$\leftarrow $act$(u)$//计算用户$u$的活跃度 4. accuracy_rate$\leftarrow E(u)$//计算用户$u$的准确度 5. if EVAL(u)$>t$//如果用户最终得分大于阈值 6. EU.add$(u)$//将用户$u$加入到专家集合EU 7. end if 8. end for 9. return EU//专家用户集合返回 表 2 UserA的电影评分记录
Tab. 2 UserA's movie rating matrix
电影名 评分 Star Wars 5 Titanic 4 Lion 3 Heat 2 Spawn 3 表 3 UserA的用户类型矩阵UTM
Tab. 3 UserA's user type matrix UTM
电影名 用户类型 Action Adventure Children's Comedy Star Wars 1 1 0 0 Heat 1 0 0 0 Titanic 1 0 0 0 Spawn 1 1 0 0 Lion 0 0 1 0 Total 4 2 1 0 算法2: DASC算法 输入:用户-物品评分矩阵$R$, 需要被连续化的物品$i$, 物品$i$的离散化属性向量DAV$_{i}$ 输出:物品$i$的连续化属性向量CAV$_{i}$ 1. U$\leftarrow $getUserSet(i)//获取所有对$i$有过评分记录的用户集合 2. for a$\leftarrow $0 to $\vert $DAV$_{i}$$\vert $ do//对于DAV$_{i}$的每一个属性 3. if DAV$_{i, a} == 0 $ 4. CAV$_{i, a}$$\leftarrow $ 0//如果DAV$_{i}$的属性为0, 那么CAV$_{i}$的该属性也为0 5. else 6. expert_set$_{i, a}\leftarrow $extract_expert_a($i)$//获取该物品属性专家用户集合 7. if $\vert $ expert_set$_{i, a}$$\vert $==0 8. $ca_{i, a}$$\leftarrow $ 0//对$ca_{i, a}$赋初值 9. for $u$ in $U$ do 10. ca$_{i, a}\leftarrow$ca$_{i, a} + R_{u, i}$ 11. ca$_{i, a}\leftarrow$ca$_{i, a}$/U.number//将对ca$_{i, a}$设为所有用户对$a$属性的均值 12. else 13. ca$_{i, a }$$\leftarrow $0//对ca$_{i, a}$赋初值 14. for every $u$ in expert_set$_{i, a}$ do 15. ca$_{i, a}\leftarrow$ca${\_}{i, a} + R_{u, i}/(\max{\_}{\rm rating}{\_}a^{\ast}$ave$(R_{u})^{\ast}\vert {\rm expert}{\_}{\rm set}_{i, a}\vert$)//通过专家用户计算ca$_{i, a}$ 16. end for 17. end if 18. CAV$_{i, a}\leftarrow$ca$_{i, a}$//将ca$_{i, a}$赋值到对应的属性向量位置上 19. end if 20. end for 21. return CAV$_{i}$//返回物品$i$的连续化属性向量CAV$_{i}$ 表 4 电影的连续型属性向量CAV
Tab. 4 The continuous attribute vector (CAV) of the film
电影名 用户类型 Action Adventure Children's Comedy Star Wars 0.73 0.59 0 0 算法3: RUPRA算法 输入:经连续空间化的用户集合$U$、商品集合$I$ 输出:推荐列表$L$集合 1. L$\leftarrow $$\emptyset $//将$L$列表集合初始化为空 2. for i$\leftarrow $0 to U.number do 3. UPC($u$)//对每个用户进行偏好连续化 4. end for 5. for i$\leftarrow $0 to I.number do 6. Ii.knn$\leftarrow $Compute_knn(Ii)//针对用户集合$U$计算商品$Ii$的knn 7. for k$\leftarrow $0 to U.number do 8. if Uk $\in $Ii.knn//如果用户属于商品$Ii$的knn 9. Lk.add($Ii$)//将商品$Ii$加入到列表集合的第$k$个列表项中 10. end if 11. end for 12. end for 13. return $L$//推荐列表集合返回 -
[1] BOBADILLA J, ORTEGA F, HERNANDO A. Recommender systems survey[J]. Knowledge-Based Systems, 2013, 46(1):109-132. http://d.old.wanfangdata.com.cn/Periodical/rjxb201301008 [2] XU H L, WU X, LI X D, et al. Comparison study of Internet recommendation system[J]. Journal of Software, 2009, 20(2):350-363. doi: 10.3724-SP.J.1001.2009.00350/ [3] HUSSEIN T. Context-Aware Recommender Systems 2011[C]//ACM Conference on Recommender Systems. ACM, 2011:349-350. [4] MENG X W, XUN H U, WANG L C, et al. Mobile recommender systems and their applications[J]. Journal of Software, 2013, 24(1):91-108. http://d.old.wanfangdata.com.cn/Periodical/rjxb201301008 [5] ANDERSON C. The long tail[J]. Wired Magazine, 2004, 12(10):170-177. http://d.old.wanfangdata.com.cn/OAPaper/oai_pubmedcentral.nih.gov_210215 [6] NEWMAN M E J. Power laws, Pareto distributions and Zipf's law[J]. Contemporary Physics, 2005, 46(5):323-351. doi: 10.1080/00107510500052444 [7] YANG X W, STECK H, GUO Y, et al. On top-k recommendation using social networks[C]//Proceedings of the 6th ACM Conference on Recommender Systems. ACM, 2012:67-74. [8] KHRIBI M K, JEMNI M, NASRAOUI O. Automatic recommendations for e-learning personalization based on web usage mining techniques and information retrieval[C]//Advanced Learning Technologies, 2008, ICALT'08, 8th IEEE International Conference on. IEEE, 2008:241-245. http://www.freepatentsonline.com/article/212768683.html [9] ABEL F, GAO Q, HOUBEN G J, et al. Analyzing user modeling on twitter for personalized news recommendations[C]//International Conference on User Modeling, Adaptation, and Personalization. Berlin:Springer, 2011:1-12. doi: 10.1007%2F978-3-642-22362-4_1 [10] YAO B, LI F F, KUMAR P. Reverse furthest neighbors in spatial databases[C]//IEEE, International Conference on Data Engineering. IEEE, 2009:664-675. https://www.researchgate.net/publication/220965425_Reverse_Furthest_Neighbors_in_Spatial_Databases [11] LIU J Q, CHEN H X, FURUSE K, et al. An efficient algorithm for arbitrary reverse furthest neighbor queries[C]//Asia-Pacific Web Conference. Berlin:Springer, 2012:60-72. https://www.researchgate.net/publication/261859969_An_Efficient_Algorithm_for_Arbitrary_Reverse_Furthest_Neighbor_Queries [12] WANG S L, CHEEMA M A, LIN X M, et al. Efficiently computing reverse k furthest neighbors[C]//Data Engineering (ICDE), 2016 IEEE 32nd International Conference on. IEEE, 2016:1110-1121. https://www.researchgate.net/publication/304456743_Efficiently_computing_reverse_k_furthest_neighbors [13] GOLDBERG D, NICHOLS D, OKI B M, et al. Using collaborative filtering to weave an information tapestry[J]. Communications of the ACM, 1992, 35(12):61-70. doi: 10.1145/138859.138867 [14] SARWAR B, KARYPIS G, KONSTAN J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th international conference on World Wide Web. ACM, 2001:285-295. https://www.researchgate.net/publication/2369002_Item-based_Collaborative_Filtering_Recommendation_Algorithms [15] XIAO P, SHAO L S, LI X R. Improved collaborative filtering algorithm in the research and application of personalized movie recommendations[C]//Intelligent Systems Design and Engineering Applications, 20134th International Conference on. IEEE, 2013:349-352. https://www.researchgate.net/publication/271472705_Improved_Collaborative_Filtering_Algorithm_in_the_Research_and_Application_of_Personalized_Movie_Recommendations [16] ZHAO Z D, SHANG M S. User-based collaborative-filtering recommendation algorithms on hadoop[C]//20103rd International Conference on Knowledge Discovery and Data Mining. IEEE, 2010:478-481. https://www.researchgate.net/publication/221306166_User-Based_Collaborative-Filtering_Recommendation_Algorithms_on_Hadoop [17] PIRASTEH P, JUNG J J, HWANG D. Item-based collaborative filtering with attribute correlation:A case study on movie recommendation[C]//Asian Conference on Intelligent Information and Database Systems. Cham:Springer, 2014:245-252. doi: 10.1007%2F978-3-319-05458-2_26 [18] VOZALIS E G, MARGARITIS K G. Recommender systems:An experimental comparison of two filtering algorithms[C]//Proceedings of the 9th Panhellenic Conference in Informatics-PCI 2003. 2003. [19] MA H F, JIA M H Z, ZHANG D, et al. Combining tag correlation and user social relation for microblog recommendation[J]. Information Sciences, 2017, 385/386:325-337. doi: 10.1016/j.ins.2016.12.047 [20] 郭娣, 赵海燕.融合标签流行度和时间权重的矩阵分解推荐算法[J].小型微型计算机系统, 2016, 37(2):293-297. doi: 10.3969/j.issn.1000-1220.2016.02.019 [21] FALOUTSOS M, FALOUTSOS P, FALOUTSOS C. On power-law relationships of the internet topology[J]. ACM SIGCOMM Computer Communication Review, 1999, 29(4):251-262. doi: 10.1145/316194 [22] CLEMENTI F, GALLEGATI M. Power law tails in the Italian personal income distribution[J]. Physica A:Statistical Mechanics and its Applications, 2005, 350(2/3/4):427-438. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_cond-mat%2f0408067 [23] NEWMAN M E J. Power laws, Pareto distributions and Zipf's law[J]. Contemporary physics, 2005, 46(5):323-351. doi: 10.1080/00107510500052444 [24] CLAUSET A, SHALIZI C R, NEWMAN M E J. Power-law distributions in empirical data[J]. SIAM review, 2009, 51(4):661-703. doi: 10.1137/070710111 [25] LI B H, ZHANG C, CHEN W H, et al. Dynamic reverse furthest neighbor querying algorithm of moving objects[C]//ADMA 2016:Advanced Data Mining and Applications. Cham:Springer, 2016:266-279. http://en.cnki.com.cn/Article_en/CJFDTotal-XXWX201606003.htm [26] 郑伟, 李博涵, 王雅楠, 等.融合偏好交互的组推荐算法模型[J].小型微型计算机系统, 2018, 2(39):372-378. http://d.old.wanfangdata.com.cn/Periodical/xxwxjsjxt201802034 [27] CLAUSET A, SHALIZI C R, NEWMAN M E J. Power law distributions in empirical data[J]. SIAM Review, 2009, 51(4):661-703. doi: 10.1137/070710111 [28] 高珂. 基于消费者行为分析的电子商务市场研究[D]. 山东青岛: 青岛理工大学, 2010. http://cdmd.cnki.com.cn/Article/CDMD-10429-1011302212.htm