Study of click through rate prediction in online advertisement
-
摘要: 随着互联网的发展和用户的增长,广告行业从传统的线下广告模式,逐步转变为线上广告模式.同时,由于大数据分析技术的运用,线上广告模式相比于传统广告也体现了巨大的优越性.广告主之间相互竞争,通过竞价的方式,将自己的广告投放在运营媒体的广告位上.所以,在投放前预测该广告可能被用户点击的概率(CTR),对于广告主减少成本和增加可能收入来说非常重要.本文在调研了目前常用的广告点击率预测模型的基础上,选取广告主、广告和投放媒体平台信息作为预测模型的特征,采用真实数据集验证说明各种模型的优劣性,以及不同特征对广告点击率预测结果的影响.Abstract: With the development of the Internet and the growth of users, the advertising industry originated from the traditional offline advertising model, is gradually transforming into online advertising model. At the same time, due to the use of large data analysis technology, online advertising shows great advantages when compared with traditional advertising. The advertisers deliver their advertisements to the platform's specific positions by competition auction of counterparts. Therefore, it is important to predict the click through rate (CTR) of a given advertisement before auction, which is important for advertisers to reduce costs and expand their likely revenue.This paper introduces the commonly used ad click rate prediction model, uses the information from different advertisers, advertisements and media platforms as the features of machine learning, and uses real data sets to illustrate the advantages of various models, and the impact of different features on the ad click rate.
-
Key words:
- computational advertising /
- CTR /
- machine learning
-
表 1 日志字段对应的特征
Tab. 1 The log field corresponding to the feature
特征类型 日志字段 一天中的时间段 tis 地域 ip 竞价平台 adx 流量类型 devicetype 平台 platform 浏览器 browser 操作系统 os 广告位ID adslot_id 广告位位置 adslot_pos 活动ID campaign_id 活动组ID group_id 素材ID creative_id 素材尺寸 creative_size 广告主ID advertiser_id 广告代理ID ad_agent_id 表 2 不同特征对模型预测结果的影响
Tab. 2 Impact of Different Features on Model Prediction
特征评价指标 广告信息 广告信息+用户信息 广告信息+用户信息+媒体平台信息 Precision 0.705 4 0.764 4 0.774 4 Auc 0.704 9 0.830 3 0.836 9 Logloss 0.598 5 0.480 8 0.474 9 -
[1] GABRILOVICH E. An Overview of Computational Advertising[R/OL].[2013-03-21]. http://research.yahoo.com/pub/2915. [2] AGARWAL D, CHAKRABARTI D. Statistical Challenge in Online Advertising[R/OL].[2013-03-21]. http://research.yahoo.com/pub/2430. [3] 纪文迪, 王晓玲, 周傲英.广告点击率估算技术综述[J].华东师范大学学报(自然科学版), 2013(3):2-14. http://xblk.ecnu.edu.cn/CN/abstract/abstract24855.shtml [4] AGARWAL D, AGRAWAL R, KHANNA R, et al. Estimating rates of rare events with multiple hierarchies through scalable log-linear models[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2010:213-222. [5] RICHARDSON M, DOMINOWSKA E, RAGNO R. Predicting clicks:estimating the click-through rate for new ads[C]//International Conference on World Wide Web. ACM, 2007:521-530. [6] HE X, PAN J, JIN O, et al. Practical Lessons from Predicting Clicks on Ads at Facebook[C]//Eighth International Workshop on Data Mining for Online Advertising. ACM, 2014:1-9. [7] CHAPELLE O, ZHANG Y. A dynamic bayesian network click model for web search ranking[C]//International Conference on World Wide Web. ACM, 2009:1-10. [8] DUPRET G E, PIWOWARSKI B. A user browsing model to predict search engine click data from past observations[C]//International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2008:331-338. [9] DAVE K, VARMA V. Predicting the click-through rate for rare/new ads[R]. Center for Search and Information Extraction Lab International Institute of Information Technology Hyderabad, INDIA, 2010. [10] REGELSON M, FAIN D. Predicting click-through rate using keyword clusters[C]//Proceedings of the Second Workshop on Sponsored Search Auctions, 2006:9623. [11] RENDLE S. Factorization machines[C]//IEEE International Conference on Data Mining. IEEE Computer Society, 2010:995-1000. [12] WANG X, LI W, CUI Y, et al. Click-through rate estimation for rare events in online advertising[G]//HUA X S, MEI T, HANJALIC A. Online Multimedia Advertising:Techniques and Technologies. Hershey:IGI Global, 2010. doi:10.4018/978-1-60960-189-8.ch001. [13] AGARWAL D, BRODER A Z, CHAKRABARTI D, et al. Estimating rates of rare events at multiple resolutions[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining-Kdd. ACM, 2007:16-25. [14] AGARWAL D, CHEN B C, ELANGO P. Spatio-temporal models for estimating click-through rate[C]//International Conference on World Wide Web. ACM, 2009:21-30. [15] SCHONLAU M. Boosted regression (boosting):An introductory tutorial and a stata plugin[J]. Stata Journal, 2005, 5(3):330-354. http://ageconsearch.umn.edu/handle/117524 [16] BURGES C J C. From ranknet to lambdarank to lambdamart:An overview[R]. Microsoft Research Technical Report, 2010. [17] FANG Y, LIU J. A novel prior-based real-time click through rate prediction model[J]. International Journal of Machine Learning & Cybernetics, 2014, 5(6):887-895. doi: 10.1007/s13042-014-0231-7 [18] FAIN D C, PEDERSEN J O. Sponsored search:A brief history[J]. Bulletin of the American Society for Information Science & Technology, 2010, 32(2):12-13. http://www.ist.psu.edu/faculty_pages/jjansen/academic/asist_bulletin_paid_search/03_pedersen.pdf [19] RICHARDSON M, DOMINOWSKA E, RAGNO R. Predicting clicks:estimating the click-through rate for new ads[C]//International Conference on World Wide Web. ACM, 2007:521-530. [20] JOACHIMS T, GRANKA L, PAN B, et al. Accurately interpreting clickthrough data as implicit feedback[C]//Proceedings of the 28th Annual International ACM SIGIR, 2005:154-161.