Self-attention based neural networks for product titles compression
-
摘要: 大部分电子商务网站为了吸引用户的关注,通常将商品的很多属性也纳入到商品名称中,使得商品名称中包括了冗余的信息,并产生不一致性.为解决这一的问题,提出了一个基于自注意力机制的商品名称精简模型,并针对自注意力机制网络无法直接捕捉商品名称序列特征的问题,利用门控循环单元的时序特性对自注意力机制进行了时序增强,以较小的计算代价换取了商品命名精简任务整体性能的提升.在公开商品短标题数据集LESD4EC的基础上,构造了商品名称精简数据集LESD4EC_L和LESD4EC_S,并进行了模型验证.一系列的实验结果表明本,所提出的自注意力机制冗长商品名称精简方法相对于其他商品名称精简方法在效果上有较大的提升.Abstract: E-commerce product title compression has received significant attention in recent years, since it can facilitate more specific information for cross-platform knowledge alignment and multi-source data fusion. Product titles usually contain redundant descriptions, which can lead to inconsistencies. In this paper, we propose self-attention based neural networks for this task. Given the fact that self-attention mechanism networks cannot directly capture sequence features of product names, we enhance the mapping networks with a dot-attention structure, which was computed for the query and key-value pairs by a gated recurrent unit (GRU) based recurrent neural network. The proposed method improves the analytical capability of the model at a lower relative computational cost. Based on data from LESD4EC, we built two E-commerce datasets of product core phrases named LESD4EC L and LESD4EC S; we subsequently tested the model on these two datasets. A series of experiments show that the proposed model achieves better performance in product title compression than existing techniques.
-
算法1 ERS-NET商品名称精简算法 输入:训练集商品名称$T$_train, 训练集商品标记$Y$, 测试集商品名称$T$_test, 模型迭代次数Steps($st$), 自注意计算头数$m$ 输出:测试集精简商品短标题$S$ 1: train: 2: for all网络参数do 3: 使用截断正态分布随机初始化参数 4: $\textbf{end for}$ 5: /*读取数据*/ 6: ${T}'\leftarrow $词嵌入网络(T_train) 7: for st $\leftarrow $ 0 to steps $\textbf{do}$ 8: $\textbf{for}$ $t$ in $T'$ 9: $h_{t} \leftarrow $ BiGRU($t$, $h_{t-1})$ 10: $\textbf{end for}$ 11: for $i \leftarrow $ 0 to $m$-1 do 12: 使用公式(6)计算商品词语义关系 13: 使用公式(8)计算单头注意力得分 14: $\textbf{end for}$ 15: $H \leftarrow $ concat (ATT) 16: $Y' \leftarrow $ Softmax(H) 17: $L\leftarrow $ -$\sum y$log$y'$, $y\in Y$且$y'\in Y'$ 18: $\textbf{if}$ $\textit{st}$ == 0 $\textbf{or}$ L < min L 19: model.save() 20: min$L$ $\leftarrow L$ 21: 使用Adam方法最小化交叉熵损失函数$L$来更新网络参数 22: end for 23: test: 24: model.load() 25: Y_predict $\leftarrow $ modelT_test) 26: S←Y_predict$\cap$ T_test 表 1 商品数据集合并样例析
Tab. 1 Example of product title datasets union
商品名称 标注者1 标注者2 标注者3 标注者4 标注者5 标注者6 标注者7 标注者8 LESD4EC_L LESD4EC_S 谭木匠, 新品, 礼盒, 小, 可爱, 木梳, 年轻款, 送, 孩子, 女生, 儿童节, 礼品 小, 可爱, 木梳, 礼盒 谭木匠, 小, 可爱, 木梳 谭木匠, 可爱, 木梳 谭木匠, 可爱, 木梳 卡通, 趣味, 小, 可爱, 木梳 礼盒, 可爱, 木梳 礼盒, 小, 可爱, 木梳 谭木匠, 可爱, 木梳 谭木匠, 礼盒, 小, 可爱, 木梳 谭木匠, 礼盒, 木梳 算法2 商品名称精简数据集合并算法 输入:商品数据集LESD4EC, 关键词上限个数$\gamma $ 输出:商品名称精简数据集LESD4EC_L, LESD4EC_S 1: for title in LESD4EC 2: repeat 3: 将与title相同ID的商品标记$Y$合并; 4: 更新至LESD4EC_L; 5: until无可更新数据; 6: 根据NER信息为title标记分配权重, 根据权重计算标记得分Score; 7: 根据Score得分, 在title内部进行词汇排序, 并剔除Score为0的词汇; 8: if有效标记词汇数 < $\gamma $ 9: 将有效标记词汇更新到LESD4EC_S; 10: else 11: 将前$\gamma $个标记词汇更新到LESD4EC_S; 12: end for 表 2 在不同数据集上的商品名称精简实验结果
Tab. 2 Results on different datasets
模型 LESD4EC_L LESD4EC_S ROUGE_$P$/% ROUGE_$R$/% ROUGE_F1/% ROUGE_$P$/% ROUGE_$R$/% ROUGE_F1/% Seq2Seq 60.22 72.71 65.88 76.40 78.37 77.37 Self-ATT 73.36 74.10 73.73 79.18 82.17 80.65 FE-NET 73.02 74.84 73.92 81.73 85.30 83.48 ERS-NET 74.86 76.62 75.55 82.94 86.85 84.85 表 3 不同GPU下每万条数据执行时间
Tab. 3 Computational time per ten thousand data items on GPU
GPUs 花费时间/s Self-ATT ERS-NET Quadro P600 11.179 10.365 GTX 1060 3.957 3.831 表 4 两个真实应用场景下的冗余商品名称精简案例
Tab. 4 Case study with real application scenarios
案例1 标签_L 精简预测_L 标签_S 精简预测_S geras, 童装, 男童, 圆领, 套头, 卫衣, 套装, 春秋, 新品, 儿童, 运动, 纯棉, 两件套 geras, 男童, 圆领, 套头, 卫衣, 套装 geras, 男童, 圆领, 卫衣, 套装 geras, 卫衣 geras, 卫衣 案例2 标签_L 精简预测_L 标签_S 精简预测_S jabra, 捷波朗, elite, 运动, 臻, 跃, 心率, 无线, 蓝牙, 跑步, 防水, 耳机, 新品 jabra, 捷波朗, elite, 运动, 心率, 无线, 耳机 捷波朗, 运动, 心率, 无线, 蓝牙, 防水, 耳机 jabra, 捷波朗, 耳机 jabra, 蓝牙, 耳机 -
[1] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in neural information processing systems. 2017: 5998-6008. [2] GONG Y, LUO X S, ZHU K Q, et al. Automatic generation of chinese short product titles for mobile display[J]. arXiv preprint arXiv: 1803.11359, 2018. [3] LIU Z Y, HUANG W Y, ZHENG Y B, et al. Automatic keyphrase extraction via topic decomposition[C]//Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, 2010: 366-376. [4] ROSE S, ENGEL D, CRAMER N, et al. Automatic keyword extraction from individual documents[M]//Text mining: Applications and theory. Hoboken: A John Wiley and Sons, Ltd., 2010: 1-20. [5] MIHALCEA R, TARAU P. Textrank: Bringing order into text[C]//Proceedings of the 2004 conference on empirical methods in natural language processing. 2004. [6] ZHAO W X, JIANG J, HE J, et al. Topical keyphrase extraction from Twitter[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2011: 379-388. [7] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. doi: 10.1162/neco.1997.9.8.1735 [8] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1724-1734. [9] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473, 2014. [10] LUONG T, PHAM H, MANNING C D. Effective approaches to attention-based neural machine translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1412-1421. [11] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems. 2014: 3104-3112. [12] NALLAPATI R, ZHOU B W, DOS SANTOS C, et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond[C]//Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. 2016: 280-290. [13] NALLAPATI R, ZHAI F F, ZHOU B W. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI-17). 2017: 3075-3081. [14] SEE A, LIU P J, MANNING C D. Get to the point: Summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017: 1073-1083. [15] VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[C]//Advances in Neural Information Processing Systems 28(NIPS 2015). 2015: 2692-2700. [16] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems 27(NIPS 2014). 2014: 2672-2680. [17] ZHANG J, ZOU P, LI Z, et al. Multi-modal generative adversarial network for short product title generation in mobile e-commerce[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2(Industry Papers). 2019: 64-72. [18] WANG J G, TIAN J F, QIU L, et al. A multi-task learning approach for improving product title compression with user search log data[C]//32nd AAAI Conference on Artificial Intelligence. 2018: 451-458. [19] KINGMA D P, BA J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2014. [20] LIN C Y, HOVY E. Automatic evaluation of summaries using n-gram co-occurrence statistics[C]//Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 2003: 150-157.