Methods and progress in deep neural network model compression
-
摘要: 深度神经网络(Deep Neural Network, DNN)模型通过巨大的内存消耗和高计算量来实现强大的性能, 难以部署在有限资源的硬件平台上. 通过模型压缩来降低内存成本和加速计算已成为热点问题, 近年来已有大量的这方面的研究工作. 主要介绍了4种具有代表性的深度神经网络压缩方法,即网络剪枝、量化、知识蒸馏和紧凑神经网络设计; 着重介绍了近年来具有代表性的压缩模型方法及其特点; 最后, 总结了模型压缩的相关评价标准和研究前景.Abstract: The deep neural network (DNN) model achieves strong performance using substantial memory consumption and high computational power, which can be difficult to deploy on hardware platforms with limited resources. To meet these challenges, researchers have made great strides in this field and have formed a wealth of relevant literature and methods. This paper introduces four representative compression methods for deep neural networks used in recent years: network pruning, quantization, knowledge distillation, and compact network design; in particular, the article focuses on the characteristics of these representative models. Finally, evaluation criteria and research prospects of model compression are summarized.1) “ ×”代表与原始速度(模型大小)相比, 压缩后的加速(压缩)倍率.
-
表 1 神经网络压缩方法概要
Tab. 1 Summary of neural network compression methods
压缩方法 描述 优缺点 适用场景 剪枝 移除已训练好模型中冗余的、信息量较少的权重 降低网络复杂度, 解决过拟合问题; 但需要设计专有的计算库, 计算复杂度高 已知预训练模型, 微调时需要原始数据集. 适合计算内存和存储容量低的设备 量化 减少表示一个权值所需比特数 与硬件相结合, 大大提高推理速度; 但精度下降明显 适合实时推理速度较高且计算内存低的场景 知识蒸馏 学生模型学习大型教师模型知识 大大降低计算量和存储量; 主要用于分类任务, 适用范围窄, 且知识定义困难 已知教师预训练模型, 适用于数据集较小或者没有数据集的情况 紧凑神经网络 设计更紧凑的卷积核或卷积方式 通用卷积网络, 网络参数量减少; 特殊卷积核计算较慢 端到端训练压缩模型, 有完整的训练、测试数据集 表 2 CIFAR10和ImageNet数据集上不同量化方法的性能对比
Tab. 2 Performance comparison of different quantization methods on the CIFAR10 and ImageNet datasets
网络(数据集) 压缩方法 W/bit A/bit accTop-1/% accTop-5/% VGG-Small(CIFAR10) 32 32 93.8 BNN[22] 1 1 89.9 XNOR-Net[38] 1 1 89.8 IR-Net[34] 1 1 90.4 TSQ[35] 3 2 93.5 BWN[38] 1 32 90.1 ResNet-18(ImageNet) 32 32 69.6 89.20 BWN[38] 1 32 60.8 83.00 Bi-Real[21] 1 1 56.4 79.50 TWN[23] 2 32 61.8 84.20 IR-Net[34] 1 32 62.9 84.10 IR-Net[34] 1 1 58.1 80.00 BENN[30] 1 1 61.0 CI-BCNN[31] 1 1 59.9 84.18 CBCN[32] 1 1 61.4 82.80 表 3 ImageNet数据集上不同紧凑神经网络方法的性能对比
Tab. 3 Performance comparison of different compact neural network methods on the ImageNet dataset
模型 Param/(× 106) accTop-1/% FLOPs/(× 109) 推理延迟/ms SqueezeNet[52] 1.25 57.5 1.70 MobileNetV2[54] 3.40 70.6 0.30 75 MobileNetV3[55] 5.40 75.2 0.22 ShuffleNetV1[56] 3.40 71.5 0.53 108 ShuffleNetV2[57] 5.30 73.7 0.30 GhostNet[59] 5.20 73.9 0.14 Oct-MobileNetV2[60] 3.50 72.0 0.27 53 CondenseNet[62] 4.80 73.2 0.53 1 890 表 4 ImageNet数据集上不同压缩算法的性能对比
Tab. 4 Performance comparison of different compression methods on the ImageNet dataset
网络 压缩方法 Param/(× 106) $ {{\varphi}}$ accTop-1/% accTop-5/% FLOPs/(× 109) $ {{\phi}} $ AlexNet 61 1 × 57.22 80.27 0.72 1.0 × Han等[33] 1.70 35 × 57.22 80.30 3.0 × Zhang等[18] 2.90 21 × 80.20 VGG-16 138.00 1 × 68.50 88.68 15.50 1.0 × Luo等[23] 8.32 16.63 × 67.34 87.92 9.34 2.3 × Han等[33] 11.30 49 × 68.83 89.09 (3.0 ~ 4.0) × Yu等[66] 9.70 15 × 68.75 89.06 Cheng等[67] 28.00 19.6 × 67.37 88.23 4.9 × Hu等[68] 9.20 15 × 64.78 86.03 4.40 2.5 × ResNet-50 25.56 1 × 72.88 91.14 7.72 1.0 × Luo等[23] 8.66 2.60 × 68.42 88.30 2.20 Zhuang等[24] 12.38 2.06 × 71.82 90.53 3.41 Pierre等[35] 5.09 19 × 73.79 注: “×”代表与原始速度(模型大小)相比,压缩后的加速(压缩)倍率 表 5 Microsoft COCO数据集上不同压缩方法的性能对比
Tab. 5 Performance comparison of different compression methods on the Microsoft COCO datase
模型 骨干网络 输入维度 Param/(× 106) FLOPs/(× 109) AP AP0.5 AP0.75 Yolov3-Tiny[69] Tiny-Darknet 416 × 416 12.30 3.49 33.1 Pelee[11] PeleeNet 304 × 304 5.98 1.39 22.4 38.3 22.9 SSD[53] MobileNetV1 300 × 300 6.80 1.20 19.3 SSD-lite[54] MobileNetV2 320 × 320 4.30 0.80 22.1 Tiny-DSOD[70] DDB-Net+D-FPN 300 × 300 1.12 23.2 40.4 22.8 ThunderNet[14] SNet146 320 × 320 0.47 23.7 40.3 24.6 EfficientDet*[71] EfficientNet-B0 512 × 512 3.90 2.50 33.8 52.2 35.8 FQN-INT4*[72] RetinaNet18 800 × 800 28.6 46.9 29.9 注: 带*代表数据集使用Microsoft COCO 2017, 不带*代表数据集使用Microsoft COCO 2015 -
[1] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge [J]. International journal of computer vision, 2015, 115(3): 211-252. [2] HE Y, SAINATH T N, PRABHAVALKAR R, et al. Streaming end-to-end speech recognition for mobile devices [C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019: 6381-6385. [3] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2019-05-24)[2020-07-02]. https://arxiv.org/pdf/1810.04805.pdf. [4] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015-04-10)[2020-07-02]. https://arxiv.org/pdf/1409.1556.pdf. [5] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 770-778. DOI: 10.1109/CVPR.2016.90. [6] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2261-2269. DOI: 10.1109/CVPR.2017.243. [7] CHENG Y, WANG D, ZHOU P, et al. A survey of model compression and acceleration for deep neural networks [EB/OL]. (2020-06-14)[2020-07-02]. https://arxiv.org/pdf/1710.09282.pdf. [8] 雷杰, 高鑫, 宋杰, 等. 深度网络模型压缩综述 [J]. 软件学报, 2018, 29(2): 251-266. [9] CHOUDHARY T, MISHRA V, GOSWAMI A, et al. A comprehensive survey on model compression and acceleration [J/OL]. Artificial Intelligence Review, 2020. (2020-02-08)[2020-07-02]. https://doi.org/10.1007/s10462-020-09816-7. [10] 李江昀, 赵义凯, 薛卓尔, 等. 深度神经网络模型压缩综述 [J]. 工程科学学报, 2019, 41(10): 1229-1239. [11] WANG R J, LI X, LING C X. Pelee: A real-time object detection system on mobile devices [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2018: 1967-1976. [12] CHEN X L, GIRSHICK R, HE K M, et al. TensorMask: A foundation for dense object segmentation [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 2061-2069. [13] SANH V, DEBUT L, CHAUMOND J, et al. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter [EB/OL]. (2020-01-24)[2020-07-01]. https://arxiv.org/pdf/1910.01108v3.pdf. [14] QIN Z, LI Z, ZHANG Z, et al. ThunderNet: Towards real-time generic object detection on mobile devices [C]//Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2019: 6718-6727. [15] ANWAR S, SUNG W. Coarse pruning of convolutional neural networks with random masks[EB/OL]. [2020-07-02]. https://openreview.net/pdf?id=HkvS3Mqxe. [16] LECUN Y, DENKER J S, SOLLA S A. Optimal brain damage [C]//Advances in Neural Information Processing Systems. 1989: 598-605. [17] HASSIBI B, STORK D G. Second order derivatives for network pruning: Optimal brain surgeon [C]//Advances in Neural Information Processing Systems. 1993: 164-171. [18] ZHANG T, YE S, ZHANG K, et al. A systematic dnn weight pruning framework using alternating direction method of multipliers [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 184-199. [19] MA X L, GUO F M, NIU W, et al. PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices [C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20). 2020: 5117-5124. [20] HE Y, ZHANG X, SUN J. Channel pruning for accelerating very deep neural networks [C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 1389-1397. [21] CHIN T W, DING R, ZHANG C, et al. Towards efficient model compression via learned global ranking [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1518-1528. [22] MOLCHANOV P, MALLYA A, TYREE S, et al. Importance estimation for neural network pruning [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 11264-11272. [23] LUO J H, WU J, LIN W. Thinet: A filter level pruning method for deep neural network compression [C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 5058-5066. [24] ZHUANG Z W, TAN M K, ZHUANG B, et al. Discrimination-aware channel pruning for deep neural networks [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems(NIPS’18). New York: Curran Associates Inc., 2018: 883–894. [25] HE Y, LIU P, WANG Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4340-4349. [26] LIN M, JI R, WANG Y, et al. HRank: Filter pruning using high-rank feature map [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1529-1538. [27] LIN X, ZHAO C, PAN W. Towards accurate binary convolutional neural network [C]//Advances in Neural Information Processing Systems. 2017: 345-353. [28] LIU Z, WU B, LUO W, et al. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 722-737. [29] HUBARA I, COURBARIAUX M, SOUDRY D, et al. Binarized neural networks [C]//Advances in Neural Information Processing Systems. 2016: 4107-4115. [30] LI F F, ZHANG B, LIU B. Ternary weight networks [EB/OL]. (2016-11-19)[2020-07-03]. https://arxiv.org/pdf/1605.04711.pdf. [31] WANG P, CHENG J. Fixed-point factorized networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4012-4020. [32] BOROUMAND A, GHOSE S, KIM Y, et al. Google workloads for consumer devices: Mitigating data movement bottlenecks [C]//Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 2018: 316-331. [33] HAN S, MAO H Z, DALLY W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding [EB/OL]. (2015-11-20)[2020-07-03]. https://arxiv.org/pdf/1510.00149v3.pdf. [34] CHEN W, WILSON J, TYREE S, et al. Compressing neural networks with the hashing trick [C]// International Conference on Machine Learning. 2015: 2285-2294. [35] STOCK P, JOULIN A, GRIBONVAL R, et al. And the bit goes down: Revisiting the quantizetion of neural networks [EB/OL]. (2019-12-20)[2020-07-02]. https://arxiv.org/pdf/1907.05686.pdf. [36] CARREIRA-PERPINÁN M A, IDELBAYEV Y. Model compression as constrained optimization, with application to neural nets. Part Ⅱ: Quantization [EB/OL]. (2017-07-13)[2020-07-03]. https://arxiv.org/pdf/1707.04319.pdf. [37] ZHU S, DONG X, SU H. Binary ensemble neural network: More bits per network or more networks per bit? [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4923-4932. [38] WANG Z, LU J, TAO C, et al. Learning channel-wise interactions for binary convolutional neural networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 568-577. [39] LIU C, DING W, XIA X, et al. Circulant binary convolutional networks: Enhancing the performance of 1-bit dcnns with circulant back propagation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2691-2699. [40] COURBARIAUX M, BENGIO Y, DAVID J P. BinaryConnect: Training deep neural networks with binary weights during propagations [C]//Advances in Neural Information Processing Systems. 2015: 3123-3131. [41] QIN H T, GONG R H, LIU X L, et al. Forward and backward information retention for accurate binary neural networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 2247-2256. [42] WANG P, HU Q, ZHANG Y, et al. Two-step quantization for low-bit neural networks [C]// Proceedings of the IEEE Conference on computer vision and pattern recognition. 2018: 4376-4384. [43] MELLEMPUDI N, KUNDU A, MUDIGERE D, et al. Ternary neural networks with fine-grained quantization [EB/OL]. (2017-05-30)[2020-07-03]. https://arxiv.org/pdf/1705.01462.pdf. [44] ZHU F, GONG R, YU F, et al. Towards unified int8 training for convolutional neural network [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1969-1979. [45] RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: Imagenet classification using binary convolutional neural networks [C]//European Conference on Computer Vision. Cham: Springer, 2016: 525-542. [46] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [EB/OL]. (2015-03-09)[2020-07-04]. https://arxiv.org/pdf/1503.02531.pdf. [47] TIAN Y L, KRISHNAN D, ISOLA P. Contrastive representation distillation [EB/OL]. (2020-01-18)[2020-07-04]. https://arxiv.org/pdf/1910.10699.pdf. [48] FURLANELLO T, LIPTON Z C, TSCHANNEN M, et al. Born again neural networks [C]//Proceedings of the 35th International Conference on Machine Learning. 2020: 1602-1611. [49] GAO M Y, SHEN Y J, LI Q Q, et al. Residual knowledge distillation [EB/OL]. (2020-02-21)[2020-07-04]. https://arxiv.org/pdf/2002.09168.pdf. [50] HE T, SHEN C, TIAN Z, et al. Knowledge adaptation for efficient semantic segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 578-587. [51] LIN M, CHEN Q, YAN S C. Network in network [EB/OL]. (2014-03-04)[2020-07-04]. https://arxiv.org/pdf/1312.4400/. [52] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size [EB/OL]. (2016-11-04)[2020-07-04]. https://arxiv.org/pdf/1602.07360.pdf. [53] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications [EB/OL]. (2017-04-17)[2020-07-04]. https://arxiv.org/pdf/1704.04861.pdf. [54] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks [C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520. [55] HOWARD A, SANDLER M, CHU G, et al. Searching for mobilenetv3 [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1314-1324. [56] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6848-6856. [57] MA N, ZHANG X, ZHENG H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 116-131. [58] HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141. [59] HAN K, WANG Y, TIAN Q, et al. GhostNet: More features from cheap operations [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1580-1589. [60] CHEN Y, FAN H, XU B, et al. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 3435-3444. [61] CAMPBELL F W, ROBSON J G. Application of Fourier analysis to the visibility of gratings [J]. The Journal of Physiology, 1968, 197(3): 551-566. [62] HUANG G, LIU S, VAN DER MAATEN L, et al. Condensenet: An efficient densenet using learned group convolutions [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 2752-2761. [63] JADERBERG M, VEDALDI A, ZISSERMAN A. Speeding up convolutional neural networks with low rank expansions [EB/OL]. (2014-05-15)[2020-07-04]. https://arxiv.org/pdf/1405.3866.pdf. [64] POLINO A, PASCANU R, ALISTARH D. Model compression via distillation and quantization [EB/OL]. (2018-02-15)[2020-07-04]. https://arxiv.org/pdf/1802.05668.pdf. [65] 蔡瑞初, 钟椿荣, 余洋, 等. 面向“边缘”应用的卷积神经网络量化与压缩方法 [J]. 计算机应用, 2018, 38(9): 2449-2454. [66] YU X Y, LIU T L, WANG X C, et al. On compressing deep models by low rank and sparse decomposition [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 7370-7379. [67] CHENG J, WU J X, LENG C, et al. Quantized CNN: A unified approach to accelerate and compress convolutional networks [J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(10): 4730-4743. [68] HU H Y, PENG R, TAI Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures [EB/OL]. (2016-07-12)[2020-7-04]. https://arxiv.org/pdf/1607.03250.pdf. [69] WANG R J, LI X, LING C X. Pelee: A real-time object detection system on mobile devices [C]//Advances in Neural Information Processing Systems. 2018: 1963-1972. [70] LI Y, LI J, LIN W, et al. Tiny-DSOD: Lightweight object detection for resource-restricted usages[EB/OL]. (2018-07-29)[2020-07-04]. https://arxiv.org/pdf/1807.11013.pdf. [71] TAN M, PANG R, LE Q V.Efficientdet: Scalable and efficient object detection [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10781-10790. [72] LI R, WANG Y, LIANG F, et al. Fully quantized network for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2810-2819.