中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

深度神经网络模型压缩方法与进展

赖叶静 郝珊锋 黄定江

赖叶静, 郝珊锋, 黄定江. 深度神经网络模型压缩方法与进展[J]. 华东师范大学学报(自然科学版), 2020, (5): 68-82. doi: 10.3969/j.issn.1000-5641.202091001
引用本文: 赖叶静, 郝珊锋, 黄定江. 深度神经网络模型压缩方法与进展[J]. 华东师范大学学报(自然科学版), 2020, (5): 68-82. doi: 10.3969/j.issn.1000-5641.202091001
LAI Yejing, HAO Shanfeng, HUANG Dingjiang. Methods and progress in deep neural network model compression[J]. Journal of East China Normal University (Natural Sciences), 2020, (5): 68-82. doi: 10.3969/j.issn.1000-5641.202091001
Citation: LAI Yejing, HAO Shanfeng, HUANG Dingjiang. Methods and progress in deep neural network model compression[J]. Journal of East China Normal University (Natural Sciences), 2020, (5): 68-82. doi: 10.3969/j.issn.1000-5641.202091001

深度神经网络模型压缩方法与进展

doi: 10.3969/j.issn.1000-5641.202091001
基金项目: 国家自然科学基金(11501204, U1711262)
详细信息
    通讯作者:

    黄定江, 男, 教授, 博士生导师,研究方向为机器学习与人工智能. E-mail: djhuang@dase.ecnu.edu.cn

  • “ ×”代表与原始速度(模型大小)相比, 压缩后的加速(压缩)倍率.
  • 中图分类号: TP391

Methods and progress in deep neural network model compression

  • 摘要: 深度神经网络(Deep Neural Network, DNN)模型通过巨大的内存消耗和高计算量来实现强大的性能, 难以部署在有限资源的硬件平台上. 通过模型压缩来降低内存成本和加速计算已成为热点问题, 近年来已有大量的这方面的研究工作. 主要介绍了4种具有代表性的深度神经网络压缩方法,即网络剪枝、量化、知识蒸馏和紧凑神经网络设计; 着重介绍了近年来具有代表性的压缩模型方法及其特点; 最后, 总结了模型压缩的相关评价标准和研究前景.
    1)  “ ×”代表与原始速度(模型大小)相比, 压缩后的加速(压缩)倍率.
  • 图  1  4种剪枝粒度

    Fig.  1  Four pruning granularities

    图  2  DCP 框架[24]

    Fig.  2  The DCP framework[24]

    图  3  通过标量量化(顶部)和中心体微调(底部)共享权重[33]

    Fig.  3  Weight sharing by scalar quantization (top) and fine-tuning of centroids (bottom)[33]

    图  4  知识蒸馏.

    Fig.  4  Knowledge distillation

    图  5  MobileNetV1结构

    Fig.  5  Structure of MobileNetV1

    表  1  神经网络压缩方法概要

    Tab.  1  Summary of neural network compression methods

    压缩方法描述优缺点适用场景
    剪枝 移除已训练好模型中冗余的、信息量较少的权重 降低网络复杂度, 解决过拟合问题; 但需要设计专有的计算库, 计算复杂度高 已知预训练模型, 微调时需要原始数据集. 适合计算内存和存储容量低的设备
    量化 减少表示一个权值所需比特数 与硬件相结合, 大大提高推理速度; 但精度下降明显 适合实时推理速度较高且计算内存低的场景
    知识蒸馏 学生模型学习大型教师模型知识 大大降低计算量和存储量; 主要用于分类任务, 适用范围窄, 且知识定义困难 已知教师预训练模型, 适用于数据集较小或者没有数据集的情况
    紧凑神经网络 设计更紧凑的卷积核或卷积方式 通用卷积网络, 网络参数量减少; 特殊卷积核计算较慢 端到端训练压缩模型, 有完整的训练、测试数据集
    下载: 导出CSV

    表  2  CIFAR10和ImageNet数据集上不同量化方法的性能对比

    Tab.  2  Performance comparison of different quantization methods on the CIFAR10 and ImageNet datasets

    网络(数据集)压缩方法W/bitA/bitaccTop-1/%accTop-5/%
    VGG-Small(CIFAR10) 32 32 93.8
    BNN[22] 1 1 89.9
    XNOR-Net[38] 1 1 89.8
    IR-Net[34] 1 1 90.4
    TSQ[35] 3 2 93.5
    BWN[38] 1 32 90.1
    ResNet-18(ImageNet) 32 32 69.6 89.20
    BWN[38] 1 32 60.8 83.00
    Bi-Real[21] 1 1 56.4 79.50
    TWN[23] 2 32 61.8 84.20
    IR-Net[34] 1 32 62.9 84.10
    IR-Net[34] 1 1 58.1 80.00
    BENN[30] 1 1 61.0
    CI-BCNN[31] 1 1 59.9 84.18
    CBCN[32] 1 1 61.4 82.80
    下载: 导出CSV

    表  3  ImageNet数据集上不同紧凑神经网络方法的性能对比

    Tab.  3  Performance comparison of different compact neural network methods on the ImageNet dataset

    模型Param/(× 106)accTop-1/%FLOPs/(× 109)推理延迟/ms
    SqueezeNet[52] 1.25 57.5 1.70
    MobileNetV2[54] 3.40 70.6 0.30 75
    MobileNetV3[55] 5.40 75.2 0.22
    ShuffleNetV1[56] 3.40 71.5 0.53 108
    ShuffleNetV2[57] 5.30 73.7 0.30
    GhostNet[59] 5.20 73.9 0.14
    Oct-MobileNetV2[60] 3.50 72.0 0.27 53
    CondenseNet[62] 4.80 73.2 0.53 1 890
    下载: 导出CSV

    表  4  ImageNet数据集上不同压缩算法的性能对比

    Tab.  4  Performance comparison of different compression methods on the ImageNet dataset

    网络压缩方法Param/(× 106)$ {{\varphi}}$accTop-1/%accTop-5/%FLOPs/(× 109)$ {{\phi}} $
    AlexNet 61 1 × 57.22 80.27 0.72 1.0 ×
    Han等[33] 1.70 35 × 57.22 80.30 3.0 ×
    Zhang等[18] 2.90 21 × 80.20
    VGG-16 138.00 1 × 68.50 88.68 15.50 1.0 ×
    Luo等[23] 8.32 16.63 × 67.34 87.92 9.34 2.3 ×
    Han等[33] 11.30 49 × 68.83 89.09 (3.0 ~ 4.0) ×
    Yu等[66] 9.70 15 × 68.75 89.06
    Cheng等[67] 28.00 19.6 × 67.37 88.23 4.9 ×
    Hu等[68] 9.20 15 × 64.78 86.03 4.40 2.5 ×
    ResNet-50 25.56 1 × 72.88 91.14 7.72 1.0 ×
    Luo等[23] 8.66 2.60 × 68.42 88.30 2.20
    Zhuang等[24] 12.38 2.06 × 71.82 90.53 3.41
    Pierre等[35] 5.09 19 × 73.79
    注: “×”代表与原始速度(模型大小)相比,压缩后的加速(压缩)倍率
    下载: 导出CSV

    表  5  Microsoft COCO数据集上不同压缩方法的性能对比

    Tab.  5  Performance comparison of different compression methods on the Microsoft COCO datase

    模型骨干网络输入维度Param/(× 106)FLOPs/(× 109)APAP0.5AP0.75
    Yolov3-Tiny[69] Tiny-Darknet 416 × 416 12.30 3.49 33.1
    Pelee[11] PeleeNet 304 × 304 5.98 1.39 22.4 38.3 22.9
    SSD[53] MobileNetV1 300 × 300 6.80 1.20 19.3
    SSD-lite[54] MobileNetV2 320 × 320 4.30 0.80 22.1
    Tiny-DSOD[70] DDB-Net+D-FPN 300 × 300 1.12 23.2 40.4 22.8
    ThunderNet[14] SNet146 320 × 320 0.47 23.7 40.3 24.6
    EfficientDet*[71] EfficientNet-B0 512 × 512 3.90 2.50 33.8 52.2 35.8
    FQN-INT4*[72] RetinaNet18 800 × 800 28.6 46.9 29.9
    注: 带*代表数据集使用Microsoft COCO 2017, 不带*代表数据集使用Microsoft COCO 2015
    下载: 导出CSV
  • [1] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge [J]. International journal of computer vision, 2015, 115(3): 211-252.
    [2] HE Y, SAINATH T N, PRABHAVALKAR R, et al. Streaming end-to-end speech recognition for mobile devices [C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019: 6381-6385.
    [3] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2019-05-24)[2020-07-02]. https://arxiv.org/pdf/1810.04805.pdf.
    [4] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015-04-10)[2020-07-02]. https://arxiv.org/pdf/1409.1556.pdf.
    [5] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
    [6] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2261-2269. DOI: 10.1109/CVPR.2017.243.
    [7] CHENG Y, WANG D, ZHOU P, et al. A survey of model compression and acceleration for deep neural networks [EB/OL]. (2020-06-14)[2020-07-02]. https://arxiv.org/pdf/1710.09282.pdf.
    [8] 雷杰, 高鑫, 宋杰, 等. 深度网络模型压缩综述 [J]. 软件学报, 2018, 29(2): 251-266.
    [9] CHOUDHARY T, MISHRA V, GOSWAMI A, et al. A comprehensive survey on model compression and acceleration [J/OL]. Artificial Intelligence Review, 2020. (2020-02-08)[2020-07-02]. https://doi.org/10.1007/s10462-020-09816-7.
    [10] 李江昀, 赵义凯, 薛卓尔, 等. 深度神经网络模型压缩综述 [J]. 工程科学学报, 2019, 41(10): 1229-1239.
    [11] WANG R J, LI X, LING C X. Pelee: A real-time object detection system on mobile devices [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2018: 1967-1976.
    [12] CHEN X L, GIRSHICK R, HE K M, et al. TensorMask: A foundation for dense object segmentation [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 2061-2069.
    [13] SANH V, DEBUT L, CHAUMOND J, et al. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter [EB/OL]. (2020-01-24)[2020-07-01]. https://arxiv.org/pdf/1910.01108v3.pdf.
    [14] QIN Z, LI Z, ZHANG Z, et al. ThunderNet: Towards real-time generic object detection on mobile devices [C]//Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2019: 6718-6727.
    [15] ANWAR S, SUNG W. Coarse pruning of convolutional neural networks with random masks[EB/OL]. [2020-07-02]. https://openreview.net/pdf?id=HkvS3Mqxe.
    [16] LECUN Y, DENKER J S, SOLLA S A. Optimal brain damage [C]//Advances in Neural Information Processing Systems. 1989: 598-605.
    [17] HASSIBI B, STORK D G. Second order derivatives for network pruning: Optimal brain surgeon [C]//Advances in Neural Information Processing Systems. 1993: 164-171.
    [18] ZHANG T, YE S, ZHANG K, et al. A systematic dnn weight pruning framework using alternating direction method of multipliers [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 184-199.
    [19] MA X L, GUO F M, NIU W, et al. PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices [C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20). 2020: 5117-5124.
    [20] HE Y, ZHANG X, SUN J. Channel pruning for accelerating very deep neural networks [C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 1389-1397.
    [21] CHIN T W, DING R, ZHANG C, et al. Towards efficient model compression via learned global ranking [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1518-1528.
    [22] MOLCHANOV P, MALLYA A, TYREE S, et al. Importance estimation for neural network pruning [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 11264-11272.
    [23] LUO J H, WU J, LIN W. Thinet: A filter level pruning method for deep neural network compression [C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 5058-5066.
    [24] ZHUANG Z W, TAN M K, ZHUANG B, et al. Discrimination-aware channel pruning for deep neural networks [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems(NIPS’18). New York: Curran Associates Inc., 2018: 883–894.
    [25] HE Y, LIU P, WANG Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4340-4349.
    [26] LIN M, JI R, WANG Y, et al. HRank: Filter pruning using high-rank feature map [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1529-1538.
    [27] LIN X, ZHAO C, PAN W. Towards accurate binary convolutional neural network [C]//Advances in Neural Information Processing Systems. 2017: 345-353.
    [28] LIU Z, WU B, LUO W, et al. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 722-737.
    [29] HUBARA I, COURBARIAUX M, SOUDRY D, et al. Binarized neural networks [C]//Advances in Neural Information Processing Systems. 2016: 4107-4115.
    [30] LI F F, ZHANG B, LIU B. Ternary weight networks [EB/OL]. (2016-11-19)[2020-07-03]. https://arxiv.org/pdf/1605.04711.pdf.
    [31] WANG P, CHENG J. Fixed-point factorized networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4012-4020.
    [32] BOROUMAND A, GHOSE S, KIM Y, et al. Google workloads for consumer devices: Mitigating data movement bottlenecks [C]//Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 2018: 316-331.
    [33] HAN S, MAO H Z, DALLY W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding [EB/OL]. (2015-11-20)[2020-07-03]. https://arxiv.org/pdf/1510.00149v3.pdf.
    [34] CHEN W, WILSON J, TYREE S, et al. Compressing neural networks with the hashing trick [C]// International Conference on Machine Learning. 2015: 2285-2294.
    [35] STOCK P, JOULIN A, GRIBONVAL R, et al. And the bit goes down: Revisiting the quantizetion of neural networks [EB/OL]. (2019-12-20)[2020-07-02]. https://arxiv.org/pdf/1907.05686.pdf.
    [36] CARREIRA-PERPINÁN M A, IDELBAYEV Y. Model compression as constrained optimization, with application to neural nets. Part Ⅱ: Quantization [EB/OL]. (2017-07-13)[2020-07-03]. https://arxiv.org/pdf/1707.04319.pdf.
    [37] ZHU S, DONG X, SU H. Binary ensemble neural network: More bits per network or more networks per bit? [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4923-4932.
    [38] WANG Z, LU J, TAO C, et al. Learning channel-wise interactions for binary convolutional neural networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 568-577.
    [39] LIU C, DING W, XIA X, et al. Circulant binary convolutional networks: Enhancing the performance of 1-bit dcnns with circulant back propagation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2691-2699.
    [40] COURBARIAUX M, BENGIO Y, DAVID J P. BinaryConnect: Training deep neural networks with binary weights during propagations [C]//Advances in Neural Information Processing Systems. 2015: 3123-3131.
    [41] QIN H T, GONG R H, LIU X L, et al. Forward and backward information retention for accurate binary neural networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 2247-2256.
    [42] WANG P, HU Q, ZHANG Y, et al. Two-step quantization for low-bit neural networks [C]// Proceedings of the IEEE Conference on computer vision and pattern recognition. 2018: 4376-4384.
    [43] MELLEMPUDI N, KUNDU A, MUDIGERE D, et al. Ternary neural networks with fine-grained quantization [EB/OL]. (2017-05-30)[2020-07-03]. https://arxiv.org/pdf/1705.01462.pdf.
    [44] ZHU F, GONG R, YU F, et al. Towards unified int8 training for convolutional neural network [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1969-1979.
    [45] RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: Imagenet classification using binary convolutional neural networks [C]//European Conference on Computer Vision. Cham: Springer, 2016: 525-542.
    [46] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [EB/OL]. (2015-03-09)[2020-07-04]. https://arxiv.org/pdf/1503.02531.pdf.
    [47] TIAN Y L, KRISHNAN D, ISOLA P. Contrastive representation distillation [EB/OL]. (2020-01-18)[2020-07-04]. https://arxiv.org/pdf/1910.10699.pdf.
    [48] FURLANELLO T, LIPTON Z C, TSCHANNEN M, et al. Born again neural networks [C]//Proceedings of the 35th International Conference on Machine Learning. 2020: 1602-1611.
    [49] GAO M Y, SHEN Y J, LI Q Q, et al. Residual knowledge distillation [EB/OL]. (2020-02-21)[2020-07-04]. https://arxiv.org/pdf/2002.09168.pdf.
    [50] HE T, SHEN C, TIAN Z, et al. Knowledge adaptation for efficient semantic segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 578-587.
    [51] LIN M, CHEN Q, YAN S C. Network in network [EB/OL]. (2014-03-04)[2020-07-04]. https://arxiv.org/pdf/1312.4400/.
    [52] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size [EB/OL]. (2016-11-04)[2020-07-04]. https://arxiv.org/pdf/1602.07360.pdf.
    [53] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications [EB/OL]. (2017-04-17)[2020-07-04]. https://arxiv.org/pdf/1704.04861.pdf.
    [54] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks [C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.
    [55] HOWARD A, SANDLER M, CHU G, et al. Searching for mobilenetv3 [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1314-1324.
    [56] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6848-6856.
    [57] MA N, ZHANG X, ZHENG H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 116-131.
    [58] HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
    [59] HAN K, WANG Y, TIAN Q, et al. GhostNet: More features from cheap operations [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1580-1589.
    [60] CHEN Y, FAN H, XU B, et al. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 3435-3444.
    [61] CAMPBELL F W, ROBSON J G. Application of Fourier analysis to the visibility of gratings [J]. The Journal of Physiology, 1968, 197(3): 551-566.
    [62] HUANG G, LIU S, VAN DER MAATEN L, et al. Condensenet: An efficient densenet using learned group convolutions [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 2752-2761.
    [63] JADERBERG M, VEDALDI A, ZISSERMAN A. Speeding up convolutional neural networks with low rank expansions [EB/OL]. (2014-05-15)[2020-07-04]. https://arxiv.org/pdf/1405.3866.pdf.
    [64] POLINO A, PASCANU R, ALISTARH D. Model compression via distillation and quantization [EB/OL]. (2018-02-15)[2020-07-04]. https://arxiv.org/pdf/1802.05668.pdf.
    [65] 蔡瑞初, 钟椿荣, 余洋, 等. 面向“边缘”应用的卷积神经网络量化与压缩方法 [J]. 计算机应用, 2018, 38(9): 2449-2454.
    [66] YU X Y, LIU T L, WANG X C, et al. On compressing deep models by low rank and sparse decomposition [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 7370-7379.
    [67] CHENG J, WU J X, LENG C, et al. Quantized CNN: A unified approach to accelerate and compress convolutional networks [J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(10): 4730-4743.
    [68] HU H Y, PENG R, TAI Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures [EB/OL]. (2016-07-12)[2020-7-04]. https://arxiv.org/pdf/1607.03250.pdf.
    [69] WANG R J, LI X, LING C X. Pelee: A real-time object detection system on mobile devices [C]//Advances in Neural Information Processing Systems. 2018: 1963-1972.
    [70] LI Y, LI J, LIN W, et al. Tiny-DSOD: Lightweight object detection for resource-restricted usages[EB/OL]. (2018-07-29)[2020-07-04]. https://arxiv.org/pdf/1807.11013.pdf.
    [71] TAN M, PANG R, LE Q V.Efficientdet: Scalable and efficient object detection [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10781-10790.
    [72] LI R, WANG Y, LIANG F, et al. Fully quantized network for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2810-2819.
  • 加载中
图(5) / 表(5)
计量
  • 文章访问数:  185
  • HTML全文浏览量:  731
  • PDF下载量:  104
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-08-02
  • 网络出版日期:  2020-09-24
  • 刊出日期:  2020-09-24

目录

    /

    返回文章
    返回