中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

Message Board

Respected readers, authors and reviewers, you can add comments to this page on any questions about the contribution, review, editing and publication of this journal. We will give you an answer as soon as possible. Thank you for your support!

Name
E-mail
Phone
Title
Content
Verification Code
Issue 5
Sep.  2020
Turn off MathJax
Article Contents
LAI Yejing, HAO Shanfeng, HUANG Dingjiang. Methods and progress in deep neural network model compression[J]. Journal of East China Normal University (Natural Sciences), 2020, (5): 68-82. doi: 10.3969/j.issn.1000-5641.202091001
Citation: LAI Yejing, HAO Shanfeng, HUANG Dingjiang. Methods and progress in deep neural network model compression[J]. Journal of East China Normal University (Natural Sciences), 2020, (5): 68-82. doi: 10.3969/j.issn.1000-5641.202091001

Methods and progress in deep neural network model compression

doi: 10.3969/j.issn.1000-5641.202091001
  • Received Date: 2020-08-02
    Available Online: 2020-09-24
  • Publish Date: 2020-09-24
  • The deep neural network (DNN) model achieves strong performance using substantial memory consumption and high computational power, which can be difficult to deploy on hardware platforms with limited resources. To meet these challenges, researchers have made great strides in this field and have formed a wealth of relevant literature and methods. This paper introduces four representative compression methods for deep neural networks used in recent years: network pruning, quantization, knowledge distillation, and compact network design; in particular, the article focuses on the characteristics of these representative models. Finally, evaluation criteria and research prospects of model compression are summarized.
  • loading
  • [1]
    RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge [J]. International journal of computer vision, 2015, 115(3): 211-252.
    [2]
    HE Y, SAINATH T N, PRABHAVALKAR R, et al. Streaming end-to-end speech recognition for mobile devices [C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019: 6381-6385.
    [3]
    DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2019-05-24)[2020-07-02]. https://arxiv.org/pdf/1810.04805.pdf.
    [4]
    SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015-04-10)[2020-07-02]. https://arxiv.org/pdf/1409.1556.pdf.
    [5]
    HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
    [6]
    HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2261-2269. DOI: 10.1109/CVPR.2017.243.
    [7]
    CHENG Y, WANG D, ZHOU P, et al. A survey of model compression and acceleration for deep neural networks [EB/OL]. (2020-06-14)[2020-07-02]. https://arxiv.org/pdf/1710.09282.pdf.
    [8]
    雷杰, 高鑫, 宋杰, 等. 深度网络模型压缩综述 [J]. 软件学报, 2018, 29(2): 251-266.
    [9]
    CHOUDHARY T, MISHRA V, GOSWAMI A, et al. A comprehensive survey on model compression and acceleration [J/OL]. Artificial Intelligence Review, 2020. (2020-02-08)[2020-07-02]. https://doi.org/10.1007/s10462-020-09816-7.
    [10]
    李江昀, 赵义凯, 薛卓尔, 等. 深度神经网络模型压缩综述 [J]. 工程科学学报, 2019, 41(10): 1229-1239.
    [11]
    WANG R J, LI X, LING C X. Pelee: A real-time object detection system on mobile devices [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2018: 1967-1976.
    [12]
    CHEN X L, GIRSHICK R, HE K M, et al. TensorMask: A foundation for dense object segmentation [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 2061-2069.
    [13]
    SANH V, DEBUT L, CHAUMOND J, et al. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter [EB/OL]. (2020-01-24)[2020-07-01]. https://arxiv.org/pdf/1910.01108v3.pdf.
    [14]
    QIN Z, LI Z, ZHANG Z, et al. ThunderNet: Towards real-time generic object detection on mobile devices [C]//Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2019: 6718-6727.
    [15]
    ANWAR S, SUNG W. Coarse pruning of convolutional neural networks with random masks[EB/OL]. [2020-07-02]. https://openreview.net/pdf?id=HkvS3Mqxe.
    [16]
    LECUN Y, DENKER J S, SOLLA S A. Optimal brain damage [C]//Advances in Neural Information Processing Systems. 1989: 598-605.
    [17]
    HASSIBI B, STORK D G. Second order derivatives for network pruning: Optimal brain surgeon [C]//Advances in Neural Information Processing Systems. 1993: 164-171.
    [18]
    ZHANG T, YE S, ZHANG K, et al. A systematic dnn weight pruning framework using alternating direction method of multipliers [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 184-199.
    [19]
    MA X L, GUO F M, NIU W, et al. PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices [C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20). 2020: 5117-5124.
    [20]
    HE Y, ZHANG X, SUN J. Channel pruning for accelerating very deep neural networks [C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 1389-1397.
    [21]
    CHIN T W, DING R, ZHANG C, et al. Towards efficient model compression via learned global ranking [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1518-1528.
    [22]
    MOLCHANOV P, MALLYA A, TYREE S, et al. Importance estimation for neural network pruning [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 11264-11272.
    [23]
    LUO J H, WU J, LIN W. Thinet: A filter level pruning method for deep neural network compression [C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 5058-5066.
    [24]
    ZHUANG Z W, TAN M K, ZHUANG B, et al. Discrimination-aware channel pruning for deep neural networks [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems(NIPS’18). New York: Curran Associates Inc., 2018: 883–894.
    [25]
    HE Y, LIU P, WANG Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4340-4349.
    [26]
    LIN M, JI R, WANG Y, et al. HRank: Filter pruning using high-rank feature map [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1529-1538.
    [27]
    LIN X, ZHAO C, PAN W. Towards accurate binary convolutional neural network [C]//Advances in Neural Information Processing Systems. 2017: 345-353.
    [28]
    LIU Z, WU B, LUO W, et al. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 722-737.
    [29]
    HUBARA I, COURBARIAUX M, SOUDRY D, et al. Binarized neural networks [C]//Advances in Neural Information Processing Systems. 2016: 4107-4115.
    [30]
    LI F F, ZHANG B, LIU B. Ternary weight networks [EB/OL]. (2016-11-19)[2020-07-03]. https://arxiv.org/pdf/1605.04711.pdf.
    [31]
    WANG P, CHENG J. Fixed-point factorized networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4012-4020.
    [32]
    BOROUMAND A, GHOSE S, KIM Y, et al. Google workloads for consumer devices: Mitigating data movement bottlenecks [C]//Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 2018: 316-331.
    [33]
    HAN S, MAO H Z, DALLY W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding [EB/OL]. (2015-11-20)[2020-07-03]. https://arxiv.org/pdf/1510.00149v3.pdf.
    [34]
    CHEN W, WILSON J, TYREE S, et al. Compressing neural networks with the hashing trick [C]// International Conference on Machine Learning. 2015: 2285-2294.
    [35]
    STOCK P, JOULIN A, GRIBONVAL R, et al. And the bit goes down: Revisiting the quantizetion of neural networks [EB/OL]. (2019-12-20)[2020-07-02]. https://arxiv.org/pdf/1907.05686.pdf.
    [36]
    CARREIRA-PERPINÁN M A, IDELBAYEV Y. Model compression as constrained optimization, with application to neural nets. Part Ⅱ: Quantization [EB/OL]. (2017-07-13)[2020-07-03]. https://arxiv.org/pdf/1707.04319.pdf.
    [37]
    ZHU S, DONG X, SU H. Binary ensemble neural network: More bits per network or more networks per bit? [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4923-4932.
    [38]
    WANG Z, LU J, TAO C, et al. Learning channel-wise interactions for binary convolutional neural networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 568-577.
    [39]
    LIU C, DING W, XIA X, et al. Circulant binary convolutional networks: Enhancing the performance of 1-bit dcnns with circulant back propagation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2691-2699.
    [40]
    COURBARIAUX M, BENGIO Y, DAVID J P. BinaryConnect: Training deep neural networks with binary weights during propagations [C]//Advances in Neural Information Processing Systems. 2015: 3123-3131.
    [41]
    QIN H T, GONG R H, LIU X L, et al. Forward and backward information retention for accurate binary neural networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 2247-2256.
    [42]
    WANG P, HU Q, ZHANG Y, et al. Two-step quantization for low-bit neural networks [C]// Proceedings of the IEEE Conference on computer vision and pattern recognition. 2018: 4376-4384.
    [43]
    MELLEMPUDI N, KUNDU A, MUDIGERE D, et al. Ternary neural networks with fine-grained quantization [EB/OL]. (2017-05-30)[2020-07-03]. https://arxiv.org/pdf/1705.01462.pdf.
    [44]
    ZHU F, GONG R, YU F, et al. Towards unified int8 training for convolutional neural network [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1969-1979.
    [45]
    RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: Imagenet classification using binary convolutional neural networks [C]//European Conference on Computer Vision. Cham: Springer, 2016: 525-542.
    [46]
    HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [EB/OL]. (2015-03-09)[2020-07-04]. https://arxiv.org/pdf/1503.02531.pdf.
    [47]
    TIAN Y L, KRISHNAN D, ISOLA P. Contrastive representation distillation [EB/OL]. (2020-01-18)[2020-07-04]. https://arxiv.org/pdf/1910.10699.pdf.
    [48]
    FURLANELLO T, LIPTON Z C, TSCHANNEN M, et al. Born again neural networks [C]//Proceedings of the 35th International Conference on Machine Learning. 2020: 1602-1611.
    [49]
    GAO M Y, SHEN Y J, LI Q Q, et al. Residual knowledge distillation [EB/OL]. (2020-02-21)[2020-07-04]. https://arxiv.org/pdf/2002.09168.pdf.
    [50]
    HE T, SHEN C, TIAN Z, et al. Knowledge adaptation for efficient semantic segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 578-587.
    [51]
    LIN M, CHEN Q, YAN S C. Network in network [EB/OL]. (2014-03-04)[2020-07-04]. https://arxiv.org/pdf/1312.4400/.
    [52]
    IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size [EB/OL]. (2016-11-04)[2020-07-04]. https://arxiv.org/pdf/1602.07360.pdf.
    [53]
    HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications [EB/OL]. (2017-04-17)[2020-07-04]. https://arxiv.org/pdf/1704.04861.pdf.
    [54]
    SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks [C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.
    [55]
    HOWARD A, SANDLER M, CHU G, et al. Searching for mobilenetv3 [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1314-1324.
    [56]
    ZHANG X, ZHOU X, LIN M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6848-6856.
    [57]
    MA N, ZHANG X, ZHENG H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 116-131.
    [58]
    HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
    [59]
    HAN K, WANG Y, TIAN Q, et al. GhostNet: More features from cheap operations [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1580-1589.
    [60]
    CHEN Y, FAN H, XU B, et al. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 3435-3444.
    [61]
    CAMPBELL F W, ROBSON J G. Application of Fourier analysis to the visibility of gratings [J]. The Journal of Physiology, 1968, 197(3): 551-566.
    [62]
    HUANG G, LIU S, VAN DER MAATEN L, et al. Condensenet: An efficient densenet using learned group convolutions [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 2752-2761.
    [63]
    JADERBERG M, VEDALDI A, ZISSERMAN A. Speeding up convolutional neural networks with low rank expansions [EB/OL]. (2014-05-15)[2020-07-04]. https://arxiv.org/pdf/1405.3866.pdf.
    [64]
    POLINO A, PASCANU R, ALISTARH D. Model compression via distillation and quantization [EB/OL]. (2018-02-15)[2020-07-04]. https://arxiv.org/pdf/1802.05668.pdf.
    [65]
    蔡瑞初, 钟椿荣, 余洋, 等. 面向“边缘”应用的卷积神经网络量化与压缩方法 [J]. 计算机应用, 2018, 38(9): 2449-2454.
    [66]
    YU X Y, LIU T L, WANG X C, et al. On compressing deep models by low rank and sparse decomposition [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 7370-7379.
    [67]
    CHENG J, WU J X, LENG C, et al. Quantized CNN: A unified approach to accelerate and compress convolutional networks [J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(10): 4730-4743.
    [68]
    HU H Y, PENG R, TAI Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures [EB/OL]. (2016-07-12)[2020-7-04]. https://arxiv.org/pdf/1607.03250.pdf.
    [69]
    WANG R J, LI X, LING C X. Pelee: A real-time object detection system on mobile devices [C]//Advances in Neural Information Processing Systems. 2018: 1963-1972.
    [70]
    LI Y, LI J, LIN W, et al. Tiny-DSOD: Lightweight object detection for resource-restricted usages[EB/OL]. (2018-07-29)[2020-07-04]. https://arxiv.org/pdf/1807.11013.pdf.
    [71]
    TAN M, PANG R, LE Q V.Efficientdet: Scalable and efficient object detection [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10781-10790.
    [72]
    LI R, WANG Y, LIANG F, et al. Fully quantized network for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2810-2819.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(5)

    Article views (185) PDF downloads(104) Cited by()
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return