Survey on scene text detection based on deep learning
-
摘要: 在大数据驱动应用的背景下,随着计算机硬件性能的提高,基于深度学习的目标检测和图像分割算法冲破了传统算法的瓶颈,成为当前计算机视觉领域的主流算法.而场景文字检测任务受到目标检测和图像分割算法发展的影响,近年来也有了极大的突破.这篇综述的目的主要有3个方面:介绍近5年场景文字检测工作进展;比较分析先进算法的优点及不足;总结该领域相关的基准数据集和评价方法.Abstract: With improvements in computer hardware performance, object detection, and image segmentation algorithms (based on deep learning) have broken the bottlenecks posed by traditional algorithms in big data-driven applications and become the mainstream algorithms in the field of computer vision. In this context, scene text detection algorithms have made great breakthroughs in recent years. The objectives of this survey are three-fold:introduce the progress of scene text detection over the past 5 years, compare and analyze the advantages and limitations of advanced algorithms, and summarize the relevant benchmark datasets and evaluation methods in the field.
-
Key words:
- text detection /
- deep learning /
- natural scene /
- object detection /
- image segmentation
-
表 1 近几年场景文字检测算法的优势及局限
Tab. 1 Advantages and limitations of scene text detection algorithms in recent years
方法 算法 优势 局限 基于目
标检测
的方法Tian等人(CTPN)[29] 速度快, 性能佳 只能处理水平文字 Zhong等人(DeepText)[30] 速度快, 性能佳, 合成数据, 训练样本少 只能处理水平文字 Zhang等人(FEN)[33] 速度快, 性能佳 只能处理水平文字 Jiang等人(R2CNN)[31] 多方向 速度慢 Ma等人(RRPN)[32] 多方向 速度慢 Shi等人(SegLink)[36] 多方向, 长文字, 速度快 不能检测间隔大的文字行;
不能检测形变或弯曲的文字Tian等人(WeText)[37] 速度快, 训练数据集扩增 只能处理水平文字 基于图
像分割
的方法Zhang等人(Text-Block FCN)[40] 多方向, 多语言, 多字体 速度慢 He等人(CCTN)[41] 多方向, 多尺寸, 鲁棒 速度慢 Yao等人(HED-based)[42] 多方向, 多语言, 弯曲文字 速度慢, 不适用于模糊
严重和高亮文字Polzounov等人(Wordfence)[43] 多方向, 多语言, 多尺寸 速度慢 Deng等人(PixelLink)[44] 多方向, 性能佳, 鲁棒 对弯曲文字检测不好 Yang等人(IncepText)[45] 多方向, 性能佳, 已在OCR产品中实现 速度慢 混合
方法Dai等人(FTSN)[46] 多方向, 弯曲文字, 性能佳 速度慢 He等人(DDRN)[47] 多方向, 直接, 高效, 一步后处理 对间隔很大、单个字符、
复杂背景的文字行检测不好Jiang等人(CCP)[48] 多方向, 多语言 速度慢 Zhou等人(EAST)[49] 多方向, 速度快, 高效 对较长文字检测效果不好 Qin和Manduchi(CSDN)[38] 多方向, 不需要后处理 对弯曲文字检测不好 表 2 近几年端到端文字检测系统的优势与局限
Tab. 2 Advantages and limitations of end-to-end text detection systems in recent years
算法 年份 优势 局限 Wang等人[6] 2012 第一个将深度学习用于场景文字检测系统中, 性能好, 鲁棒 只能处理水平文字 Jaderberg等人[5] 2014 鲁棒, 适用于不同分辨率的图像 只能处理水平文字, 要求大量训练数据, 低效 Jaderberg等人[51] 2016 合成场景文字图像, 高召回率 只适用于给定语言, 只能处理水平文字 Gupta等人(FCRN)[52] 2016 合成场景文字图像, 速度快 只能处理水平文字 Liao等人(TextBoxes)[53] 2017 速度快, 准确率高, 支持多
尺度输入, 一步后处理只能处理水平文字, 对间隔大的字符
和垂直文字检测不佳Li等人[54] 2017 第一个尝试将场景文字检测
和识别集成到一个只能处理水平文字, 对小文字识别效果不佳
单一网络中的模型, 可以处理多尺度文字Busta等人(Deep TextSpotter)[55] 2017 速度快, 性能佳, 可以
处理多方向文字对单个字符或者简短的数字和
字符片段检测不佳Liao等人(TextBoxes++)[56] 2018 鲁棒, 速度快, 可以处理
多方向文字对间隔大的字符和垂直
文字检测不佳Bartz等人(SEE)[57] 2018 网络架构简单, 网络可以
自动学习文字检测只能处理水平文字, 训练困难 Liu等人(FOTS)[58] 2018 鲁棒, 速度快, 可以处理多
方向文字及长文字对文字区域内存在较大方差或者
文字区域和背景有相似模式时不适用表 3 场景文字检测常用数据集
Tab. 3 Common datasets for scene text detection
数据集年份 图片数量(训练, 测试) 方向(弯曲) 语言 ICDAR 2003[63]/ ICDAR 2005[64] 509(258, 251) 水平 英文 ICDAR 2011[65] 484(229, 255) 水平 英文 ICDAR 2013[66]/ICDAR 2015-Focused[67] 462(229, 233) 水平 英文 ICDAR 2015-Incidental[67] 1500(1000, 500) 多方向 英文 ICDAR 2017-MLT[68] 9000(7200, 1800) 多方向 多语言 KAIST 2010[69] 3000 水平 英韩文 SVT 2010[3] 350(100, 250) 水平 英文 NEOCR 2011[70] 659 多方向(弯曲) 多语言 OSTD 2011[71] 89 多方向 英文 MSRA-TD500 2012[9] 500(300, 200) 多方向 中英文 CUTE80 2014[72] 80 弯曲 英文 HUST-TR400 2014[73] 400 多方向 英文 USTB-SV1K 2015[74] 1000(500, 500) 多方向 英文 SCUT-FORU-DB 2016[75] 3931 水平 中英文 COCO-Text 2016[76] 63686(43686, 20000) 多方向 英文 RCTW-17 2017[77] 12263(8034, 4229) 多方向 中文 Total-Text 2017[78] 1555(1255, 300) 多方向(弯曲) 英文 CTW1500 2017[79] 1500(1000, 500) 多方向(弯曲) 中英文 CTW 2018[80] 32285 多方向(弯曲) 中文 表 4 场景文字检测算法性能比较
Tab. 4 Performance comparison of scene text detection algorithms
算法 ICDAR2013 ICDAR2015Incidental MSRA-TD500 P/% R/% F/% P/% R/% F/% P/% R/% F/% Tian等人(CTPN)[29] 93.00 83.00 87.70 74.22 51.56 60.85 / / / Zhong等人(DeepText)[30] 87.17 82.79 84.93 / / / / / / Zhang等人(FEN)[33] 89.30 94.10 91.60 / / / / / / Jiang等人(R2CNN)[31] 93.55 82.59 87.73 85.62 79.63 82.54 / / / Ma等人(RRPN)[32] 90.22 71.89 80.02 73.23 82.17 77.44 82.00 68.00 74.00 Shi等人(SegLink)[36] 87.70 83.00 85.30 73.10 76.80 75.00 86.00 70.00 77.00 Tian等人(WeText)[37] 84.20 80.70 82.30 / / / / / / Zhang等人(Text-Block FCN)[40] 88.14 74.00 80.00 70.81 43.09 53.58 83.00 67.00 74.00 He等人(CCTN)[41] 90.00 83.00 86.00 / / / 79.00 65.00 71.00 Yao等人(HED-based)[42] 89.00 80.00 84.00 72.00 59.00 65.00 63.00 62.00 60.00 Polzounov等人(Wordfence)[43] 65.00 92.00 76.00 / / / / / / Deng等人(PixelLink)[44] 87.50 88.60 88.10 85.50 82.00 83.70 83.00 73.20 77.80 Yang等人(IncepText)[45] / / / 93.80 87.30 90.50 87.50 79.00 83.00 Dai等人(FTSN)[46] / / / 88.60 80.00 84.10 87.60 77.10 82.00 He等人(DDRN)[47] 92.00 81.00 86.00 82.00 80.00 81.00 77.00 70.00 74.00 Jiang等人(CCP)[48] 92.20 91.50 91.90 / / / / / / Zhou等人(EAST)[49] / / / 83.27 78.33 80.72 87.30 67.40 76.10 Qin等人(CSDN)[38] 90.00 83.00 86.00 79.00 65.00 71.00 / / / 注: "$P$"、"$R$"、"$F$"分别代表准确率、召回率和$F$度量. -
[1] ZHU Y, YAO C, BAI X. Scene text detection and recognition:Recent advances and future trends[J]. Front Comput Sci, 2014, 10(1):19-36. http://d.old.wanfangdata.com.cn/Periodical/zggdxxxswz-jsjkx201601003 [2] YE Q, DOERMANN D. Text detection and recognition in imagery:A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7):1480-1500. doi: 10.1109/TPAMI.2014.2366765 [3] WANG K, BELONGIE S. Word spotting in the wild[C]//Computer Vision-ECCV 2010. Berlin: Springer, 2010: 591-604. [4] NEUMANN L, MATAS J. Scene text localization and recognition with oriented stroke detection[C]//2013 IEEE International Conference on Computer Vision. IEEE, 2013: 97-104. [5] JADERBERG M, VEDALDI A, ZISSERMAN A. Deep features for text spotting[C]//Computer Vision-ECCV 2014. Cham: Springer, 2014: 512-528. [6] WANG T, WU D J, COATES A, et al. End-to-end text recognition with convolutional neural networks[C]//Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). 2012: 3304-3308. [7] EPSHTEIN B, OFEK E, WEXLER Y. Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010: 2963-2970. [8] MATAS J, CHUM O, URBAN M, et al. Robust wide baseline stereo from maximally stable extremal regions[J]. Image and Vision Computing, 2004, 22:761-767. doi: 10.1016/j.imavis.2004.02.006 [9] YAO C, BAI X, LIU W, et al. Detecting texts of arbitrary orientations in natural images[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012: 1083-1090. [10] KANG L, LI Y, DOERMANN D. Orientation robust text line detection in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014: 4034-4041. [11] YIN X C, YIN X, HUANG K, et al. Robust text detection in natural scene images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(5):970-983. doi: 10.1109/TPAMI.2013.182 [12] YIN X C, PEI W Y, ZHANG J, et al. Multi-orientation scene text detection with adaptive clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1930-1937. doi: 10.1109/TPAMI.2014.2388210 [13] CHO H, SUNG M, JUN B. Canny text detector: Fast and robust scene text localization algorithm[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016: 3566-3573. [14] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 580-587. [15] GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015: 1440-1448. [16] REN S, HE K, GIRSHICK R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017(6):1137-1149. http://www.ncbi.nlm.nih.gov/pubmed/27295650 [17] DAI J, LI Y, HE K, et al. R-FCN: Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems 29. NIPS, 2016: 379-387. [18] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 779-788. [19] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]//European Conference on Computer Vision. Cham: Springer, 2016: 21-37. [20] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25. NIPS, 2012: 1097-1105. [21] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171. doi: 10.1007/s11263-013-0620-5 [22] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//Computer Vision-ECCV 2014. Cham: Springer, 2014: 346-361. [23] REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 6517-6525. [24] REDMON J, FARHADI A. YOLOv3: An incremental improvement[J]. arXiv preprint, arXiv: 1804. 02767v1[cs.CV] 8 Apr 2018. http://cn.arxiv.org/abs/1804.02767 [25] CIRESAN D, GIUSTI A, GAMBARDELLA L M, et al. Deep neural networks segment neuronal membranes in electron microscopy images[G]//Advances in Neural Information Processing Systems 25. Curran Associates, Inc, 2012: 2843-2851. [26] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015: 3431-3440. [27] LI Y, QI H, DAI J, et al. Fully convolutional instance-aware semantic segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017: 4438-4446. [28] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//2017 IEEE International Conferé nce on Computer Vision (ICCV). IEEE, 2017: 2980-2988. [29] TIAN Z, HUANG W, HE T, et al. Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision. Cham: Springer, 2016: 56-72. [30] ZHONG Z, JIN L, ZHANG S, et al. DeepText: A unified framework for text proposal generation and text detection in natural images[J]. arXiv preprint, arXiv: 1605. 07314v1[cs.CV] 24 May 2016. [31] JIANG Y, ZHU X, WANG X, et al. R2CNN: Rotational region CNN for orientation robust scene text detection[J]. arXiv preprint, arXiv: 1706. 09579v2[cs.CV] 30 Jun 2017. http://cn.arxiv.org/abs/1706.09579 [32] MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. arXiv preprint, arXiv: 1703. 01086v3[cs.CV] 15 Mar 2018. http://cn.arxiv.org/abs/1703.01086 [33] ZHANG S, LIU Y, JIN L, et al. Feature enhancement network: A refined scene text detector[J]. arXiv preprint, arXiv: 1711. 04249v1[cs.CV] 12 Nov 2017. http://cn.arxiv.org/abs/1711.04249 [34] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5/6):602-610. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=JJ029013030 [35] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[J]. arXiv preprint, arXiv: 1409. 4842v1[cs.CV] 17 Sep 2014. http://cn.arxiv.org/abs/1409.4842 [36] SHI B, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 3482-3490. [37] TIAN S, LU S, LI C. WeText: Scene text detection under weak supervision[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017: 1501-1509. [38] QIN S, MANDUCHI R. Cascaded segmentation-detection networks for word-level text spotting[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017: 1275-1282. [39] HU H, ZHANG C, LUO Y, et al. WordSup: Exploiting word annotations for character based text detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017: 4950-4959. [40] ZHANG Z, ZHANG C, SHEN W, et al. Multi-oriented text detection with fully convolutional networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 4159-4167. [41] HE T, HUANG W, QIAO Y, et al. Accurate text localization in natural image with cascaded convolutional text network[J]. arXiv preprint, arXiv: 1603. 09423v1[cs.CV] 31 Mar 2016. http://cn.arxiv.org/abs/1603.09423 [42] YAO C, BAI X, SANG N, et al. Scene text detection via holistic, multi-channel prediction[J]. arXiv preprint, arXiv: 1606. 09002v2[cs.CV] 5 Jul 2016. http://cn.arxiv.org/abs/1606.09002 [43] POLZOUNOV A, ABLAVATSKI A, ESCALERA S, et al. Wordfence: Text detection in natural images with border awareness[C]//2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017: 1222-1226. [44] DENG D, LIU H, LI X, et al. PixelLink: Detecting scene text via instance segmentation[J]. arXiv preprint, arXiv: 1801. 01315v1[cs.CV] 4 Jan 2018. http://cn.arxiv.org/abs/1801.01315 [45] YANG Q, CHENG M, ZHOU W, et al. Incep text: A new inception-text module with deformable PSROI pooling for multi-oriented scene text detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI). 2018: 1071-1077. [46] DAI Y, HUANG Z, GAO Y, et al. Fused text segmentation networks for multi-oriented scene text detection[J]. arXiv preprint, arXiv: 1709. 03272v4[cs.CV] 7 May 2018. http://cn.arxiv.org/abs/1709.03272 [47] HE W, ZHANG X Y, YIN F, et al. Deep direct regression for multi-oriented scene text detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017: 745-753. [48] JIANG F, HAO Z, LIU X. Deep scene text detection with connected component proposals[J]. arXiv preprint, arXiv: 1708. 05133v1[cs.CV] 17 Aug 2017. http://cn.arxiv.org/abs/1708.05133 [49] ZHOU X, YAO C, WEN H, et al. EAST: An efficient and accurate scene text detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2642-2651. [50] KIM K H, HONG S, ROH B, et al. PVANET: Deep but lightweight neural networks for real-time object detection[J]. arXiv preprint, arXiv: 1608. 08021v3[cs.CV] 30 Sep 2016. http://cn.arxiv.org/abs/1608.08021 [51] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1):1-20. doi: 10.1007/s11263-015-0823-z [52] GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 2315-2324. [53] LIAO M, SHI B, BAI X, et al. TextBoxes: A fast text detector with a single deep neural network[C]//31st AAAI Conference on Artificial Intelligence. 2017: 4161-4167. [54] LI H, WANG P, SHEN C. Towards end-to-end text spotting with convolutional recurrent neural networks[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017: 5248-5256. [55] BUSTA M, NEUMANN L, MATAS J. Deep textspotter: An end-to-end trainable scene text localization and recognition framework[C]//Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017: 2223-2231. [56] LIAO M, SHI B, BAI X. TextBoxes++:A single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8):3676-3690. doi: 10.1109/TIP.2018.2825107 [57] BARTZ C, YANG H, MEINEL C. See: Towards semi-supervised end-to-end scene text recognition[J]. arXiv preprint, arXiv: 1712. 05404v1[cs.CV] 14 Dec 2017. http://cn.arxiv.org/abs/1712.05404 [58] LIU X, LIANG D, YAN S, et al. FOTS: Fast oriented text spotting with a unified network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 5676-5685. [59] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Synthetic data and artificial neural networks for natural scene text recognition[J]. arXiv preprint, arXiv: 1406. 2227v4[cs.CV] 9 Dec 2014. http://cn.arxiv.org/abs/1406.2227 [60] SHI B, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371 [61] GRAVES A, FERNÁNDEZ S, GOMEZ F, et al. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 369-376. [62] JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems 27. NIPS, 2015: 2017-2025. [63] LUCAS S M, PANARETOS A, SOSA L, et al. ICDAR 2003 robust reading competitions:Entries, results, and future directions[J]. International Journal of Document Analysis and Recognition (IJDAR), 2005, 7(2/3):105-122. http://d.old.wanfangdata.com.cn/NSTLQK/NSTL_QKJJ021047811/ [64] LUCAS S M. ICDAR 2005 text locating competition results[C]//8th International Conference on Document Analysis and Recognition (ICDAR'05). 2005: 80-84. [65] SHAHAB A, SHAFAIT F, DENGEL A. ICDAR 2011 robust reading competition challenge 2: Reading text in scene images[C]//Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011: 1491-1496. [66] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition[C]//International Conference on Document Analysis and Recognition. IEEE Computer Society, 2013: 1484-1493. [67] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//International Conference on Document Analysis and Recognition. IEEE 2015: 1156-1160. [68] NAYEF N, YIN F, BIZID I, et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017: 1454-1459. [69] LEE S, CHO M S, JUNG K, et al. Scene text extraction with edge constraint and text collinearity[C]//201020th International Conference on Pattern Recognition. 2010: 3983-3986. [70] NAGY R, DICKER A, MEYER-WEGENER K. NEOCR: A configurable dataset for natural image text recognition[C]//Camera-Based Document Analysis and Recognition. Berlin: Springer, 2011: 150-163. [71] YI C, TIAN Y. Text string detection from natural scenes by structure-based partition and grouping[J]. IEEE Transactions on Image Processing, 2011, 20(9):2594-2605. doi: 10.1109/TIP.2011.2126586 [72] RISNUMAWAN A, SHIVAKUMARA P, CHAN C S, et al. A robust arbitrary text detection system for natural scene images[J]. Expert Systems with Applications, 2014, 41(18):8027-8048. doi: 10.1016/j.eswa.2014.07.008 [73] YAO C, BAI X, LIU W. A unified framework for multioriented text detection and recognition[J]. IEEE Transactions on Image Processing, 2014, 23(11):4737-4749. doi: 10.1109/TIP.2014.2353813 [74] YIN X C, PEI W Y, ZHANG J, et al. Multi-orientation scene text detection with adaptive clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1930-1937. doi: 10.1109/TPAMI.2014.2388210 [75] 张树业.深度模型及其在视觉文字分析中的应用[D].广州: 华南理工大学, 2016. http://cdmd.cnki.com.cn/Article/CDMD-10561-1016770438.htm [76] VEIT A, MATERA T, NEUMANN L, et al. COCO-Text: Dataset and benchmark for text detection and recognition in natural images[J]. arXiv preprint, arXiv: 1601. 07140v2[cs.CV] 19 Jun 2016. [77] SHI B, YAO C, LIAO M, et al. ICDAR2017 competition on reading chinese text in the wild (RCTW-17)[C]//Document Analysis and Recognition (ICDAR), 201714th IAPR International Conference on. IEEE, 2017: 1429-1434. [78] CHNG C K, CHAN C S. Total-text: A comprehensive dataset for scene text detection and recognition[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017: 935-942. [79] LIU Y L, JIN L W, ZHANG S T, et al. Detecting curve text in the wild: New dataset and new solution[J]. arXiv preprint, arXiv: 1712. 02170v1[cs.CV] 6 Dec 2017. http://cn.arxiv.org/abs/1712.02170 [80] YUAN T L, ZHU Z, XU K, et al. Chinese text in the wild[J]. arXiv preprint, arXiv: 1803. 00085v1[cs.CV] 28 Feb 2018. http://cn.arxiv.org/abs/1803.00085 [81] HUA X S, LIU W Y, ZHANG H J. An automatic performance evaluation protocol for video text detection algorithms[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14(4):498-507. doi: 10.1109/TCSVT.2004.825538 [82] WOLF C, JOLION J M. Object count/area graphs for the evaluation of object detection and segmentation algorithms[J]. International Journal of Document Analysis and Recognition (IJDAR), 2006, 8(4):280-296. doi: 10.1007/s10032-006-0014-0 [83] EVERINGHAM M, ESLAMI S M A, GOOL L V, et al. The pascal visual object classes challenge:A retrospective[J]. International Journal of Computer Vision, 2015, 111(1):98-136. doi: 10.1007/s11263-014-0733-5