基于深度学习的场景文字检测研究进展

余若男; 黄定江; 董启文

doi:10.3969/j.issn.1000-5641.2018.05.001

基于深度学习的场景文字检测研究进展

doi: 10.3969/j.issn.1000-5641.2018.05.001

华东师范大学数据科学与工程学院, 上海 200062

基金项目:

国家自然科学基金 11501204

国家自然科学基金广东省联合项目 U1711262

详细信息

作者简介:
余若男, 女, 硕士研究生, 研究方向为深度学习与目标检测.E-mail:yrn130814232@163.com

通讯作者:
黄定江, 男, 教授, 研究方向为机器学习与人工智能及其在计算金融等跨领域中大数据的解析和应用.E-mail:djhuang@dase.ecnu.edu.cn

中图分类号: TP391
计量
- 文章访问数: 300
- HTML全文浏览量: 140
- PDF下载量: 422
- 被引次数: 0
出版历程
- 收稿日期: 2018-06-27
- 刊出日期: 2018-09-25

Survey on scene text detection based on deep learning

School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

摘要

摘要: 在大数据驱动应用的背景下，随着计算机硬件性能的提高，基于深度学习的目标检测和图像分割算法冲破了传统算法的瓶颈，成为当前计算机视觉领域的主流算法.而场景文字检测任务受到目标检测和图像分割算法发展的影响，近年来也有了极大的突破.这篇综述的目的主要有3个方面：介绍近5年场景文字检测工作进展；比较分析先进算法的优点及不足；总结该领域相关的基准数据集和评价方法.
- 文字检测 /
- 深度学习 /
- 自然场景 /
- 目标检测 /
- 图像分割
Abstract: With improvements in computer hardware performance, object detection, and image segmentation algorithms (based on deep learning) have broken the bottlenecks posed by traditional algorithms in big data-driven applications and become the mainstream algorithms in the field of computer vision. In this context, scene text detection algorithms have made great breakthroughs in recent years. The objectives of this survey are three-fold:introduce the progress of scene text detection over the past 5 years, compare and analyze the advantages and limitations of advanced algorithms, and summarize the relevant benchmark datasets and evaluation methods in the field.
- text detection /
- deep learning /
- natural scene /
- object detection /
- image segmentation

HTML全文

图 1 R-CNN网络结构

Fig. 1 Architecture of R-CNN

下载: 全尺寸图片幻灯片

图 2 CTPN结构

Fig. 2 Architecture of the CTPN (Connectionist Text Proposal Network)

下载: 全尺寸图片幻灯片

图 3 基于图像分割的场景文字检测算法示例

Fig. 3 Examples of scene text detection algorithms based on image segmentation

下载: 全尺寸图片幻灯片

图 4 ICDAR 2015-Incidental场景文字数据集示例

Fig. 4 Examples from the ICDAR 2015 incidental scene text dataset

下载: 全尺寸图片幻灯片

表 1 近几年场景文字检测算法的优势及局限

Tab. 1 Advantages and limitations of scene text detection algorithms in recent years

方法	算法	优势	局限
基于目标检测的方法	Tian等人(CTPN)^[29]	速度快, 性能佳	只能处理水平文字
	Zhong等人(DeepText)^[30]	速度快, 性能佳, 合成数据, 训练样本少	只能处理水平文字
	Zhang等人(FEN)^[33]	速度快, 性能佳	只能处理水平文字
	Jiang等人(R2CNN)^[31]	多方向	速度慢
	Ma等人(RRPN)^[32]	多方向	速度慢
	Shi等人(SegLink)^[36]	多方向, 长文字, 速度快	不能检测间隔大的文字行; 不能检测形变或弯曲的文字
	Tian等人(WeText)^[37]	速度快, 训练数据集扩增	只能处理水平文字
基于图像分割的方法	Zhang等人(Text-Block FCN)^[40]	多方向, 多语言, 多字体	速度慢
	He等人(CCTN)^[41]	多方向, 多尺寸, 鲁棒	速度慢
	Yao等人(HED-based)^[42]	多方向, 多语言, 弯曲文字	速度慢, 不适用于模糊严重和高亮文字
	Polzounov等人(Wordfence)^[43]	多方向, 多语言, 多尺寸	速度慢
	Deng等人(PixelLink)^[44]	多方向, 性能佳, 鲁棒	对弯曲文字检测不好
	Yang等人(IncepText)^[45]	多方向, 性能佳, 已在OCR产品中实现	速度慢
混合方法	Dai等人(FTSN)^[46]	多方向, 弯曲文字, 性能佳	速度慢
	He等人(DDRN)^[47]	多方向, 直接, 高效, 一步后处理	对间隔很大、单个字符、复杂背景的文字行检测不好
	Jiang等人(CCP)^[48]	多方向, 多语言	速度慢
	Zhou等人(EAST)^[49]	多方向, 速度快, 高效	对较长文字检测效果不好
	Qin和Manduchi(CSDN)^[38]	多方向, 不需要后处理	对弯曲文字检测不好

下载: 导出CSV

表 2 近几年端到端文字检测系统的优势与局限

Tab. 2 Advantages and limitations of end-to-end text detection systems in recent years

算法	年份	优势	局限
Wang等人^[6]	2012	第一个将深度学习用于场景文字检测系统中, 性能好, 鲁棒	只能处理水平文字
Jaderberg等人^[5]	2014	鲁棒, 适用于不同分辨率的图像	只能处理水平文字, 要求大量训练数据, 低效
Jaderberg等人^[51]	2016	合成场景文字图像, 高召回率	只适用于给定语言, 只能处理水平文字
Gupta等人(FCRN)^[52]	2016	合成场景文字图像, 速度快	只能处理水平文字
Liao等人(TextBoxes)^[53]	2017	速度快, 准确率高, 支持多尺度输入, 一步后处理	只能处理水平文字, 对间隔大的字符和垂直文字检测不佳
Li等人^[54]	2017	第一个尝试将场景文字检测和识别集成到一个	只能处理水平文字, 对小文字识别效果不佳单一网络中的模型, 可以处理多尺度文字
Busta等人(Deep TextSpotter)^[55]	2017	速度快, 性能佳, 可以处理多方向文字	对单个字符或者简短的数字和字符片段检测不佳
Liao等人(TextBoxes++)^[56]	2018	鲁棒, 速度快, 可以处理多方向文字	对间隔大的字符和垂直文字检测不佳
Bartz等人(SEE)^[57]	2018	网络架构简单, 网络可以自动学习文字检测	只能处理水平文字, 训练困难
Liu等人(FOTS)^[58]	2018	鲁棒, 速度快, 可以处理多方向文字及长文字	对文字区域内存在较大方差或者文字区域和背景有相似模式时不适用

下载: 导出CSV

表 3 场景文字检测常用数据集

Tab. 3 Common datasets for scene text detection

数据集年份	图片数量(训练, 测试)	方向(弯曲)	语言
ICDAR 2003^[63]/ ICDAR 2005^[64]	509(258, 251)	水平	英文
ICDAR 2011^[65]	484(229, 255)	水平	英文
ICDAR 2013^[66]/ICDAR 2015-Focused^[67]	462(229, 233)	水平	英文
ICDAR 2015-Incidental^[67]	1500(1000, 500)	多方向	英文
ICDAR 2017-MLT^[68]	9000(7200, 1800)	多方向	多语言
KAIST 2010^[69]	3000	水平	英韩文
SVT 2010^[3]	350(100, 250)	水平	英文
NEOCR 2011^[70]	659	多方向(弯曲)	多语言
OSTD 2011^[71]	89	多方向	英文
MSRA-TD500 2012^[9]	500(300, 200)	多方向	中英文
CUTE80 2014^[72]	80	弯曲	英文
HUST-TR400 2014^[73]	400	多方向	英文
USTB-SV1K 2015^[74]	1000(500, 500)	多方向	英文
SCUT-FORU-DB 2016^[75]	3931	水平	中英文
COCO-Text 2016^[76]	63686(43686, 20000)	多方向	英文
RCTW-17 2017^[77]	12263(8034, 4229)	多方向	中文
Total-Text 2017^[78]	1555(1255, 300)	多方向(弯曲)	英文
CTW1500 2017^[79]	1500(1000, 500)	多方向(弯曲)	中英文
CTW 2018^[80]	32285	多方向(弯曲)	中文

下载: 导出CSV

表 4 场景文字检测算法性能比较

Tab. 4 Performance comparison of scene text detection algorithms

算法	ICDAR2013			ICDAR2015Incidental			MSRA-TD500
算法	P/%	R/%	F/%	P/%	R/%	F/%	P/%	R/%	F/%
Tian等人(CTPN)^[29]	93.00	83.00	87.70	74.22	51.56	60.85	/	/	/
Zhong等人(DeepText)^[30]	87.17	82.79	84.93	/	/	/	/	/	/
Zhang等人(FEN)^[33]	89.30	94.10	91.60	/	/	/	/	/	/
Jiang等人(R2CNN)^[31]	93.55	82.59	87.73	85.62	79.63	82.54	/	/	/
Ma等人(RRPN)^[32]	90.22	71.89	80.02	73.23	82.17	77.44	82.00	68.00	74.00
Shi等人(SegLink)^[36]	87.70	83.00	85.30	73.10	76.80	75.00	86.00	70.00	77.00
Tian等人(WeText)^[37]	84.20	80.70	82.30	/	/	/	/	/	/
Zhang等人(Text-Block FCN)^[40]	88.14	74.00	80.00	70.81	43.09	53.58	83.00	67.00	74.00
He等人(CCTN)^[41]	90.00	83.00	86.00	/	/	/	79.00	65.00	71.00
Yao等人(HED-based)^[42]	89.00	80.00	84.00	72.00	59.00	65.00	63.00	62.00	60.00
Polzounov等人(Wordfence)^[43]	65.00	92.00	76.00	/	/	/	/	/	/
Deng等人(PixelLink)^[44]	87.50	88.60	88.10	85.50	82.00	83.70	83.00	73.20	77.80
Yang等人(IncepText)^[45]	/	/	/	93.80	87.30	90.50	87.50	79.00	83.00
Dai等人(FTSN)^[46]	/	/	/	88.60	80.00	84.10	87.60	77.10	82.00
He等人(DDRN)^[47]	92.00	81.00	86.00	82.00	80.00	81.00	77.00	70.00	74.00
Jiang等人(CCP)^[48]	92.20	91.50	91.90	/	/	/	/	/	/
Zhou等人(EAST)^[49]	/	/	/	83.27	78.33	80.72	87.30	67.40	76.10
Qin等人(CSDN)^[38]	90.00	83.00	86.00	79.00	65.00	71.00	/	/	/
注: "$P$"、"$R$"、"$F$"分别代表准确率、召回率和$F$度量.

下载: 导出CSV

参考文献(83)

[1]	ZHU Y, YAO C, BAI X. Scene text detection and recognition:Recent advances and future trends[J]. Front Comput Sci, 2014, 10(1):19-36. http://d.old.wanfangdata.com.cn/Periodical/zggdxxxswz-jsjkx201601003
[2]	YE Q, DOERMANN D. Text detection and recognition in imagery:A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7):1480-1500. doi: 10.1109/TPAMI.2014.2366765
[3]	WANG K, BELONGIE S. Word spotting in the wild[C]//Computer Vision-ECCV 2010. Berlin: Springer, 2010: 591-604.
[4]	NEUMANN L, MATAS J. Scene text localization and recognition with oriented stroke detection[C]//2013 IEEE International Conference on Computer Vision. IEEE, 2013: 97-104.
[5]	JADERBERG M, VEDALDI A, ZISSERMAN A. Deep features for text spotting[C]//Computer Vision-ECCV 2014. Cham: Springer, 2014: 512-528.
[6]	WANG T, WU D J, COATES A, et al. End-to-end text recognition with convolutional neural networks[C]//Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). 2012: 3304-3308.
[7]	EPSHTEIN B, OFEK E, WEXLER Y. Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010: 2963-2970.
[8]	MATAS J, CHUM O, URBAN M, et al. Robust wide baseline stereo from maximally stable extremal regions[J]. Image and Vision Computing, 2004, 22:761-767. doi: 10.1016/j.imavis.2004.02.006
[9]	YAO C, BAI X, LIU W, et al. Detecting texts of arbitrary orientations in natural images[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012: 1083-1090.
[10]	KANG L, LI Y, DOERMANN D. Orientation robust text line detection in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014: 4034-4041.
[11]	YIN X C, YIN X, HUANG K, et al. Robust text detection in natural scene images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(5):970-983. doi: 10.1109/TPAMI.2013.182
[12]	YIN X C, PEI W Y, ZHANG J, et al. Multi-orientation scene text detection with adaptive clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1930-1937. doi: 10.1109/TPAMI.2014.2388210
[13]	CHO H, SUNG M, JUN B. Canny text detector: Fast and robust scene text localization algorithm[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016: 3566-3573.
[14]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 580-587.
[15]	GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015: 1440-1448.
[16]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017(6):1137-1149. http://www.ncbi.nlm.nih.gov/pubmed/27295650
[17]	DAI J, LI Y, HE K, et al. R-FCN: Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems 29. NIPS, 2016: 379-387.
[18]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 779-788.
[19]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]//European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[20]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25. NIPS, 2012: 1097-1105.
[21]	UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171. doi: 10.1007/s11263-013-0620-5
[22]	HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//Computer Vision-ECCV 2014. Cham: Springer, 2014: 346-361.
[23]	REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 6517-6525.
[24]	REDMON J, FARHADI A. YOLOv3: An incremental improvement[J]. arXiv preprint, arXiv: 1804. 02767v1[cs.CV] 8 Apr 2018. http://cn.arxiv.org/abs/1804.02767
[25]	CIRESAN D, GIUSTI A, GAMBARDELLA L M, et al. Deep neural networks segment neuronal membranes in electron microscopy images[G]//Advances in Neural Information Processing Systems 25. Curran Associates, Inc, 2012: 2843-2851.
[26]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015: 3431-3440.
[27]	LI Y, QI H, DAI J, et al. Fully convolutional instance-aware semantic segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017: 4438-4446.
[28]	HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//2017 IEEE International Conferé nce on Computer Vision (ICCV). IEEE, 2017: 2980-2988.
[29]	TIAN Z, HUANG W, HE T, et al. Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision. Cham: Springer, 2016: 56-72.
[30]	ZHONG Z, JIN L, ZHANG S, et al. DeepText: A unified framework for text proposal generation and text detection in natural images[J]. arXiv preprint, arXiv: 1605. 07314v1[cs.CV] 24 May 2016.
[31]	JIANG Y, ZHU X, WANG X, et al. R2CNN: Rotational region CNN for orientation robust scene text detection[J]. arXiv preprint, arXiv: 1706. 09579v2[cs.CV] 30 Jun 2017. http://cn.arxiv.org/abs/1706.09579
[32]	MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. arXiv preprint, arXiv: 1703. 01086v3[cs.CV] 15 Mar 2018. http://cn.arxiv.org/abs/1703.01086
[33]	ZHANG S, LIU Y, JIN L, et al. Feature enhancement network: A refined scene text detector[J]. arXiv preprint, arXiv: 1711. 04249v1[cs.CV] 12 Nov 2017. http://cn.arxiv.org/abs/1711.04249
[34]	GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5/6):602-610. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=JJ029013030
[35]	SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[J]. arXiv preprint, arXiv: 1409. 4842v1[cs.CV] 17 Sep 2014. http://cn.arxiv.org/abs/1409.4842
[36]	SHI B, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 3482-3490.
[37]	TIAN S, LU S, LI C. WeText: Scene text detection under weak supervision[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017: 1501-1509.
[38]	QIN S, MANDUCHI R. Cascaded segmentation-detection networks for word-level text spotting[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017: 1275-1282.
[39]	HU H, ZHANG C, LUO Y, et al. WordSup: Exploiting word annotations for character based text detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017: 4950-4959.
[40]	ZHANG Z, ZHANG C, SHEN W, et al. Multi-oriented text detection with fully convolutional networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 4159-4167.
[41]	HE T, HUANG W, QIAO Y, et al. Accurate text localization in natural image with cascaded convolutional text network[J]. arXiv preprint, arXiv: 1603. 09423v1[cs.CV] 31 Mar 2016. http://cn.arxiv.org/abs/1603.09423
[42]	YAO C, BAI X, SANG N, et al. Scene text detection via holistic, multi-channel prediction[J]. arXiv preprint, arXiv: 1606. 09002v2[cs.CV] 5 Jul 2016. http://cn.arxiv.org/abs/1606.09002
[43]	POLZOUNOV A, ABLAVATSKI A, ESCALERA S, et al. Wordfence: Text detection in natural images with border awareness[C]//2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017: 1222-1226.
[44]	DENG D, LIU H, LI X, et al. PixelLink: Detecting scene text via instance segmentation[J]. arXiv preprint, arXiv: 1801. 01315v1[cs.CV] 4 Jan 2018. http://cn.arxiv.org/abs/1801.01315
[45]	YANG Q, CHENG M, ZHOU W, et al. Incep text: A new inception-text module with deformable PSROI pooling for multi-oriented scene text detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI). 2018: 1071-1077.
[46]	DAI Y, HUANG Z, GAO Y, et al. Fused text segmentation networks for multi-oriented scene text detection[J]. arXiv preprint, arXiv: 1709. 03272v4[cs.CV] 7 May 2018. http://cn.arxiv.org/abs/1709.03272
[47]	HE W, ZHANG X Y, YIN F, et al. Deep direct regression for multi-oriented scene text detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017: 745-753.
[48]	JIANG F, HAO Z, LIU X. Deep scene text detection with connected component proposals[J]. arXiv preprint, arXiv: 1708. 05133v1[cs.CV] 17 Aug 2017. http://cn.arxiv.org/abs/1708.05133
[49]	ZHOU X, YAO C, WEN H, et al. EAST: An efficient and accurate scene text detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2642-2651.
[50]	KIM K H, HONG S, ROH B, et al. PVANET: Deep but lightweight neural networks for real-time object detection[J]. arXiv preprint, arXiv: 1608. 08021v3[cs.CV] 30 Sep 2016. http://cn.arxiv.org/abs/1608.08021
[51]	JADERBERG M, SIMONYAN K, VEDALDI A, et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1):1-20. doi: 10.1007/s11263-015-0823-z
[52]	GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 2315-2324.
[53]	LIAO M, SHI B, BAI X, et al. TextBoxes: A fast text detector with a single deep neural network[C]//31st AAAI Conference on Artificial Intelligence. 2017: 4161-4167.
[54]	LI H, WANG P, SHEN C. Towards end-to-end text spotting with convolutional recurrent neural networks[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017: 5248-5256.
[55]	BUSTA M, NEUMANN L, MATAS J. Deep textspotter: An end-to-end trainable scene text localization and recognition framework[C]//Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017: 2223-2231.
[56]	LIAO M, SHI B, BAI X. TextBoxes++:A single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8):3676-3690. doi: 10.1109/TIP.2018.2825107
[57]	BARTZ C, YANG H, MEINEL C. See: Towards semi-supervised end-to-end scene text recognition[J]. arXiv preprint, arXiv: 1712. 05404v1[cs.CV] 14 Dec 2017. http://cn.arxiv.org/abs/1712.05404
[58]	LIU X, LIANG D, YAN S, et al. FOTS: Fast oriented text spotting with a unified network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 5676-5685.
[59]	JADERBERG M, SIMONYAN K, VEDALDI A, et al. Synthetic data and artificial neural networks for natural scene text recognition[J]. arXiv preprint, arXiv: 1406. 2227v4[cs.CV] 9 Dec 2014. http://cn.arxiv.org/abs/1406.2227
[60]	SHI B, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371
[61]	GRAVES A, FERNÁNDEZ S, GOMEZ F, et al. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 369-376.
[62]	JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems 27. NIPS, 2015: 2017-2025.
[63]	LUCAS S M, PANARETOS A, SOSA L, et al. ICDAR 2003 robust reading competitions:Entries, results, and future directions[J]. International Journal of Document Analysis and Recognition (IJDAR), 2005, 7(2/3):105-122. http://d.old.wanfangdata.com.cn/NSTLQK/NSTL_QKJJ021047811/
[64]	LUCAS S M. ICDAR 2005 text locating competition results[C]//8th International Conference on Document Analysis and Recognition (ICDAR'05). 2005: 80-84.
[65]	SHAHAB A, SHAFAIT F, DENGEL A. ICDAR 2011 robust reading competition challenge 2: Reading text in scene images[C]//Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011: 1491-1496.
[66]	KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition[C]//International Conference on Document Analysis and Recognition. IEEE Computer Society, 2013: 1484-1493.
[67]	KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//International Conference on Document Analysis and Recognition. IEEE 2015: 1156-1160.
[68]	NAYEF N, YIN F, BIZID I, et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017: 1454-1459.
[69]	LEE S, CHO M S, JUNG K, et al. Scene text extraction with edge constraint and text collinearity[C]//201020th International Conference on Pattern Recognition. 2010: 3983-3986.
[70]	NAGY R, DICKER A, MEYER-WEGENER K. NEOCR: A configurable dataset for natural image text recognition[C]//Camera-Based Document Analysis and Recognition. Berlin: Springer, 2011: 150-163.
[71]	YI C, TIAN Y. Text string detection from natural scenes by structure-based partition and grouping[J]. IEEE Transactions on Image Processing, 2011, 20(9):2594-2605. doi: 10.1109/TIP.2011.2126586
[72]	RISNUMAWAN A, SHIVAKUMARA P, CHAN C S, et al. A robust arbitrary text detection system for natural scene images[J]. Expert Systems with Applications, 2014, 41(18):8027-8048. doi: 10.1016/j.eswa.2014.07.008
[73]	YAO C, BAI X, LIU W. A unified framework for multioriented text detection and recognition[J]. IEEE Transactions on Image Processing, 2014, 23(11):4737-4749. doi: 10.1109/TIP.2014.2353813
[74]	YIN X C, PEI W Y, ZHANG J, et al. Multi-orientation scene text detection with adaptive clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1930-1937. doi: 10.1109/TPAMI.2014.2388210
[75]	张树业.深度模型及其在视觉文字分析中的应用[D].广州: 华南理工大学, 2016. http://cdmd.cnki.com.cn/Article/CDMD-10561-1016770438.htm
[76]	VEIT A, MATERA T, NEUMANN L, et al. COCO-Text: Dataset and benchmark for text detection and recognition in natural images[J]. arXiv preprint, arXiv: 1601. 07140v2[cs.CV] 19 Jun 2016.
[77]	SHI B, YAO C, LIAO M, et al. ICDAR2017 competition on reading chinese text in the wild (RCTW-17)[C]//Document Analysis and Recognition (ICDAR), 201714th IAPR International Conference on. IEEE, 2017: 1429-1434.
[78]	CHNG C K, CHAN C S. Total-text: A comprehensive dataset for scene text detection and recognition[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017: 935-942.
[79]	LIU Y L, JIN L W, ZHANG S T, et al. Detecting curve text in the wild: New dataset and new solution[J]. arXiv preprint, arXiv: 1712. 02170v1[cs.CV] 6 Dec 2017. http://cn.arxiv.org/abs/1712.02170
[80]	YUAN T L, ZHU Z, XU K, et al. Chinese text in the wild[J]. arXiv preprint, arXiv: 1803. 00085v1[cs.CV] 28 Feb 2018. http://cn.arxiv.org/abs/1803.00085
[81]	HUA X S, LIU W Y, ZHANG H J. An automatic performance evaluation protocol for video text detection algorithms[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14(4):498-507. doi: 10.1109/TCSVT.2004.825538
[82]	WOLF C, JOLION J M. Object count/area graphs for the evaluation of object detection and segmentation algorithms[J]. International Journal of Document Analysis and Recognition (IJDAR), 2006, 8(4):280-296. doi: 10.1007/s10032-006-0014-0
[83]	EVERINGHAM M, ESLAMI S M A, GOOL L V, et al. The pascal visual object classes challenge:A retrospective[J]. International Journal of Computer Vision, 2015, 111(1):98-136. doi: 10.1007/s11263-014-0733-5