【本周学习】光学字符识别（OCR）

光学字符识别最早是指针对印刷体字符，采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件，并通过识别软件将图像中的文字转换成文本格式，供文字处理软件进一步编辑加工的技术，现在已经拓展为通过深度学习等技术对图像中的字符内容进行检测，返回文本内容和文本所在图片中的位置信息，通常为四个边界的坐标（后一段解释为个人理解）。

原图（左）和识别结果可视化（右）

以本文所使用的是百度飞浆的PaddleOCR工具库，理由如下：

1.国内公司开发的项目，提供了大量的中文操作和学习文档，方便使用与学习，属于小白友好型项目；

2.可拓展性良好，接口均已预留可直接调用，提供了适用于各种部署场景的轻量级网络和开发模组，属于开发者友好型项目。

GitHub – PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices) – GitHub – PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices) 【本周学习】光学字符识别（OCR） https://github.com/PaddlePaddle/PaddleOCR 可通过终端使用：cd到项目根目录，输入

paddleocr.py是主模块，–image_dir 是待识别图片地址 –type、–table与–layout共同控制识别模式选择。

操作界面展示

我通过调用接口编写了一个简单地识别模块predict.py

输入

输入地址：Input/emotion/ocr13.jpg

原始输出

重点在result = table_engine(img)，输入图片地址img，返回结果result

1.result为长度为1的列表（list）变量

2.result[0]为长度为4的字典（dict）变量

3.result[0][‘res’]为长度为2（即识别到的字符块个数）的列表（list）变量

4.result[0][‘res’][0]为长度为3的字典（dict）变量，包含了识别到的第一个代码块的所有信息

4.1 result[0][‘res’][0][‘text’]：第一个字符块的文本识别结果

4.2 result[0][‘res’][0][‘confidence’]：第一个字符块的文本识别置信度

4.3 result[0][‘res’][0][‘text_region’]：第一个字符块的旋转矩形检测框四个边界点坐标

4.3.1 可通过result[0][‘res’][0][‘text_region’][0][0]和result[0][‘res’][0][‘text_region’][0][1]来调用检测框边界点坐标