# PaddleOCR **Repository Path**: xu_pengjun/PaddleOCR ## Basic Information - **Project Name**: PaddleOCR - **Description**: 基于飞桨的OCR和文档解析工具库，包含文字识别PP-OCR系列模型、文档解析PP-Structure系列方案和关键信息抽取PP-ChatOCR系列方案 - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: https://paddlepaddle.github.io/PaddleOCR/latest/ - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1056 - **Created**: 2025-07-03 - **Last Updated**: 2025-07-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

[中文](./readme_c.md)| English [![stars](https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf)](https://github.com/PaddlePaddle/PaddleOCR) [![Downloads](https://img.shields.io/pypi/dm/paddleocr)](https://pypi.org/project/PaddleOCR/) ![python](https://img.shields.io/badge/python-3.8～3.12-aff.svg) ![os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg) ![hardware](https://img.shields.io/badge/hardware-cpu%2C%20gpu%2C%20xpu%2C%20npu-yellow.svg) [![Website](https://img.shields.io/badge/Website-PaddleOCR-blue?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAABmmRkdj0AAAAASUVORK5CYII=)](https://www.paddleocr.ai/) [![AI Studio](https://img.shields.io/badge/PP_OCRv5-AI_Studio-green)](https://aistudio.baidu.com/community/app/91660/webUI) [![AI Studio](https://img.shields.io/badge/PP_StructureV3-AI_Studio-green)](https://aistudio.baidu.com/community/app/518494/webUI) [![AI Studio](https://img.shields.io/badge/PP_ChatOCRv4-AI_Studio-green)](https://aistudio.baidu.com/community/app/518493/webUI)

## 🚀 Introduction Since its initial release, PaddleOCR has gained widespread acclaim across academia, industry, and research communities, thanks to its cutting-edge algorithms and proven performance in real-world applications. It’s already powering popular open-source projects like Umi-OCR, OmniParser, MinerU, and RAGFlow, making it the go-to OCR toolkit for developers worldwide. On May 20, 2025, the PaddlePaddle team unveiled PaddleOCR 3.0, fully compatible with the official release of the **PaddlePaddle 3.0** framework. This update further **boosts text-recognition accuracy**, adds support for **multiple text-type recognition** and **handwriting recognition**, and meets the growing demand from large-model applications for **high-precision parsing of complex documents**. When combined with the **ERNIE 4.5T**, it significantly enhances key-information extraction accuracy. PaddleOCR 3.0 also introduces support for domestic hardware platforms such as **KUNLUNXIN** and **Ascend**. For the complete usage documentation, please refer to the [PaddleOCR 3.0 Documentation](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html). Three Major New Features in PaddleOCR 3.0: - Universal-Scene Text Recognition Model [PP-OCRv5](./docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5.en.md): A single model that handles five different text types plus complex handwriting. Overall recognition accuracy has increased by 13 percentage points over the previous generation. [Online Demo](https://aistudio.baidu.com/community/app/91660/webUI) - General Document-Parsing Solution [PP-StructureV3](./docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.en.md): Delivers high-precision parsing of multi-layout, multi-scene PDFs, outperforming many open- and closed-source solutions on public benchmarks. [Online Demo](https://aistudio.baidu.com/community/app/518494/webUI) - Intelligent Document-Understanding Solution [PP-ChatOCRv4](./docs/version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.en.md): Natively powered by the WenXin large model 4.5T, achieving 15 percentage points higher accuracy than its predecessor. [Online Demo](https://aistudio.baidu.com/community/app/518493/webUI) In addition to providing an outstanding model library, PaddleOCR 3.0 also offers user-friendly tools covering model training, inference, and service deployment, so developers can rapidly bring AI applications to production.

PaddleOCR Architecture

## 📣 Recent updates #### 🔥🔥**2025.06.19: Release of PaddleOCR 3.0.2, includes:** - **New Features:** - The default download source has been changed from `BOS` to `HuggingFace`. Users can also change the environment variable `PADDLE_PDX_MODEL_SOURCE` to `BOS` to set the model download source back to Baidu Object Storage (BOS). - Added service invocation examples for six languages—C++, Java, Go, C#, Node.js, and PHP—for pipelines like PP-OCRv5, PP-StructureV3, and PP-ChatOCRv4. - Improved the layout partition sorting algorithm in the PP-StructureV3 pipeline, enhancing the sorting logic for complex vertical layouts to deliver better results. - Enhanced model selection logic: when a language is specified but a model version is not, the system will automatically select the latest model version supporting that language. - Set a default upper limit for MKL-DNN cache size to prevent unlimited growth, while also allowing users to configure cache capacity. - Updated default configurations for high-performance inference to support Paddle MKL-DNN acceleration and optimized the logic for automatic configuration selection for smarter choices. - Adjusted the logic for obtaining the default device to consider the actual support for computing devices by the installed Paddle framework, making program behavior more intuitive. - Added Android example for PP-OCRv5. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/deployment/on_device_deployment.html). - **Bug Fixes:** - Fixed an issue with some CLI parameters in PP-StructureV3 not taking effect. - Resolved an issue where `export_paddlex_config_to_yaml` would not function correctly in certain cases. - Corrected the discrepancy between the actual behavior of `save_path` and its documentation description. - Fixed potential multithreading errors when using MKL-DNN in basic service deployment. - Corrected channel order errors in image preprocessing for the Latex-OCR model. - Fixed channel order errors in saving visualized images within the text recognition module. - Resolved channel order errors in visualized table results within PP-StructureV3 pipeline. - Fixed an overflow issue in the calculation of `overlap_ratio` under extremely special circumstances in the PP-StructureV3 pipeline. - **Documentation Improvements:** - Updated the description of the `enable_mkldnn` parameter in the documentation to accurately reflect the program's actual behavior. - Fixed errors in the documentation regarding the `lang` and `ocr_version` parameters. - Added instructions for exporting production line configuration files via CLI. - Fixed missing columns in the performance data table for PP-OCRv5. - Refined benchmark metrics for PP-StructureV3 across different configurations. - **Others:** - Relaxed version restrictions on dependencies like numpy and pandas, restoring support for Python 3.12.

History Log

#### **2025.06.05: Release of PaddleOCR 3.0.1, includes:** - **Optimisation of certain models and model configurations:** - Updated the default model configuration for PP-OCRv5, changing both detection and recognition from mobile to server models. To improve default performance in most scenarios, the parameter `limit_side_len` in the configuration has been changed from 736 to 64. - Added a new text line orientation classification model `PP-LCNet_x1_0_textline_ori` with an accuracy of 99.42%. The default text line orientation classifier for OCR, PP-StructureV3, and PP-ChatOCRv4 pipelines has been updated to this model. - Optimised the text line orientation classification model `PP-LCNet_x0_25_textline_ori`, improving accuracy by 3.3 percentage points to a current accuracy of 98.85%. - **Optimizations and fixes for some issues in version 3.0.0, [details](https://paddlepaddle.github.io/PaddleOCR/latest/en/update/update.html)** 🔥🔥2025.05.20: Official Release of **PaddleOCR v3.0**, including: - **PP-OCRv5**: High-Accuracy Text Recognition Model for All Scenarios - Instant Text from Images/PDFs. 1. 🌐 Single-model support for **five** text types - Seamlessly process **Simplified Chinese, Traditional Chinese, Simplified Chinese Pinyin, English** and **Japanese** within a single model. 2. ✍️ Improved **handwriting recognition**: Significantly better at complex cursive scripts and non-standard handwriting. 3. 🎯 **13-point accuracy gain** over PP-OCRv4, achieving state-of-the-art performance across a variety of real-world scenarios. - **PP-StructureV3**: General-Purpose Document Parsing – Unleash SOTA Images/PDFs Parsing for Real-World Scenarios! 1. 🧮 **High-Accuracy multi-scene PDF parsing**, leading both open- and closed-source solutions on the OmniDocBench benchmark. 2. 🧠 Specialized capabilities include **seal recognition**, **chart-to-table conversion**, **table recognition with nested formulas/images**, **vertical text document parsing**, and **complex table structure analysis**. - **PP-ChatOCRv4**: Intelligent Document Understanding – Extract Key Information, not just text from Images/PDFs. 1. 🔥 **15-point accuracy gain** in key-information extraction on PDF/PNG/JPG files over the previous generation. 2. 💻 Native support for **ERINE4.5 Turbo**, with compatibility for large-model deployments via PaddleNLP, Ollama, vLLM, and more. 3. 🤝 Integrated [PP-DocBee2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee2), enabling extraction and understanding of printed text, handwriting, seals, tables, charts, and other common elements in complex documents. [History Log](https://paddlepaddle.github.io/PaddleOCR/latest/en/update/update.html)

## ⚡ Quick Start ### 1. Run online demo [![AI Studio](https://img.shields.io/badge/PP_OCRv5-AI_Studio-green)](https://aistudio.baidu.com/community/app/91660/webUI) [![AI Studio](https://img.shields.io/badge/PP_StructureV3-AI_Studio-green)](https://aistudio.baidu.com/community/app/518494/webUI) [![AI Studio](https://img.shields.io/badge/PP_ChatOCRv4-AI_Studio-green)](https://aistudio.baidu.com/community/app/518493/webUI) ### 2. Installation Install PaddlePaddle refer to [Installation Guide](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html), after then, install the PaddleOCR toolkit. ```bash # Install paddleocr pip install paddleocr ``` ### 3. Run inference by CLI ```bash # Run PP-OCRv5 inference paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation False # Run PP-StructureV3 inference paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png --use_doc_orientation_classify False --use_doc_unwarping False # Get the Qianfan API Key at first, and then run PP-ChatOCRv4 inference paddleocr pp_chatocrv4_doc -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key --use_doc_orientation_classify False --use_doc_unwarping False # Get more information about "paddleocr ocr" paddleocr ocr --help ``` ### 4. Run inference by API **4.1 PP-OCRv5 Example** ```python # Initialize PaddleOCR instance ocr = PaddleOCR( use_doc_orientation_classify=False, use_doc_unwarping=False, use_textline_orientation=False) # Run OCR inference on a sample image result = ocr.predict( input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png") # Visualize the results and save the JSON results for res in result: res.print() res.save_to_img("output") res.save_to_json("output") ```

4.2 PP-StructureV3 Example

```python from pathlib import Path from paddleocr import PPStructureV3 pipeline = PPStructureV3() # For Image output = pipeline.predict( input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png", use_doc_orientation_classify=False, use_doc_unwarping=False ) # Visualize the results and save the JSON results for res in output: res.print() res.save_to_json(save_path="output") res.save_to_markdown(save_path="output") ```

4.3 PP-ChatOCRv4 Example

```python from paddleocr import PPChatOCRv4Doc chat_bot_config = { "module_name": "chat_bot", "model_name": "ernie-3.5-8k", "base_url": "https://qianfan.baidubce.com/v2", "api_type": "openai", "api_key": "api_key", # your api_key } retriever_config = { "module_name": "retriever", "model_name": "embedding-v1", "base_url": "https://qianfan.baidubce.com/v2", "api_type": "qianfan", "api_key": "api_key", # your api_key } pipeline = PPChatOCRv4Doc( use_doc_orientation_classify=False, use_doc_unwarping=False ) visual_predict_res = pipeline.visual_predict( input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png", use_common_ocr=True, use_seal_recognition=True, use_table_recognition=True, ) mllm_predict_info = None use_mllm = False # If a multimodal large model is used, the local mllm service needs to be started. You can refer to the documentation: https://github.com/PaddlePaddle/PaddleX/blob/release/3.0/docs/pipeline_usage/tutorials/vlm_pipelines/doc_understanding.m d performs deployment and updates the mllm_chat_bot_config configuration. if use_mllm: mllm_chat_bot_config = { "module_name": "chat_bot", "model_name": "PP-DocBee", "base_url": "http://127.0.0.1:8080/", # your local mllm service url "api_type": "openai", "api_key": "api_key", # your api_key } mllm_predict_res = pipeline.mllm_pred( input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png", key_list=["驾驶室准乘人数"], mllm_chat_bot_config=mllm_chat_bot_config, ) mllm_predict_info = mllm_predict_res["mllm_res"] visual_info_list = [] for res in visual_predict_res: visual_info_list.append(res["visual_info"]) layout_parsing_result = res["layout_parsing_result"] vector_info = pipeline.build_vector( visual_info_list, flag_save_bytes_vector=True, retriever_config=retriever_config ) chat_result = pipeline.chat( key_list=["驾驶室准乘人数"], visual_info=visual_info_list, vector_info=vector_info, mllm_predict_info=mllm_predict_info, chat_bot_config=chat_bot_config, retriever_config=retriever_config, ) print(chat_result) ```

### 5. Domestic AI Accelerators - [Huawei Ascend](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/other_devices_support/paddlepaddle_install_NPU.html) - [KUNLUNXIN](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/other_devices_support/paddlepaddle_install_XPU.html) ## ⛰️ Advanced Tutorials - [PP-OCRv5 Tutorial](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/pipeline_usage/OCR.html) - [PP-StructureV3 Tutorial](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/pipeline_usage/PP-StructureV3.html) - [PP-ChatOCRv4 Tutorial](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/pipeline_usage/PP-ChatOCRv4.html) ## 🔄 Quick Overview of Execution Results

PP-OCRv5 Demo

PP-StructureV3 Demo

## 👩‍👩‍👧‍👦 Community | PaddlePaddle WeChat official account | Join the tech discussion group | | :---: | :---: | |

|

| ## 😃 Awesome Projects Leveraging PaddleOCR PaddleOCR wouldn’t be where it is today without its incredible community! 💗 A massive thank you to all our longtime partners, new collaborators, and everyone who’s poured their passion into PaddleOCR — whether we’ve named you or not. Your support fuels our fire! | Project Name | Description | | ------------ | ----------- | | [RAGFlow](https://github.com/infiniflow/ragflow)

|RAG engine based on deep document understanding.| | [MinerU](https://github.com/opendatalab/MinerU)

|Multi-type Document to Markdown Conversion Tool| | [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR)

|Free, Open-source, Batch Offline OCR Software.| | [OmniParser](https://github.com/microsoft/OmniParser)

|OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent.| | [QAnything](https://github.com/netease-youdao/QAnything)

|Question and Answer based on Anything.| | [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit)

|A powerful open-source toolkit designed to efficiently extract high-quality content from complex and diverse PDF documents.| | [Dango-Translator](https://github.com/PantsuDango/Dango-Translator)

|Recognize text on the screen, translate it and show the translation results in real time.| | [Learn more projects](./awesome_projects.md) | [More projects based on PaddleOCR](./awesome_projects.md)| ## 👩‍👩‍👧‍👦 Contributors

## 🌟 Star [![Star History Chart](https://api.star-history.com/svg?repos=PaddlePaddle/PaddleOCR&type=Date)](https://star-history.com/#PaddlePaddle/PaddleOCR&Date) ## 📄 License This project is released under the [Apache 2.0 license](LICENSE). ## 🎓 Citation ``` @misc{paddleocr2020, title={PaddleOCR, Awesome multilingual OCR toolkits based on PaddlePaddle.}, author={PaddlePaddle Authors}, howpublished = {\url{https://github.com/PaddlePaddle/PaddleOCR}}, year={2020} } ```