LLM-aided OCR – Correcting Tesseract OCR errors with LLMs

LLM-Aided OCR Project is a cutting-edge system that enhances OCR text quality using natural language processing and large language models. It corrects errors, formats text, removes duplicate content, and offers configurable options like header suppression. The system supports both local and API-based LLMs, async processing, token management, and quality assessment. Users can customize settings in a .env file. The installation process involves setting up Python 3.12, Tesseract OCR engine, and necessary libraries. The system converts PDF to images, performs OCR, splits text into chunks, corrects errors using LLMs, and formats text in Markdown. Optimization includes concurrent processing and adaptive token management.

https://github.com/Dicklesworthstone/llm_aided_ocr

To top