Donut is a new method of document understanding that uses an OCR-free end-to-end Transformer model, demonstrating state-of-the-art performances on various visual document understanding tasks. It doesn’t require the use of off-the-shelf OCR engines/APIs, providing flexibility on various languages and domains. The method is described in detail and provides full experimental results and analyses in an academic paper. The synthetic document generator, SynthDoG, is also presented and is available to generate synthetic datasets. Pre-trained models and web demos are available on Gradio and Google Colab, with links provided in the article. The software is open-source under the MIT license.
https://github.com/clovaai/donut