Nvidia-Ingest: Multi-modal data extraction

NVIDIA-Ingest is a powerful microservice focused on extracting data and metadata from various document types. It supports PDFs, Word, PowerPoint, and images, allowing users to extract text, tables, charts, and images for downstream generative applications. Surprisingly, it offers multiple extraction methods for each document type to balance throughput and accuracy, such as pdfium, Unstructured.io, and Adobe Content Extraction Services for PDFs. The service is not a fixed pipeline but adaptable to different document parsing libraries. Users need specific hardware and software prerequisites, detailed in the content. To utilize NVIDIA-Ingest, users must follow a step-by-step process outlined, including starting containers and installing Python dependencies.

https://github.com/NVIDIA/nv-ingest

To top