VGGT: Visual Geometry Grounded Transformer

The Visual Geometry Grounded Transformer (VGGT) is a groundbreaking neural network that can infer all key 3D scene attributes from a single or multiple views within seconds. By cloning the repository and installing dependencies, users can easily try the model with just a few lines of code. Surprisingly, VGGT performs well on single-view reconstruction, even though it was not explicitly trained for this task. The model’s runtime and GPU memory usage have been benchmarked, showcasing its efficiency. VGGT’s performance compares favorably to state-of-the-art monocular approaches. Users can explore interactive 3D visualization tools and track point information across images. The research behind VGGT builds on previous projects, with acknowledgements to various repositories for their contributions.

https://github.com/facebookresearch/vggt