Spann3R, a 3D reconstruction system developed by University College London, utilizes a transformer-based architecture to create dense 3D models from images without prior scene or camera information. Unlike previous methods, Spann3R implements an external spatial memory to track relevant 3D data and predict the next frame’s structure in a global coordinate system. By leveraging pre-trained weights from DUSt3R and fine-tuning on select datasets, Spann3R demonstrates strong performance across various datasets and can handle ordered image collections in real-time. The system includes a video pipeline that maps images to pointmaps, as well as showcases attention maps to visualize patch correlations.
https://hengyiwang.github.io/projects/spanner