LoCUS: Learning Multiscale 3D-consistent Features from Posed Images

Kloepfer, Dominik A.; Campbell, Dylan; Henriques, João F.

LoCUS: Learning Multiscale 3D-consistent Features from Posed Images

dc.contributor.author	Kloepfer, Dominik A.	en
dc.contributor.author	Campbell, Dylan	en
dc.contributor.author	Henriques, João F.	en
dc.date.accessioned	2025-06-19T04:31:58Z
dc.date.available	2025-06-19T04:31:58Z
dc.date.issued	2023-01-15	en
dc.description.abstract	An important challenge for autonomous agents such as robots is to maintain a spatially and temporally consistent model of the world. It must be maintained through occlusions, previously-unseen views, and long time horizons (e.g., loop closure and re-identification). It is still an open question how to train such a versatile neural representation without supervision. We start from the idea that the training objective can be framed as a patch retrieval problem: given an image patch in one view of a scene, we would like to retrieve (with high precision and recall) all patches in other views that map to the same real-world location. One drawback is that this objective does not promote reusability of features: by being unique to a scene (achieving perfect precision/recall), a representation will not be useful in the context of other scenes. We find that it is possible to balance retrieval and reusability by constructing the retrieval set carefully, leaving out patches that map to far-away locations. Similarly, we can easily regulate the scale of the learned features (e.g., points, objects, or rooms) by adjusting the spatial tolerance for considering a retrieval to be positive. We optimize for (smooth) Average Precision (AP), in a single unified ranking-based objective. This objective also doubles as a criterion for choosing landmarks or keypoints, as patches with high AP. We show results creating sparse, multi-scale, semantic spatial maps composed of highly identifiable landmarks, with applications in landmark retrieval, localization, semantic segmentation and instance segmentation.	en
dc.description.sponsorship	Acknowledgements. We are grateful for funding from EPSRC AIMS CDT EP/S024050/1 (D.K.), Continental AG (D.C.), and the Royal Academy of Engineering (RF/201819/18/163, J.H.).	en
dc.description.status	Peer-reviewed	en
dc.format.extent	11	en
dc.identifier.isbn	9798350307184	en
dc.identifier.issn	1550-5499	en
dc.identifier.other	Scopus:85185876179	en
dc.identifier.other	ARIES:a383154xPUB47146	en
dc.identifier.other	ORCID:/0000-0002-4717-6850/work/162523125	en
dc.identifier.uri	http://www.scopus.com/inward/record.url?scp=85185876179&partnerID=8YFLogxK	en
dc.identifier.uri	https://hdl.handle.net/1885/733764478
dc.language.iso	en	en
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en
dc.relation.ispartofseries	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023	en
dc.relation.ispartofseries	Proceedings of the IEEE International Conference on Computer Vision	en
dc.rights	Publisher Copyright: © 2023 IEEE.	en
dc.title	LoCUS: Learning Multiscale 3D-consistent Features from Posed Images	en
dc.type	Conference paper	en
local.bibliographicCitation.lastpage	16598	en
local.bibliographicCitation.startpage	16588	en
local.contributor.affiliation	Kloepfer, Dominik A.; University of Oxford	en
local.contributor.affiliation	Campbell, Dylan; School of Computing, ANU College of Systems and Society, The Australian National University	en
local.contributor.affiliation	Henriques, João F.; University of Oxford	en
local.identifier.doi	10.1109/ICCV51070.2023.01525	en
local.identifier.pure	3c2bc6cd-317d-43c7-bb09-b039f8f48a19	en
local.type.status	Published	en

Collections

ANU Research Publications

LoCUS: Learning Multiscale 3D-consistent Features from Posed Images

Downloads

Collections