Single Reference View 6D Pose Estimation for Novel Objects

Top post
Estimating the 6D Pose of Novel Objects with Only a Single Reference View
Estimating the 6D pose, meaning determining the position and orientation of an object in three-dimensional space, is a central task in robotics, augmented reality, and many other fields. Traditional methods for 6D pose estimation of novel objects often require CAD models or a multitude of reference views. These requirements, however, present a hurdle, as the creation of CAD models can be time-consuming, and capturing numerous reference views under real-world conditions is often impractical.
A new approach, presented in the paper "Novel Object 6D Pose Estimation with a Single Reference View," promises a solution. The method, called SinRef-6D (Single-Reference-based novel object 6D), enables the estimation of the 6D pose using only a single reference view of the object. This significantly simplifies the process and expands the applicability of 6D pose estimation to scenarios where CAD models or multiple reference views are not available.
SinRef-6D is based on the principle of iterative point-to-point alignment in the camera coordinate system using State Space Models (SSMs). These models allow for the extraction of temporal dependencies and spatial information from the single reference view. Two specific SSMs, an RGB- and a point-SSM, are used to process both color information and spatial depth information. The iterative alignment in the camera coordinate system allows SinRef-6D to effectively handle even large deviations in pose.
A crucial advantage of SinRef-6D is the possibility of training with synthetic data. Once trained, the model can estimate the 6D pose of novel objects based on a single reference view, without requiring retraining or a CAD model. This generalization capability is particularly important for use in robotics, where robots frequently have to interact with unknown objects.
To evaluate the performance of SinRef-6D, extensive experiments were conducted on six common datasets as well as in real-world robot scenarios. The results show that SinRef-6D achieves comparable performance to methods based on CAD models or multiple reference views, despite using only a single reference view. This underscores the potential of SinRef-6D for practical applications in various fields.
The development of SinRef-6D represents a significant advancement in the field of 6D pose estimation. By reducing the required information to a single reference view, the applicability of the technology is significantly expanded, paving the way for new applications in robotics, augmented reality, and other areas. The publication of the code allows the research community to build upon this work and optimize the method for specific use cases.
Bibliographie: Corsetti, L., et al. "Open-Vocabulary Object 6D Pose Estimation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024. Nguyen, N., et al. "NOPE: Novel Object Pose Estimation from a Single Image." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024. Liu, J., et al. "Novel Object 6D Pose Estimation with a Single Reference View." arXiv preprint arXiv:2503.05578. 2025. Manhardt, F., et al. "GigaPose: A 10,000+ Object 6D Pose Dataset of Real and Synthetic Scenes." European Conference on Computer Vision (ECCV). 2022. Labbé, M., et al. "POPE: 6-DoF Promptable Pose Estimation of Any Object in Any Scene with One Reference." arXiv preprint arXiv:2308.12399. 2023.