P S, A., Melnik, A., & Nandi, G. C. (2024). SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching (arXiv:2411.14322v1). arXiv. https://doi.org/10.48550/ARXIV.2411.14322
This research paper presents SplatR, a novel framework that utilizes 3D Gaussian Splatting for experience goal visual rearrangement tasks in Embodied AI. The study aims to address the limitations of existing scene representation methods for this task and improve the agent's ability to accurately restore a shuffled scene to its original configuration.
SplatR employs a two-phase approach. In the Walkthrough phase, the agent explores the environment in its goal state and builds a 3D Gaussian Splat representation. In the Unshuffle phase, the agent navigates the same environment with shuffled objects, using the Splat to render consistent views of the goal configuration. The agent then compares these views with the current observations using patch-wise dense feature matching with DINOv2 to detect changes and identify misplaced objects. Finally, a category-agnostic matching method aligns objects in the shuffled scene with their goal configurations, enabling the agent to perform the rearrangement.
The paper demonstrates that SplatR outperforms state-of-the-art methods on the AI2-THOR Rearrangement Challenge benchmark in terms of several metrics, including % Fixed Strict, % Misplaced, and % Energy Remaining. This indicates that SplatR enables more successful object rearrangements with minimal disruption to correctly placed objects.
The study concludes that 3D Gaussian Splatting is a promising approach for scene representation in experience goal visual rearrangement tasks. The authors suggest that SplatR's ability to provide consistent views and robust change detection contributes to its superior performance.
This research contributes to the field of Embodied AI by introducing a novel framework for visual rearrangement tasks. The use of 3D Gaussian Splatting as a world model offers a promising direction for developing more robust and efficient agents capable of interacting with complex environments.
The paper acknowledges limitations in handling small objects and the memory intensiveness of Gaussian Splatting. Future research could explore incorporating semantic features into the Splat and developing more efficient exploration strategies to address these limitations.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문