
|
Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects
Chuanruo Ning,
Ruihai Wu,
Haoran Lu,
Kaichun Mo and
Hao Dong
NeurIPS 2023
We introduce an affordance learning framework that effectively explores novel categories with minimal interactions on a limited number of instances.
Our framework explicitly estimates the geometric similarity across different categories, identifying local areas that differ from shapes in the training categories for efficient exploration while concurrently transferring affordance knowledge to similar parts of the objects.
Extensive experiments in simulated and real-world environments demonstrate our framework's capacity for efficient few-shot exploration and generalization.
[Paper]
[BibTex]
|
|

|
STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots
Yi Li,
Muru Zhang,
Markus Grotz,
Kaichun Mo and
Dieter Fox
CoRL 2023
Segmentation and tracking of unseen object instances in discrete frames pose a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses.
Our task involves working with a discrete set of frames separated by indefinite periods, during which substantial changes to the scene may occur, such like in the cases of object rearrangements, including shifting, removal, and partial occlusion by new items.
To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios.
Furthermore, we propose a novel paradigm for joint segmentation and tracking in discrete frames, alongside a transformer module that facilitates efficient inter-frame communication.
Our approach significantly outperforms recent methods in our experiments.
[Paper]
[Project]
[BibTex]
|
|

|
COPILOT: Human Collision Prediction and Localization from Multi-view Egocentric Videos
Boxiao Pan,
Bokui Shen*,
Davis Rempe*,
Despoina Paschalidou,
Kaichun Mo,
Yanchao Yang and
Leonidas J. Guibas
ICCV 2023
We propose the problem of predicting human-scene collisions from multi-view egocentric RGB videos captured from an exoskeleton.
Specifically, the problem consists of predicting: (1) if a collision will happen in the next H seconds; (2) which body joints might be involved in a collision; and (3) where in the scene might cause the collision, in the form of a spatial heatmap.
To solve this problem, we present COPILOT, a COllision PredIction and LOcalization Transformer that tackles all three sub-tasks in a multi-task setting, effectively leveraging multi-view video inputs through a proposed 4D attention operation across space, time, and viewpoint.
[Paper]
[Project]
[BibTex]
|
|

|
Toward Learning Geometric Eigen-Lengths Crucial for Fitting Tasks
Yijia Weng,
Kaichun Mo,
Ruoxi Shi,
Yanchao Yang and
Leonidas J. Guibas
ICML 2023
Some extremely low-dimensional yet crucial geometric eigen-lengths often determine the success of some geometric tasks.
For example, the height of an object is important to measure to check if it can fit between the shelves of a cabinet, while the width of a couch is crucial when trying to move it through a doorway.
In this work, we propose a novel problem of discovering key geometric concepts (e.g., height, width, radius) of objects for robotic fitting tasks.
We explore potential solutions and demonstrate the feasibility of learning eigen-lengths from simply observing successful and failed fitting trials.
We also attempt geometric grounding for more accurate eigen-length measurement and study the reusability of the learned geometric eigen-lengths across multiple tasks.
[Project]
[BibTex]
|
|

|
JacobiNeRF: NeRF Shaping with Mutual Information Gradients
Xiaomeng Xu*,
Yanchao Yang*,
Kaichun Mo,
Boxiao Pan,
Li Yi and
Leonidas J. Guibas
CVPR 2023
We propose a method that trains a neural radiance field (NeRF) to encode not only the appearance of the scene but also mutual correlations between scene points, regions, or entities – aiming to capture their co-variation patterns.
In contrast to the traditional first-order photometric reconstruction objective, our method explicitly regularizes the learning dynamics to align the Jacobians of highlycorrelated entities, which proves to maximize the mutual information between them under random scene perturbations.
Experiments show that JacobiNeRF is more efficient in propagating annotations among 2D pixels and 3D points compared to NeRFs without mutual information shaping, especially in extremely sparse label regimes – thus reducing annotation burden.
[Paper]
[Code]
[Poster]
[Slides]
[BibTex]
|
|

|
DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Object Manipulation
Yan Zhao*,
Ruihai Wu*,
Zhehuan Chen,
Yourong Zhang,
Qingnan Fan,
Kaichun Mo and
Hao Dong
ICLR 2023
We propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks.
The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning.
Using the large-scale PartNet-Mobility and ShapeNet datasets, we set up four benchmark tasks for dual-gripper manipulation.
[Paper]
[Project]
[BibTex]
|
|

|
SceneHGN: Hierarchical Graph Networks for 3D Indoor Scene Generation with Fine-Grained Geometry
Lin Gao,
Jia-Mu Sun,
Kaichun Mo,
Yu-Kun Lai,
Leonidas J. Guibas and
Jie Yang
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2023
We propose a hierarchical graph network for 3D indoor scenes that takes into account the full hierarchy from the room level to the object level, then finally to the object part level.
Therefore for the first time, our method is able to directly generate plausible 3D room content, including furniture objects with fine-grained geometry, and their layout.
Our generation network is a conditional recursive neural network (RvNN) based variational autoencoder (VAE) that learns to generate detailed content with fine-grained geometry for a room, given the room boundary as the condition.
Extensive experiments demonstrate that our method produces superior generation results.
We also demonstrate that our method is effective for various applications such as part-level room editing, room interpolation, and room generation by arbitrary room boundaries.
[Paper]
[Project]
[BibTex]
|
|

|
Seg&Struct: The Interplay between Part Segmentation and Structure Inference for 3D Shape Parsing
Jeonghyun Kim,
Kaichun Mo,
Minhyuk Sung* and
Woontack Woo*
WACV 2023
We propose Seg& Struct, a supervised learning framework leveraging the interplay between part segmentation and structure inference and demonstrating their synergy in an integrated framework.
Our framework first decomposes a raw input shape into part segments using off-the-shelf algorithm, whose outputs are then mapped to nodes in a part hierarchy, establishing point-to-part associations.
Following this, ours predicts the structural information, e.g., part bounding boxes the part relationships.
Lastly, the segmentation is rectified by examining the confusion of part boundaries using the structure-based part features.
[Paper]
[Project]
[BibTex]
|
|