Interaction Understanding
This challenge is on the Articulate3D dataset which will be presented at ICCV 2025. For the challenge we provide access to the train and validation sets, and to a data loader to get you started. We will also publicly share the code behind the USDNet baseline from the Articulate3D paper, serving as a baseline for this challenge.
Participants will be able to submit their prediction on the evaluation server (eval.ai) from August, 1st to October, 15th 2025.
Task: Given a 3D indoor scene, the objective is to identify all movable parts and predict their interaction specifications. These include the part's motion characteristics—such as axis, origin, and motion type (rotation or translation)—as well as the specific graspable region that enables interaction (e.g., a door knob or window handle).
Input: A 3D point cloud of the scene.
Output:
1. Segmentation of all movable (articulated) parts.
2. For each movable part:
The challenge is divided into two main phases:
Development Phase: Participants are encouraged to use the training and validation splits of the Articulate3D dataset for experimentation and method development. All annotations for these splits are publicly available.
Test Phase: Participants will submit their predictions on the test set to the evaluation server. Ground truth annotations for the test split will remain private. (Server link coming soon.)
Participants will be able to submit their prediction on the evaluation server (eval.ai) from August, 1st to October, 15th 2025.
⚠️ Note: Articulate3D annotations are based on ScanNet++ scenes. You must obtain the ScanNet++ scans separately. Articulate3D only provides the annotations from the Articulate3D paper.
Articulate3D Annotations: Articulate3D offers diverse per-scene annotations, with the relevant annotations for this challenge:
Each scene contains two JSON files named using the following convention:
{scannetpp_scan_id}_parts.json
{scannetpp_scan_id}_artic.json
triIndices
field.{obj_id}.{parent_hierarchy}.{own_hierarchy}.{label}
Example:
3.1.cabinet
3.1.2_1.door
3.1.2_2.door
3.1.2_1.3_1.handle
Explanation:
3.1.cabinet
: A cabinet object.3.1.2_1.door
, 3.1.2_2.door
: Two doors of the cabinet.3.1.2_1.3_1.handle
: Handle on the first door.pid
, which corresponds to a partId
in parts.json
.base
field denotes the static reference part (e.g., door frame).
Data Loader: A Python-based scene iterator that returns:
Participants must submit a Pickle file (.pkl) containing predictions for each scan. Each prediction must include all detected movable and interactable instances in that scan.
An example file will provided for download here by July 15th.
The file should be structured as a dictionary where:
{
"scene_id_1": {
"pred_masks": numpy.Array, // binary mask over mesh vertices (Num_vertices, num_pred_parts)
"pred_scores": numpy.Array, // confidence scores for each predicted part (num_pred_parts)
"pred_classes": numpy.Array, // class labels for each predicted part (num_pred_parts), 1: rotation, 2: translation
"pred_origins": numpy.Array, // axis origin for each predicted part (num_pred_parts, 3)
"pred_axes": numpy.Array, // axis direction for each predicted part (num_pred_parts, 3)
},
...
"scene_id_2": {
...
}
}
The following evaluation metrics will be computed on your submission:
@article{halacheva2024articulate3d,
title={Holistic Understanding of 3D Scenes as Universal Scene Description},
author={Anna-Maria Halacheva and Yang Miao and Jan-Nico Zaech and Xi Wang and Luc Van Gool and Danda Pani Paudel},
year={2024},
journal={arXiv preprint arXiv:2412.01398},
}