Interaction Understanding
This challenge is on the Articulate3D dataset which will be presented at ICCV 2025. For the challenge we provide access to the train and validation sets, and to a data loader to get you started. We will also publicly share the code behind the USDNet baseline from the Articulate3D paper, serving as a baseline for this challenge.
Participants can submit their prediction on the evaluation server.
The challenge platform is based on the open-source EvalAI software and hosted on our own device. Contact here if any questions/problems regarding the baseline and the submission platform.
Task: Given a 3D indoor scene, the objective is to identify all movable parts and predict their interaction specifications. These include the part's motion characteristics—such as axis, origin, and motion type (rotation or translation)—as well as the specific graspable region that enables interaction (e.g., a door knob or window handle).
Input: A 3D point cloud of the scene.
Output:
1. Segmentation of all movable (articulated) parts.
2. For each movable part:
The challenge is divided into 4 main phases:
Movable Part Segmentation and Articulation Prediction - Development Phase: Participants are encouraged to use the training and validation splits of the Articulate3D dataset for experimentation and method development. All annotations for these splits are publicly available.
Movable Part Segmentation and Articulation Prediction - Test Phase: Participants will submit their predictions on the test set to the evaluation server. Ground truth annotations for the test split will remain private.
Interactable Part Segmentation - Development Phase: Participants are encouraged to use the training and validation splits of the Articulate3D dataset for experimentation and method development. All annotations for these splits are publicly available.
Interactable Part Segmentation - Test Phase: Participants will submit their predictions on the test set to the evaluation server. Ground truth annotations for the test split will remain private.
Participants can submit their prediction on the evaluation server.
The challenge platform is based on the open-source EvalAI software and hosted on our own device. Contact here if any questions/problems regarding the baseline and the submission platform.
⚠️ Note: Articulate3D annotations are based on ScanNet++ scenes. You must obtain the ScanNet++ scans separately. Articulate3D only provides the annotations from the Articulate3D paper.
Articulate3D Annotations: Articulate3D offers diverse per-scene annotations, with the relevant annotations for this challenge:
Each scene contains two JSON files named using the following convention:
{scannetpp_scan_id}_parts.json
{scannetpp_scan_id}_artic.json
triIndices
field.{obj_id}.{parent_hierarchy}.{own_hierarchy}.{label}
Example:
3.1.cabinet
3.1.2_1.door
3.1.2_2.door
3.1.2_1.3_1.handle
Explanation:
3.1.cabinet
: A cabinet object.3.1.2_1.door
, 3.1.2_2.door
: Two doors of the cabinet.3.1.2_1.3_1.handle
: Handle on the first door.pid
, which corresponds to a partId
in parts.json
.base
field denotes the static reference part (e.g., door frame).
Data Loader: A Python-based scene iterator that returns:
The example of submission files are here.
For the pahse of movable part segmentation and articulation prediction, participants must submit zip files (.zip) containing predictions for each sca Each prediction must include all detected movable and interactable instances in that scan. The baseline output predictions in .pickle file in the format of following:
{
"scene_id_1": {
"pred_masks": numpy.Array, // binary mask over mesh vertices (Num_vertices, num_pred_parts)
"pred_scores": numpy.Array, // confidence scores for each predicted part (num_pred_parts)
"pred_classes": numpy.Array, // class labels for each predicted part (num_pred_parts), 1: rotation, 2: translation
"pred_origins": numpy.Array, // axis origin for each predicted part (num_pred_parts, 3)
"pred_axes": numpy.Array, // axis direction for each predicted part (num_pred_parts, 3)
},
...
"scene_id_2": {
...
}
}
For the phase of interactable part segmentation, participants must submit Pickle files (.pkl) containing interaction part segmentation for each scan.
The file should be structured as a dictionary where:
{
"scene_id_1": {
"pred_masks": numpy.Array, // binary mask over mesh vertices (Num_vertices, num_pred_parts)
"pred_scores": numpy.Array, // confidence scores for each predicted part (num_pred_parts)
"pred_classes": numpy.Array, // class labels for each predicted part (num_pred_parts), 1: rotation, 2: translation, note here the class labels refers to the articulation type of the parent movable part.
},
...
"scene_id_2": {
...
}
}
However, the pickle file takes a lot of storage space and submission of it can overwhelm the server. Thus, we transfer the predictions in .txt and them zip it for submission. Check the code and the script for details.
The following evaluation metrics will be computed on your submission:
@article{halacheva2024articulate3d,
title={Holistic Understanding of 3D Scenes as Universal Scene Description},
author={Anna-Maria Halacheva and Yang Miao and Jan-Nico Zaech and Xi Wang and Luc Van Gool and Danda Pani Paudel},
year={2024},
journal={arXiv preprint arXiv:2412.01398},
}