Articulate3D

Articulate3D Challenge

Interaction Understanding

INSAIT, Sofia University "St. Kliment Ohridski"

Preprint Data Data Loader USDNet Baseline

This challenge is on the Articulate3D dataset which will be presented at ICCV 2025. For the challenge we provide access to the train and validation sets, and to a data loader to get you started. We will also publicly share the code behind the USDNet baseline from the Articulate3D paper, serving as a baseline for this challenge.

Participants can submit their prediction on the evaluation server.

The challenge platform is based on the open-source EvalAI software and hosted on our own device. Contact here if any questions/problems regarding the baseline and the submission platform.

Task Description

Task: Given a 3D indoor scene, the objective is to identify all movable parts and predict their interaction specifications. These include the part's motion characteristics—such as axis, origin, and motion type (rotation or translation)—as well as the specific graspable region that enables interaction (e.g., a door knob or window handle).

Input: A 3D point cloud of the scene.

Output:
1. Segmentation of all movable (articulated) parts.
2. For each movable part:

(a) Predicted motion specification: axis, origin, and motion type.
(b) The mask of the associated interactable region (e.g., handle, knob, button, switch).

Challenge Phases

The challenge is divided into 4 main phases:

Movable Part Segmentation and Articulation Prediction - Development Phase: Participants are encouraged to use the training and validation splits of the Articulate3D dataset for experimentation and method development. All annotations for these splits are publicly available.

Movable Part Segmentation and Articulation Prediction - Test Phase: Participants will submit their predictions on the test set to the evaluation server. Ground truth annotations for the test split will remain private.

Interactable Part Segmentation - Development Phase: Participants are encouraged to use the training and validation splits of the Articulate3D dataset for experimentation and method development. All annotations for these splits are publicly available.

Interactable Part Segmentation - Test Phase: Participants will submit their predictions on the test set to the evaluation server. Ground truth annotations for the test split will remain private.

Participants can submit their prediction on the evaluation server.

The challenge platform is based on the open-source EvalAI software and hosted on our own device. Contact here if any questions/problems regarding the baseline and the submission platform.

Working with Articulate3D

⚠️ Note: Articulate3D annotations are based on ScanNet++ scenes. You must obtain the ScanNet++ scans separately. Articulate3D only provides the annotations from the Articulate3D paper.

Articulate3D Annotations: Articulate3D offers diverse per-scene annotations, with the relevant annotations for this challenge:

(a) Segmentation masks for the parts of all articulated objects. The parts include fixed, movable and interactable (graspable) parts.
(b) Connectivity graphs of the parts (e.g., to which door a certain knob belongs, and to which cabinet the door belongs).
(c) Motion specifications for each movable part (origin, axis, motion range, and type).

Click for format details

Dataset Structure

Each scene contains two JSON files named using the following convention:

{scannetpp_scan_id}_parts.json  
{scannetpp_scan_id}_artic.json

{scannetpp_scan_id}: The ID of the ScanNet++ scene.
parts.json: Contains part segmentation annotations.
artic.json: Contains articulation (motion) annotations.

Part Segmentation Annotations

Face-based segmentation is provided via the triIndices field.
Vertex-based segmentation can be derived by voting over face labels.
Hierarchy Representation: Encoded in the part label using a dot-separated string:
```
{obj_id}.{parent_hierarchy}.{own_hierarchy}.{label}
```
Example:
```
3.1.cabinet  
3.1.2_1.door  
3.1.2_2.door  
3.1.2_1.3_1.handle
```
Explanation:
- 3.1.cabinet: A cabinet object.
- 3.1.2_1.door, 3.1.2_2.door: Two doors of the cabinet.
- 3.1.2_1.3_1.handle: Handle on the first door.

Articulation Annotations

Each movable part is indexed by its pid, which corresponds to a partId in parts.json.
The base field denotes the static reference part (e.g., door frame).
The base can also be inferred from the label hierarchy as the direct parent part.

Data Loader: A Python-based scene iterator that returns:

(1) A dictionary of movable parts, each with predicted motion and its list of interactable parts.
(2) A face-level scene mask marking all movable and interactable segments.

📤 Submission Instructions

1. What to Submit

The example of submission files are here.

For the pahse of movable part segmentation and articulation prediction, participants must submit zip files (.zip) containing predictions for each sca Each prediction must include all detected movable and interactable instances in that scan. The baseline output predictions in .pickle file in the format of following:

{
  "scene_id_1": {
    "pred_masks": numpy.Array, // binary mask over mesh vertices (Num_vertices, num_pred_parts)
    "pred_scores": numpy.Array, // confidence scores for each predicted part (num_pred_parts)
    "pred_classes": numpy.Array, // class labels for each predicted part (num_pred_parts), 1: rotation, 2: translation
    "pred_origins": numpy.Array, // axis origin for each predicted part (num_pred_parts, 3)
    "pred_axes": numpy.Array, // axis direction for each predicted part (num_pred_parts, 3)
  },
  ...
  "scene_id_2": {
    ...
  }
}

For the phase of interactable part segmentation, participants must submit Pickle files (.pkl) containing interaction part segmentation for each scan. The file should be structured as a dictionary where:

{
  "scene_id_1": {
    "pred_masks": numpy.Array, // binary mask over mesh vertices (Num_vertices, num_pred_parts)
    "pred_scores": numpy.Array, // confidence scores for each predicted part (num_pred_parts)
    "pred_classes": numpy.Array, // class labels for each predicted part (num_pred_parts), 1: rotation, 2: translation, note here the class labels refers to the articulation type of the parent movable part. 
  },
  ...
  "scene_id_2": {
    ...
  }
}

However, the pickle file takes a lot of storage space and submission of it can overwhelm the server. Thus, we transfer the predictions in .txt and them zip it for submission. Check the code and the script for details.

2. Metrics Computed

The following evaluation metrics will be computed on your submission:

AP@50%: Average Precision at 50% IoU threshold (standard semantic instance segmentation)
Articulation-specific metrics:
- MA: Match with correct Axis
- MO: Match with correct Origin
- MAO-ST: Match with both Axis and Origin

BibTeX

@article{halacheva2024articulate3d, title={Holistic Understanding of 3D Scenes as Universal Scene Description}, author={Anna-Maria Halacheva and Yang Miao and Jan-Nico Zaech and Xi Wang and Luc Van Gool and Danda Pani Paudel}, year={2024}, journal={arXiv preprint arXiv:2412.01398}, }