MPI BABEL Action Recognition Challenge

Welcome to the BABEL Action Recognition Challenge

Data

The motion-capture (mocap) sequences come from the AMASS^[1] dataset, and the action labels come from the BABEL dataset.
Download the test data from the BABEL project webpage.

Action Recognition Task

Given a motion-capture span of 5 seconds or less, predict the likelihood (score) of actions in it. See the BABEL paper for details regarding the task. There are four different "challenge settings" in the action recognition task, with two variables:

Number of Classes
- BABEL-60 subset containing span of mocap sequences belonging to 60 classes.
- BABEL-120 subset containing span of mocap sequences belonging to 120 classes.
Type of Labels
- Dense labels only - The submission uses only the following two files: train.json and val.json for training and validation.
- Dense + Extra labels - The submission uses train.json, val.json and in addition, extra_train.json, extra_val.json during training and validation.

For details regarding the Dense and Extra labels, please visit our Data page.

Metrics

Top 1 measures the accuracy of the highest-scoring prediction.
Top 5 evaluates whether the ground-truth category is present among the top 5 highest-scoring predictions.
- It accounts for labeling noise and inherent label ambiguity.
- It also accounts for the possible association of multiple action categories with a single input movement sequence. For instance, a person walking in a circle is mapped to the two action categories walk and circular movement. Ideal models will predict high scores for all the categories relevant to the movement sample.
Top 1 Normalized is the mean Top 1 across categories. The magnitude of Top 1 Normalized - Top 1 illustrates the class-specific bias in the model performance. In BABEL, it reflects the impact of class imbalance on learning.

Predictions

Please refer to the action_recognition/challenge folder in the BABEL GitHub repository for details regarding how to generating a submission file on the test set.

You will need to log in to upload your submissions and get the evaluation results.

Leaderboard

BABEL-120
Algorithm	DENSE LABELS			DENSE + EXTRA LABELS
Algorithm	Top 1 (%)	Top 5 (%)	Top 1 Normalized (%)	Top 1 (%)	Top 5 (%)	Top 1 Normalized (%)
BABEL's baseline 2s-AGCN model (CE loss) ^[babel-2]	38.41	70.49	17.56

BABEL-60
Algorithm	DENSE LABELS			DENSE + EXTRA LABELS
Algorithm	Top 1 (%)	Top 5 (%)	Top 1 Normalized (%)	Top 1 (%)	Top 5 (%)	Top 1 Normalized (%)
BABEL's baseline 2s-AGCN model (CE loss) ^[babel-2]	41.14	73.18	24.46

References

[1] Mahmood, Naureen, et al. AMASS: Archive of Motion Capture as Surface Shapes. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
[babel-2] Abhinanda R. Punnakkal*, Arjun Chandrasekaran*, Nikos Athanasiou, Alejandra Quiros-Ramirez, Michael J. Black. (* denotes equal contribution). BABEL: Bodies, Action and Behavior with English Labels. CVPR , 2021. link

Contact & Citation

If you use this work, please cite:


@inproceedings{BABEL:CVPR:2021,
    title = {{BABEL}: Bodies, Action and Behavior with English Labels},
    author = {Punnakkal, Abhinanda R. and Chandrasekaran, Arjun and Athanasiou, Nikos and Quiros-Ramirez, Alejandra and Black, Michael J.},
    booktitle = {Proceedings IEEE/CVF Conf.~on Computer Vision and Pattern Recognition ({CVPR})},
    pages= {722--731},
    month = jun,
    year = {2021},
    doi = {},
    month_numeric = {6}
}

If you have any questions or problems regarding this dataset, please do not hesitate to contact us.

Acknowledgement

This website was built by the Software Workshop at Max-Planck Institute for Intelligent Systems