12.08.2025

Teaser image to Tracking Actions in Space and Time: ICCV 2025 Challenge & Workshop

Tracking Actions in Space and Time: ICCV 2025 Challenge & Workshop

MCML Research Insight - With Tanveer Hannan, Mark Weber, and Thomas Seidl

Tracking actions, not just objects: The Spatiotemporal Action Grounding Challenge and Workshop at ICCV 2025 focusses on detecting and localizing actions both in space and time within complex, real-world videos.

Unlike standard action recognition, this task requires identifying when and where an action occurs, pushing models to handle long, diverse, and occlusion-heavy video content. Applications range from sports analytics to autonomous systems and large-scale video search.

The worksop is organized by MCML Junior Members Tanveer Hannan and Mark Weber, MCML Director Thomas Seidl and collaborators.


«We believe this challenge will bring the community together to push the boundaries of spatiotemporal video understanding.»


Tanveer Hannan

MCML Junior Member, Co-Organizer

Challenge Overview

The challenge provides:

  • A large-scale dataset with dense spatiotemporal annotations.
  • Standardized evaluation for fair comparison.
  • Baseline models & open-source code for quick experimentation.

The Evaluation Server is open until September 19, 2025, allowing teams to submit results and receive immediate feedback.

Evaluation Server

Check out an example video from the dataset

It features dense spatiotemporal annotations, with user queries paired with corresponding object bounding boxes in every frame. At the top, you’ll see the current frame number and a visual summary of the number of queries (shown as boxes). At the bottom, the specific textual user query is displayed.

Example Video

«This is an exciting opportunity to connect research and real-world applications.»


Rajat Koner

Amazon, MCML Alumni, Co-Organizer

Workshop Highlights

The challenge will conclude at our ICCV 2025 Workshop in Hawaii, featuring:

  • Invited talks from leading video understanding researchers.
  • Panel discussions on future directions.
  • Top team presentations detailing innovative solutions.

Full details, including the evaluation package, dataset description, and baseline code, are available on the Workshop Website.

Workshop Website

 


«We want to see methods that not only achieve high accuracy but also work in real-world scenarios.»


Organizers

Why Now?

The explosion of online video makes robust spatiotemporal grounding essential. The complexity of actions, camera changes, and temporal reasoning demands methods that are accurate, generalizable, and efficient. Advances in Vision-Language Models (VLMs) and multimodal learning open new possibilities, and this challenge aims to accelerate progress.


Get Involved

We invite the computer vision community to join us.

Participate in the challenge

Evaluation Server

Explore resources

Workshop Website

Meet us at ICCV 2025

Conference Website

We look forward to your submissions and discussions at ICCV 2025 in Hawaii.


Share Your Research!


Get in touch with us!

Are you an MCML Junior Member and interested in showcasing your research on our blog?

We’re happy to feature your work—get in touch with us to present your paper.

12.08.2025


Subscribe to RSS News feed

Related

Link to AI for Personalized Psychiatry - with researcher Clara Vetter

01.09.2025

AI for Personalized Psychiatry - With Researcher Clara Vetter

AI research by Clara Vetter uses brain, genetic and smartphone data to personalize psychiatry and improve diagnosis and treatment.

Link to Satellite Insights for a Sustainable Future - with researcher Ivica Obadic

25.08.2025

Satellite Insights for a Sustainable Future - With Researcher Ivica Obadic

AI from satellite imagery helps design livable cities, improve well-being & food systems with transparent models by Ivica Obadić.

Link to Digital Twins for Surgery - with researcher Azade Farshad

18.08.2025

Digital Twins for Surgery - With Researcher Azade Farshad

Azade Farshad develops patient digital twins at TUM & MCML to improve personalized treatment, surgical planning, and training.

Link to From Physics Dreams to Algorithm Discovery - with Niki Kilbertus

13.08.2025

From Physics Dreams to Algorithm Discovery - With Niki Kilbertus

Niki Kilbertus develops AI algorithms to uncover cause and effect, making science smarter and decisions in fields like medicine more reliable.

Link to AI for Dynamic Urban Mapping - with researcher Shanshan Bai

11.08.2025

AI for Dynamic Urban Mapping - With Researcher Shanshan Bai

Shanshan Bai uses geo-tagged social media and AI to map cities in real time. Part of KI Trans, funded by DATIpilot to support AI in education.