25

Jan

Teaser image to From Video Understanding to Embodied Agents

From Video Understanding to Embodied Agents

Cordelia Schmid, Inria Institute / Google Research

   25.01.2024

   5:00 pm - 7:00 pm

   TUM Campus Munich, Room 0790, Arcisstraße 21, 80333 München

On behalf of our partners at the Bavarian AI network baiosphere, the MCML cordially invites you to the Munich AI Lectures.

Cordelia Schmid is a pioneer in AI research. She invented procedures in the field of image recognition that enabled computers to semantically interpret image and video content. Her computer vision algorithms are key for the development of robotic assistants that can, in the future, recognize their surroundings and respond to spoken commands. Her work has been honored with important awards, including the Körber Prize, endowed with one million euros, her most recent achievement.

In this talk, she presents Vid2Seq, a model for dense video captioning that predicts temporal boundaries and textual descriptions from video and speech, and a retrieval-augmented visual language model that achieves state-of-the-art results in video question answering and image captioning. She also introduces the History Aware Multimodal Transformer (HAMT) for vision-guided navigation and robot manipulation, demonstrating its superior performance on benchmarks and in real-world applications with the Tiago robot and UR5 arm.

Organized by:

baiosphere

Bavarian Academy of Science and Humanities

Helmholtz Munich

LMU Munich

TUM

AI-HUB LMU

ELLIS Munich Unit

Konrad Zuse School of Excellence in Reliable AI

MCML

Munich Data Science Institute TUM

Munich Institute of Robotics and Machine Intelligence TUM


Related

Link to Multi-Modal and Multi-Robot Coordination in Challenging Environments

Munich AI Lectures  •  22.07.2024  •  TUM Garching Campus, FMI Building, Hörsaal 2 (00.04.011), Boltzmannstr. 3, 85748 Garching bei München or online via Livestream

Multi-Modal and Multi-Robot Coordination in Challenging Environments

Munich AI Lecture with Sebastian Scherer from CMU outlining some of their approaches progress, and results on multi-modal sensing.


Link to We are (still?) not giving data enough credit

Munich AI Lectures  •  17.07.2024  •  Bayerische Akademie der Wissenschaften, Plenarsaal, 1. Stock, Alfons-Goppel-Straße 11, 80539 München

We are (still?) not giving data enough credit

Munich AI Highlight Lecture with Alexei A. Efros from UC Berkeley, emphasizing the critical role of data in Computer Vision with historical examples and recent work.


Link to Physical AI: Promises and Challenges

Munich AI Lectures  •  15.07.2024  •  TU Munich, Institute for Advanced Study, Auditorium (Ground floor), Lichtenbergstraße 2a, 85748 Garching

Physical AI: Promises and Challenges

Munich AI Lecture with Daniela Rus from MIT discussing recent developments in ML and robotics.


Link to From Video Understanding to Embodied Agents

Munich AI Lectures  •  25.06.2024  •  TUM Campus Munich, Room 0790, Arcisstraße 21, 80333 München

From Video Understanding to Embodied Agents

The MCML invites you to the Munich AI Lectures with Ivan Laptev discussing AI models that make reliable predictions from explanatory videos.


Link to Learning complex robotic behaviors with optimal control

Munich AI Lectures  •  18.06.2024  •  TUM, Arcisstr. 21, 80333 Munich, Room 0790 (ground floor)

Learning complex robotic behaviors with optimal control

Munich AI Lecture with Ludovic Righetti from NYU presenting their recent work with a particular eye towards unifying learning and numerical optimal control.