B | Perception, Vision, and Natural Language Processing


B1 | Computer Vision

Computer Vision has entered a golden era in which algorithms are being transformed from prototypes to real­world applications. Much of this success is due to the rise of deep learning algorithms, which have successfully tackled new Computer Vision tasks, ranging from object detection to semantic segmentation. Most models rely on the supervised learning paradigm, in which a convolutional neural network type of architecture is trained on very large datasets. While successful, there are still key challenges the MCML researchers address in this resaerch area: Going beyond convolutional neural networks by focusing on the development of novel models that encode both low­level pixel relationships as well as high­level object interactions; going beyond supervised learning by proposing new techniques to learn from unlabeled data, focusing on other learning paradigms such as self­supervised, semi­supervised, or active learning; and going boing beyond 2D by moving from semantic analysis of images and videos to analyzing and reasoning about the shape, appearance and motion of the 3D world perceived through the camera.

TU München

Computer Vision & Artificial Intelligence

TU München

Machine Learning of 3D Scene Geometry

TU München

Visual Computing

LMU München

Machine Vision & Learning

TU München

Physics-based Simulation

TU München

Computer Graphics & Visualization


B2 | Natural Language Processing

Natural Language Processing (NLP) is the subarea of computer science that is concerned with understanding and gen­eration of natural language text. The field has been revolutionized in the last 5+ years by the advent of deep learning. In spite of this impressive progress, the gap from the current state of the technology to human­level performance is still very large. There are a number of challenges that our MCML researchers tackle: The first challenge is that deep language understanding requires understanding the relationship between the words in a sentence. There are opportunities in addressing this problem by infusing deep learning models with structural biases, both new ones and those from the previous generation of NLP ML models. The second challenge is that current models do not possess common sense. There is an opportunity here to create experimental environments in which multimodal models can learn about the world through interacting with it. The third challenge is sample efficiency. NLP models are usually trained on large training sets. There is a vast discrepancy between what a truly intelligent being could learn from that much data on the one hand and what our current models do manage to learn on the other.

LMU München

Machine Translation and Multilingual NLP

LMU München

Artificial Intelligence and Computational Linguistics

LMU München

Statistical NLP and Deep Learning


B3 | Multimodal Perception

The ability for an intelligent, mobile actor to understand ego­motion as well as the surroundings are a fundamental pre­requisite for the choice of actions to take. However, vast challenges remain to achieve the necessary levels of safety, which are deeply rooted in research that MCML aims to carry out: Multi­sensor ego­motion estimation and environment mapping, scene representations suitable for interaction in an open­-ended environment, understanding and forecasting motion and events, and the the role of uncertainty in ML blocks as modular elements.

TU München

Cyber Physical Systems

TU München

Machine Learning for Robotics