Areas of Competence
Spatial and spatio-temporal data, i.e. data with a dynamic spatial or time component such as sensor data or camera recordings, play an important role in many applications today.
In this competence group, Machine Learning methods will be developed that address the challenges inherent to spatio-temporal data, such as integrating data from different sources and with different resolutions or processing stream data in real-time. In particular, we will develop new Deep Learning methods for time series analysis, Bayesian methods for image analysis and reinforcement learning techniques for finding optimal strategies in data-driven environments.
This competence field encompasses the following areas of focus:
In many data-intensive applications, from social media, genome research to mobility, attributed graphs and networks
have proven to be a powerful and highly informative data source. In real applications, however, the stored networks are often subject to errors, contain outliers and are highly noisy.
Handling such data requires the development of machine learning techniques on graphs that can handle impure and inaccurate data and are robust to errors. Especially if the edges of a network are not deterministic but stochastic, statistical models that were originally used to model social networks in the field of social sciences can be used.
In general, statistical models and methods of machine learning for the analysis of
relational data have experienced a significant upswing in the past 5 to 10 years. However, the scaling of models and methods to today’s high-dimensional networks is still in its infancy.
The combination of new data sources, e.g. text analyses results in high-dimensional and complex relations (networks), which allow connections between texts (patents, websites, research papers) and actors (inventors, authors, …).
A particularly important instance are knowledge graphs, which have emerged from the tradition of knowledge modelling and the semantic web and represent a significant breakthrough.
A particular methodological challenge is the exchange of information and the interaction with unstructured data such as signals, texts, as well as image and video data. Knowledge graphs increasingly find applications in the medical environment, but also in the industrial environment for the communication of agents in order to gain a holistic view of the information sources in the company, as well as in supply chain management and in the Internet of Things (IoT). In our work we combine knowledge graphs with machine learning.
The research field of representation learning involves the automated generation of meaningful features from high-dimensional data sets in order to develop fields of application in which insufficient expert knowledge is available for the manual creation of such features. The performance of methods in the field of machine learning is primarily determined by the availability of meaningful features. Generating such features from high-dimensional observations is based originally on domain expertise. With the progresses made in the field of deep learning and with the exponential growth of available data within the past years the automated identification of meaningful features in different domains became possible. The focus of this research group is developing and further improving the performance of such methods and to exploit new fields of application for machine learning. The challenges are inter alia generating data features for heterogeneous data sets, the robustness of the methods regarding their non-representative observations and the reduction of the required amount of data while maintaining the performance and interpretability of the computed features. The success of the representations in the computed data features can be emphasized through the synthesis of new scientific insights in the fields applications. Thus the development of novel methods will be linked to specific applications. This linking facilitates the transformation of theoretic developments into practical use.
This competence field encompasses teh following areas of focus:
Valid benchmarking of machine learning methods is essential to gain robust guarantees for the practical use of models. Successful machine learning involves much more than efficient optimization of arisk function within a given model: Preprocessing, hyperparameter tuning, model selection,
feature generation and selection are central aspects of the modeling process, often critical to the success of a project. For the development of fully automated systems, statistically valid benchmarking is especially important.
After a model has been selected and validated, the interpretation of the model is of crucial importance.
Models are often complex after optimal model selection and have ways to be created to make them understandable.
Ideally, this should be done model-agnostically, so that a model diagnosis can be performed independently of
the actual (automatic) model selection.
This competence field is divided into three areas of focus:
Large-scale machine learning covers supervised as well as unsupervised analysis of Big Data. The amount of data to be analyzed as well as the number of dimensions increases steadily and new basic technologies like, e.g., distributed computing and parallel processing with graphics cards provide a plethora of new possibilities to learn from large amounts data. As some machine learning algorithms are easily parallelizable, there are still many architectures not investigated thoroughly yet regarding their applicability for large-scale and high-dimensional data. Especially methods of unsupervised learning as clustering of high dimensional data, e.g., subspace clustering or correlation clustering, or community detection in graphs were often developed without focus on Big-Data.
With the amount of data also the demand of explainability of analysis results increases. Interactive approaches can support explainabilty and use expert knowledge by offering hyperparameters. Applications allowing users to select different underlying statistical models could use expert knowledge even more nuanced. Also permanently and fast available results become more and more important, while the time to process and analyze data increases with its amount and dimensionality. Thus, developing anytime algorithms, which are able to deliver results at any time, is another goal.
Summarizing, this competence field encompasses the following areas of focus: