is Professor at the Chair of Database Systems & Data Mining and head of 'AI-beyond: Research Group for Spatial AI' at LMU Munich.
Reinforcement learning can learn powerful policies which enable autonomous systems to dynamically adapt to unknown situations and still perform well in maximizing expected rewards. His group develops novel solutions for spatial mobility tasks such as resource collection and allocation in highly dynamic environments. They aim to make their agents as versatile to adapt to changed conditions and variations of the environment. They further investigate risk and constraints to enforce stable outcomes in financial settings such as portfolio allocation.
Allocation tasks represent a class of problems where a limited amount of resources must be allocated to a set of entities at each time step. Prominent examples of this task include portfolio optimization or distributing computational workloads across servers. Allocation tasks are typically bound by linear constraints describing practical requirements that have to be strictly fulfilled at all times. In portfolio optimization, for example, investors may be obligated to allocate less than 30% of the funds into a certain industrial sector in any investment period. Such constraints restrict the action space of allowed allocations in intricate ways, which makes learning a policy that avoids constraint violations difficult. In this paper, we propose a new method for constrained allocation tasks based on an autoregressive process to sequentially sample allocations for each entity. In addition, we introduce a novel de-biasing mechanism to counter the initial bias caused by sequential sampling. We demonstrate the superior performance of our approach compared to a variety of Constrained Reinforcement Learning (CRL) methods on three distinct constrained allocation tasks: portfolio optimization, computational workload distribution, and a synthetic allocation benchmark.
Remote sensing projects typically generate large amounts of imagery that can be used to train powerful deep neural networks. However, the amount of labeled images is often small, as remote sensing applications generally require expert labelers. Thus, semi-supervised learning (SSL), i.e., learning with a small pool of labeled and a larger pool of unlabeled data, is particularly useful in this domain. Current SSL approaches generate pseudo-labels from model predictions for unlabeled samples. As the quality of these pseudo-labels is crucial for performance, utilizing additional information to improve pseudo-label quality yields a promising direction. For remote sensing images, geolocation and recording time are generally available and provide a valuable source of information as semantic concepts, such as land cover, are highly dependent on spatiotemporal context, e.g., due to seasonal effects and vegetation zones. In this paper, we propose to exploit spatiotemporal metainformation in SSL to improve the quality of pseudo-labels and, therefore, the final model performance. We show that directly adding the available metadata to the input of the predictor at test time degenerates the prediction quality for metadata outside the spatiotemporal distribution of the training set. Thus, we propose a teacher-student SSL framework where only the teacher network uses metainformation to improve the quality of pseudo-labels on the training set. Correspondingly, our student network benefits from the improved pseudo-labels but does not receive metadata as input, making it invariant to spatiotemporal shifts at test time. Furthermore, we propose methods for encoding and injecting spatiotemporal information into the model and introduce a novel distillation mechanism to enhance the knowledge transfer between teacher and student. Our framework dubbed Spatiotemporal SSL can be easily combined with several state-of-the-art SSL methods, resulting in significant and consistent improvements on the BigEarthNet and EuroSAT benchmarks.
Semantic segmentation represents a fundamental task in computer vision with various application areas such as autonomous driving, medical imaging, or remote sensing. For evaluating and comparing semantic segmentation models, the mean intersection over union (mIoU) is currently the gold standard. However, while mIoU serves as a valuable benchmark, it does not offer insights into the types of errors incurred by a model. Moreover, different types of errors may have different impacts on downstream applications. To address this issue, we propose an intuitive method for the systematic categorization of errors, thereby enabling a fine-grained analysis of semantic segmentation models. Since we assign each erroneous pixel to precisely one error type, our method seamlessly extends the popular IoU-based evaluation by shedding more light on the false positive and false negative predictions. Our approach is model- and dataset-agnostic, as it does not rely on additional information besides the predicted and ground-truth segmentation masks. In our experiments, we demonstrate that our method accurately assesses model strengths and weaknesses on a quantitative basis, thus reducing the dependence on time-consuming qualitative model inspection. We analyze a variety of state-of-the-art semantic segmentation models, revealing systematic differences across various architectural paradigms. Exploiting the gained insights, we showcase that combining two models with complementary strengths in a straightforward way is sufficient to consistently improve mIoU, even for models setting the current state of the art on ADE20K.
Quantum Machine Learning (QML) is a recent and rapidly evolving field where the theoretical framework and logic of quantum mechanics is employed to solve machine learning tasks. A variety of techniques that have a different level of quantum-classical hybridization has been presented. Here we focus on variational quantum circuits (VQC), which emerged as the most promising candidates for the quantum counterpart of neural networks in the noisy intermediate-scale quantum (NISQ) era. Although showing promising results, VQCs can be hard to train because of different issues e.g. barren plateau, periodicity of the weights or choice of the architecture. In this paper we focus on this last problem and in order to address it we propose a gradient free algorithm inspired by natural evolution to optimise both the weights and the architecture of the VQC. In particular, we present a version of the well known neuroevolution of augmenting topologies (NEAT) algorithm adapted to the case of quantum variational circuits. We test the algorithm with different benchmark problems of classical fields of machine learning i.e. reinforcement learning and optimization.
Financial portfolio managers typically face multi-period optimization tasks such as short-selling or investing at least a particular portion of the portfolio in a specific industry sector. A common approach to tackle these problems is to use constrained Markov decision process (CMDP) methods, which may suffer from sample inefficiency, hyperparameter tuning, and lack of guarantees for constraint violations. In this paper, we propose Action Space Decomposition Based Optimization (ADBO) for optimizing a more straightforward surrogate task that allows actions to be mapped back to the original task. We examine our method on two real-world data portfolio construction tasks. The results show that our new approach consistently outperforms state-of-the-art benchmark approaches for general CMDPs.
Modern Emergency Medical Services (EMS) benefit from real-time sensor information in various ways as they provide up-to-date location information and help assess current local emergency risks. A critical part of EMS is dynamic ambulance redeployment, i.e., the task of assigning idle ambulances to base stations throughout a community. Although there has been a considerable effort on methods to optimize emergency response systems, a comparison of proposed methods is generally difficult as reported results are mostly based on artificial and proprietary test beds. In this paper, we present a benchmark simulation environment for dynamic ambulance redeployment based on real emergency data from the city of San Francisco. Our proposed simulation environment is highly scalable and is compatible with modern reinforcement learning frameworks. We provide a comparative study of several state-of-the-art methods for various metrics. Results indicate that even simple baseline algorithms can perform considerably well in close-to-realistic settings.
Recently, the availability of remote sensing imagery from aerial vehicles and satellites constantly improved. For an automated interpretation of such data, deep-learning-based object detectors achieve state-of-the-art performance. However, established object detectors require complete, precise, and correct bounding box annotations for training. In order to create the necessary training annotations for object detectors, imagery can be georeferenced and combined with data from other sources, such as points of interest localized by GPS sensors. Unfortunately, this combination often leads to poor object localization and missing annotations. Therefore, training object detectors with such data often results in insufficient detection performance. In this paper, we present a novel approach for training object detectors with extremely noisy and incomplete annotations. Our method is based on a teacher-student learning framework and a correction module accounting for imprecise and missing annotations. Thus, our method is easy to use and can be combined with arbitrary object detectors. We demonstrate that our approach improves standard detectors by 37.1% $AP_{50}$ on a noisy real-world remote-sensing dataset. Furthermore, our method achieves great performance gains on two datasets with synthetic noise.
A common problem in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes’ representations of the input graph align and become indiscernible. The latest models employing attention mechanisms with Graph Transformer Layers (GTLs) are still restricted to the layer-wise computational workflow of a GNN that are not beyond preventing such effects. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes’ representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph textbf{S}htextbf{e}ll textbf{A}ttention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node’s representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results while drastically reducing the number of parameters compared to state-of-the-art models.
Stochastic Resource Collection (SRC) describes tasks where an agent tries to collect a maximal amount of dynamic resources while navigating through a road network. An instance of SRC is the traveling officer problem (TOP), where a parking officer tries to maximize the number of fined parking violations. In contrast to vehicular routing problems, in SRC tasks, resources might appear and disappear by an unknown stochastic process, and thus, the task is inherently more dynamic. In most applications of SRC, such as TOP, covering realistic scenarios requires more than one agent. However, directly applying multi-agent approaches to SRC yields challenges considering temporal abstractions and inter-agent coordination. In this paper, we propose a novel multi-agent reinforcement learning method for the task of Multi-Agent Stochastic Resource Collection (MASRC). To this end, we formalize MASRC as a Semi-Markov Game which allows the use of temporal abstraction and asynchronous actions by various agents. In addition, we propose a novel architecture trained with independent learning, which integrates the information about collaborating agents and allows us to take advantage of temporal abstractions. Our agents are evaluated on the multiple traveling officer problem, an instance of MASRC where multiple officers try to maximize the number of fined parking violations. Our simulation environment is based on real-world sensor data. Results demonstrate that our proposed agent can beat various state-of-the-art approaches.
Object detection on aerial and satellite imagery is an important tool for image analysis in remote sensing and has many areas of application. As modern object detectors require accurate annotations for training, manual and labor-intensive labeling is necessary. In situations where GPS coordinates for the objects of interest are already available, there is potential to avoid the cumbersome annotation process. Unfortunately, GPS coordinates are often not well-aligned with georectified imagery. These spatial errors can be seen as noise regarding the object locations, which may critically harm the training of object detectors and, ultimately, limit their practical applicability. To overcome this issue, we propose a co-correction technique that allows us to robustly train a neural network with noisy object locations and to transform them toward the true locations. When applied as a preprocessing step on noisy annotations, our method greatly improves the performance of existing object detectors. Our method is applicable in scenarios where the images are only annotated with points roughly indicating object locations, instead of entire bounding boxes providing precise information on the object locations and extents. We test our method on three datasets and achieve a substantial improvement (e.g., 29.6% mAP on the COWC dataset) over existing methods for noise-robust object detection.
Finding an available on-street parking spot is a relevant problem of day-to-day life. In recent years, several cities began providing real-time parking occupancy data. Finding a free parking spot in such a smart environment can be modeled and solved as a Markov decision process (MDP). The solver has to consider uncertainty as available parking spots might not remain available until arrival due to other vehicles claiming spots in the meantime. Knowing the parking intention of every vehicle in the environment would eliminate this uncertainty but is currently not realistic. In contrast, acquiring data from a subset of vehicles appears feasible and could at least reduce uncertainty.In this paper, we examine how sharing data within a vehicle fleet might lower parking search times. We use this data to better estimate the availability of parking spots at arrival. Since optimal solutions for large scenarios are computationally infeasible, we base our methods on approximations shown to perform well in single-agent settings. Our evaluation features a simulation of a part of Melbourne and indicates that fleet data can significantly reduce the time spent searching for a free parking bay.
We show that the task of collecting stochastic, spatially distributed resources (Stochastic Resource Collection, SRC) may be considered as a Semi-Markov-Decision-Process. Our Deep-Q-Network (DQN) based approach uses a novel scalable and transferable artificial neural network architecture. The concrete use-case of the SRC is an officer (single agent) trying to maximize the amount of fined parking violations in his area. We evaluate our approach on a environment based on the real-world parking data of the city of Melbourne. In small, hence simple, settings with short distances between resources and few simultaneous violations, our approach is comparable to previous work. When the size of the network grows (and hence the amount of resources) our solution significantly outperforms preceding methods. Moreover, applying a trained agent to a non-overlapping new area outperforms existing approaches.
Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data. In particular, methods relying on the self-expressiveness property have recently proved especially successful. However, they suffer from two major shortcomings: First, a quadratic-size coefficient matrix is learned directly, preventing these methods from scaling beyond small datasets. Secondly, the trained models are transductive and thus cannot be used to cluster out-of-sample data unseen during training. Instead of learning self-expression coefficients directly, we propose a novel metric learning approach to learn instead a subspace affinity function using a siamese neural network architecture. Consequently, our model benefits from a constant number of parameters and a constant-size memory footprint, allowing it to scale to considerably larger datasets. In addition, we can formally show that out model is still able to exactly recover subspace clusters given an independence assumption. The siamese architecture in combination with a novel geometric classifier further makes our model inductive, allowing it to cluster out-of-sample data. Additionally, non-linear clusters can be detected by simply adding an auto-encoder module to the architecture. The whole model can then be trained end-to-end in a self-supervised manner. This work in progress reports promising preliminary results on the MNIST dataset. In the spirit of reproducible research, me make all code publicly available. In future work we plan to investigate several extensions of our model and to expand experimental evaluation.
In this work, we present SMART-Env (Spatial Multi-Agent Resource search Training Environment), a spatio-temporal multi-agent environment for evaluating and training different kinds of agents on resource search tasks. We explain how to simulate arbitrary spawning distributions on real-world street graphs, compare agents’ behavior and evaluate their performance over time. Finally, we demonstrate SMART-Env in a taxi dispatching scenario with three different kinds of agents.
In this work we address the problem of graph node alignment at the example of Map Fusion (MF). Given two partly overlapping road networks, the goal is to match nodes that represent the same locations in both networks. For this task we propose a new model based on Graph Neural Networks (GNN). Existing GNN approaches, which have recently been successfully applied on various tasks for graph based data, show poor performance for the MF task. We hypothesize that this is mainly caused by graph regions from the non-overlapping areas, as information from those areas negatively affect the learned node representations. Therefore, our model has an additional inductive bias and learns to ignore effects of nodes that do not have a matching in the other graph. Our new model can easily be extended to other graph alignment problems, e.g., for calculating graph similarities, or for the alignment of entities in knowledge graphs, as well.
Spatial interpolation is the task to predict a measurement for any location in a given geographical region. To train a prediction model, we assume to have point-wise measurements for various locations in the region. In addition, it is often beneficial to consider historic measurements for these locations when training an interpolation model. Typical use cases are the interpolation of weather, pollution or traffic information. In this paper, we introduce a new type of model with strong relational inductive bias based on Message Passing Networks. In addition, we extend our new model to take geomorphological characteristics into account to improve the prediciton quality. We provide an extensive evaluation based on a large real-world weather dataset and compare our new approach with classical statistical interpolation techniques and Neural Networks without inductive bias.
Monitoring the restoration of natural habitats after human intervention is an important task in the field of remote sensing. Currently, this requires extensive field studies entailing considerable costs. Unmanned Aerial vehicles (UAVs, a.k.a. drones) have the potential to reduce these costs, but generate immense amounts of data which have to be evaluated automatically with special techniques. Especially the automated detection of tree seedlings poses a big challenge, as their size and shape vary greatly across images. In addition, there is a tradeoff between different flying altitudes. Given the same camera equipment, a lower flying altitude achieves higher resolution images and thus, achieving high detection rates is easier. However, the imagery will only cover a limited area. On the other hand, flying at larger altitudes, allows for covering larger areas, but makes seedling detection more challenging due to the coarser images. In this paper we investigate the usability of super resolution (SR) networks for the case that we can collect a large amount of coarse imagery on higher flying altitudes, but only a small amount of high resolution images from lower flying altitudes. We use a collection of high-resolution images taken by a drone at 5m altitude. After training the SR models on these data, we evaluate their applicability to low quality images taken at 30m altitude (in-domain). In addition, we investigate and compare whether approaches trained on a highly diverse large data sets can be transferred to these data (cross-domain). We also evaluate the usability of the SR results based on their influence on the detection rate of different object detectors. We found that the features acquired from training on standard SR data sets are transferable to the drone footage. Furthermore, we demonstrate that the detection rate of common object detectors can be improved by SR techniques using both settings, in-domain and cross-domain.
In many applications, it is required to analyze a graph merely based on its topology. In these cases, nodes can only be distinguished based on their structural neighborhoods and it is common that nodes having the same functionality or role yield similar neighborhood structures. In this work, we investigate two problems: (1) how to create structural node embeddings which describe a node’s role and (2) how important the nodes’ roles are for characterizing entire graphs. To describe the role of a node, we explore the structure within the local neighborhood (or multiple local neighborhoods of various extents) of the node in the vertex domain, compute the visiting probability distribution of nodes in the local neighborhoods and summarize each distribution to a single number by computing its entropy. Furthermore, we argue that the roles of nodes are important to characterize the entire graph. Therefore, we propose to aggregate the role representations to describe whole graphs for graph classification tasks. Our experiments show that our new role descriptors outperform state-of-the-art structural node representations that are usually more expensive to compute. Additionally, we achieve promising results compared to advanced state-of-the-art approaches for graph classification on various benchmark datasets, often outperforming these approaches.
Routing to a resource (e.g. a parking spot or charging station) is a probabilistic search problem due to the uncertainty as to whether the resource is available at the time of arrival or not. In recent years, more and more real-time information about the current state of resources has become available in order to facilate this task. Therefore, we consider the case of a driver receiving online updates about the current situation. In this setting, the problem can be described as a fully observable Markov Decision Process (MDP) which can be used to compute an optimal policy minimizing the expected search time. However, current approaches do not scale beyond a dozen resources in a query. In this paper, we suggest to adapt common approximate solutions for solving MDPs. We propose a new re-planning and hindsight planning algorithm that redefine the state space and rely on novel cost estimations to find close to optimal results. Unlike exact solutions for computing MDPs, our approximate planers can scale up to hundreds of resources without prohibitive computational costs. We demonstrate the result quality and the scalability of our approaches on two settings describing the search for parking spots and charging stations in an urban environment.