is Assistant Professor at the Chair of Human-Computer Interaction and Artificial Intelligence at LMU Munich.
His research sits at the intersection between Human-Computer Interaction and Artificial Intelligence, where he focuses on the next generation of computing systems. He uses artificial intelligence to design, build, and evaluate future human-centered interfaces. In particular, he envisions enabling humans to outperform their performance in collaboration with the machine. He focuses on areas such as augmented and virtual reality, mobile scenarios, and robotics.
Users frequently use their smartphones in combination with other smart devices, for example, when streaming music to smart speakers or controlling smart appliances. During these interconnected interactions, user data gets handled and processed by several entities that employ different data protection practices or are subject to different regulations. Users need to understand these processes to inform themselves in the right places and make informed privacy decisions. We conducted an online survey (N=120) to investigate whether users have accurate mental models about interconnected interactions. We found that users consider scenarios more privacy-concerning when multiple devices are involved. Yet, we also found that most users do not fully comprehend the privacy-relevant processes in interconnected interactions. Our results show that current privacy information methods are insufficient and that users must be better educated to make informed privacy decisions. Finally, we advocate for restricting data processing to the app layer and better encryption to reduce users’ data protection responsibilities.
As artificial intelligence becomes increasingly pervasive, it is essential that we understand the implications of bias in machine learning. Many developers rely on crowd workers to generate and annotate datasets for machine learning applications. However, this step risks embedding training data with labeler bias, leading to biased decision-making in systems trained on these datasets. To characterize labeler bias, we created a face dataset and conducted two studies where labelers of different ethnicity and sex completed annotation tasks. In the first study, labelers annotated subjective characteristics of faces. In the second, they annotated images using bounding boxes. Our results demonstrate that labeler demographics significantly impact both subjective and accuracy-based annotations, indicating that collecting a diverse set of labelers may not be enough to solve the problem. We discuss the consequences of these findings for current machine learning practices to create fair and unbiased systems.
Images and videos are widely used to elicit emotions; however, their visual appeal differs from real-world experiences. With virtual reality becoming more realistic, immersive, and interactive, we envision virtual environments to elicit emotions effectively, rapidly, and with high ecological validity. This work presents the first interactive virtual reality dataset to elicit emotions. We created five interactive virtual environments based on corresponding validated 360° videos and validated their effectiveness with 160 participants. Our results show that our virtual environments successfully elicit targeted emotions. Compared with the existing methods using images or videos, our dataset allows virtual reality researchers and practitioners to integrate their designs effectively with emotion elicitation settings in an immersive and interactive way.
Future domestic robots will become integral parts of our homes. They will have various sensors that continuously collect data and varying locomotion and interaction capabilities, enabling them to access all rooms and physically manipulate the environment. This raises many privacy concerns. We investigate how such concerns can be mitigated, using all possibilities enabled by the robot’s novel locomotion and interaction abilities. First, we found that privacy concerns increase with advanced locomotion and interaction capabilities through an online survey (N=90). Second, we conducted three focus groups (N=22) to construct 86 patterns to communicate the states of microphones, cameras, and the internet connectivity of domestic robots. Lastly, we conducted a large-scale online survey (N=1720) to understand which patterns perform best regarding trust, privacy, understandability, notification qualities, and user preference. Our final set of communication patterns will guide developers and researchers to ensure a privacy-preserving future with domestic robots.
In a world increasingly reliant on artificial intelligence, it is more important than ever to consider the ethical implications of artificial intelligence. One key under-explored challenge is labeler bias — bias introduced by individuals who label datasets — which can create inherently biased datasets for training and subsequently lead to inaccurate or unfair decisions in healthcare, employment, education, and law enforcement. Hence, we conducted a study (N=98) to investigate and measure the existence of labeler bias using images of people from different ethnicities and sexes in a labeling task. Our results show that participants hold stereotypes that influence their decision-making process and that labeler demographics impact assigned labels. We also discuss how labeler bias influences datasets and, subsequently, the models trained on them. Overall, a high degree of transparency must be maintained throughout the entire artificial intelligence training process to identify and correct biases in the data as early as possible.
Physiological sensing enables us to use advanced adaptive functionalities through physiological data (e.g., eye tracking) to change conditions. In this work, we investigate the impact of infilling methods on LSTM models’ performance in handling missing eye tracking data, specifically during blinks and gaps in recording. We conducted experiments using recommended infilling techniques from previous work on an openly available eye tracking dataset and LSTM model structure. Our findings indicate that the infilling method significantly influences LSTM prediction accuracy. These results underscore the importance of standardized infilling approaches for enhancing the reliability and reproducibility of LSTM-based eye tracking applications on a larger scale. Future work should investigate the impact of these infilling methods in larger datasets to investigate generalizability.
Currently, interactive systems use physiological sensing to enable advanced functionalities. While eye tracking is a promising means to understand the user, eye tracking data inherently suffers from missing data due to blinks, which may result in reduced system performance. We conducted a literature review to understand how researchers deal with this issue. We uncovered that researchers often implemented their use-case-specific pipeline to overcome the issue, ranging from ignoring missing data to artificial interpolation. With these first insights, we run a large-scale analysis on 11 publicly available datasets to understand the impact of the various approaches on data quality and accuracy. By this, we highlight the pitfalls in data processing and which methods work best. Based on our results, we provide guidelines for handling eye tracking data for interactive systems. Further, we propose a standard data processing pipeline that allows researchers and practitioners to pre-process and standardize their data efficiently.
Sedentary behavior is endemic in modern workplaces, contributing to negative physical and mental health outcomes. Although adjustable standing desks are increasing in popularity, people still avoid standing. We developed an open-source plug-and-play system to remotely control standing desks and investigated three system modes with a three-week in-the-wild user study (N=15). Interval mode forces users to stand once per hour, causing frustration. Adaptive mode nudges users to stand every hour unless the user has stood already. Smart mode, which raises the desk during breaks, was the best rated, contributing to increased standing time with the most positive qualitative feedback. However, non-computer activities need to be accounted for in the future. Therefore, our results indicate that a smart standing desk that shifts modes at opportune times has the most potential to reduce sedentary behavior in the workplace. We contribute our open-source system and insights for future intelligent workplace well-being systems.
Security indicators, such as the padlock icon indicating SSL encryption in browsers, are established mechanisms to convey secure connections. Currently, such indicators mainly exist for browsers and mobile environments. With the rise of the metaverse, we investigate how to mark secure transitions between applications in virtual reality to so-called sub-metaverses. For this, we first conducted in-depth interviews with domain experts (N=8) to understand the general design dimensions for security indicators in virtual reality (VR). Using these insights and considering additional design constraints, we implemented the five most promising indicators and evaluated them in a user study (N=25). While the visual blinking indicator placed in the periphery performed best regarding accuracy and task completion time, participants subjectively preferred the static visual indicator above the portal. Moreover, the latter received high scores regarding understandability while still being rated low regarding intrusiveness and disturbance. Our findings contribute to a more secure and enjoyable metaverse experience.
Eye tracking is the basis for many intelligent systems to predict user actions. A core challenge with eye-tracking data is that it inherently suffers from missing data due to blinks. Approaches such as intent prediction and user state recognition process gaze data using neural networks; however, they often have difficulty handling missing information. In an effort to understand how prior work dealt with missing data, we found that researchers often simply ignore missing data or adopt use-case-specific approaches, such as artificially filling in missing data. This inconsistency in handling missing data in eye tracking hinders the development of effective intelligent systems for predicting user actions and limits reproducibility. Furthermore, this can even lead to incorrect results. Thus, this lack of standardization calls for investigating possible solutions to improve the consistency and effectiveness of processing eye-tracking data for user action prediction.
Over the last few years, we have seen many approaches using tangibles to address the limited expressiveness of touchscreens. Mainstream tangible detection uses fiducial markers embedded in the tangibles. However, the coarse sensor size of capacitive touchscreens makes tangibles bulky, limiting their usefulness. We propose a novel deep-learning super-resolution network to facilitate fiducial tangibles on capacitive touchscreens better. In detail, our network super-resolves the markers enabling off-the-shelf detection algorithms to track tangibles reliably. Our network generalizes to unseen marker sets, such as AprilTag, ArUco, and ARToolKit. Therefore, we are not limited to a fixed number of distinguishable objects and do not require data collection and network training for new fiducial markers. With extensive evaluation, including real-world users and five showcases, we demonstrate the applicability of our open-source approach on commodity mobile devices and further highlight the potential of tangibles on capacitive touchscreens.
We are constantly surrounded by technology that collects and processes sensitive data, paving the way for privacy violations. Yet, current research investigating technology-facilitated privacy violations in the physical world is scattered and focused on specific scenarios or investigates such violations purely from an expert’s perspective. Informed through a large-scale online survey, we first construct a scenario taxonomy based on user-experienced privacy violations in the physical world through technology. We then validate our taxonomy and establish mitigation strategies using interviews and co-design sessions with privacy and security experts. In summary, this work contributes (1) a refined scenario taxonomy for technology-facilitated privacy violations in the physical world, (2) an understanding of how privacy violations manifest in the physical world, (3) a decision tree on how to inform users, and (4) a design space to create notices whenever adequate. With this, we contribute a conceptual framework to enable a privacy-preserving technology-connected world.
Today touchscreens are one of the most common input devices for everyday ubiquitous interaction. Yet, capacitive touchscreens are limited in expressiveness; thus, a large body of work has focused on extending the input capabilities of touchscreens. One promising approach is to use index finger orientation; however, this requires a two-handed interaction and poses ergonomic constraints. We propose using the thumb’s pitch as an additional input dimension to counteract these limitations, enabling one-handed interaction scenarios. Our deep convolutional neural network detecting the thumb’s pitch is trained on more than 230,000 ground truth images recorded using a motion tracking system. We highlight the potential of ThumbPitch by proposing several use cases that exploit the higher expressiveness, especially for one-handed scenarios. We tested three use cases in a validation study and validated our model. Our model achieved a mean error of only 11.9°.
As ubiquitous computing brings sensors and actuators directly into our homes, they introduce privacy concerns for the owners and bystanders. However, privacy concerns may vary among devices and depend on the bystanders’ social relation to the owner. In this work, we hypothesize 1) that bystanders assign more privacy concerns to smart home devices than personal computing devices, such as smartphones, even though they have the same capabilities, and 2) that a stronger social relationship mitigates some of the bystanders’ privacy concerns. By conducting an online survey (n=170), we found that personal computing devices are perceived as significantly less privacy-concerning than smart home devices while having equal capabilities. By varying the assumed social relationship, we further found that a stronger connection to the owner reduces privacy concerns. Thus, as bystanders underestimate the risk of personal computing devices and are generally concerned about smart home devices, it is essential to alert the user about the presence of both. We argue that bystanders have to be informed about the privacy risks while entering a new space, in the best case, already in the entrance area.
©all images: LMU | TUM