Computer Vision & Artificial Intelligence
leads the MCML Junior Research Group ‘Egocentric Vision’ at TU Munich.
Xi Wang and her team conduct cutting-edge research in egocentric vision, focusing on learning from first-person human videos to understand behavior patterns and extract valuable information for potential applications in robotics. Their ongoing projects include 3D reconstruction using Gaussian splitting and multimodal learning with vision-language models. Funded as a BMBF project, the group maintains close ties with MCML and actively seeks collaborations that bridge egocentric vision with other research domains, extending beyond our own focus.
null
Computer Vision & Artificial Intelligence
Despite the astonishing progress in generative AI, 4D dynamic object generation remains an open challenge. With limited high-quality training data and heavy computing requirements, the combination of hallucinating unseen geometry together with unseen movement poses great challenges to generative models. In this work, we propose TwoSquared as a method to obtain a 4D physically plausible sequence starting from only two 2D RGB images corresponding to the beginning and end of the action. Instead of directly solving the 4D generation problem, TwoSquared decomposes the problem into two steps: 1) an image-to-3D module generation based on the existing generative model trained on high-quality 3D assets, and 2) a physically inspired deformation module to predict intermediate movements. To this end, our method does not require templates or object-class-specific prior knowledge and can take in-the-wild images as input. In our experiments, we demonstrate that TwoSquared is capable of producing texture-consistent and geometry-consistent 4D sequences only given 2D images.
Current 3D stylization techniques primarily focus on static scenes, while our world is inherently dynamic, filled with moving objects and changing environments. Existing style transfer methods primarily target appearance – such as color and texture transformation – but often neglect the geometric characteristics of the style image, which are crucial for achieving a complete and coherent stylization effect. To overcome these shortcomings, we propose GAS-NeRF, a novel approach for joint appearance and geometry stylization in dynamic Radiance Fields. Our method leverages depth maps to extract and transfer geometric details into the radiance field, followed by appearance transfer. Experimental results on synthetic and real-world datasets demonstrate that our approach significantly enhances the stylization quality while maintaining temporal coherence in dynamic scenes.
Neural implicits are a widely used surface presentation because they offer an adaptive resolution and support arbitrary topology changes. While previous works rely on ground truth point clouds or meshes, they often do not discuss the data acquisition and ignore the effect of input quality and sampling methods during reconstruction. In this paper, we introduce a sampling method with an uncertainty-augmented surface implicit representation that employs a sampling technique that considers the geometric characteristics of inputs. To this end, we introduce a strategy that efficiently computes differentiable geometric features, namely, mean curvatures, to guide the sampling phase during the training period. The uncertainty augmentation offers insights into the occupancy and reliability of the output signed distance value, thereby expanding representation capabilities into open surfaces. Finally, we demonstrate that our method improves the reconstruction of both synthetic and real-world data.
©all images: LMU | TUM
2025-03-14 - Last modified: 2025-03-14