Programming Languages and Artificial Intelligence
holds the Chair of Programming Languages and Artificial Intelligence at LMU Munich.
His research focuses on securing software through automated methods. His team builds systems to analyze software and understand its properties and purpose, and to harden software against malicious attacks.
Function names can greatly aid human reverse engineers, which has spurred the development of machine learning-based approaches to predicting function names in stripped binaries. Much current work in this area now uses transformers, applying a metaphor of machine translation from code to function names. Still, function naming models face challenges in generalizing to projects unrelated to the training set. In this paper, we take a completely new approach by transferring advances in automated image captioning to the domain of binary reverse engineering, such that different parts of a binary function can be associated with parts of its name. We propose BLens, which combines multiple binary function embeddings into a new ensemble representation, aligns it with the name representation latent space via a contrastive learning approach, and generates function names with a transformer architecture tailored for function names. Our experiments demonstrate that BLens significantly outperforms the state of the art. In the usual setting of splitting per binary, we achieve an F1 score of 0.79 compared to 0.70. In the cross-project setting, which emphasizes generalizability, we achieve an F1 score of 0.46 compared to 0.29. Finally, in an experimental setting reducing shared components across projects, we achieve an F1 score of 0.32 compared to 0.19.
Programming Languages and Artificial Intelligence
Programming Languages and Artificial Intelligence
Programming Languages and Artificial Intelligence
Browser extensions put millions of users at risk when misusing their elevated privileges. Despite the current practices of semi-automated code vetting, privacy-violating extensions still thrive in the official stores. We propose an approach for tracking contextual flows from browser-specific sensitive sources like cookies, browsing history, bookmarks, and search terms to suspicious network sinks through network requests. We demonstrate the effectiveness of the approach by a prototype called CodeX that leverages the power of CodeQL while breaking away from the conservativeness of bug-finding flavors of the traditional CodeQL taint analysis. Applying CodeX to the extensions published on the Chrome Web Store between March 2021 and March 2024 identified 1,588 extensions with risky flows. Manual verification of 339 of those extensions resulted in flagging 212 as privacy-violating, impacting up to 3.6M users.
Programming Languages and Artificial Intelligence
tbd
Programming Languages and Artificial Intelligence
Programming Languages and Artificial Intelligence
Programming Languages and Artificial Intelligence
©all images: LMU | TUM
2024-12-27 - Last modified: 2024-12-27