Home | Publications | DVK+25

Which Instructions Matter the Most: A Saliency Analysis of Binary Function Embedding Models

MCML Authors

Moritz Dannehl

→ Group Johannes Kinder
Programming Languages and Artificial Intelligence

Samuel Valenzuela

→ Group Johannes Kinder
Programming Languages and Artificial Intelligence

Johannes Kinder

Prof. Dr.

Associate

Programming Languages and Artificial Intelligence

Abstract

Current deep learning models for binary code struggle with explainability, since it is often unclear which factors are important for a given output. In this paper, we apply occlusion-based saliency analysis as an explainability method to binary code embedding models. We conduct experiments on two state-of-the-art Transformer-based models that take preprocessed assembly code as input and calculate embedding vectors for each function. We show that, during training, the models learn the importance of different instructions. From the results, we observe that call instructions and the names of external call targets are important. This observation confirms the intuition that function calls significantly impact the semantics of a function and therefore should also have a large impact on its learned embedding. This motivates the need for developing model architectures that integrate stronger analysis into preprocessing to further leverage call relationships.

inproceedings DVK+25

DLSP @SPW 2025

8th Deep Learning Security and Privacy Workshop at the 46th IEEE Symposium on Security and Privacy. San Francisco, CA, May 15, 2025.

Authors

M. Dannehl • S. Valenzuela • J. Kinder

Links

DOI

Research Area

A3 | Computational Models

BibTeXKey: DVK+25

#p-kinder