In the area of binary code analysis, there recently has been a rapid rise in research on binary function embedding models. These are trained to encode the semantics of binary code in such a way that they can be generalized to a variety of reverse engineering tasks such as binary code search, vulnerability detection, or malware classification. While many models only take the function in question as contextual input, there have been successful attempts to improve function embeddings by leveraging information from the call graph. In this study, we dissect the implications of these embedding refinements. We conduct experiments using a range of graph-based models on the embeddings generated by two state-of-the-art binary function embedding models. In the process, we show that improvements on binary code similarity detection (BCSD) will not necessarily generalize to downstream tasks, neither of semantic nor of syntactic nature. More generally, we find that training on semantic tasks correlates with worse performance on syntactic tasks. By conducting an explanatory analysis on the dataset, we find that the call graph-based enhancements significantly enhance the robustness of embeddings, particularly in scenarios where the initial models struggle. We encourage future research to build upon these findings to further explore the best methods for leveraging inter-function information in binary analysis.
inproceedings VK26
BibTeXKey: VK26