kTrans: Knowledge-Aware Transformer for Binary Code Embedding

要約

バイナリコード埋め込み (BCE) は、バイナリコード類似性検出、型回復、制御フロー回復、データフロー分析などのさまざまなリバースエンジニアリングタスクに重要な用途を持っています。
最近の研究では、Transformer モデルがバイナリコードのセマンティクスを理解して下流のタスクをサポートできることが示されています。
しかし、既存のモデルはアセンブリ言語の事前知識を無視していました。
この論文では、知識を認識したバイナリコード埋め込みを生成するための、新しい Transformer ベースのアプローチ、つまり kTrans を提案します。
形式的知識を追加入力として Transformer に供給し、暗黙的知識を新しい事前トレーニングタスクと融合することで、kTrans はドメイン知識を Transformer フレームワークに組み込むための新しい視点を提供します。
生成されたエンベディングを外れ値の検出と視覚化で検査し、kTrans を 3 つのダウンストリームタスク (バイナリコード類似性検出 (BCSD)、関数型回復 (FTR)、間接呼び出し認識 (ICR)) に適用します。
評価結果は、kTrans が高品質のバイナリコード埋め込みを生成でき、ダウンストリームタスクで最先端 (SOTA) アプローチをそれぞれ 5.2%、6.8%、12.6% 上回るパフォーマンスを示していることを示しています。
kTrans は、https://github.com/Learner0x5a/kTrans-release で公開されています。

要約(オリジナル)

Binary Code Embedding (BCE) has important applications in various reverse engineering tasks such as binary code similarity detection, type recovery, control-flow recovery and data-flow analysis. Recent studies have shown that the Transformer model can comprehend the semantics of binary code to support downstream tasks. However, existing models overlooked the prior knowledge of assembly language. In this paper, we propose a novel Transformer-based approach, namely kTrans, to generate knowledge-aware binary code embedding. By feeding explicit knowledge as additional inputs to the Transformer, and fusing implicit knowledge with a novel pre-training task, kTrans provides a new perspective to incorporating domain knowledge into a Transformer framework. We inspect the generated embeddings with outlier detection and visualization, and also apply kTrans to 3 downstream tasks: Binary Code Similarity Detection (BCSD), Function Type Recovery (FTR) and Indirect Call Recognition (ICR). Evaluation results show that kTrans can generate high-quality binary code embeddings, and outperforms state-of-the-art (SOTA) approaches on downstream tasks by 5.2%, 6.8%, and 12.6% respectively. kTrans is publicly available at: https://github.com/Learner0x5a/kTrans-release

arxiv情報

著者	Wenyu Zhu,Hao Wang,Yuchen Zhou,Jiaming Wang,Zihan Sha,Zeyu Gao,Chao Zhang
発行日	2023-08-24 09:07:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

kTrans: Knowledge-Aware Transformer for Binary Code Embedding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー