A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

要約

【タイトル】
ビットシフトとnグラム埋め込みを使用したファイル断片分類のためのCNN：1バイトのシークエンスは画像以上の価値がある

【要約】
・ファイル断片分類(FFC)は、メモリフォレンジックとインターネットセキュリティにおいて重要である。しかし、既存の方法は、ファイル断片を1次元のバイト信号として扱い、キャプチャされたバイト間の特徴を使用して分類するものがほとんどであり、バイト内の情報、すなわちバイト内情報は滅多に考慮されない。これは、シンボルが可変長ビットで表される可変長符号化ファイルを分類するのに不適切である。そのため、著者らはByte2Imageという新しいデータ補完技術を提案し、これにより、忽略されているバイト内の情報をファイル断片に導入し、これらを2次元のグレースケール画像として再処理することができる。これにより、強力な畳み込みニューラルネットワーク(CNN)を使用して、バイト間の相関関係とバイト内の相関関係を同時に捕捉することができる。具体的には、ファイル断片を2D画像に変換するために、スライディングバイトウィンドウを使用して、無視されているバイト内情報を露出させ、列ごとにn-gram特徴を積み重ねることで変換を実現する。さらに、バイトシークエンスと画像を一体化したFFC用のバイトシークエンス＆画像融合ネットワークを提案する。FFT-75データセット上の実験により、提案手法は、ほぼすべてのシナリオで既存の方法よりも著しい精度改善を達成できることが確認されました。コードはhttps://github.com/wenyang001/Byte2Imageで公開されます。

要約(オリジナル)

File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence \& image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. The code will be released at https://github.com/wenyang001/Byte2Image.

arxiv情報

著者	Wenyang Liu,Yi Wang,Kejun Wu,Kim-Hui Yap,Lap-Pui Chau
発行日	2023-04-14 08:06:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー