MHAFF: Multi-Head Attention Feature Fusion of CNN and Transformer for Cattle Identification

要約

畳み込みニューラルネットワーク (CNN) は、銃口画像を使用した牛の識別に研究者の注目を集めています。
ただし、CNN は銃口の複雑なパターン内の長距離依存関係を捕捉できないことがよくあります。
変圧器はこれらの課題に対処します。
このことから、私たちは銃口ベースの牛の識別において CNN と変圧器の長所を融合することを思いつきました。
加算と連結は、特徴融合に最も一般的に使用される手法です。
ただし、加算では識別情報が保持されず、連結では次元が増加します。
どちらの方法も単純な操作であり、融合機能間の関係や相互作用を発見することはできません。
この研究は、加算と連結が直面する問題を克服することを目的としています。
この研究では、牛の識別において初めて、Multi-Head Attendee Feature Fusion (MHAFF) と呼ばれる新しいアプローチが導入されています。
MHAFF は、オリジナリティを維持しながら、さまざまなタイプの融合フィーチャ間の関係を捉えます。
実験では、公的に利用可能な 2 つの牛データセットにおいて、MHAFF が加算および連結技術や既存の牛識別方法よりも精度が優れていることが示されました。
MHAFF は優れたパフォーマンスを示し、すぐに収束して 2 つの牛データセットで 99.88% と 99.52% の最適精度を同時に達成します。

要約(オリジナル)

Convolutional Neural Networks (CNNs) have drawn researchers’ attention to identifying cattle using muzzle images. However, CNNs often fail to capture long-range dependencies within the complex patterns of the muzzle. The transformers handle these challenges. This inspired us to fuse the strengths of CNNs and transformers in muzzle-based cattle identification. Addition and concatenation have been the most commonly used techniques for feature fusion. However, addition fails to preserve discriminative information, while concatenation results in an increase in dimensionality. Both methods are simple operations and cannot discover the relationships or interactions between fusing features. This research aims to overcome the issues faced by addition and concatenation. This research introduces a novel approach called Multi-Head Attention Feature Fusion (MHAFF) for the first time in cattle identification. MHAFF captures relations between the different types of fusing features while preserving their originality. The experiments show that MHAFF outperformed addition and concatenation techniques and the existing cattle identification methods in accuracy on two publicly available cattle datasets. MHAFF demonstrates excellent performance and quickly converges to achieve optimum accuracy of 99.88% and 99.52% in two cattle datasets simultaneously.

arxiv情報

著者	Rabin Dulal,Lihong Zheng,Muhammad Ashad Kabir
発行日	2025-01-09 13:00:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MHAFF: Multi-Head Attention Feature Fusion of CNN and Transformer for Cattle Identification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー