Decoupling Semantic Similarity from Spatial Alignment for Neural Networks

要約

ディープニューラルネットワークはどのような表現を学習しますか?
ニューラルネットワークの画像は互いにどの程度似ていますか?
ディープラーニング手法は圧倒的な成功を収めているにもかかわらず、その内部の高次元性と複雑さのため、その内部動作に関する重要な疑問は依然としてほとんど答えられていません。
これに対処するための 1 つのアプローチは、さまざまな入力に対する活性化応答の類似性を測定することです。
表現類似性行列 (RSM) は、この類似性を各入力ペアのスカラー値に抽出します。
これらの行列は、システムの類似構造全体をカプセル化し、どの入力が類似の応答につながるかを示します。
画像間の類似性は曖昧ですが、意味論的オブジェクトの空間的位置は人間の知覚や深層学習の分類器には影響を及ぼさないと主張します。
したがって、これは、コンピュータビジョンシステムの画像応答間の類似性の定義に反映される必要があります。
RSM の確立された類似度計算を再検討して、空間アライメントに対する RSM の感度を明らかにします。
この論文では、空間的置換に対して不変であるセマンティック RSM を通じてこれを解決することを提案します。
入力応答間の意味的類似性をセットマッチング問題として定式化することで測定します。
さらに、画像検索を通じて、表現間の類似性と予測クラス確率間の類似性を比較することにより、空間意味論的 RSM に対する意味論的 RSM の優位性を定量化します。

要約(オリジナル)

What representation do deep neural networks learn? How similar are images to each other for neural networks? Despite the overwhelming success of deep learning methods key questions about their internal workings still remain largely unanswered, due to their internal high dimensionality and complexity. To address this, one approach is to measure the similarity of activation responses to various inputs. Representational Similarity Matrices (RSMs) distill this similarity into scalar values for each input pair. These matrices encapsulate the entire similarity structure of a system, indicating which input leads to similar responses. While the similarity between images is ambiguous, we argue that the spatial location of semantic objects does neither influence human perception nor deep learning classifiers. Thus this should be reflected in the definition of similarity between image responses for computer vision systems. Revisiting the established similarity calculations for RSMs we expose their sensitivity to spatial alignment. In this paper, we propose to solve this through semantic RSMs, which are invariant to spatial permutation. We measure semantic similarity between input responses by formulating it as a set-matching problem. Further, we quantify the superiority of semantic RSMs over spatio-semantic RSMs through image retrieval and by comparing the similarity between representations to the similarity between predicted class probabilities.

arxiv情報

著者	Tassilo Wald,Constantin Ulrich,Gregor Köhler,David Zimmerer,Stefan Denner,Michael Baumgartner,Fabian Isensee,Priyank Jaini,Klaus H. Maier-Hein
発行日	2024-10-30 15:17:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Decoupling Semantic Similarity from Spatial Alignment for Neural Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー