REOrdering Patches Improves Vision Models

要約

トランスなどのシーケンスモデルでは、入力を1次元シーケンスとして表す必要があります。
ビジョンでは、これには通常、固定のrow-major（ラスタースキャン）順序を使用して画像の平坦化が含まれます。
完全な自己関節は順位順にequivariantですが、現代の長シーケンス変圧器は、この不変性を破り、パッチ順序に感度を導入する建築上の近似にますます依存しています。
パッチの順序は、このような設定でのモデルのパフォーマンスに大きく影響することを示しています。列 – 主要な曲線などの単純な代替品が顕著な精度シフトをもたらします。
これにより動機付けられていると、タスク最適なパッチの順序を発見するための2段階のフレームワークであるReorderを提案します。
まず、さまざまなパッチシーケンスの圧縮率を評価することにより、情報理論の事前に導き出します。
次に、Renforceを使用してPlackett-Luceポリシーを最適化することにより、順列をめぐるポリシーを学びます。
このアプローチにより、組み合わせの順列空間で効率的な学習が可能になります。
Reorderは、Imagenet-1KのRow-Major注文よりもトップ1の精度を最大3.01％、世界の機能マップを13.35％改善します。

要約(オリジナル)

Sequence models such as transformers require inputs to be represented as one-dimensional sequences. In vision, this typically involves flattening images using a fixed row-major (raster-scan) order. While full self-attention is permutation-equivariant, modern long-sequence transformers increasingly rely on architectural approximations that break this invariance and introduce sensitivity to patch ordering. We show that patch order significantly affects model performance in such settings, with simple alternatives like column-major or Hilbert curves yielding notable accuracy shifts. Motivated by this, we propose REOrder, a two-stage framework for discovering task-optimal patch orderings. First, we derive an information-theoretic prior by evaluating the compressibility of various patch sequences. Then, we learn a policy over permutations by optimizing a Plackett-Luce policy using REINFORCE. This approach enables efficient learning in a combinatorial permutation space. REOrder improves top-1 accuracy over row-major ordering on ImageNet-1K by up to 3.01% and Functional Map of the World by 13.35%.

arxiv情報

著者	Declan Kutscher,David M. Chan,Yutong Bai,Trevor Darrell,Ritwik Gupta
発行日	2025-05-29 17:59:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

REOrdering Patches Improves Vision Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー