ChiTransformer:Towards Reliable Stereo from Cues

要約

現在のステレオマッチング技術は、探索空間の制限、オクルード領域、および、膨大なサイズという課題を抱えている。一方、単一画像の奥行き推定はこれらの課題から免れ、抽出された単眼キューで満足のいく結果を得ることができるが、立体的な関係の欠如により、特に非常に動的または乱雑な環境において、単眼予測の信頼性が低くなってしまう。そこで本研究では、視線交差注意（GPCA）層を持つ視覚変換器（ViT）を用いて、自己注意によって集約された幅広いコンテキスト情報を保持しつつ、ビュー間の特徴に敏感なパターン検索を可能にする、視線交差注意に基づく自己教師付き両眼奥行き推定手法を提案する。単眼の手がかりは、その後、検索されたパターンペアとブレンド層によって条件付きで整流される。このクロスオーバーの設計は、人間の視覚系の視神経乳頭構造に生物学的に類似しているため、ChiTransformerと名づけられた。我々の実験によれば、このアーキテクチャは、最新の自己教師付きステレオアプローチを11%上回る大幅な改善をもたらし、直線的な画像と非直線的な画像（例えば魚眼）の両方に用いることができることが分かっている。プロジェクトは https://github.com/ISL-CV/ChiTransformer で公開されています。

要約(オリジナル)

Current stereo matching techniques are challenged by restricted searching space, occluded regions and sheer size. While single image depth estimation is spared from these challenges and can achieve satisfactory results with the extracted monocular cues, the lack of stereoscopic relationship renders the monocular prediction less reliable on its own, especially in highly dynamic or cluttered environments. To address these issues in both scenarios, we present an optic-chiasm-inspired self-supervised binocular depth estimation method, wherein a vision transformer (ViT) with gated positional cross-attention (GPCA) layers is designed to enable feature-sensitive pattern retrieval between views while retaining the extensive context information aggregated through self-attentions. Monocular cues from a single view are thereafter conditionally rectified by a blending layer with the retrieved pattern pairs. This crossover design is biologically analogous to the optic-chasma structure in the human visual system and hence the name, ChiTransformer. Our experiments show that this architecture yields substantial improvements over state-of-the-art self-supervised stereo approaches by 11%, and can be used on both rectilinear and non-rectilinear (e.g., fisheye) images. Project is available at https://github.com/ISL-CV/ChiTransformer.

arxiv情報

著者	Qing Su,Shihao Ji
発行日	2022-08-09 14:04:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

ChiTransformer:Towards Reliable Stereo from Cues

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー