Geometry-biased Transformers for Novel View Synthesis

要約

いくつかの入力画像と関連するカメラの視点が与えられた場合、オブジェクトの新しいビューを合成するタスクに取り組みます。
私たちの仕事は、マルチビュー画像が (グローバル) セット潜在表現としてエンコードされ、任意のクエリ光線の色を予測するために使用される、最近の「ジオメトリフリー」アプローチに触発されています。
この表現は、斬新な視点に対応する (大まかに) 正確な画像を生成しますが、幾何学的推論の欠如により、これらの出力の品質が制限されます。
この制限を克服するために、集合潜在表現ベースの推論に幾何学的誘導バイアスを組み込んで、マルチビューの幾何学的一貫性を促進する「幾何学的バイアス変換器」(GBT) を提案します。
ドット積アテンションメカニズムを拡張して、トークンに関連付けられたレイ間の3D距離も学習可能なバイアスとして組み込むことにより、幾何学的バイアスを誘導します。
これは、入力としてのカメラ認識埋め込みとともに、モデルがはるかに正確な出力を生成できることを発見しました。
現実世界の CO3D データセットでアプローチを検証し、10 のカテゴリにわたってシステムをトレーニングし、新しいオブジェクトと目に見えないカテゴリのビュー合成能力を評価します。
提案された幾何学的バイアスの利点を経験的に検証し、私たちのアプローチが以前の研究よりも大幅に改善されることを示します。

要約(オリジナル)

We tackle the task of synthesizing novel views of an object given a few input images and associated camera viewpoints. Our work is inspired by recent ‘geometry-free’ approaches where multi-view images are encoded as a (global) set-latent representation, which is then used to predict the color for arbitrary query rays. While this representation yields (coarsely) accurate images corresponding to novel viewpoints, the lack of geometric reasoning limits the quality of these outputs. To overcome this limitation, we propose ‘Geometry-biased Transformers’ (GBTs) that incorporate geometric inductive biases in the set-latent representation-based inference to encourage multi-view geometric consistency. We induce the geometric bias by augmenting the dot-product attention mechanism to also incorporate 3D distances between rays associated with tokens as a learnable bias. We find that this, along with camera-aware embeddings as input, allows our models to generate significantly more accurate outputs. We validate our approach on the real-world CO3D dataset, where we train our system over 10 categories and evaluate its view-synthesis ability for novel objects as well as unseen categories. We empirically validate the benefits of the proposed geometric biases and show that our approach significantly improves over prior works.

arxiv情報

著者	Naveen Venkat,Mayank Agarwal,Maneesh Singh,Shubham Tulsiani
発行日	2023-01-11 18:59:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Geometry-biased Transformers for Novel View Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー