R3D-SWIN:Use Shifted Window Attention for Single-View 3D Reconstruction

要約

最近、ビジョントランスフォーマーは、ボクセル 3D 再構築を含むさまざまなコンピュータービジョンタスクで良好なパフォーマンスを発揮しています。
ただし、ビジョントランスフォーマーのウィンドウはマルチスケールではなく、ウィンドウ間に接続がないため、ボクセル 3D 再構成の精度が制限されます。
したがって、シフトされたウィンドウアテンションに基づくボクセル 3D 再構成ネットワークを提案します。
私たちの知る限り、これはボクセル 3D 再構築にシフトウィンドウアテンションを適用した最初の研究です。
ShapeNet の実験結果により、私たちの方法が単一ビュー再構成で SOTA 精度を達成できることが検証されました。

要約(オリジナル)

Recently, vision transformers have performed well in various computer vision tasks, including voxel 3D reconstruction. However, the windows of the vision transformer are not multi-scale, and there is no connection between the windows, which limits the accuracy of voxel 3D reconstruction. Therefore, we propose a voxel 3D reconstruction network based on shifted window attention. To the best of our knowledge, this is the first work to apply shifted window attention to voxel 3D reconstruction. Experimental results on ShapeNet verify our method achieves SOTA accuracy in single-view reconstruction.

arxiv情報

著者	Chenhuan Li,Meihua Xiao,zehuan li,Fangping Chen,Shanshan Qiao,Dingli Wang,Mengxi Gao,Siyi Zhang
発行日	2024-03-06 12:48:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

R3D-SWIN:Use Shifted Window Attention for Single-View 3D Reconstruction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー