PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition

要約

ポイントクラウドビデオ認識は、3Dビジョンの領域にとって不可欠なタスクとなっています。
現在、現在の4D表現学習技術は、通常、高密度のクエリ操作と組み合わせた反復処理に関与します。
時間的特徴のキャプチャには効果的ですが、このアプローチは実質的な計算冗長性につながります。
この作業では、パーソナライズされたワンショットクエリ操作を介して、効果的でありながら効率的なポイントクラウドビデオ認識のために、PVNextと名付けられたフレームワークを提案します。
特に、PVNextは、モーション模倣器とシングルステップモーションエンコーダーの2つの重要なモジュールで構成されています。
前のモジュールであるMotion Imitatorは、ポイント雲のシーケンスに固有の時間的ダイナミクスをキャプチャするように設計されているため、各フレームに対応する仮想運動を生成します。
シングルステップモーションエンコーダは、各フレームのポイントクラウドを対応する仮想モーションフレームに関連付けるワンステップクエリ操作を実行し、それにより、ポイントクラウドシーケンスからモーションキューを抽出し、シーケンス全体で時間的ダイナミクスをキャプチャします。
これら2つのモジュールを統合することで、{pvnext}が各フレームのパーソナライズされたワンショットクエリを可能にし、フレーム固有のループと集中的なクエリプロセスの必要性を効果的に排除します。
複数のベンチマークでの広範な実験は、私たちの方法の有効性を示しています。

要約(オリジナル)

Point cloud video perception has become an essential task for the realm of 3D vision. Current 4D representation learning techniques typically engage in iterative processing coupled with dense query operations. Although effective in capturing temporal features, this approach leads to substantial computational redundancy. In this work, we propose a framework, named as PvNeXt, for effective yet efficient point cloud video recognition, via personalized one-shot query operation. Specially, PvNeXt consists of two key modules, the Motion Imitator and the Single-Step Motion Encoder. The former module, the Motion Imitator, is designed to capture the temporal dynamics inherent in sequences of point clouds, thus generating the virtual motion corresponding to each frame. The Single-Step Motion Encoder performs a one-step query operation, associating point cloud of each frame with its corresponding virtual motion frame, thereby extracting motion cues from point cloud sequences and capturing temporal dynamics across the entire sequence. Through the integration of these two modules, {PvNeXt} enables personalized one-shot queries for each frame, effectively eliminating the need for frame-specific looping and intensive query processes. Extensive experiments on multiple benchmarks demonstrate the effectiveness of our method.

arxiv情報

著者	Jie Wang,Tingfa Xu,Lihe Ding,Xinjie Zhang,Long Bai,Jianan Li
発行日	2025-04-07 13:43:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー