「cs.CV」カテゴリーアーカイブ

Estimating Body and Hand Motion in an Ego-sensed World

投稿日: 2024年12月18日作成者: jarxiv

要約ヘッドマウントデバイスから人間の動作を推定するシステム EgoAllo を … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

BanglishRev: A Large-Scale Bangla-English and Code-mixed Dataset of Product Reviews in E-Commerce

投稿日: 2024年12月18日作成者: jarxiv

要約この研究では、BanglishRev データセットを紹介します。これは、英 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

投稿日: 2024年12月18日作成者: jarxiv

要約実際の動的表情認識 (DFER) は、感情に関連した表情を認識する際に大き … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Locate n’ Rotate: Two-stage Openable Part Detection with Foundation Model Priors

投稿日: 2024年12月18日作成者: jarxiv

要約多関節オブジェクトの開閉可能な部分を検出することは、引き出しを引き出すなど … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ORFormer: Occlusion-Robust Transformer for Accurate Facial Landmark Detection

投稿日: 2024年12月18日作成者: jarxiv

要約顔ランドマーク検出 (FLD) は大幅な進歩を遂げていますが、既存の FL … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment

投稿日: 2024年12月18日作成者: jarxiv

要約単眼内視鏡ビデオからの同時位置特定とマッピング (SLAM) により、自律 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

投稿日: 2024年12月18日作成者: jarxiv

要約視覚言語モデルの高速化に関する最近の研究では、視覚情報が高度に圧縮されてい … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures

投稿日: 2024年12月18日作成者: jarxiv

要約スパースビュー RGB 入力からのリアルタイムフリービューヒューマ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Move-in-2D: 2D-Conditioned Human Motion Generation

投稿日: 2024年12月18日作成者: jarxiv

要約リアルな人間のビデオを生成することは依然として困難な作業であり、現在最も効 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction

投稿日: 2024年12月18日作成者: jarxiv

要約自然言語の形式で高レベルの口語的なタスク仕様が与えられたシーンで、人間の手 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Estimating Body and Hand Motion in an Ego-sensed World

BanglishRev: A Large-Scale Bangla-English and Code-mixed Dataset of Product Reviews in E-Commerce

Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

Locate n’ Rotate: Two-stage Openable Part Detection with Foundation Model Priors

ORFormer: Occlusion-Robust Transformer for Accurate Facial Landmark Detection

NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures

Move-in-2D: 2D-Conditioned Human Motion Generation

HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction

最近の投稿

最近のコメント

アーカイブ

カテゴリー