月別アーカイブ: 2025年3月

REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder

投稿日: 2025年3月12日作成者: jarxiv

要約生成モデリングのためのビデオ埋め込み装置の学習に関する新しい視点を提示しま … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting

投稿日: 2025年3月12日作成者: jarxiv

要約ベクター量子化（VQ）は、特に極端な圧縮シナリオでは、多様なモデル全体で均 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Keypoint Detection and Description for Raw Bayer Images

投稿日: 2025年3月12日作成者: jarxiv

要約キーポイント検出とローカル機能の説明は、ロボット認識の基本的なタスクであり … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Language-Depth Navigated Thermal and Visible Image Fusion

投稿日: 2025年3月12日作成者: jarxiv

要約深さ誘導マルチモーダルフュージョンは、可視および赤外線画像から深さ情報を組 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

投稿日: 2025年3月12日作成者: jarxiv

要約拡散ベースの生成モデルは、オブジェクト指向の画像編集に革命をもたらしました … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GarmentCrafter: Progressive Novel View Synthesis for Single-View 3D Garment Reconstruction and Editing

投稿日: 2025年3月12日作成者: jarxiv

要約 GarmentCrafterを紹介します。これは、非専門的なユーザーがシン … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving

投稿日: 2025年3月12日作成者: jarxiv

要約車両から車両への（V2V）協同的自律運転は、単一エージェントシステムに固有 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MA | コメントを受け付けていません

‘Principal Components’ Enable A New Language of Images

投稿日: 2025年3月12日作成者: jarxiv

要約潜在的なトークン空間に証明可能なPCA様構造を埋め込む新しい視覚トークン化 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models

投稿日: 2025年3月12日作成者: jarxiv

要約統一されたマルチモーダル理解と視覚生成（またはマルチモーダル生成）モデルの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

投稿日: 2025年3月12日作成者: jarxiv

要約長いビデオ理解における最近の進歩は、通常、注意分布に基づいて視覚トークン剪 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年3月

REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder

SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting

Keypoint Detection and Description for Raw Bayer Images

Language-Depth Navigated Thermal and Visible Image Fusion

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

GarmentCrafter: Progressive Novel View Synthesis for Single-View 3D Garment Reconstruction and Editing

CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving

‘Principal Components’ Enable A New Language of Images

OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

最近の投稿

最近のコメント

アーカイブ

カテゴリー