「cs.CV」カテゴリーアーカイブ

Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding

投稿日: 2025年6月6日作成者: jarxiv

要約具体化された3D接地は、自我中心の視点から人間の指示に記載されているターゲ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

投稿日: 2025年6月6日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、視覚データとテキストデータの統 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View

投稿日: 2025年6月6日作成者: jarxiv

要約スパースビューからセマンティックアウェア3Dシーンを再構築することは、仮想 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

投稿日: 2025年6月6日作成者: jarxiv

要約最近、ビデオ拡散トランスのブレークスルーは、多様な運動世代に顕著な能力を示 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Towards Vision-Language-Garment Models For Web Knowledge Garment Understanding and Generation

投稿日: 2025年6月6日作成者: jarxiv

要約マルチモーダルファンデーションモデルは強力な一般化を実証していますが、衣服 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DSG-World: Learning a 3D Gaussian World Model from Dual State Videos

投稿日: 2025年6月6日作成者: jarxiv

要約限られた観察から効率的で身体的に一貫した世界モデルを構築することは、ビジョ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

投稿日: 2025年6月6日作成者: jarxiv

要約構造認識関連（SRR）トリプレットパラダイムを活用することにより、最新のア … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SAM-aware Test-time Adaptation for Universal Medical Image Segmentation

投稿日: 2025年6月6日作成者: jarxiv

要約セグメントを使用したユニバーサル医療画像セグメンテーションAnything … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MAC-Gaze: Motion-Aware Continual Calibration for Mobile Gaze Tracking

投稿日: 2025年6月6日作成者: jarxiv

要約モバイルの視線追跡は基本的な課題に直面しています。ユーザーが自然に姿勢やデ … 続きを読む →

カテゴリー: 68T10, 68U35, C.2.4, cs.CV, cs.HC | コメントを受け付けていません

Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes

投稿日: 2025年6月6日作成者: jarxiv

要約ポアソン表面再構築は、配向点クラウドから表面を再構築するための広く使用され … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG, stat.ML | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View

Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

Towards Vision-Language-Garment Models For Web Knowledge Garment Understanding and Generation

DSG-World: Learning a 3D Gaussian World Model from Dual State Videos

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

SAM-aware Test-time Adaptation for Universal Medical Image Segmentation

MAC-Gaze: Motion-Aware Continual Calibration for Mobile Gaze Tracking

Stochastic Poisson Surface Reconstruction with One Solve using Geometric Gaussian Processes

最近の投稿

最近のコメント

アーカイブ

カテゴリー