「cs.CV」カテゴリーアーカイブ

Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models

投稿日: 2025年5月8日作成者: jarxiv

要約テキストからイメージ（T2I）モデルは、インパクトのある現実のアプリケーシ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Uncertainty for SVBRDF Acquisition using Frequency Analysis

投稿日: 2025年5月8日作成者: jarxiv

要約このペーパーは、マルチビューキャプチャのSVBRDF取得の不確実性を定量化 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

投稿日: 2025年5月8日作成者: jarxiv

要約 2021年初頭にリリースされたOpenAIのクリップは、マルチモーダルファ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FastMap: Revisiting Dense and Scalable Structure from Motion

投稿日: 2025年5月8日作成者: jarxiv

要約速度とシンプルさに焦点を当てたモーションメソッドからの新しいグローバル構造 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait

投稿日: 2025年5月8日作成者: jarxiv

要約制約のない環境における全身の人認識の問題に対処します。この問題は、高度お … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

投稿日: 2025年5月8日作成者: jarxiv

要約ビジョンは、特に視覚サーボを使用して、操作での使用でよく知られています。 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

On Path to Multimodal Generalist: General-Level and General-Bench

投稿日: 2025年5月8日作成者: jarxiv

要約 Multimodal Large Languageモデル（MLLM）は、L … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

投稿日: 2025年5月8日作成者: jarxiv

要約複雑な3D形状を単純な幾何学的要素に分解し、人間の視覚認知において重要な役 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning

投稿日: 2025年5月8日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、テキスト、ビジョン、オーディオ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Vision-Language Models Create Cross-Modal Task Representations

投稿日: 2025年5月8日作成者: jarxiv

要約自己回帰ビジョン言語モデル（VLM）は、単一のモデル内で多くのタスクを処理 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models

Uncertainty for SVBRDF Acquisition using Frequency Analysis

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

FastMap: Revisiting Dense and Scalable Structure from Motion

Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait

Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

On Path to Multimodal Generalist: General-Level and General-Bench

PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning

Vision-Language Models Create Cross-Modal Task Representations

最近の投稿

最近のコメント

アーカイブ

カテゴリー