月別アーカイブ: 2025年2月

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

投稿日: 2025年2月4日作成者: jarxiv

要約大規模言語モデル(LLM)の進歩は、外部ツールを呼び出すためのコントローラ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Reflective Gaussian Splatting

投稿日: 2025年2月4日作成者: jarxiv

要約 NeRFや3DGSに基づく手法の性能向上により、新しいビュー合成は大きく進 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers

投稿日: 2025年2月4日作成者: jarxiv

要約ディープモデルを理解することは、セーフティクリティカルなアプリケーションに … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications

投稿日: 2025年2月4日作成者: jarxiv

要約本テクニカルレポートでは、Prithvi-EO-2.0を紹介します。Pri … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

投稿日: 2025年2月4日作成者: jarxiv

要約デジタルエージェントは、ウェブページ、ソフトウェアアプリケーション、オペレ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models

投稿日: 2025年2月4日作成者: jarxiv

要約分布外（OOD）検出は、推論中にOODサンプルを検出し、展開されたモデルの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ViewpointDepth: A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts

投稿日: 2025年2月4日作成者: jarxiv

要約単眼での奥行き推定は、自律走行や他の多くのコンピュータビジョンアプリケーシ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

The Master Key Filters Hypothesis: Deep Filters Are General

投稿日: 2025年2月4日作成者: jarxiv

要約本論文では、畳み込みニューラルネットワーク（CNN）フィルターは、より深い … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions

投稿日: 2025年2月4日作成者: jarxiv

要約この研究では、音声とテキストを大規模言語モデル（LLM）への入力として統合 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

What is causal about causal models and representations?

投稿日: 2025年2月4日作成者: jarxiv

要約因果ベイズネットワークは、介入分布に関する予測を行うので、「因果」モデルで … 続きを読む →

カテゴリー: cs.AI, cs.LG, math.ST, stat.ML, stat.TH | コメントを受け付けていません

月別アーカイブ: 2025年2月

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Reflective Gaussian Splatting

GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers

Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models

ViewpointDepth: A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts

The Master Key Filters Hypothesis: Deep Filters Are General

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions

What is causal about causal models and representations?

最近の投稿

最近のコメント

アーカイブ

カテゴリー