「cs.AI」カテゴリーアーカイブ

Large Language Models Empowered Personalized Web Agents

投稿日: 2025年3月25日作成者: jarxiv

要約 Webエージェントは、ユーザーの命令に基づいてWebタスクの完了を自動化す … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.IR | コメントを受け付けていません

Lightweight Embedded FPGA Deployment of Learned Image Compression with Knowledge Distillation and Hybrid Quantization

投稿日: 2025年3月25日作成者: jarxiv

要約学習可能な画像圧縮（LIC）は、RD効率で標準化されたビデオコーデックを上 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

投稿日: 2025年3月25日作成者: jarxiv

要約分散除外検出に関する以前の研究（OODD）は、主に単一モダリティモデルに焦 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Dual-domain Multi-path Self-supervised Diffusion Model for Accelerated MRI Reconstruction

投稿日: 2025年3月25日作成者: jarxiv

要約磁気共鳴イメージング（MRI）は重要な診断ツールですが、本質的に長い獲得時 … 続きを読む →

カテゴリー: cs.AI, cs.CV, eess.IV | コメントを受け付けていません

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

投稿日: 2025年3月25日作成者: jarxiv

要約現在のビジョン言語モデル（VLM）は、視覚的な質問応答など、さまざまなタス … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

STEVE: A Step Verification Pipeline for Computer-use Agent Training

投稿日: 2025年3月25日作成者: jarxiv

要約グラフィカルユーザーインターフェイスを自律的に操作するためにAIエージェン … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Visual Position Prompt for MLLM based Visual Grounding

投稿日: 2025年3月25日作成者: jarxiv

要約マルチモーダルの大手言語モデル（MLLM）は、さまざまな画像関連のタスクに … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Exploring the Integration of Key-Value Attention Into Pure and Hybrid Transformers for Semantic Segmentation

投稿日: 2025年3月25日作成者: jarxiv

要約 CNNは長い間画像処理の最先端と見なされていましたが、トランスアーキテクチ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

AdaWorld: Learning Adaptable World Models with Latent Actions

投稿日: 2025年3月25日作成者: jarxiv

要約世界モデルは、アクション制御された予測モデルを学ぶことを目指しており、イン … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Video-T1: Test-Time Scaling for Video Generation

投稿日: 2025年3月25日作成者: jarxiv

要約トレーニングデータ、モデルサイズ、および計算コストの増加のスケール機能によ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

Large Language Models Empowered Personalized Web Agents

Lightweight Embedded FPGA Deployment of Learned Image Compression with Knowledge Distillation and Hybrid Quantization

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

Dual-domain Multi-path Self-supervised Diffusion Model for Accelerated MRI Reconstruction

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

STEVE: A Step Verification Pipeline for Computer-use Agent Training

Visual Position Prompt for MLLM based Visual Grounding

Exploring the Integration of Key-Value Attention Into Pure and Hybrid Transformers for Semantic Segmentation

AdaWorld: Learning Adaptable World Models with Latent Actions

Video-T1: Test-Time Scaling for Video Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー