「cs.AI」カテゴリーアーカイブ

HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation

投稿日: 2025年6月4日作成者: jarxiv

要約言語モデルの進歩に伴い、統一されたマルチモーダル理解と生成は、モデルアーキ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Deep Learning for Retinal Degeneration Assessment: A Comprehensive Analysis of the MARIO AMD Progression Challenge

投稿日: 2025年6月4日作成者: jarxiv

要約 MICCAI 2024で開催されたMARIOチャレンジは、光干渉断層計（O … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Smartflow: Enabling Scalable Spatiotemporal Geospatial Research

投稿日: 2025年6月4日作成者: jarxiv

要約 BlackSkyは、オープンソースのツールやテクノロジーをベースに構築され … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers

投稿日: 2025年6月4日作成者: jarxiv

要約拡散変換(DiT)はビデオ生成において画期的な進歩を遂げたが、この長いシー … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

SASP: Strip-Aware Spatial Perception for Fine-Grained Bird Image Classification

投稿日: 2025年6月4日作成者: jarxiv

要約きめ細かな鳥類画像分類（FBIC）は、生態学的モニタリングや種の同定に大き … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification

投稿日: 2025年6月4日作成者: jarxiv

要約近年、畳み込みニューラルネットワーク（CNN）の性能が大幅に向上している。 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation

投稿日: 2025年6月4日作成者: jarxiv

要約マルチモーダル大規模言語モデル(MLLM)の最新の進歩により、自律走行のた … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

DPO Learning with LLMs-Judge Signal for Computer Use Agents

投稿日: 2025年6月4日作成者: jarxiv

要約コンピュータ・ユース・エージェント（CUA）は、グラフィカル・ユーザー・イ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs

投稿日: 2025年6月4日作成者: jarxiv

要約マルチモーダル大規模言語モデル（MLLM）は、テキストと画像の両方を介した … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM | コメントを受け付けていません

EgoVLM: Policy Optimization for Egocentric Video Understanding

投稿日: 2025年6月4日作成者: jarxiv

要約ウェアラブルカメラや自律型エージェントなど、新たな具現化AIアプリケーショ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation

Deep Learning for Retinal Degeneration Assessment: A Comprehensive Analysis of the MARIO AMD Progression Challenge

Smartflow: Enabling Scalable Spatiotemporal Geospatial Research

Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers

SASP: Strip-Aware Spatial Perception for Fine-Grained Bird Image Classification

Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation

DPO Learning with LLMs-Judge Signal for Computer Use Agents

Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs

EgoVLM: Policy Optimization for Egocentric Video Understanding

最近の投稿

最近のコメント

アーカイブ

カテゴリー