月別アーカイブ: 2025年5月

Remote Sensing Spatio-Temporal Vision-Language Models: A Comprehensive Survey

投稿日: 2025年5月23日作成者: jarxiv

要約多時代のリモートセンシング画像の解釈は、バイナリまたはセマンティックマスク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

RealEngine: Simulating Autonomous Driving in Realistic Context

投稿日: 2025年5月23日作成者: jarxiv

要約運転シミュレーションは、制御された評価環境を提供することにより、信頼できる … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?

投稿日: 2025年5月23日作成者: jarxiv

要約最近のテキストからイメージ（T2I）モデルは、簡単な説明から画像を合成する … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Backdoor Cleaning without External Guidance in MLLM Fine-tuning

投稿日: 2025年5月23日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLMS）は、ユーザーがサビされたデータセ … 続きを読む →

カテゴリー: cs.CR, cs.CV | コメントを受け付けていません

L2RDaS: Synthesizing 4D Radar Tensors for Model Generalization via Dataset Expansion

投稿日: 2025年5月23日作成者: jarxiv

要約 4次元（4D）レーダーは、有害な気象条件下での堅牢性により、知覚タスクの自 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

投稿日: 2025年5月23日作成者: jarxiv

要約この作業では、現在のマルチモーダルアプローチで支配的な自己網性パラダイムか … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

NovelSeek: When Agent Becomes the Scientist — Building Closed-Loop System from Hypothesis to Verification

投稿日: 2025年5月23日作成者: jarxiv

要約人工知能（AI）は、科学研究のパラダイムの変換を加速し、研究効率を高めるだ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Efficient Correlation Volume Sampling for Ultra-High-Resolution Optical Flow Estimation

投稿日: 2025年5月23日作成者: jarxiv

要約最近の光フロー推定方法は、しばしば密な全ペア相関ボリュームからのローカルコ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Motion by Queries: Identity-Motion Trade-offs in Text-to-Video Generation

投稿日: 2025年5月23日作成者: jarxiv

要約テキスト間拡散モデルは、テキストの説明からコヒーレントなビデオクリップを生 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning

投稿日: 2025年5月23日作成者: jarxiv

要約既存の医療用VQAベンチマークは、主に単一イメージ分析に焦点を当てています … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年5月

Remote Sensing Spatio-Temporal Vision-Language Models: A Comprehensive Survey

RealEngine: Simulating Autonomous Driving in Realistic Context

DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?

Backdoor Cleaning without External Guidance in MLLM Fine-tuning

L2RDaS: Synthesizing 4D Radar Tensors for Model Generalization via Dataset Expansion

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

NovelSeek: When Agent Becomes the Scientist — Building Closed-Loop System from Hypothesis to Verification

Efficient Correlation Volume Sampling for Ultra-High-Resolution Optical Flow Estimation

Motion by Queries: Identity-Motion Trade-offs in Text-to-Video Generation

MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning

最近の投稿

最近のコメント

アーカイブ

カテゴリー