投稿者「jarxiv」のアーカイブ

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

投稿日: 2025年4月10日作成者: jarxiv

要約ビジョン言語モデル（VLMS）の迅速な発展には、厳密で信頼できる評価が必要 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.CY, cs.LG | コメントを受け付けていません

Beyond the Hype: A dispassionate look at vision-language models in medical scenario

投稿日: 2025年4月10日作成者: jarxiv

要約大規模な視覚言語モデル（LVLMS）の最近の進歩は、多様なタスク全体で顕著 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

投稿日: 2025年4月10日作成者: jarxiv

要約ビジョン言語モデル（VLMS）の評価は、主に英語のベンチマークに依存してお … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

LUDO: Low-Latency Understanding of Deformable Objects using Point Cloud Occupancy Functions

投稿日: 2025年4月10日作成者: jarxiv

要約オブジェクトの形状と変形可能なオブジェクト内の内部構造の位置を正確に決定す … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Detecting AI-generated Artwork

投稿日: 2025年4月10日作成者: jarxiv

要約人工知能（AI）によって生成されたアートワークの効率と品質は、人間の芸術家 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

投稿日: 2025年4月10日作成者: jarxiv

要約複雑な環境で生き残り、繁栄するために、人間は環境探査、経験の階層的な抽象化 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

投稿日: 2025年4月10日作成者: jarxiv

要約カメラの軌跡の設計は、ビデオ制作において重要な役割を果たし、監督の意図を伝 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

OmniCaptioner: One Captioner to Rule Them All

投稿日: 2025年4月10日作成者: jarxiv

要約 Omnicaptionerを提案します。これは、さまざまな視覚ドメインにわ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Are We Done with Object-Centric Learning?

投稿日: 2025年4月10日作成者: jarxiv

要約オブジェクト中心の学習（OCL）は、シーン内の他のオブジェクトまたは背景キ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

投稿日: 2025年4月10日作成者: jarxiv

要約汎用性の高いビデオ深度推定モデルは、（1）フレーム間で正確で一貫性があり、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Beyond the Hype: A dispassionate look at vision-language models in medical scenario

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

LUDO: Low-Latency Understanding of Deformable Objects using Point Cloud Occupancy Functions

Detecting AI-generated Artwork

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

OmniCaptioner: One Captioner to Rule Them All

Are We Done with Object-Centric Learning?

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

最近の投稿

最近のコメント

アーカイブ

カテゴリー