「cs.AI」カテゴリーアーカイブ

More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

投稿日: 2025年5月23日作成者: jarxiv

要約大規模な言語モデル（LLM）が3Dの物理的世界を理解できるようにすることは … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

投稿日: 2025年5月23日作成者: jarxiv

要約強化学習（RL）は、ビジョン言語モデル（VLM）の推論を強化するための効果 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

T2I-ConBench: Text-to-Image Benchmark for Continual Post-training

投稿日: 2025年5月23日作成者: jarxiv

要約継続的なトレーニング後のテキストから画像間拡散モデルを適応させて、個別のモ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?

投稿日: 2025年5月23日作成者: jarxiv

要約大規模なファンデーションモデルは、特に剛性テンプレートまたは群衆発表の命令 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?

投稿日: 2025年5月23日作成者: jarxiv

要約最近のテキストからイメージ（T2I）モデルは、簡単な説明から画像を合成する … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

NovelSeek: When Agent Becomes the Scientist — Building Closed-Loop System from Hypothesis to Verification

投稿日: 2025年5月23日作成者: jarxiv

要約人工知能（AI）は、科学研究のパラダイムの変換を加速し、研究効率を高めるだ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation

投稿日: 2025年5月23日作成者: jarxiv

要約分散除外（OOD）検出とセグメンテーションは、自律運転やロボット支援手術な … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

投稿日: 2025年5月23日作成者: jarxiv

要約私たちは、最近マルチモーダルコミュニティに関心を集めている顔と声の間の学習 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

投稿日: 2025年5月23日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、問題を解決するタスクで印象的な … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Interactive Post-Training for Vision-Language-Action Models

投稿日: 2025年5月23日作成者: jarxiv

要約リップVLAを紹介します。これは、スパースバイナリの成功報酬のみを使用して … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

T2I-ConBench: Text-to-Image Benchmark for Continual Post-training

MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?

DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?

NovelSeek: When Agent Becomes the Scientist — Building Closed-Loop System from Hypothesis to Verification

Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation

PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Interactive Post-Training for Vision-Language-Action Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー