投稿者「jarxiv」のアーカイブ

High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

投稿日: 2025年6月5日作成者: jarxiv

要約現在、大規模な言語モデル（LLMS）は、すべてのプロンプトに応答しています … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results

投稿日: 2025年6月5日作成者: jarxiv

要約言語モデル（LMS）の不確実性の定量化（UQ）は、安全性と信頼性を改善する … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

AI and the Dynamic Supply of Training Data

投稿日: 2025年6月5日作成者: jarxiv

要約人工知能（AI）システムは、人間で生成されたデータに大きく依存していますが … 続きを読む →

カテゴリー: cs.AI, cs.CY, cs.LG, econ.GN, q-fin.EC | コメントを受け付けていません

REAL: Response Embedding-based Alignment for LLMs

投稿日: 2025年6月5日作成者: jarxiv

要約大規模な言語モデル（LLM）を人間の好みに合わせることは、通常、監視された … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

投稿日: 2025年6月5日作成者: jarxiv

要約医学の大規模な言語モデル（LLM）を評価することは重要です。なぜなら、医療 … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

EuroLLM-9B: Technical Report

投稿日: 2025年6月5日作成者: jarxiv

要約このレポートは、24の公式欧州連合言語すべてと11の追加言語をカバーするこ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

投稿日: 2025年6月5日作成者: jarxiv

要約具体化されたエージェントの一部として、ユーザーからの自然言語の指示を考慮し … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.RO | コメントを受け付けていません

Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace

投稿日: 2025年6月5日作成者: jarxiv

要約人間は複数の感覚を通して世界を知覚し、周囲の包括的な表現を作成し、ドメイン … 続きを読む →

カテゴリー: cs.AI | コメントを受け付けていません

Optimizing Sensory Neurons: Nonlinear Attention Mechanisms for Accelerated Convergence in Permutation-Invariant Neural Networks for Reinforcement Learning

投稿日: 2025年6月5日作成者: jarxiv

要約トレーニング強化学習（RL）エージェントには、多くの場合、重要な計算リソー … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

TextAtari: 100K Frames Game Playing with Language Agents

投稿日: 2025年6月5日作成者: jarxiv

要約 TextAtariは、最大100,000のステップにまたがる非常に長期の意 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results

AI and the Dynamic Supply of Training Data

REAL: Response Embedding-based Alignment for LLMs

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

EuroLLM-9B: Technical Report

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace

Optimizing Sensory Neurons: Nonlinear Attention Mechanisms for Accelerated Convergence in Permutation-Invariant Neural Networks for Reinforcement Learning

TextAtari: 100K Frames Game Playing with Language Agents

最近の投稿

最近のコメント

アーカイブ

カテゴリー