投稿者「jarxiv」のアーカイブ

Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning

投稿日: 2025年6月13日作成者: jarxiv

要約オートフォーマル化は、自然言語声明の正式な言語への自動翻訳を可能にすること … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Magistral

投稿日: 2025年6月13日作成者: jarxiv

要約 Mistralの最初の推論モデル、および独自のスケーラブルな強化学習（RL … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Efficiently Identifying Watermarked Segments in Mixed-Source Texts

投稿日: 2025年6月13日作成者: jarxiv

要約大規模な言語モデル（LLM）のテキスト透かしは、合成テキストを検出するため … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Weak-to-Strong Jailbreaking on Large Language Models

投稿日: 2025年6月13日作成者: jarxiv

要約大規模な言語モデル（LLM）は、脱獄攻撃に対して脆弱であり、有害、非倫理的 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

投稿日: 2025年6月13日作成者: jarxiv

要約機械的解釈可能性の中心的な目標は、その出力を因果的に説明する大規模な言語モ … 続きを読む →

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

Improving LLM Safety Alignment with Dual-Objective Optimization

投稿日: 2025年6月13日作成者: jarxiv

要約大規模な言語モデル（LLM）の既存のトレーニング時間安全アライメント手法は … 続きを読む →

カテゴリー: cs.CL, cs.CR, cs.LG | コメントを受け付けていません

Dynamic Epistemic Friction in Dialogue

投稿日: 2025年6月13日作成者: jarxiv

要約大規模な言語モデル（LLM）を人間の好みに合わせて最近の開発により、人間と … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Build the web for agents, not agents for the web

投稿日: 2025年6月13日作成者: jarxiv

要約大規模な言語モデル（LLMS）とマルチモーダルのカウンターパートの最近の進 … 続きを読む →

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?

投稿日: 2025年6月13日作成者: jarxiv

要約最近の推論モデルは、彼らの推論を反映し、バックトラックし、自己検証する能力 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves

投稿日: 2025年6月13日作成者: jarxiv

要約バイナリ分類問題における受信機動作特性（ROC）および精密リコール（PR） … 続きを読む →

カテゴリー: cs.AI, cs.LG, math.ST, stat.ML, stat.TH | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning

Magistral

Efficiently Identifying Watermarked Segments in Mixed-Source Texts

Weak-to-Strong Jailbreaking on Large Language Models

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Improving LLM Safety Alignment with Dual-Objective Optimization

Dynamic Epistemic Friction in Dialogue

Build the web for agents, not agents for the web

How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?

On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves

最近の投稿

最近のコメント

アーカイブ

カテゴリー