「cs.CL」カテゴリーアーカイブ

The Lessons of Developing Process Reward Models in Mathematical Reasoning

投稿日: 2025年6月6日作成者: jarxiv

要約プロセス報酬モデル（PRM）は、推論プロセスで中間エラーを特定して軽減する … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

投稿日: 2025年6月6日作成者: jarxiv

要約シーケンスモデリングは現在、SoftMaxの自己触媒を使用する因果変圧器ア … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

From Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors

投稿日: 2025年6月6日作成者: jarxiv

要約現在の研究は、脱獄攻撃によって有害なコンテンツを生成する大規模な言語モデル … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CR | コメントを受け付けていません

Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning

投稿日: 2025年6月6日作成者: jarxiv

要約検索された生成（RAG）システムは、一般に知識の対立に苦しみます。質問応 … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

ProRefine: Inference-time Prompt Refinement with Textual Feedback

投稿日: 2025年6月6日作成者: jarxiv

要約複数のAIエージェントが協力して推論や計画などの複雑なタスクを達成するエー … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

投稿日: 2025年6月6日作成者: jarxiv

要約 LLMは、主に同期通信で使用され、人間のユーザーとモデルが交互ターンで通信 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.MA | コメントを受け付けていません

Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models

投稿日: 2025年6月6日作成者: jarxiv

要約現実世界の設定で展開された大規模な言語モデル（LLM）は、繊細、時代遅れ、 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

投稿日: 2025年6月6日作成者: jarxiv

要約強化学習（RL）は、特に推論能力を高めるために、大規模な言語モデル（LLM … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

投稿日: 2025年6月6日作成者: jarxiv

要約推論のための大規模な強化学習（RL）の最近の進歩にもかかわらず、高性能の推 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

投稿日: 2025年6月6日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、視覚データとテキストデータの統 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

「cs.CL」カテゴリーアーカイブ

The Lessons of Developing Process Reward Models in Mathematical Reasoning

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

From Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors

Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning

ProRefine: Inference-time Prompt Refinement with Textual Feedback

Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー