月別アーカイブ: 2025年2月

MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces

投稿日: 2025年2月12日作成者: jarxiv

要約オープンエンドの学習エージェントは、学習進捗（LP）を最大化するものに焦点 … 続きを読む →

カテゴリー: cs.AI | コメントを受け付けていません

pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning

投稿日: 2025年2月12日作成者: jarxiv

要約 Federated Learning（FL）は、データがローカルのままであ … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

DPO Meets PPO: Reinforced Token Optimization for RLHF

投稿日: 2025年2月12日作成者: jarxiv

要約人間のフィードバック（RLHF）フレームワークからの古典的な強化学習では、 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, stat.ML | コメントを受け付けていません

Large Continual Instruction Assistant

投稿日: 2025年2月12日作成者: jarxiv

要約継続的な指導チューニング（CIT）は、データによる人間の意図データに従うよ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

TMLC-Net: Transferable Meta Label Correction for Noisy Label Learning

投稿日: 2025年2月12日作成者: jarxiv

要約実際のデータセットにおける騒々しいラベルの有病率は、深い学習モデルの効果的 … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Novelty Detection in Reinforcement Learning with World Models

投稿日: 2025年2月12日作成者: jarxiv

要約世界モデルを使用した補強学習（RL）は、最近の大幅な成功を発見しています。 … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.SY, eess.SY | コメントを受け付けていません

Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art

投稿日: 2025年2月12日作成者: jarxiv

要約自律システムはすぐに、製造、農業、ヘルスケア、エンターテイメント、その他の … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.RO | コメントを受け付けていません

Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK

投稿日: 2025年2月12日作成者: jarxiv

要約大規模な言語モデル（LLM）は、顕著なコード生成機能を実証していますが、生 … 続きを読む →

カテゴリー: cs.AI, cs.SE | コメントを受け付けていません

TopoTune : A Framework for Generalized Combinatorial Complex Neural Networks

投稿日: 2025年2月12日作成者: jarxiv

要約グラフニューラルネットワーク（GNNS）は、グラフドメインの対称性を保持す … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

What makes math problems hard for reinforcement learning: a case study

投稿日: 2025年2月12日作成者: jarxiv

要約組み合わせグループ理論からの長年の推測を使用して、複数の観点から、不釣り合 … 続きを読む →

カテゴリー: cs.AI, cs.LG, math.CO, math.GR, math.GT | コメントを受け付けていません

月別アーカイブ: 2025年2月

MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces

pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning

DPO Meets PPO: Reinforced Token Optimization for RLHF

Large Continual Instruction Assistant

TMLC-Net: Transferable Meta Label Correction for Noisy Label Learning

Novelty Detection in Reinforcement Learning with World Models

Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art

Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK

TopoTune : A Framework for Generalized Combinatorial Complex Neural Networks

What makes math problems hard for reinforcement learning: a case study

最近の投稿

最近のコメント

アーカイブ

カテゴリー