投稿者「jarxiv」のアーカイブ

EvalAgent: Discovering Implicit Evaluation Criteria from the Web

投稿日: 2025年4月22日作成者: jarxiv

要約構造化されたライティングタスクでの言語モデル出力の評価は、通常、人間の評価 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Fully Bayesian Approaches to Topics over Time

投稿日: 2025年4月22日作成者: jarxiv

要約時間の経過とともにトピック（TOT）モデルは、Word共起パターンと共同で … 続きを読む →

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning

投稿日: 2025年4月22日作成者: jarxiv

要約大規模な言語モデル（LLM）は、侵入などの敵対的な攻撃の影響を受けやすく、 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet

投稿日: 2025年4月22日作成者: jarxiv

要約大規模な言語モデル（LLM）は遍在するため、リスクと制限を理解することが重 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

投稿日: 2025年4月22日作成者: jarxiv

要約テスト時間計算のスケーリング、または推論中に発電機の大規模言語モデル（LL … 続きを読む →

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

投稿日: 2025年4月22日作成者: jarxiv

要約 Cからust骨の輸送は、現代の錆生態系との安全性と相互運用性を高めながら、 … 続きを読む →

カテゴリー: cs.CL, cs.SE | コメントを受け付けていません

DataComp-LM: In search of the next generation of training sets for language models

投稿日: 2025年4月22日作成者: jarxiv

要約言語モデルを改善することを目的とした、制御されたデータセット実験のテストベ … 続きを読む →

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

Federated Latent Factor Model for Bias-Aware Recommendation with Privacy-Preserving

投稿日: 2025年4月22日作成者: jarxiv

要約推奨システム（RS）は、ユーザーにパーソナライズされたアイテムの推奨事項を … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Rethinking the Potential of Multimodality in Collaborative Problem Solving Diagnosis with Large Language Models

投稿日: 2025年4月22日作成者: jarxiv

要約学生の共同問題解決（CPS）コンピテンシーを解釈するために、デジタルトレー … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Fast-Slow Co-advancing Optimizer: Toward Harmonious Adversarial Training of GAN

投稿日: 2025年4月22日作成者: jarxiv

要約これまで、特にトレーニングセットの全体的な分散が大きい場合、典型的な生成敵 … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

EvalAgent: Discovering Implicit Evaluation Criteria from the Web

Fully Bayesian Approaches to Topics over Time

MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning

Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet

Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

DataComp-LM: In search of the next generation of training sets for language models

Federated Latent Factor Model for Bias-Aware Recommendation with Privacy-Preserving

Rethinking the Potential of Multimodality in Collaborative Problem Solving Diagnosis with Large Language Models

Fast-Slow Co-advancing Optimizer: Toward Harmonious Adversarial Training of GAN

最近の投稿

最近のコメント

アーカイブ

カテゴリー