「cs.CL」カテゴリーアーカイブ

Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs

投稿日: 2024年10月7日作成者: jarxiv

要約人間による評価はオープンドメイン対話評価のゴールドスタンダードであり続けて … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity

投稿日: 2024年10月7日作成者: jarxiv

要約大規模言語モデル（LLM）の顕著な性能に寄与する主要な側面の1つは、事前学 … 続きを読む →

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores

投稿日: 2024年10月7日作成者: jarxiv

要約大規模言語モデル（LLM）は確率的であり、固定ランダムシードで温度をゼロに … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students

投稿日: 2024年10月7日作成者: jarxiv

要約フラッシュカードスケジューラは、1)生徒が知っているフラッシュカードを予測 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Jailbreaking as a Reward Misspecification Problem

投稿日: 2024年10月7日作成者: jarxiv

要約大規模言語モデル(LLM)の普及により、その安全性と信頼性、特に敵対的攻撃 … 続きを読む →

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

A SMART Mnemonic Sounds like ‘Glue Tonic’: Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick

投稿日: 2024年10月7日作成者: jarxiv

要約キーワード・ニーモニックとは、新しい用語をより単純なキーワードに結びつける … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios

投稿日: 2024年10月7日作成者: jarxiv

要約様々な領域で大規模言語モデル（Large Language Models: … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

‘Seeing the Big through the Small’: Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

投稿日: 2024年10月7日作成者: jarxiv

要約人間のラベルのばらつき（HLV）は、複数の人間のアノテーターが正当な理由で … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

To Err Is Human, but Llamas Can Learn It Too

投稿日: 2024年10月7日作成者: jarxiv

要約本研究では、言語モデル(LM)を用いた人工的なエラー生成(AEG)により、 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Steering Large Language Models between Code Execution and Textual Reasoning

投稿日: 2024年10月7日作成者: jarxiv

要約最近の多くの研究は、マルチエージェントフレームワークや推論チェーンを最適化 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

「cs.CL」カテゴリーアーカイブ

Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs

To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity

Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores

KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students

Jailbreaking as a Reward Misspecification Problem

A SMART Mnemonic Sounds like ‘Glue Tonic’: Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick

CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios

‘Seeing the Big through the Small’: Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

To Err Is Human, but Llamas Can Learn It Too

Steering Large Language Models between Code Execution and Textual Reasoning

最近の投稿

最近のコメント

アーカイブ

カテゴリー