月別アーカイブ: 2025年3月

IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval

投稿日: 2025年3月7日作成者: jarxiv

要約 Expert Domainsで命令に従う情報検索（IR）を評価するために設 … 続きを読む →

カテゴリー: cs.CL, cs.IR | コメントを受け付けていません

Get my drift? Catching LLM Task Drift with Activation Deltas

投稿日: 2025年3月7日作成者: jarxiv

要約 LLMは、外部ソースからのデータに基づいてユーザー命令を実行するために、検 … 続きを読む →

カテゴリー: cs.CL, cs.CR, cs.CY | コメントを受け付けていません

Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization

投稿日: 2025年3月7日作成者: jarxiv

要約大規模な言語モデル（LLM）が、社会的価値を順守する応答だけであることを確 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

An Information-theoretic Multi-task Representation Learning Framework for Natural Language Understanding

投稿日: 2025年3月7日作成者: jarxiv

要約このペーパーでは、すべてのタスクに対してノイズ不変の表現を抽出するための新 … 続きを読む →

カテゴリー: cs.CL, cs.IT, cs.LG, math.IT | コメントを受け付けていません

LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue

投稿日: 2025年3月7日作成者: jarxiv

要約ユーザー満足度の推定（使用）として知られる会話システムに対するユーザーの満 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module

投稿日: 2025年3月7日作成者: jarxiv

要約私たちは、小学校、数学の問題を提示する短いテキストのデータセットであるGS … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases

投稿日: 2025年3月7日作成者: jarxiv

要約 Deepseek-R1やOpenai-O3などの最新の推論強化ラージモデル … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets

投稿日: 2025年3月7日作成者: jarxiv

要約大規模な言語モデル（LLM）は、大規模なデータセットでのトレーニング中に必 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities

投稿日: 2025年3月7日作成者: jarxiv

要約話しかけられた対話モデリングは、テキストベースの言語モデリングを超えた独自 … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

投稿日: 2025年3月7日作成者: jarxiv

要約音声からスピーチのダイアログシステムの最近の進歩は、マルチモーダルの相互作 … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

月別アーカイブ: 2025年3月

IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval

Get my drift? Catching LLM Task Drift with Activation Deltas

Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization

An Information-theoretic Multi-task Representation Learning Framework for Natural Language Understanding

LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue

DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module

Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases

UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets

Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

最近の投稿

最近のコメント

アーカイブ

カテゴリー