「cs.CL」カテゴリーアーカイブ

JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System

投稿日: 2025年5月1日作成者: jarxiv

要約このペーパーでは、中国の法制度における判断文書生成のパフォーマンスを評価す … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.IR | コメントを受け付けていません

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

投稿日: 2025年5月1日作成者: jarxiv

要約 Openai-O1やDeepSeek-R1などの大きな推論モデル（LRMS … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.IR | コメントを受け付けていません

Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers

投稿日: 2025年5月1日作成者: jarxiv

要約幻覚は、大規模な言語モデル（LLMS）の持続的な問題です。これらのモデル … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

SWE-smith: Scaling Data for Software Engineering Agents

投稿日: 2025年5月1日作成者: jarxiv

要約ソフトウェアエンジニアリングの言語モデル（LMS）の最近の進歩にもかかわら … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SE | コメントを受け付けていません

How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues

投稿日: 2025年5月1日作成者: jarxiv

要約ヘルスケアにおける合成データの採用の増大は、プライバシーの懸念、現実世界の … 続きを読む →

カテゴリー: 68T50, cs.AI, cs.CL, cs.CY, cs.HC, I.2.7 | コメントを受け付けていません

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

投稿日: 2025年5月1日作成者: jarxiv

要約 Lean 4で証明する正式な定理用に設計されたオープンソースの大型言語モデ … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments

投稿日: 2025年5月1日作成者: jarxiv

要約目的：大規模な言語モデル（LLM）は臨床医を支援し、患者をサポートするため … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation

投稿日: 2025年4月30日作成者: jarxiv

要約検索拡張生成（RAG）システムの自動評価は、専門家のアノテーターによって判 … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation

投稿日: 2025年4月30日作成者: jarxiv

要約さまざまな大規模な言語モデル（LLM）にわたって毒性を緩和するために設計さ … 続きを読む →

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

投稿日: 2025年4月30日作成者: jarxiv

要約ビジョン言語モデル（VLMS）の評価は、主に英語のベンチマークに依存してお … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

「cs.CL」カテゴリーアーカイブ

JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers

SWE-smith: Scaling Data for Software Engineering Agents

How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments

MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation

UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

最近の投稿

最近のコメント

アーカイブ

カテゴリー