投稿者「jarxiv」のアーカイブ

Stochastic interior-point methods for smooth conic optimization with applications

投稿日: 2025年6月3日作成者: jarxiv

要約円錐最適化は、多くの機械学習（ML）の問題で重要な役割を果たします。ただ … 続きを読む →

カテゴリー: 90C25, 90C30, cs.AI, cs.LG, math.OC | コメントを受け付けていません

Causally Reliable Concept Bottleneck Models

投稿日: 2025年6月3日作成者: jarxiv

要約概念ベースのモデルは、人間が解釈可能な変数を介して動作する推論プロセスを制 … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

How well do LLMs reason over tabular data, really?

投稿日: 2025年6月3日作成者: jarxiv

要約大規模な言語モデル（LLM）は自然言語のタスクに優れていますが、表形式デー … 続きを読む →

カテゴリー: cs.AI | コメントを受け付けていません

HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation

投稿日: 2025年6月3日作成者: jarxiv

要約検索された生成（RAG）は、大規模な言語モデル（LLM）の知識の外部問題に … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Value Compass Benchmarks: A Platform for Fundamental and Validated Evaluation of LLMs Values

投稿日: 2025年6月3日作成者: jarxiv

要約大規模な言語モデル（LLM）が顕著なブレークスルーを達成するにつれて、人間 … 続きを読む →

カテゴリー: cs.AI | コメントを受け付けていません

A Dual-Directional Context-Aware Test-Time Learning for Text Classification

投稿日: 2025年6月3日作成者: jarxiv

要約テキスト分類は、テキストを事前定義されたカテゴリに割り当てます。従来の方 … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

(Im)possibility of Automated Hallucination Detection in Large Language Models

投稿日: 2025年6月3日作成者: jarxiv

要約自動幻覚検出は可能ですか？この作業では、大規模な言語モデル（LLM）によ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, stat.ML | コメントを受け付けていません

Estimating LLM Consistency: A User Baseline vs Surrogate Metrics

投稿日: 2025年6月3日作成者: jarxiv

要約大規模な言語モデル（LLM）は幻覚を起こしやすく、緊急摂動に敏感であり、し … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.HC, cs.LG | コメントを受け付けていません

Improving Transformer World Models for Data-Efficient RL

投稿日: 2025年6月3日作成者: jarxiv

要約モデルベースのRLへのアプローチを提示します。これは、挑戦的なCrafta … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Universal Value-Function Uncertainties

投稿日: 2025年6月3日作成者: jarxiv

要約価値関数における認識論的不確実性の推定は、効率的な調査、安全な意思決定、オ … 続きを読む →

カテゴリー: cs.AI, cs.LG, stat.ML | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Stochastic interior-point methods for smooth conic optimization with applications

Causally Reliable Concept Bottleneck Models

How well do LLMs reason over tabular data, really?

HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation

Value Compass Benchmarks: A Platform for Fundamental and Validated Evaluation of LLMs Values

A Dual-Directional Context-Aware Test-Time Learning for Text Classification

(Im)possibility of Automated Hallucination Detection in Large Language Models

Estimating LLM Consistency: A User Baseline vs Surrogate Metrics

Improving Transformer World Models for Data-Efficient RL

Universal Value-Function Uncertainties

最近の投稿

最近のコメント

アーカイブ

カテゴリー