「cs.CL」カテゴリーアーカイブ

First-Person Fairness in Chatbots

投稿日: 2025年3月4日作成者: jarxiv

要約チャットボットの急速な普及を考えると、チャットボットの公平性を評価すること … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CY | コメントを受け付けていません

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

投稿日: 2025年3月4日作成者: jarxiv

要約 OpenAIのo1シリーズに代表される大規模言語モデル(LLM)におけるテ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

InductionBench: LLMs Fail in the Simplest Complexity Class

投稿日: 2025年3月4日作成者: jarxiv

要約大規模言語モデル(LLM)は推論において顕著な改善を見せており、多くの既存 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.FL, cs.LG | コメントを受け付けていません

On Memory Construction and Retrieval for Personalized Conversational Agents

投稿日: 2025年3月4日作成者: jarxiv

要約本論文では、次の2つの重要な発見を提示する。(1)記憶単位の粒度が重要であ … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

SensorQA: A Question Answering Benchmark for Daily-Life Monitoring

投稿日: 2025年3月4日作成者: jarxiv

要約センサーデータの急速な増加に伴い、これらのデータを人間が理解しやすい方法で … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Forecasting Frontier Language Model Agent Capabilities

投稿日: 2025年3月4日作成者: jarxiv

要約言語モデル(LM)が自律的なエージェントとして運用されるようになるにつれ、 … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Structural-Entropy-Based Sample Selection for Efficient and Effective Learning

投稿日: 2025年3月4日作成者: jarxiv

要約サンプル選択は、有益で代表的なサンプルを提供することで、機械学習モデルの効 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Representation Engineering: A Top-Down Approach to AI Transparency

投稿日: 2025年3月4日作成者: jarxiv

要約本稿では、認知神経科学の知見を活用し、AIシステムの透明性を向上させるアプ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.CY, cs.LG | コメントを受け付けていません

NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

投稿日: 2025年3月4日作成者: jarxiv

要約 Vision-and-Language Navigation (VLN)は … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Evaluating Intelligence via Trial and Error

投稿日: 2025年3月4日作成者: jarxiv

要約知能は、限られた回数の試行錯誤の中で解決策を見出す種にとって重要な特性であ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.IR | コメントを受け付けていません

「cs.CL」カテゴリーアーカイブ

First-Person Fairness in Chatbots

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

InductionBench: LLMs Fail in the Simplest Complexity Class

On Memory Construction and Retrieval for Personalized Conversational Agents

SensorQA: A Question Answering Benchmark for Daily-Life Monitoring

Forecasting Frontier Language Model Agent Capabilities

Structural-Entropy-Based Sample Selection for Efficient and Effective Learning

Representation Engineering: A Top-Down Approach to AI Transparency

NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

Evaluating Intelligence via Trial and Error

最近の投稿

最近のコメント

アーカイブ

カテゴリー