ResearchArena: Benchmarking Large Language Models’ Ability to Collect and Organize Information as Research Agents

要約

大規模な言語モデル（LLM）は、多くの自然言語処理タスクで優れていますが、研究調査の実施などのドメイン固有の分析タスクで課題に直面しています。
この研究では、学術調査の基本的なステップの学術調査$ \ unicode {x2013} $を実施する際にLLMSの機能を評価するために設計されたベンチマークであるResearchArenaを紹介します。
ResearchArenaは、3つの段階でプロセスをモデル化します。（1）情報の発見、関連する文献の特定。
（2）情報の選択、論文の関連性と影響の評価。
（3）情報組織、マインドマップなどの階層的なフレームワークへの知識を構成する。
特に、マインドマップの構築は、調査執筆における補足的な役割を反映して、ボーナスタスクとして扱われます。
これらの評価をサポートするために、12mフルテキストのアカデミックペーパーと7.9kの調査論文のオフライン環境を構築します。
倫理的コンプライアンスを確保するために、著作権で保護された材料を再分配しません。
代わりに、Semantic Scholar Open Research Corpus（S2ORC）から環境を構築するためのコードを提供します。
予備的な評価により、LLMベースのアプローチは、より単純なキーワードベースの検索方法と比較してパフォーマンスが低いことが明らかになり、自律研究でLLMを進めるための重要な機会を強調しています。

要約(オリジナル)

Large language models (LLMs) excel across many natural language processing tasks but face challenges in domain-specific, analytical tasks such as conducting research surveys. This study introduces ResearchArena, a benchmark designed to evaluate LLMs’ capabilities in conducting academic surveys$\unicode{x2013}$a foundational step in academic research. ResearchArena models the process in three stages: (1) information discovery, identifying relevant literature; (2) information selection, evaluating papers’ relevance and impact; and (3) information organization, structuring knowledge into hierarchical frameworks such as mind-maps. Notably, mind-map construction is treated as a bonus task, reflecting its supplementary role in survey-writing. To support these evaluations, we construct an offline environment of 12M full-text academic papers and 7.9K survey papers. To ensure ethical compliance, we do not redistribute copyrighted materials; instead, we provide code to construct the environment from the Semantic Scholar Open Research Corpus (S2ORC). Preliminary evaluations reveal that LLM-based approaches underperform compared to simpler keyword-based retrieval methods, underscoring significant opportunities for advancing LLMs in autonomous research.

arxiv情報

著者	Hao Kang,Chenyan Xiong
発行日	2025-02-14 17:37:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ResearchArena: Benchmarking Large Language Models’ Ability to Collect and Organize Information as Research Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー