QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture

要約

QuArchを紹介する。QuArchは、コンピュータアーキテクチャに対する言語モデルの理解を評価し、強化するために設計された、1500組の人間による検証済みの質問と回答のデータセットである。このデータセットは、プロセッサ設計、メモリシステム、パフォーマンス最適化などの分野をカバーしている。私たちの分析では、重大な性能差が浮き彫りになりました。最高のクローズドソースモデルが84%の精度を達成したのに対し、トップの小規模なオープンソースモデルは72%に達しました。メモリシステム、相互接続ネットワーク、ベンチマークにおいて、顕著な苦戦が見られます。QuArchによるファインチューニングは、小規模モデルの精度を最大8%向上させ、AI主導のコンピュータアーキテクチャ研究を推進する基盤を確立します。データセットとリーダーボードはhttps://harvard-edge.github.io/QuArch/。

要約(オリジナル)

We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models’ understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles in memory systems, interconnection networks, and benchmarking. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and leaderboard are at https://harvard-edge.github.io/QuArch/.

arxiv情報

著者	Shvetank Prakash,Andrew Cheng,Jason Yik,Arya Tschand,Radhika Ghosal,Ikechukwu Uchendu,Jessica Quaye,Jeffrey Ma,Shreyas Grampurohit,Sofia Giannuzzi,Arnav Balyan,Fin Amin,Aadya Pipersenia,Yash Choudhary,Ankita Nayak,Amir Yazdanbakhsh,Vijay Janapa Reddi
発行日	2025-01-03 16:55:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー