QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture

要約

QuArch は、言語モデルによるコンピューターアーキテクチャの理解を評価および強化するために設計された、人間によって検証された 1500 の質問と回答のペアのデータセットです。
このデータセットは、プロセッサ設計、メモリシステム、パフォーマンスの最適化などの領域をカバーしています。
私たちの分析では、パフォーマンスに大きなギャップがあることが明らかになりました。最高のクローズドソースモデルは 84% の精度を達成するのに対し、上位の小規模なオープンソースモデルは 72% に達します。
メモリシステム、相互接続ネットワーク、ベンチマークにおいて顕著な苦戦が見られます。
QuArch を使用した微調整により、小型モデルの精度が最大 8% 向上し、AI 主導のコンピューターアーキテクチャ研究を進めるための基盤が確立されます。
データセットとリーダーボードは https://harvard-edge.github.io/QuArch/ にあります。

要約(オリジナル)

We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models’ understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles in memory systems, interconnection networks, and benchmarking. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and leaderboard are at https://harvard-edge.github.io/QuArch/.

arxiv情報

著者	Shvetank Prakash,Andrew Cheng,Jason Yik,Arya Tschand,Radhika Ghosal,Ikechukwu Uchendu,Jessica Quaye,Jeffrey Ma,Shreyas Grampurohit,Sofia Giannuzzi,Arnav Balyan,Fin Amin,Aadya Pipersenia,Yash Choudhary,Ankita Nayak,Amir Yazdanbakhsh,Vijay Janapa Reddi
発行日	2025-01-06 17:48:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー