HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks

要約

大規模な言語モデル（LLM）トレーニングと推論の迅速なスケーリングは、学界と産業全体の半導体設計での採用を推進しています。
ほとんどの以前の作業は、ハードウェア説明言語（HDL）タスク、特にVerilogでLLMSを評価しますが、デザイナーは高レベルの合成（HLS）を使用してドメイン固有のアクセラレータと複雑なハードウェアシステムを構築しています。
ただし、HLS設計タスクのLLMを包括的に評価するためのベンチマークとツールは依然として不足しています。
これに対処するために、LLM駆動型HLS設計の最初の完全なベンチマークと評価フレームワークであるHLS-Evalを紹介します。
HLS-Evalは、2つのコアタスクをターゲットにします。（1）自然言語の説明からHLSコードを生成し、（2）パフォーマンスとハードウェア効率を最適化するためのHLS固有のコード編集を実行します。
ベンチマークには、標準のHLSベンチマークと新しいソースから描かれた94のユニークなデザインが含まれています。
各ケースは、自然言語の説明とcシミュレーションと合成検証のためのペアのテストベンチを生成する半自動フローを介して調製され、各タスクが「LLM対応」であることを確認します。
ベンチマークを超えて、HLS-Evalは、ローカルとホストの両方のLLMの両方の自動化された並列評価のためのモジュラーPythonフレームワークを提供します。
これには、並列評価エンジン、直接HLSツール統合、およびさまざまなLLM相互作用パラダイムをサポートするための抽象化が含まれ、新しいベンチマーク、タスク、およびLLMメソッドの迅速なプロトタイピングを可能にします。
Vitis HLS上のオープンソースLLMのベースライン評価を通じてHLS -Evalを実証し、4つの主要なメトリックの出力を測定します – パーセビリティ、コンパイラビリティ、ランナビリティ、および合成化可能性 – 反復HLS設計サイクルを反映しています。
また、Pass@K Metricsを報告し、より広範なLLM-For-Hardwareコミュニティ向けに明確なベースラインと再利用可能なインフラストラクチャを確立します。
すべてのベンチマーク、フレームワークコード、および結果は、https：//github.com/stefanpie/hls-evalでオープンソーリングされています。

要約(オリジナル)

The rapid scaling of large language model (LLM) training and inference has driven their adoption in semiconductor design across academia and industry. While most prior work evaluates LLMs on hardware description language (HDL) tasks, particularly Verilog, designers are increasingly using high-level synthesis (HLS) to build domain-specific accelerators and complex hardware systems. However, benchmarks and tooling to comprehensively evaluate LLMs for HLS design tasks remain scarce. To address this, we introduce HLS-Eval, the first complete benchmark and evaluation framework for LLM-driven HLS design. HLS-Eval targets two core tasks: (1) generating HLS code from natural language descriptions, and (2) performing HLS-specific code edits to optimize performance and hardware efficiency. The benchmark includes 94 unique designs drawn from standard HLS benchmarks and novel sources. Each case is prepared via a semi-automated flow that produces a natural language description and a paired testbench for C-simulation and synthesis validation, ensuring each task is ‘LLM-ready.’ Beyond the benchmark, HLS-Eval offers a modular Python framework for automated, parallel evaluation of both local and hosted LLMs. It includes a parallel evaluation engine, direct HLS tool integration, and abstractions for to support different LLM interaction paradigms, enabling rapid prototyping of new benchmarks, tasks, and LLM methods. We demonstrate HLS-Eval through baseline evaluations of open-source LLMs on Vitis HLS, measuring outputs across four key metrics – parseability, compilability, runnability, and synthesizability – reflecting the iterative HLS design cycle. We also report pass@k metrics, establishing clear baselines and reusable infrastructure for the broader LLM-for-hardware community. All benchmarks, framework code, and results are open-sourced at https://github.com/stefanpie/hls-eval.

arxiv情報

著者	Stefan Abi-Karam,Cong Hao
発行日	2025-04-16 17:30:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー