LangProp: A code optimization framework using Large Language Models applied to driving

要約

我々は、教師あり学習と強化学習の両方の設定において、大規模言語モデル（LLM）によって生成されたコードを反復的に最適化するためのフレームワークであるLangPropを提案する。LLMはゼロショットで賢明なコーディング解を生成することができるが、しばしば最適とは言えない。特にコード生成タスクでは、初期コードが特定のエッジケースで失敗する可能性が高い。LangPropは自動的に入出力ペアのデータセットでコードの性能を評価し、例外をキャッチし、その結果を学習ループでLLMにフィードバックします。このコード最適化手順にメトリックとデータ駆動のトレーニングパラダイムを採用することで、模倣学習、DAgger、強化学習などの伝統的な機械学習技術の知見を容易に適応することができる。我々は、数独やCartPoleのような一般的なドメインへのLangPropの適用可能性を示すとともに、CARLAにおける自律走行のための自動コード最適化の最初の概念実証を示す。我々は、LangPropが解釈可能で透明なポリシーを生成でき、メトリックとデータ駆動の方法で検証、改善できることを示します。我々のコードはhttps://github.com/shuishida/LangProp。

要約(オリジナル)

We propose LangProp, a framework for iteratively optimizing code generated by large language models (LLMs), in both supervised and reinforcement learning settings. While LLMs can generate sensible coding solutions zero-shot, they are often sub-optimal. Especially for code generation tasks, it is likely that the initial code will fail on certain edge cases. LangProp automatically evaluates the code performance on a dataset of input-output pairs, catches any exceptions, and feeds the results back to the LLM in the training loop, so that the LLM can iteratively improve the code it generates. By adopting a metric- and data-driven training paradigm for this code optimization procedure, one could easily adapt findings from traditional machine learning techniques such as imitation learning, DAgger, and reinforcement learning. We show LangProp’s applicability to general domains such as Sudoku and CartPole, as well as demonstrate the first proof of concept of automated code optimization for autonomous driving in CARLA. We show that LangProp can generate interpretable and transparent policies that can be verified and improved in a metric- and data-driven way. Our code is available at https://github.com/shuishida/LangProp.

arxiv情報

著者	Shu Ishida,Gianluca Corrado,George Fedoseev,Hudson Yeo,Lloyd Russell,Jamie Shotton,João F. Henriques,Anthony Hu
発行日	2024-05-03 16:15:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

LangProp: A code optimization framework using Large Language Models applied to driving

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー