Understanding How CodeLLMs (Mis)Predict Types with Activation Steering

要約

私たちが知っているように、CodeLLM はソフトウェア開発を変革しています。
これは、型予測など、ルールベースのアプローチでは不十分なタスクに特に当てはまります。
型予測タスクは、部分的に型付けされたプログラムに新しい型注釈を追加することで構成され、結果として得られるプログラムが完全に型付けされたものに近づきます。
ルールベースのアプローチの扱いが難しく、手動によるアノテーションのコストが高いため、CodeLLM はこの問題に対する魅力的な解決策となっています。
ただし、CodeLLM はその信頼性をめぐる疑問があるため、大規模に導入されるにはまだ程遠いです。
CodeLLM が型予測にどのようにアプローチするかを明らかにするために、モデルが型を誤って予測したときに何が起こるかを調査します。
セマンティクスを保持する編集をコードに適用すると、CodeLLM が最終的に型アノテーションを誤って予測するように誘導されることを示します。
ただし、アクティベーションステアリングを活用することで、モデルを正しい予測に「導く」ことができ、意味的に無関係なプロンプト機能に対してモデルをより堅牢にすることができます。
ステアリングがタイプ予測タスクを直接微調整するのと同等のパフォーマンスを達成することを示します。
さらに、Python コードから計算されたステアリングベクトルが TypeScript の予測ミスを修正するのに効果的であり、その逆も同様であることがわかりました。
私たちの知る限り、これは、CodeLLM が言語間で伝達されるタスク表現を学習することを示唆するこの種の最初の証拠です。

要約(オリジナル)

CodeLLMs are transforming software development as we know it. This is especially true for tasks where rule-based approaches fall short, like type prediction. The type prediction task consists in adding a new type annotation to a partially typed program, such that the resulting program is closer to being fully typed. The intractability of rule-based approaches and high cost of manual annotation make CodeLLMs an attractive solution to the problem. However, CodeLLMs are still far from being deployed on the large-scale due to doubts surrounding their reliability. To shed some light on how CodeLLMs approach type prediction, we investigate what happens when a model mispredicts a type. We show that by applying semantics-preserving edits to code, CodeLLMs are eventually misled into mispredicting type annotations. However, by leveraging activation steering we are able to ‘steer’ the model back to the correct prediction, making models more robust against semantically irrelevant prompt features. We show that steering achieves comparable performance to fine-tuning directly on the type prediction task. Furthermore, we find that steering vectors computed from Python code are effective at correcting TypeScript mispredictions, and vice versa. To our knowledge, this is the first evidence of its kind to suggest that CodeLLMs learn task representations that transfer across languages.

arxiv情報

著者	Francesca Lucchetti,Arjun Guha
発行日	2024-09-13 14:56:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Understanding How CodeLLMs (Mis)Predict Types with Activation Steering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー