A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers

要約

この論文では、分布外の数学的推論問題に対するトランスフォーマーの一般化可能性を評価するために、シンボリックエンジンの支援を受けて、方程式の詳細な導出を大規模に生成および摂動する方法論を提案します。
シーケンス分類タスクのコンテキストでフレームワークをインスタンス化し、GPT-4、GPT-3.5、および微調整された BERT モデルのカノンの機能を比較し、特定の演算子と推論の側面の摂動による汎化の失敗との関係を調査します。
対称性と可変の表面形状として。
驚くべきことに、私たちの経験的評価により、微調整されたモデルの平均配布内パフォーマンスは GPT-3.5 を超え、GPT-4 に匹敵することが明らかになりました。
ただし、入力推論に対する摂動により、パフォーマンスが最大 80 F1 ポイント低下する可能性があります。
全体として、この結果は、トレーニング中に適切に構造化された導出依存関係を組み込むことで、小規模なオープンソースモデルの配布内パフォーマンスが GPT に匹敵する可能性があることを示唆しており、数学的エンティティへの間接参照をデコードできないという BERT と GPT の共通の弱点を浮き彫りにしています。
。
この分野での将来の進歩を促進するために、完全なコードベース、構築されたデータセット、および微調整されたモデルをリリースします。

要約(オリジナル)

This paper proposes a methodology for generating and perturbing detailed derivations of equations at scale, aided by a symbolic engine, to evaluate the generalisability of Transformers to out-of-distribution mathematical reasoning problems. Instantiating the framework in the context of sequence classification tasks, we compare the capabilities of GPT-4, GPT-3.5, and a canon of fine-tuned BERT models, exploring the relationship between specific operators and generalisation failure via the perturbation of reasoning aspects such as symmetry and variable surface forms. Surprisingly, our empirical evaluation reveals that the average in-distribution performance of fine-tuned models surpasses GPT-3.5, and rivals GPT-4. However, perturbations to input reasoning can reduce their performance by up to 80 F1 points. Overall, the results suggest that the in-distribution performance of smaller open-source models may potentially rival GPT by incorporating appropriately structured derivation dependencies during training, and highlight a shared weakness between BERT and GPT involving a relative inability to decode indirect references to mathematical entities. We release the full codebase, constructed datasets, and fine-tuned models to encourage future progress in the field.

arxiv情報

著者	Jordan Meadows,Marco Valentino,Damien Teney,Andre Freitas
発行日	2024-04-08 14:29:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー