MaxPoolBERT: Enhancing BERT Classification via Layer- and Token-Wise Aggregation

要約

BERTの[CLS]トークンは、一般的に分類タスクの固定長い表現として使用されていますが、他のトークンと中間層の両方が貴重なコンテキスト情報をエンコードすることが以前の研究で示されています。
この作業では、レイヤーとトークンを越えて情報を集約することにより[CLS]表現を改良するBERTの軽量拡張機能であるMaxpoolbertを提案します。
具体的には、3つの変更を調査します。（i）複数のレイヤーにわたって[CLS]トークンを最大プーリングする、（ii）[CLS]トークンが最終層全体に出席し、追加のマルチヘッド注意（MHA）層を使用して、MHAとの完全なシーケンス全体の最大プーリングを組み合わせた（III）。
私たちのアプローチは、モデルサイズを事前に訓練または大幅に増加させることなく、Bertの分類精度（特に低リソースタスクで）を強化します。
接着剤ベンチマークでの実験は、Maxpoolbertが標準のBertベースモデルで一貫してより良いパフォーマンスを達成することを示しています。

要約(オリジナル)

The [CLS] token in BERT is commonly used as a fixed-length representation for classification tasks, yet prior work has shown that both other tokens and intermediate layers encode valuable contextual information. In this work, we propose MaxPoolBERT, a lightweight extension to BERT that refines the [CLS] representation by aggregating information across layers and tokens. Specifically, we explore three modifications: (i) max-pooling the [CLS] token across multiple layers, (ii) enabling the [CLS] token to attend over the entire final layer using an additional multi-head attention (MHA) layer, and (iii) combining max-pooling across the full sequence with MHA. Our approach enhances BERT’s classification accuracy (especially on low-resource tasks) without requiring pre-training or significantly increasing model size. Experiments on the GLUE benchmark show that MaxPoolBERT consistently achieves a better performance on the standard BERT-base model.

arxiv情報

著者	Maike Behrendt,Stefan Sylvius Wagner,Stefan Harmeling
発行日	2025-05-21 16:10:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MaxPoolBERT: Enhancing BERT Classification via Layer- and Token-Wise Aggregation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー