TaCo: Targeted Concept Removal in Output Embeddings for NLP via Information Theory and Explainability

要約

自然言語処理 (NLP) モデルの公平性が重大な懸念事項として浮上しています。
情報理論によれば、公平性を達成するには、モデルが性別、民族性、年齢などの敏感な変数を予測できない必要があります。
ただし、これらの変数に関連する情報は暗黙的に言語に現れることが多く、バイアスを効果的に特定して軽減することが困難になります。
この問題に取り組むために、特定のアーキテクチャに依存せず、NLP モデルの埋め込みレベルで動作する新しいアプローチを紹介します。
私たちの手法は、XAI 技術の最近の進歩からの洞察を活用し、埋め込み変換を採用して、選択された変数から暗黙的な情報を削除します。
最終層の埋め込みを直接操作することにより、私たちのアプローチは、大幅な変更や再トレーニングを必要とせずに、既存のモデルへのシームレスな統合を可能にします。
評価では、提案された事後アプローチが、モデルの全体的なパフォーマンスと機能を維持しながら、NLP モデルにおける性別関連の関連性を大幅に低減することを示します。
私たちのメソッドの実装は、https://github.com/fanny-jourdan/TaCo から入手できます。

要約(オリジナル)

The fairness of Natural Language Processing (NLP) models has emerged as a crucial concern. Information theory indicates that to achieve fairness, a model should not be able to predict sensitive variables, such as gender, ethnicity, and age. However, information related to these variables often appears implicitly in language, posing a challenge in identifying and mitigating biases effectively. To tackle this issue, we present a novel approach that operates at the embedding level of an NLP model, independent of the specific architecture. Our method leverages insights from recent advances in XAI techniques and employs an embedding transformation to eliminate implicit information from a selected variable. By directly manipulating the embeddings in the final layer, our approach enables a seamless integration into existing models without requiring significant modifications or retraining. In evaluation, we show that the proposed post-hoc approach significantly reduces gender-related associations in NLP models while preserving the overall performance and functionality of the models. An implementation of our method is available: https://github.com/fanny-jourdan/TaCo

arxiv情報

著者	Fanny Jourdan,Louis Béthune,Agustin Picard,Laurent Risser,Nicholas Asher
発行日	2024-04-12 15:50:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TaCo: Targeted Concept Removal in Output Embeddings for NLP via Information Theory and Explainability

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー