Speech language models lack important brain-relevant semantics

要約

脳における読むことと聞くことの違いは知られていますが、最近の研究では、テキストベースの言語モデルがテキストによって引き起こされる脳活動と音声によって引き起こされる脳活動の両方を驚くべき程度に予測することが示されています。
これは、どのようなタイプの情報言語モデルが脳内で実際に予測するのかという疑問を投げかけます。
私たちはこの疑問を直接的なアプローチで調査します。このアプローチでは、言語モデル表現内の特定の低レベル刺激の特徴（テキスト、音声、視覚）に関連する情報を排除し、この介入が、実験中に取得された fMRI 脳記録との整合にどのような影響を与えるかを観察します。
参加者は同じ自然主義的な物語を読むか聞くかを比較しました。
さらに、我々の発見を音声ベースの言語モデルと対比します。音声ベースの言語モデルは、脳内の言語処理を適切にモデル化すれば、音声によって引き起こされる脳活動をより正確に予測できると予想されます。
私たちの直接的なアプローチを使用すると、テキストベースの言語モデルと音声ベースの言語モデルの両方が、低レベルの特徴が共有されているため、初期の感覚領域とよく一致していることがわかりました。
テキストベースのモデルは、これらの機能を削除した後でも、その後の言語地域と適切に整合し続けますが、驚くべきことに、音声ベースのモデルは整合性の大部分が失われます。
これらの発見は、音声ベースのモデルをさらに改良して、脳のような言語処理をよりよく反映できることを示唆しています。

要約(オリジナル)

Despite known differences between reading and listening in the brain, recent work has shown that text-based language models predict both text-evoked and speech-evoked brain activity to an impressive degree. This poses the question of what types of information language models truly predict in the brain. We investigate this question via a direct approach, in which we eliminate information related to specific low-level stimulus features (textual, speech, and visual) in the language model representations, and observe how this intervention affects the alignment with fMRI brain recordings acquired while participants read versus listened to the same naturalistic stories. We further contrast our findings with speech-based language models, which would be expected to predict speech-evoked brain activity better, provided they model language processing in the brain well. Using our direct approach, we find that both text-based and speech-based language models align well with early sensory regions due to shared low-level features. Text-based models continue to align well with later language regions even after removing these features, while, surprisingly, speech-based models lose most of their alignment. These findings suggest that speech-based models can be further improved to better reflect brain-like language processing.

arxiv情報

著者	Subba Reddy Oota,Emin Çelik,Fatma Deniz,Mariya Toneva
発行日	2023-11-08 13:11:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Speech language models lack important brain-relevant semantics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー