Are Lexicon-Based Tools Still the Gold Standard for Valence Analysis in Low-Resource Flemish?

要約

日常言語のニュアンスを理解することは、計算言語学と感情の研究における進歩にとって極めて重要です。
LIWCやパターンなどの従来のレキシコンベースのツールは、このドメインの基礎的な楽器として長い間機能してきました。
LIWCは、社会科学とパターンの最も広範囲に検証された単語カウントベースのテキスト分析ツールです。パターンは、NLPの機能を提供するオープンソースPythonライブラリです。
しかし、日常の言語は本質的に自発的で、豊かに表現力豊かで、深く文脈に依存しています。
フランドルの日々の物語の価格をキャプチャする際のLLMの能力を調査するために、最初に102人のオランダ語を話す参加者から約25,000人のテキスト応答を含む研究を実施しました。
各参加者は、「今何が起こっているのか、それについてどう思いますか？」という質問によって促された物語を提供しました。
次に、これらの価数スコアを予測する際に、3つのオランダ固有のLLMの性能を評価し、それらの出力をLIWCとパターンによって生成された出力と比較しました。
私たちの調査結果は、LLMアーキテクチャの進歩にもかかわらず、これらのオランダの調整されたモデルは現在、自発的で現実世界の物語に存在する感情的な原子価を正確に捉えることに不足していることを示しています。
この研究では、自然言語の使用の複雑さを巧みに処理できる文化的および言語的に調整されたモデル/ツールを開発するための命令を強調しています。
自動化された原子価分析の強化は、計算方法論を進めるための極めて重要なことであるだけでなく、人間の日常体験に関する生態学的に有効な洞察を伴う心理研究の重要な約束も抱えています。
私たちは、計算言語学と感情研究の間のギャップを埋めることを目指して、フランドルのような低リソース言語の包括的なデータセットと微調整LLMを作成する努力の増加を提唱しています。

要約(オリジナル)

Understanding the nuances in everyday language is pivotal for advancements in computational linguistics & emotions research. Traditional lexicon-based tools such as LIWC and Pattern have long served as foundational instruments in this domain. LIWC is the most extensively validated word count based text analysis tool in the social sciences and Pattern is an open source Python library offering functionalities for NLP. However, everyday language is inherently spontaneous, richly expressive, & deeply context dependent. To explore the capabilities of LLMs in capturing the valences of daily narratives in Flemish, we first conducted a study involving approximately 25,000 textual responses from 102 Dutch-speaking participants. Each participant provided narratives prompted by the question, ‘What is happening right now and how do you feel about it?’, accompanied by self-assessed valence ratings on a continuous scale from -50 to +50. We then assessed the performance of three Dutch-specific LLMs in predicting these valence scores, and compared their outputs to those generated by LIWC and Pattern. Our findings indicate that, despite advancements in LLM architectures, these Dutch tuned models currently fall short in accurately capturing the emotional valence present in spontaneous, real-world narratives. This study underscores the imperative for developing culturally and linguistically tailored models/tools that can adeptly handle the complexities of natural language use. Enhancing automated valence analysis is not only pivotal for advancing computational methodologies but also holds significant promise for psychological research with ecologically valid insights into human daily experiences. We advocate for increased efforts in creating comprehensive datasets & finetuning LLMs for low-resource languages like Flemish, aiming to bridge the gap between computational linguistics & emotion research.

arxiv情報

著者	Ratna Kandala,Katie Hoemann
発行日	2025-06-04 16:31:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Are Lexicon-Based Tools Still the Gold Standard for Valence Analysis in Low-Resource Flemish?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー