Author-Specific Linguistic Patterns Unveiled: A Deep Learning Study on Word Class Distributions

要約

深層学習手法は、テキストデータのパターンを明らかにするために、計算言語学にますます適用されています。
この研究では、品詞 (POS) タグ付けとバイグラム分析を使用して、著者固有の単語クラスの分布を調査します。
ディープニューラルネットワークを活用することで、作品から得られた POS タグベクトルとバイグラム頻度行列に基づいて文学作家を分類します。
私たちは、ユニグラムおよびバイグラムベースの表現の有効性を調査するために、完全に接続された畳み込みニューラルネットワークアーキテクチャを採用しています。
私たちの結果は、ユニグラム特徴が中程度の分類精度を達成する一方で、バイグラムベースのモデルがパフォーマンスを大幅に向上させることを示しており、連続した単語クラスパターンが著者のスタイルにより特徴的であることを示唆しています。
多次元スケーリング (MDS) 視覚化により、著者の作品の意味のあるクラスタリングが明らかになり、文体のニュアンスが計算手法を通じて捕捉できるという仮説が裏付けられます。
これらの発見は、著者プロファイリングや文学研究における深層学習と言語特徴分析の可能性を浮き彫りにしています。

要約(オリジナル)

Deep learning methods have been increasingly applied to computational linguistics to uncover patterns in text data. This study investigates author-specific word class distributions using part-of-speech (POS) tagging and bigram analysis. By leveraging deep neural networks, we classify literary authors based on POS tag vectors and bigram frequency matrices derived from their works. We employ fully connected and convolutional neural network architectures to explore the efficacy of unigram and bigram-based representations. Our results demonstrate that while unigram features achieve moderate classification accuracy, bigram-based models significantly improve performance, suggesting that sequential word class patterns are more distinctive of authorial style. Multi-dimensional scaling (MDS) visualizations reveal meaningful clustering of authors’ works, supporting the hypothesis that stylistic nuances can be captured through computational methods. These findings highlight the potential of deep learning and linguistic feature analysis for author profiling and literary studies.

arxiv情報

著者	Patrick Krauss,Achim Schilling
発行日	2025-01-17 09:43:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Author-Specific Linguistic Patterns Unveiled: A Deep Learning Study on Word Class Distributions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー