Are Information Retrieval Approaches Good at Harmonising Longitudinal Survey Questions in Social Science?

要約

縦方向の社会科学調査における意味的に同等の質問の自動検出は、社会的、経済、および健康科学の経験的研究を通知する長期研究にとって重要です。
同等の質問を取得することは、二重の課題に直面しています：研究全体の理論構造（すなわち、概念/サブコンセプト）の一貫性のない表現、および質問と回答のオプション、および縦断テキストの語彙と構造の進化。
これらの課題に対処するために、コンピューター科学者と調査スペシャリストの学際的なコラボレーションは、縦断的集団研究を調和させるための質問と回答オプションの概念（例えば、住宅、仕事など）の等価性を特定するという新しい情報検索（IR）タスクを提示します。
このペーパーでは、確率モデル、言語モデルの線形調査、IR専用の事前に訓練されたニューラルネットワークなど、1946年から2020年にかけての調査データセットに関する複数の監視されていないアプローチを調査します。
IR特有のニューラルモデルは、他のアプローチが同等のパフォーマンスを発揮し、最高の全体的なパフォーマンスを達成することを示しています。
さらに、神経モデルを使用した確率モデルの結果の再ランキングは、F1スコアで最大で0.07の控えめな改善をもたらすだけです。
調査スペシャリストによる定性的な事後評価は、モデルが一般に、特にサブコンセプトが不一致になっている場合に、高語彙のオーバーラップの高い質問に対して感度が低いことを示しています。
全体として、私たちの分析は、社会科学における縦断的研究の調和に関するさらなる研究に役立ちます。

要約(オリジナル)

Automated detection of semantically equivalent questions in longitudinal social science surveys is crucial for long-term studies informing empirical research in the social, economic, and health sciences. Retrieving equivalent questions faces dual challenges: inconsistent representation of theoretical constructs (i.e. concept/sub-concept) across studies as well as between question and response options, and the evolution of vocabulary and structure in longitudinal text. To address these challenges, our multi-disciplinary collaboration of computer scientists and survey specialists presents a new information retrieval (IR) task of identifying concept (e.g. Housing, Job, etc.) equivalence across question and response options to harmonise longitudinal population studies. This paper investigates multiple unsupervised approaches on a survey dataset spanning 1946-2020, including probabilistic models, linear probing of language models, and pre-trained neural networks specialised for IR. We show that IR-specialised neural models achieve the highest overall performance with other approaches performing comparably. Additionally, the re-ranking of the probabilistic model’s results with neural models only introduces modest improvements of 0.07 at most in F1-score. Qualitative post-hoc evaluation by survey specialists shows that models generally have a low sensitivity to questions with high lexical overlap, particularly in cases where sub-concepts are mismatched. Altogether, our analysis serves to further research on harmonising longitudinal studies in social science.

arxiv情報

著者	Wing Yan Li,Zeqiang Wang,Jon Johnson,Suparna De
発行日	2025-04-29 12:00:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Are Information Retrieval Approaches Good at Harmonising Longitudinal Survey Questions in Social Science?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー