Predicting the Geolocation of Tweets Using transformer models on Customized Data

要約

この研究は、ツイート/ユーザーの地理位置予測タスクを解決し、テキストビッグデータの地理タグ付けのための柔軟な方法論を提供することを目的としています。
提案されたアプローチでは、自然言語処理 (NLP) 用のニューラルネットワークを実装して、位置を座標ペア (経度、緯度) および 2 次元混合ガウスモデル (GMM) として推定します。
提案されたモデルの範囲は、ベースモデルとして事前トレーニングされた Bidirectional Encoder Representations from Transformers (BERT) を使用して Twitter データセット上で微調整されています。
パフォーマンス指標では、ツイートのコンテンツのテキスト特徴とメタデータコンテキストに基づいてトレーニングおよび評価されたモデルの誤差の中央値が、世界レベルで 30 km 未満、米国レベルのデータセットで 15 km 未満であることが示されています。
私たちのソースコードとデータは https://github.com/K4TEL/geo-twitter.git で入手できます。

要約(オリジナル)

This research is aimed to solve the tweet/user geolocation prediction task and provide a flexible methodology for the geotagging of textual big data. The suggested approach implements neural networks for natural language processing (NLP) to estimate the location as coordinate pairs (longitude, latitude) and two-dimensional Gaussian Mixture Models (GMMs). The scope of proposed models has been finetuned on a Twitter dataset using pretrained Bidirectional Encoder Representations from Transformers (BERT) as base models. Performance metrics show a median error of fewer than 30 km on a worldwide-level, and fewer than 15 km on the US-level datasets for the models trained and evaluated on text features of tweets’ content and metadata context. Our source code and data are available at https://github.com/K4TEL/geo-twitter.git

arxiv情報

著者	Kateryna Lutsai,Christoph H. Lampert
発行日	2024-08-01 16:14:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Predicting the Geolocation of Tweets Using transformer models on Customized Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー