Multi-Modal Video Feature Extraction for Popularity Prediction

要約

この研究は、動画そのものとそれに関連する特徴を用いて、短い動画の人気を予測することを目的としている。人気度は、4つの主要なエンゲージメントメトリクス（視聴数、いいね！数、コメント数、共有数）によって測定される。本研究では、動画のモダリティ特徴を抽出するためのバックボーンネットワークとして、異なるアーキテクチャと学習方法を持つ動画分類モデルを用いる。一方、クリーニングされたビデオキャプションは、ビデオと共に、入念に設計されたプロンプトフレームワークに組み込まれ、ビデオ-テキスト生成モデルの入力となり、詳細なテキストベースのビデオコンテンツ理解を生成する。これらのテキストは、事前に訓練されたBERTモデルを用いてベクトルに符号化される。前述の 6 セットのベクトルに基づいて、4 つの予測指標ごとにニューラル・ネットワークが訓練される。さらに、本研究では、動画と表データに基づくデータマイニングと特徴工学を実施し、ハッシュタグの出現頻度の合計、言及の出現頻度の合計、動画時間、フレーム数、フレームレート、オンライン時間の合計などの実用的な特徴を構築する。複数の機械学習モデルが訓練され、最も安定したモデルであるXGBoostが選択される。最後に、ニューラルネットワークとXGBoostモデルからの予測が平均化され、最終結果が得られる。

要約(オリジナル)

This work aims to predict the popularity of short videos using the videos themselves and their related features. Popularity is measured by four key engagement metrics: view count, like count, comment count, and share count. This study employs video classification models with different architectures and training methods as backbone networks to extract video modality features. Meanwhile, the cleaned video captions are incorporated into a carefully designed prompt framework, along with the video, as input for video-to-text generation models, which generate detailed text-based video content understanding. These texts are then encoded into vectors using a pre-trained BERT model. Based on the six sets of vectors mentioned above, a neural network is trained for each of the four prediction metrics. Moreover, the study conducts data mining and feature engineering based on the video and tabular data, constructing practical features such as the total frequency of hashtag appearances, the total frequency of mention appearances, video duration, frame count, frame rate, and total time online. Multiple machine learning models are trained, and the most stable model, XGBoost, is selected. Finally, the predictions from the neural network and XGBoost models are averaged to obtain the final result.

arxiv情報

著者	Haixu Liu,Wenning Wang,Haoxiang Zheng,Penghao Jiang,Qirui Wang,Ruiqing Yan,Qiuzhuang Sun
発行日	2025-01-02 18:59:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Multi-Modal Video Feature Extraction for Popularity Prediction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー