Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)

要約

デヴァナーガリー文字で書かれた言語であるネパール語でのビデオキャプションは、この分野における既存の学術研究が不足しているため、独特の課題を抱えています。
この研究では、この困難に対処するために、ネパール語ビデオキャプション用の新しいエンコーダ/デコーダパラダイムを開発しました。
LSTM および GRU シーケンスツーシーケンスモデルは、CNN を使用してビデオフレームから取得した特徴に基づいて関連するテキストの説明を生成するためにモデルで使用されます。
Google 翻訳と手動の後編集を使用して、Google 翻訳と手動の後編集作業を使用して作成された Microsoft Research Video description Corpus (MSVD) データセットからネパール語ビデオキャプションデータセットが生成されます。
デヴァナーガリー文字のビデオキャプション作成モデルの有効性は、そのパフォーマンスを評価するために使用される BLEU、METOR、および ROUGE の測定によって実証されます。

要約(オリジナル)

Video captioning in Nepali, a language written in the Devanagari script, presents a unique challenge due to the lack of existing academic work in this domain. This work develops a novel encoder-decoder paradigm for Nepali video captioning to tackle this difficulty. LSTM and GRU sequence-to-sequence models are used in the model to produce related textual descriptions based on features retrieved from video frames using CNNs. Using Google Translate and manual post-editing, a Nepali video captioning dataset is generated from the Microsoft Research Video Description Corpus (MSVD) dataset created using Google Translate, and manual post-editing work. The efficacy of the model for Devanagari-scripted video captioning is demonstrated by BLEU, METOR, and ROUGE measures, which are used to assess its performance.

arxiv情報

著者	Kabita Parajuli,Shashidhar Ram Joshi
発行日	2024-01-02 12:24:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー