Image captioning for Brazilian Portuguese using GRIT model

要約

この研究は、ブラジル系ポルトガル語の画像キャプションモデルの初期開発を示しています。
この作業を達成するために、GRIT (グリッドおよび地域ベースの画像キャプショントランスフォーマー) モデルを使用しました。
GRIT は、2 つの視覚機能を効果的に利用してより良いキャプションを生成する、Transformer 専用のニューラルアーキテクチャです。
GRIT 手法は、画像キャプションを生成するより効率的な方法として提案されました。
この作業では、ブラジルポルトガル語のデータセットでトレーニングされる GRIT モデルを適応させ、ブラジルポルトガル語の画像キャプション手法を備えています。

要約(オリジナル)

This work presents the early development of a model of image captioning for the Brazilian Portuguese language. We used the GRIT (Grid – and Region-based Image captioning Transformer) model to accomplish this work. GRIT is a Transformer-only neural architecture that effectively utilizes two visual features to generate better captions. The GRIT method emerged as a proposal to be a more efficient way to generate image captioning. In this work, we adapt the GRIT model to be trained in a Brazilian Portuguese dataset to have an image captioning method for the Brazilian Portuguese Language.

arxiv情報

著者	Rafael Silva de Alencar,William Alberto Cruz Castañeda,Marcellus Amadeus
発行日	2024-02-07 18:57:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Image captioning for Brazilian Portuguese using GRIT model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー