LaVy: Vietnamese Multimodal Large Language Model

要約

大規模言語モデル (LLM) とマルチモーダル大規模言語モデル (MLLM) は、複雑な推論と言語理解における優れた能力で世界を席巻しました。
一方で、ベトナム語の大規模言語モデルに関連する作品は数多くありますが、マルチモダリティにおける高品質のリソースの不足により、ベトナム語の MLLM の進歩が制限されています。
このペーパーでは、最先端のベトナム語 MLLM である LaVy を導入することでこの問題に先駆的に取り組み、ベトナム語の視覚言語タスクに関する MLLM の理解を評価するために指定された LaVy-Bench ベンチマークも紹介します。
私たちのプロジェクトは https://github.com/baochi0212/LaVy で公開されています

要約(オリジナル)

Large Language Models (LLMs) and Multimodal Large language models (MLLMs) have taken the world by storm with impressive abilities in complex reasoning and linguistic comprehension. Meanwhile there are plethora of works related to Vietnamese Large Language Models, the lack of high-quality resources in multimodality limits the progress of Vietnamese MLLMs. In this paper, we pioneer in address this by introducing LaVy, a state-of-the-art Vietnamese MLLM, and we also introduce LaVy-Bench benchmark designated for evaluating MLLMs’s understanding on Vietnamese visual language tasks. Our project is public at https://github.com/baochi0212/LaVy

arxiv情報

著者	Chi Tran,Huong Le Thanh
発行日	2024-04-16 15:33:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LaVy: Vietnamese Multimodal Large Language Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー