PaliGemma: A versatile 3B VLM for transfer

要約

PaliGemma は、SigLIP-So400m ビジョンエンコーダと Gemma-2B 言語モデルに基づくオープンビジョン言語モデル (VLM) です。
これは、効果的に転送できる多用途で幅広い知識を備えた基本モデルとなるようにトレーニングされています。
オープンワールドのさまざまなタスクで強力なパフォーマンスを実現します。
私たちは、標準的な VLM ベンチマークだけでなく、リモートセンシングやセグメンテーションなどのより特殊なタスクを含む、約 40 の多様なタスクに関して PaliGemma を評価しています。

要約(オリジナル)

PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.

arxiv情報

著者	Lucas Beyer,Andreas Steiner,André Susano Pinto,Alexander Kolesnikov,Xiao Wang,Daniel Salz,Maxim Neumann,Ibrahim Alabdulmohsin,Michael Tschannen,Emanuele Bugliarello,Thomas Unterthiner,Daniel Keysers,Skanda Koppula,Fangyu Liu,Adam Grycner,Alexey Gritsenko,Neil Houlsby,Manoj Kumar,Keran Rong,Julian Eisenschlos,Rishabh Kabra,Matthias Bauer,Matko Bošnjak,Xi Chen,Matthias Minderer,Paul Voigtlaender,Ioana Bica,Ivana Balazevic,Joan Puigcerver,Pinelopi Papalampidi,Olivier Henaff,Xi Xiong,Radu Soricut,Jeremiah Harmsen,Xiaohua Zhai
発行日	2024-07-10 14:57:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PaliGemma: A versatile 3B VLM for transfer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー