LaSagnA: Language-based Segmentation Assistant for Complex Queries

要約

最近の進歩により、Large Language Models for Vision (vLLM) が境界ボックスやマスクなどの詳細な知覚結果を生成できるようになりました。
それにもかかわらず、これらの vLLM のさらなる適用を制限する 2 つの制約があります。1 つはクエリごとに複数のターゲットを処理できないこと、もう 1 つはイメージ内にクエリオブジェクトが存在しないことを識別できないことです。
この研究では、これらの問題の主な原因はトレーニングクエリの複雑さが不十分であることを認識しています。
したがって、複雑なクエリ用の一般的なシーケンス形式を定義します。
次に、トレーニングデータの要件を満たすために、現在のパイプラインにセマンティックセグメンテーションタスクを組み込みます。
さらに、提案された形式の直接統合から生じる課題に効果的に対処するための 3 つの新しい戦略を紹介します。
複雑なクエリの処理におけるモデルの有効性は、クローズセットとオープンセットの両方のセマンティックセグメンテーションデータセットに対する従来の方法との比較可能な結果によって検証されます。
さらに、推論と参照セグメンテーションにおいて一連の vLLM よりも優れたパフォーマンスを示し、モデルの優れた機能を示しています。
コードは https://github.com/congvvc/LaSagnA でリリースされます。

要約(オリジナル)

Recent advancements have empowered Large Language Models for Vision (vLLMs) to generate detailed perceptual outcomes, including bounding boxes and masks. Nonetheless, there are two constraints that restrict the further application of these vLLMs: the incapability of handling multiple targets per query and the failure to identify the absence of query objects in the image. In this study, we acknowledge that the main cause of these problems is the insufficient complexity of training queries. Consequently, we define the general sequence format for complex queries. Then we incorporate a semantic segmentation task in the current pipeline to fulfill the requirements of training data. Furthermore, we present three novel strategies to effectively handle the challenges arising from the direct integration of the proposed format. The effectiveness of our model in processing complex queries is validated by the comparable results with conventional methods on both close-set and open-set semantic segmentation datasets. Additionally, we outperform a series of vLLMs in reasoning and referring segmentation, showcasing our model’s remarkable capabilities. We release the code at https://github.com/congvvc/LaSagnA.

arxiv情報

著者	Cong Wei,Haoxian Tan,Yujie Zhong,Yujiu Yang,Lin Ma
発行日	2024-04-12 14:40:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LaSagnA: Language-based Segmentation Assistant for Complex Queries

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー