ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain

要約

Transformer の設計は、自然言語処理タスクの事実上の標準です。
自然言語処理におけるトランスフォーマー設計の成功により、最近、コンピュータービジョンの分野の研究者の関心が高まっています。
畳み込みニューラルネットワーク (CNN) と比較すると、ビジョントランスフォーマー (ViT) は、多くの視覚問題に対する一般的かつ有力なソリューションになりつつあります。
トランスフォーマーベースのモデルは、さまざまなビジュアルベンチマークにおいて、畳み込みニューラルネットワークやリカレントニューラルネットワークなどの他のタイプのネットワークよりも優れたパフォーマンスを発揮します。
この作業では、さまざまなビジョントランスフォーマーモデルを個別のジョブに分割し、その利点と欠点を検討することで評価します。
ViT は、畳み込みニューラルネットワーク (CNN) を使用して起こり得るいくつかの困難を克服できます。
この調査の目的は、CV における ViT の最初の使用例を示すことです。
最初のフェーズでは、ViT が適切なさまざまな CV アプリケーションを分類します。
画像分類、オブジェクト識別、画像セグメンテーション、ビデオトランスフォーマー、画像ノイズ除去、NAS はすべて CV アプリケーションです。
次のステップは、各分野の最先端技術を分析し、現在利用可能なモデルを特定することです。
さらに、未解決の研究の多くの困難と将来の研究の可能性について概説します。

要約(オリジナル)

Transformer design is the de facto standard for natural language processing tasks. The success of the transformer design in natural language processing has lately piqued the interest of researchers in the domain of computer vision. When compared to Convolutional Neural Networks (CNNs), Vision Transformers (ViTs) are becoming more popular and dominant solutions for many vision problems. Transformer-based models outperform other types of networks, such as convolutional and recurrent neural networks, in a range of visual benchmarks. We evaluate various vision transformer models in this work by dividing them into distinct jobs and examining their benefits and drawbacks. ViTs can overcome several possible difficulties with convolutional neural networks (CNNs). The goal of this survey is to show the first use of ViTs in CV. In the first phase, we categorize various CV applications where ViTs are appropriate. Image classification, object identification, image segmentation, video transformer, image denoising, and NAS are all CV applications. Our next step will be to analyze the state-of-the-art in each area and identify the models that are currently available. In addition, we outline numerous open research difficulties as well as prospective research possibilities.

arxiv情報

著者	Md Sohag Mia,Abu Bakor Hayat Arnob,Abdu Naim,Abdullah Al Bary Voban,Md Shariful Islam
発行日	2023-10-13 14:36:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー