「cs.AI」カテゴリーアーカイブ

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

投稿日: 2024年5月29日作成者: jarxiv

要約言語は、トレーニング領域での経験的な発見なしに、ビジョンエンコーダーをさ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives

投稿日: 2024年5月29日作成者: jarxiv

要約最近のビデオ生成モデルは主に、修復やスタイル編集などの特定のタスクについて … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Why are Visually-Grounded Language Models Bad at Image Classification?

投稿日: 2024年5月29日作成者: jarxiv

要約画像分類は、マシンビジョンインテリジェンスの最も基本的な機能の 1 つ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

投稿日: 2024年5月29日作成者: jarxiv

要約最近、線形複雑性シーケンスモデリングネットワークは、FLOP とメモリ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

GFlow: Recovering 4D World from Monocular Video

投稿日: 2024年5月29日作成者: jarxiv

要約ビデオ入力から 4D シーンを再構築することは、重要かつ困難な作業です。 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

投稿日: 2024年5月29日作成者: jarxiv

要約大規模な事前トレーニングを備えた拡散モデルは、特に拡散トランスフォーマー … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

投稿日: 2024年5月29日作成者: jarxiv

要約成長を続ける LLM のエコシステムにより、膨大なオプションの中で微調整す … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

投稿日: 2024年5月29日作成者: jarxiv

要約低忠実度シミュレータを使用して、制約付き群衆ナビゲーションのための強化学習 … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding

投稿日: 2024年5月29日作成者: jarxiv

要約視覚的なグラウンディングは、ユーザーが指定したテキストクエリを画像内のク … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Double Correction Framework for Denoising Recommendation

投稿日: 2024年5月29日作成者: jarxiv

要約オンラインサービスでの可用性と汎用性により、暗黙的なフィードバックはレコ … 続きを読む →

カテゴリー: cs.AI, cs.IR | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives

Why are Visually-Grounded Language Models Bad at Image Classification?

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

GFlow: Recovering 4D World from Monocular Video

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding

Double Correction Framework for Denoising Recommendation

最近の投稿

最近のコメント

アーカイブ

カテゴリー