月別アーカイブ: 2024年6月

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

投稿日: 2024年6月7日作成者: jarxiv

要約この研究では、多段階の意思決定タスクに直面したときの変圧器の訓練損失の急速 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

投稿日: 2024年6月7日作成者: jarxiv

要約 ShareGPT4Video シリーズを紹介します。これは、高密度で正確な … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Parameter-Inverted Image Pyramid Networks

投稿日: 2024年6月7日作成者: jarxiv

要約画像ピラミッドは、画像を正確に理解するためにマルチスケール特徴を取得するた … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

投稿日: 2024年6月7日作成者: jarxiv

要約拡散ベースの画像生成モデルは、高品質のコンテンツを合成する機能を示すことに … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Coarse-To-Fine Tensor Trains for Compact Visual Representations

投稿日: 2024年6月7日作成者: jarxiv

要約ビジュアルデータのコンパクトで高品質、最適化が容易な表現を学習する機能は … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

投稿日: 2024年6月7日作成者: jarxiv

要約ほとんどの大規模マルチモーダルモデル (LMM) は、ビジュアルトーク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Coherent Zero-Shot Visual Instruction Generation

投稿日: 2024年6月7日作成者: jarxiv

要約テキストから画像への合成、特に拡散モデルの進歩にもかかわらず、一連のステッ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation

投稿日: 2024年6月7日作成者: jarxiv

要約ロボット操作の基本的な目的は、モデルが視覚的なシーンを理解し、アクションを … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

投稿日: 2024年6月7日作成者: jarxiv

要約近年、3D 生成モデルの開発が急速に進み、3D オブジェクトの動的な動きの … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

GLACE: Global Local Accelerated Coordinate Encoding

投稿日: 2024年6月7日作成者: jarxiv

要約シーン座標回帰 (SCR) メソッドは、カメラの姿勢推定のために 2D-3 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年6月

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Parameter-Inverted Image Pyramid Networks

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Coarse-To-Fine Tensor Trains for Compact Visual Representations

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

Coherent Zero-Shot Visual Instruction Generation

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

GLACE: Global Local Accelerated Coordinate Encoding

最近の投稿

最近のコメント

アーカイブ

カテゴリー