月別アーカイブ: 2024年7月

LightStereo: Channel Boost Is All Your Need for Efficient 2D Cost Aggregation

投稿日: 2024年7月1日作成者: jarxiv

要約私たちは、マッチングプロセスを加速するために作られた最先端のステレオマッチ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Kandinsky 3.0 Technical Report

投稿日: 2024年7月1日作成者: jarxiv

要約我々は、潜在拡散に基づく大規模なテキストから画像への生成モデルである Ka … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM | コメントを受け付けていません

StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction

投稿日: 2024年7月1日作成者: jarxiv

要約 3D マルチオブジェクト追跡と軌道予測は、自動運転システムにおける 2 つ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

FootBots: A Transformer-based Architecture for Motion Prediction in Soccer

投稿日: 2024年7月1日作成者: jarxiv

要約サッカーの動き予測には、プレーヤーとボールの相互作用から複雑なダイナミクス … 続きを読む →

カテゴリー: cs.CV, cs.MA | コメントを受け付けていません

Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning

投稿日: 2024年7月1日作成者: jarxiv

要約 Contrastive Vision-Language Pre-train … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding

投稿日: 2024年7月1日作成者: jarxiv

要約数十分から数時間にわたる長いビデオを理解することは、ビデオの理解に独特の課 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

On the Value of PHH3 for Mitotic Figure Detection on H&E-stained Images

投稿日: 2024年7月1日作成者: jarxiv

要約ヘマトキシリンおよびエオシン (H&E) で染色したスライドで観 … 続きを読む →

カテゴリー: cs.CV, q-bio.QM | コメントを受け付けていません

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models

投稿日: 2024年7月1日作成者: jarxiv

要約 VAE や拡散モデルなどの深層生成モデルは、潜在変数を活用してデータ分布を … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

投稿日: 2024年7月1日作成者: jarxiv

要約専門家混合 (MoE) は、大規模視覚言語モデル (LVLM) の研究にお … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Parallax-tolerant Image Stitching via Segmentation-guided Multi-homography Warping

投稿日: 2024年7月1日作成者: jarxiv

要約画像間の大きな視差は、画像スティッチングにおいては解決できない問題です。 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年7月

LightStereo: Channel Boost Is All Your Need for Efficient 2D Cost Aggregation

Kandinsky 3.0 Technical Report

StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction

FootBots: A Transformer-based Architecture for Motion Prediction in Soccer

Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning

InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding

On the Value of PHH3 for Mitotic Figure Detection on H&E-stained Images

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Parallax-tolerant Image Stitching via Segmentation-guided Multi-homography Warping

最近の投稿

最近のコメント

アーカイブ

カテゴリー