「cs.CV」カテゴリーアーカイブ

Retrieval-Based Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios

投稿日: 2025年4月9日作成者: jarxiv

要約大規模な言語モデルの推論を促すチェーン（COT）は、テキストの手がかりと記 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Privacy Attacks on Image AutoRegressive Models

投稿日: 2025年4月9日作成者: jarxiv

要約画像の自己回帰生成は、画像の自己回帰モデル（IAR）が画像品質（FID：1 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

投稿日: 2025年4月9日作成者: jarxiv

要約テキストからイメージ（T2I）拡散/フローモデルは、柔軟な視覚的な創造物を … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Monitoring Viewer Attention During Online Ads

投稿日: 2025年4月9日作成者: jarxiv

要約今日、ビデオ広告は多数のオンラインプラットフォームに広がり、世界中の何百万 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Transfer between Modalities with MetaQueries

投稿日: 2025年4月9日作成者: jarxiv

要約統一されたマルチモーダルモデルは、理解（テキスト出力）と生成（ピクセル出力 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PainNet: Statistical Relation Network with Episode-Based Training for Pain Estimation

投稿日: 2025年4月9日作成者: jarxiv

要約表情からの痛みを推定する際のスパンにもかかわらず、限られた作品は、患者によ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

OmniSVG: A Unified Scalable Vector Graphics Generation Model

投稿日: 2025年4月9日作成者: jarxiv

要約 Scalable Vector Graphics（SVG）は、解像度の独立 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

D^2USt3R: Enhancing 3D Reconstruction with 4D Pointmaps for Dynamic Scenes

投稿日: 2025年4月9日作成者: jarxiv

要約動的シーンでの3D再構成のタスクに対処します。オブジェクトの動きは、もとも … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A Taxonomy of Self-Handover

投稿日: 2025年4月9日作成者: jarxiv

要約自分の手の間にオブジェクトを転送する自己携帯は、一般的ではあるが理解されて … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

Advancing Egocentric Video Question Answering with Multimodal Large Language Models

投稿日: 2025年4月8日作成者: jarxiv

要約エゴセントリックビデオ質問応答（QA）では、モデルが長距離の時間的推論、一 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Retrieval-Based Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios

Privacy Attacks on Image AutoRegressive Models

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

Monitoring Viewer Attention During Online Ads

Transfer between Modalities with MetaQueries

PainNet: Statistical Relation Network with Episode-Based Training for Pain Estimation

OmniSVG: A Unified Scalable Vector Graphics Generation Model

D^2USt3R: Enhancing 3D Reconstruction with 4D Pointmaps for Dynamic Scenes

A Taxonomy of Self-Handover

Advancing Egocentric Video Question Answering with Multimodal Large Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー