月別アーカイブ: 2024年7月

Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images

投稿日: 2024年7月12日作成者: jarxiv

要約 Visual Question Answering for Remote … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

High-resolution open-vocabulary object 6D pose estimation

投稿日: 2024年7月12日作成者: jarxiv

要約 6D 姿勢推定タスクにおける目に見えないオブジェクトへの一般化は非常に困難 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning

投稿日: 2024年7月12日作成者: jarxiv

要約本稿ではプロトタイプベースの視覚言語推論問題について考察する。既存の手法 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Still-Moving: Customized Video Generation without Customized Video Data

投稿日: 2024年7月12日作成者: jarxiv

要約 Text-to-Image (T2I) モデルのカスタマイズは、最近、特に … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Generalizable Implicit Motion Modeling for Video Frame Interpolation

投稿日: 2024年7月12日作成者: jarxiv

要約モーションモデリングは、フローベースのビデオフレーム補間 (VFI) … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SEED-Story: Multimodal Long Story Generation with Large Language Model

投稿日: 2024年7月12日作成者: jarxiv

要約画像生成とオープンフォームテキスト生成の目覚ましい進歩により、インターリー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic

投稿日: 2024年7月12日作成者: jarxiv

要約 SLEDGE は、現実世界の運転ログでトレーニングされた、車両動作計画のた … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models

投稿日: 2024年7月12日作成者: jarxiv

要約大規模言語モデルは、現在のトークンと以前のトークンの間の相関関係をモデル化 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Towards Efficient Deployment of Hybrid SNNs on Neuromorphic and Edge AI Hardware

投稿日: 2024年7月12日作成者: jarxiv

要約この論文では、ダイナミックビジョンセンサーによってキャプチャされたデー … 続きを読む →

カテゴリー: cs.AI, cs.AR, cs.CV, cs.LG, cs.NE | コメントを受け付けていません

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

投稿日: 2024年7月12日作成者: jarxiv

要約高解像度入力により、Large Vision-Language Model … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年7月

Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images

High-resolution open-vocabulary object 6D pose estimation

NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning

Still-Moving: Customized Video Generation without Customized Video Data

Generalizable Implicit Motion Modeling for Video Frame Interpolation

SEED-Story: Multimodal Long Story Generation with Large Language Model

SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic

Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models

Towards Efficient Deployment of Hybrid SNNs on Neuromorphic and Edge AI Hardware

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー