月別アーカイブ: 2024年3月

Embodied Understanding of Driving Scenarios

投稿日: 2024年3月8日作成者: jarxiv

要約身体化されたシーンの理解は、自律エージェントがオープンな運転シナリオを認識 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

投稿日: 2024年3月8日作成者: jarxiv

要約トランスフォーマーはコンピュータービジョンと自然言語処理に革命をもたらし … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A Domain Translation Framework with an Adversarial Denoising Diffusion Model to Generate Synthetic Datasets of Echocardiography Images

投稿日: 2024年3月8日作成者: jarxiv

要約現在、医療画像ドメインの翻訳業務は、研究者や臨床医からの高い需要を示してい … 続きを読む →

カテゴリー: cs.AI, cs.CV, eess.IV | コメントを受け付けていません

High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention

投稿日: 2024年3月8日作成者: jarxiv

要約このペーパーでは、動的な推論コストとトップダウンのアテンションメカニズム … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder

投稿日: 2024年3月8日作成者: jarxiv

要約医療分析の分野では、マスクされたオートエンコーダー (MAE) とマルチモ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG, eess.IV | コメントを受け付けていません

Pix2Gif: Motion-Guided Diffusion for GIF Generation

投稿日: 2024年3月8日作成者: jarxiv

要約私たちは、画像から GIF (ビデオ) への生成のためのモーションガイド付 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

投稿日: 2024年3月8日作成者: jarxiv

要約このペーパーでは、豊富で複雑な動的なオーディオビジュアルコンポーネントで … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Geometry-Guided Ray Augmentation for Neural Surface Reconstruction with Sparse Views

投稿日: 2024年3月8日作成者: jarxiv

要約この論文では、まばらな多視点画像から 3D シーンとオブジェクトを再構成す … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention

投稿日: 2024年3月8日作成者: jarxiv

要約顔と声が互いに密接に関連しているため、視聴覚融合を使用した個人または身元確 … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

Dynamic Cross Attention for Audio-Visual Person Verification

投稿日: 2024年3月8日作成者: jarxiv

要約個人または身元の検証は、主に顔や音声などの個別のモダリティを使用して研究さ … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

月別アーカイブ: 2024年3月

Embodied Understanding of Driving Scenarios

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

A Domain Translation Framework with an Adversarial Denoising Diffusion Model to Generate Synthetic Datasets of Echocardiography Images

High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention

MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder

Pix2Gif: Motion-Guided Diffusion for GIF Generation

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Geometry-Guided Ray Augmentation for Neural Surface Reconstruction with Sparse Views

Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention

Dynamic Cross Attention for Audio-Visual Person Verification

最近の投稿

最近のコメント

アーカイブ

カテゴリー