投稿者「jarxiv」のアーカイブ

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer

投稿日: 2025年5月8日作成者: jarxiv

要約このペーパーでは、リップリーディング用の効率的な視覚音声エンコーダーを紹介 … 続きを読む →

カテゴリー: cs.CV, eess.AS | コメントを受け付けていません

Deep residual learning with product units

投稿日: 2025年5月8日作成者: jarxiv

要約製品ユニットを残留ブロックに統合して、深い畳み込みネットワークの表現力とパ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

投稿日: 2025年5月8日作成者: jarxiv

要約近年、マルチモーダル理解モデルと画像生成モデルの両方で顕著な進歩が見られて … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MFSeg: Efficient Multi-frame 3D Semantic Segmentation

投稿日: 2025年5月8日作成者: jarxiv

要約効率的なマルチフレーム3Dセマンティックセグメンテーションフレームワークで … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

投稿日: 2025年5月8日作成者: jarxiv

要約高密度の視覚的予測タスクは、事前定義されたカテゴリへの依存によって制約され … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation

投稿日: 2025年5月8日作成者: jarxiv

要約任意のスタイル転送は、特定の芸術的画像のスタイルを別のコンテンツ画像に適用 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Illumination and Shadows in Head Rotation: experiments with Denoising Diffusion Models

投稿日: 2025年5月8日作成者: jarxiv

要約頭の回転中の照明と影の影響を正確にモデル化することは、画像のリアリズムを強 … 続きを読む →

カテゴリー: cs.CV, I.2.10 | コメントを受け付けていません

Deep Learning for Sea Surface Temperature Reconstruction under Cloud Occlusion

投稿日: 2025年5月8日作成者: jarxiv

要約雲のギャップの影響を受けた衛星画像からの海面温度（SST）の再構築は、過去 … 続きを読む →

カテゴリー: cs.CV, I.4.5 | コメントを受け付けていません

Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks

投稿日: 2025年5月8日作成者: jarxiv

要約 Sharpness-Aware Minimization（SAM）は、パラ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.IT, cs.LG, cs.NE, math.IT | コメントを受け付けていません

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

投稿日: 2025年5月8日作成者: jarxiv

要約 BardやGPT-4などの大規模なビジョン言語モデルの最新のブレークスルー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer

Deep residual learning with product units

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

MFSeg: Efficient Multi-frame 3D Semantic Segmentation

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation

Illumination and Shadows in Head Rotation: experiments with Denoising Diffusion Models

Deep Learning for Sea Surface Temperature Reconstruction under Cloud Occlusion

Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー