月別アーカイブ: 2025年6月

Exploring Diffusion Transformer Designs via Grafting

投稿日: 2025年6月6日作成者: jarxiv

要約モデルアーキテクチャの設計には、オペレーター（注意、畳み込みなど）や構成（ … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis

投稿日: 2025年6月6日作成者: jarxiv

要約 Pathology Foundationモデル（PFM）は、スライド画像全 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MokA: Multimodal Low-Rank Adaptation for MLLMs

投稿日: 2025年6月6日作成者: jarxiv

要約この論文では、現在の最新のマルチモーダル微調整方法が主要な制限によって妨げ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Vision-Based Autonomous MM-Wave Reflector Using ArUco-Driven Angle-of-Arrival Estimation

投稿日: 2025年6月6日作成者: jarxiv

要約非表示（NLOS）条件における信頼できるミリ波（MMWAVE）コミュニケー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Quantifying Cross-Modality Memorization in Vision-Language Models

投稿日: 2025年6月6日作成者: jarxiv

要約トレーニング中にニューラルネットワークがどのように、どのように覚えているか … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding

投稿日: 2025年6月6日作成者: jarxiv

要約具体化された3D接地は、自我中心の視点から人間の指示に記載されているターゲ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

投稿日: 2025年6月6日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、視覚データとテキストデータの統 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View

投稿日: 2025年6月6日作成者: jarxiv

要約スパースビューからセマンティックアウェア3Dシーンを再構築することは、仮想 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

投稿日: 2025年6月6日作成者: jarxiv

要約最近、ビデオ拡散トランスのブレークスルーは、多様な運動世代に顕著な能力を示 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Towards Vision-Language-Garment Models For Web Knowledge Garment Understanding and Generation

投稿日: 2025年6月6日作成者: jarxiv

要約マルチモーダルファンデーションモデルは強力な一般化を実証していますが、衣服 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年6月

Exploring Diffusion Transformer Designs via Grafting

Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis

MokA: Multimodal Low-Rank Adaptation for MLLMs

Vision-Based Autonomous MM-Wave Reflector Using ArUco-Driven Angle-of-Arrival Estimation

Quantifying Cross-Modality Memorization in Vision-Language Models

Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View

Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

Towards Vision-Language-Garment Models For Web Knowledge Garment Understanding and Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー