投稿者「jarxiv」のアーカイブ

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

投稿日: 2025年6月4日作成者: jarxiv

要約実世界の3Dシーンを包括的に理解するためには、任意の、あるいは以前に見たこ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Effective Dual-Region Augmentation for Reduced Reliance on Large Amounts of Labeled Data

投稿日: 2025年6月4日作成者: jarxiv

要約本論文では、大規模なラベル付きデータセットへの依存を低減する一方で、ソース … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models

投稿日: 2025年6月4日作成者: jarxiv

要約テキストから画像への生成モデル～（Stable Diffusionなど）は … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SASP: Strip-Aware Spatial Perception for Fine-Grained Bird Image Classification

投稿日: 2025年6月4日作成者: jarxiv

要約きめ細かな鳥類画像分類（FBIC）は、生態学的モニタリングや種の同定に大き … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM

投稿日: 2025年6月4日作成者: jarxiv

要約最新のガウス散布法は、3Dシーンのリアルタイムフォトリアリスティックレンダ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification

投稿日: 2025年6月4日作成者: jarxiv

要約近年、畳み込みニューラルネットワーク（CNN）の性能が大幅に向上している。 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

ORV: 4D Occupancy-centric Robot Video Generation

投稿日: 2025年6月4日作成者: jarxiv

要約遠隔操作による実世界のロボットシミュレーションデータの取得は、時間と労力が … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SG2VID: Scene Graphs Enable Fine-Grained Control for Video Synthesis

投稿日: 2025年6月4日作成者: jarxiv

要約手術シミュレーションは、初心者の外科医を訓練し、学習曲線を加速させ、術中の … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation

投稿日: 2025年6月4日作成者: jarxiv

要約マルチモーダル大規模言語モデル(MLLM)の最新の進歩により、自律走行のた … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba

投稿日: 2025年6月4日作成者: jarxiv

要約人間と人間のインタラクション生成は、人間を社会的存在として理解する上で重要 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

Effective Dual-Region Augmentation for Reduced Reliance on Large Amounts of Labeled Data

EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models

SASP: Strip-Aware Spatial Perception for Fine-Grained Bird Image Classification

LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM

Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification

ORV: 4D Occupancy-centric Robot Video Generation

SG2VID: Scene Graphs Enable Fine-Grained Control for Video Synthesis

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation

InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba

最近の投稿

最近のコメント

アーカイブ

カテゴリー