投稿者「jarxiv」のアーカイブ

FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning

投稿日: 2025年5月20日作成者: jarxiv

要約顔の感情分析（FEA）は、顔のデータに基づいて人の感情状態を推測することを … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning

投稿日: 2025年5月20日作成者: jarxiv

要約ビジョン言語モデル（VLM）は、多くの直接的なマルチモーダルタスクで優れて … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

投稿日: 2025年5月20日作成者: jarxiv

要約マルチモーダルの大手言語モデル（MLLM）は視覚言語の理解において印象的な … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Understanding Complexity in VideoQA via Visual Program Generation

投稿日: 2025年5月20日作成者: jarxiv

要約ビデオ質問（VideoQA）のクエリの複雑さを分析するためのデータ駆動型の … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

投稿日: 2025年5月20日作成者: jarxiv

要約大規模な言語モデルのサイズが指数関数的に成長するにつれて、GPUメモリは、 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture

投稿日: 2025年5月20日作成者: jarxiv

要約高品質の運動分析へのより広範なアクセスは、運動障害のより詳細な特性評価と介 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance

投稿日: 2025年5月20日作成者: jarxiv

要約ビデオ生成の大幅な進歩にもかかわらず、特に細粒のセマンティクスと複雑な時間 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation

投稿日: 2025年5月20日作成者: jarxiv

要約自己回帰（AR）モデルは最近、画像生成で強力なパフォーマンスを示しています … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos

投稿日: 2025年5月20日作成者: jarxiv

要約現在、ほとんどすべての最先端の新規ビューの統合と再構築モデルは、校正カメラ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

投稿日: 2025年5月20日作成者: jarxiv

要約物理的なAIシステムは、物理的な世界で複雑な行動を認識し、理解し、実行する … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning

G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

Understanding Complexity in VideoQA via Visual Program Generation

Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture

FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance

VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation

Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos

Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

最近の投稿

最近のコメント

アーカイブ

カテゴリー