月別アーカイブ: 2024年6月

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

投稿日: 2024年6月3日作成者: jarxiv

要約汎用人工知能の探求において、マルチモーダル大規模言語モデル (MLLM) … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images

投稿日: 2024年6月3日作成者: jarxiv

要約視覚的にも物理的にも現実的なシミュレーションシーンを構築することは、ロボ … 続きを読む →

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction

投稿日: 2024年6月3日作成者: jarxiv

要約シングルビュー画像から 3D シーン表現を学習することは、入力ビューからは … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Anatomical Region Recognition and Real-time Bone Tracking Methods by Dynamically Decoding A-Mode Ultrasound Signals

投稿日: 2024年6月3日作成者: jarxiv

要約正確な骨の追跡は、整形外科や義肢ロボット工学における運動学的解析にとって非 … 続きを読む →

カテゴリー: cs.LG, cs.RO, eess.SP | コメントを受け付けていません

Iterative Feature Boosting for Explainable Speech Emotion Recognition

投稿日: 2024年6月3日作成者: jarxiv

要約音声感情認識 (SER) では、実際の重要性を考慮せずに事前定義された特徴 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS, I.2.1 | コメントを受け付けていません

Visual Attention Analysis in Online Learning

投稿日: 2024年6月3日作成者: jarxiv

要約このペーパーでは、マルチモーダル学習分析分野におけるアプローチを紹介します … 続きを読む →

カテゴリー: cs.CV, cs.HC, cs.LG | コメントを受け付けていません

Scaling White-Box Transformers for Vision

投稿日: 2024年6月3日作成者: jarxiv

要約 CRATE は、圧縮表現とスパース表現を学習するために設計されたホワイトボ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ParSEL: Parameterized Shape Editing with Language

投稿日: 2024年6月3日作成者: jarxiv

要約自然言語から 3D アセットを編集できる機能は、3D コンテンツ作成の民主 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.HC, cs.SC | コメントを受け付けていません

4DHands: Reconstructing Interactive Hands in 4D with Transformers

投稿日: 2024年6月3日作成者: jarxiv

要約この論文では、インタラクティブなハンドメッシュとその相対的な動きを単眼入 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

KerasCV and KerasNLP: Vision and Language Power-Ups

投稿日: 2024年6月3日作成者: jarxiv

要約コンピュータービジョンおよび自然言語処理ワークフロー用の Keras A … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.SE, I.2.10 | コメントを受け付けていません

月別アーカイブ: 2024年6月

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images

A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction

Anatomical Region Recognition and Real-time Bone Tracking Methods by Dynamically Decoding A-Mode Ultrasound Signals

Iterative Feature Boosting for Explainable Speech Emotion Recognition

Visual Attention Analysis in Online Learning

Scaling White-Box Transformers for Vision

ParSEL: Parameterized Shape Editing with Language

4DHands: Reconstructing Interactive Hands in 4D with Transformers

KerasCV and KerasNLP: Vision and Language Power-Ups

最近の投稿

最近のコメント

アーカイブ

カテゴリー