「cs.AI」カテゴリーアーカイブ

DifFRelight: Diffusion-Based Facial Performance Relighting

投稿日: 2024年10月11日作成者: jarxiv

要約拡散ベースの画像間の変換を使用した、自由視点の顔のパフォーマンスの再照明の … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

投稿日: 2024年10月11日作成者: jarxiv

要約コードは、その精度と精度により、大規模な言語モデルの数学的推論能力を強化す … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

投稿日: 2024年10月11日作成者: jarxiv

要約この論文では、身体化された AI における 3D 空間認識の重要性を強調す … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision

投稿日: 2024年10月11日作成者: jarxiv

要約現在の大規模マルチモーダルモデル (LMM) は、モデルが言語コンポーネ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

投稿日: 2024年10月11日作成者: jarxiv

要約単一点教師あり指向物体検出は注目を集め、コミュニティ内で初期の進歩を遂げま … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts

投稿日: 2024年10月11日作成者: jarxiv

要約大規模ビジョン言語事前トレーニング (VLP) モデル (CLIP など) … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Identifying and Addressing Delusions for Target-Directed Decision-Making

投稿日: 2024年10月11日作成者: jarxiv

要約私たちは、意思決定時の計画中に目標を生成し、行動を導き、評価中により良い一 … 続きを読む →

カテゴリー: cs.AI | コメントを受け付けていません

Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond

投稿日: 2024年10月11日作成者: jarxiv

要約近年、トレーニングデータアトリビューション (TDA) 手法が、ニュー … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Context-Aware Command Understanding for Tabletop Scenarios

投稿日: 2024年10月11日作成者: jarxiv

要約この論文では、卓上シナリオで人間の自然なコマンドを解釈するように設計された … 続きを読む →

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

Grounding Robot Policies with Visuomotor Language Guidance

投稿日: 2024年10月11日作成者: jarxiv

要約自然言語処理とコンピュータービジョンの分野における最近の進歩により、大規 … 続きを読む →

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

DifFRelight: Diffusion-Based Facial Performance Relighting

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision

PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts

Identifying and Addressing Delusions for Target-Directed Decision-Making

Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond

Context-Aware Command Understanding for Tabletop Scenarios

Grounding Robot Policies with Visuomotor Language Guidance

最近の投稿

最近のコメント

アーカイブ

カテゴリー