「cs.AI」カテゴリーアーカイブ

ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet

投稿日: 2024年12月10日作成者: jarxiv

要約ディープラーニングは、その並外れた有効性と多くの分野への適用性により、広く … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty

投稿日: 2024年12月10日作成者: jarxiv

要約生成 AI モデルに対するユーザープロンプトは、多くの場合、仕様が不十分 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Visual Lexicon: Rich Image Features in Language Space

投稿日: 2024年12月10日作成者: jarxiv

要約私たちは、自然言語で伝えるのが難しい複雑な視覚的詳細を保持しながら、豊富な … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models

投稿日: 2024年12月10日作成者: jarxiv

要約大規模視覚言語モデル (LVLM) は、入力された視覚コンテンツと相関する … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving

投稿日: 2024年12月10日作成者: jarxiv

要約動的シーンのリアルタイム 4D 再構成は、自動運転の知覚にとって依然として … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies

投稿日: 2024年12月10日作成者: jarxiv

要約さまざまな環境条件やオブジェクトインスタンスを堅牢に処理できる一般化可能 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

[MASK] is All You Need

投稿日: 2024年12月10日作成者: jarxiv

要約生成モデルでは、次のセット予測ベースのマスク生成モデルと次のノイズ予測ベー … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

投稿日: 2024年12月10日作成者: jarxiv

要約オープンセット障害の自動検出と防止は、閉ループロボットシステムにおいて非常 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

APOLLO: SGD-like Memory, AdamW-level Performance

投稿日: 2024年12月10日作成者: jarxiv

要約大規模言語モデル (LLM) は、特に人気のある AdamW オプティマイ … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.PF | コメントを受け付けていません

Enhancing FKG.in: automating Indian food composition analysis

投稿日: 2024年12月10日作成者: jarxiv

要約この論文では、インド料理のナレッジグラフ (FKG.in) と LLM … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.IR | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet

Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty

Visual Lexicon: Rich Image Features in Language Space

Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models

Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving

P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies

[MASK] is All You Need

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

APOLLO: SGD-like Memory, AdamW-level Performance

Enhancing FKG.in: automating Indian food composition analysis

最近の投稿

最近のコメント

アーカイブ

カテゴリー