投稿者「jarxiv」のアーカイブ

GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

投稿日: 2025年4月16日作成者: jarxiv

要約グラフィカルユーザーインターフェイス（GUI）エージェントの構築における既 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.HC | コメントを受け付けていません

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

投稿日: 2025年4月16日作成者: jarxiv

要約 Native Multimodal Pre-Trainingパラダイムを備 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge

投稿日: 2025年4月16日作成者: jarxiv

要約現在のマルチモーダルベンチマークは、多くの場合、推論とドメイン固有の知識を … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA

投稿日: 2025年4月16日作成者: jarxiv

要約チェックボックスは、ダニの有無がデータの抽出と意思決定プロセスを直接通知す … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Opinion: Revisiting synthetic data classifications from a privacy perspective

投稿日: 2025年4月16日作成者: jarxiv

要約合成データは、既存の知識から生成されるか、実際のデータから導出されたAI開 … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Trade-offs in Privacy-Preserving Eye Tracking through Iris Obfuscation: A Benchmarking Study

投稿日: 2025年4月16日作成者: jarxiv

要約ハードウェア、コンピューターグラフィックス、AIの最近の開発により、AR/ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments

投稿日: 2025年4月16日作成者: jarxiv

要約ゼロショット設定の下で、連続環境（VLN-CE）におけるビジョン言語ナビゲ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

GPS: Distilling Compact Memories via Grid-based Patch Sampling for Efficient Online Class-Incremental Learning

投稿日: 2025年4月16日作成者: jarxiv

要約オンラインクラスインクリメンタル学習は、壊滅的な忘却を緩和しながら、過去の … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Bi-directional Momentum-based Haptic Feedback and Control System for In-Hand Dexterous Telemanipulation

投稿日: 2025年4月15日作成者: jarxiv

要約手元の器用なテレマニピュレーションには、ロボットの正確なリモートモーション … 続きを読む →

カテゴリー: cs.RO | コメントを受け付けていません

UruBots RoboCup Work Team Description Paper

投稿日: 2025年4月15日作成者: jarxiv

要約この作品は、Robocup Work Leagueのチーム説明論文を提示し … 続きを読む →

カテゴリー: cs.RO, cs.SY, eess.SY | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge

Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA

Opinion: Revisiting synthetic data classifications from a privacy perspective

Trade-offs in Privacy-Preserving Eye Tracking through Iris Obfuscation: A Benchmarking Study

Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments

GPS: Distilling Compact Memories via Grid-based Patch Sampling for Efficient Online Class-Incremental Learning

Bi-directional Momentum-based Haptic Feedback and Control System for In-Hand Dexterous Telemanipulation

UruBots RoboCup Work Team Description Paper

最近の投稿

最近のコメント

アーカイブ

カテゴリー