投稿者「jarxiv」のアーカイブ

Explicitly Modeling Subcortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness

投稿日: 2025年6月4日作成者: jarxiv

要約物体認識のために訓練された畳み込みニューラルネットワーク（CNN）は、高い … 続きを読む →

カテゴリー: cs.CV, q-bio.NC | コメントを受け付けていません

DPO Learning with LLMs-Judge Signal for Computer Use Agents

投稿日: 2025年6月4日作成者: jarxiv

要約コンピュータ・ユース・エージェント（CUA）は、グラフィカル・ユーザー・イ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens

投稿日: 2025年6月4日作成者: jarxiv

要約対照的言語-画像事前学習は、各モダリティ用の異なるエンコーダを介して、テキ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs

投稿日: 2025年6月4日作成者: jarxiv

要約マルチモーダル大規模言語モデル（MLLM）は、テキストと画像の両方を介した … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM | コメントを受け付けていません

EgoVLM: Policy Optimization for Egocentric Video Understanding

投稿日: 2025年6月4日作成者: jarxiv

要約ウェアラブルカメラや自律型エージェントなど、新たな具現化AIアプリケーショ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step

投稿日: 2025年6月4日作成者: jarxiv

要約 Stable Diffusion や DALL-E 3 のようなテキストベ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CR, cs.CV, cs.MM | コメントを受け付けていません

DyTact: Capturing Dynamic Contacts in Hand-Object Manipulation

投稿日: 2025年6月4日作成者: jarxiv

要約 AIキャラクタアニメーション、XR、ロボット工学において、手と物体の動的な … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions

投稿日: 2025年6月4日作成者: jarxiv

要約非剛体運動、カメラの視点移動、物体の変形、人間の関節運動、複雑なインタラク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Revisiting Continuity of Image Tokens for Cross-domain Few-shot Learning

投稿日: 2025年6月4日作成者: jarxiv

要約 Vision Transformer(ViT)は、一般的な領域での大規模な … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery

投稿日: 2025年6月4日作成者: jarxiv

要約リモートセンシング画像から個々の樹木を大規模に抽出することは、特に気候変動 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Explicitly Modeling Subcortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness

DPO Learning with LLMs-Judge Signal for Computer Use Agents

FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens

Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs

EgoVLM: Policy Optimization for Egocentric Video Understanding

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step

DyTact: Capturing Dynamic Contacts in Hand-Object Manipulation

ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions

Revisiting Continuity of Image Tokens for Cross-domain Few-shot Learning

Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery

最近の投稿

最近のコメント

アーカイブ

カテゴリー