月別アーカイブ: 2024年7月

LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

投稿日: 2024年7月15日作成者: jarxiv

要約カモフラージュされた視覚認識は、多くの実際的な応用例がある重要な視覚タスク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

投稿日: 2024年7月15日作成者: jarxiv

要約長い科学研究論文の中で疑問に対する答えを探すことは、読者が疑問に素早く対処 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

投稿日: 2024年7月15日作成者: jarxiv

要約視覚的なテキストのレンダリングは、現代のテキストから画像への生成モデルにと … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

OmniSat: Self-Supervised Modality Fusion for Earth Observation

投稿日: 2024年7月15日作成者: jarxiv

要約地球観測 (EO) の分野では、さまざまなセンサーからの豊富なデータが提供 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Improving Alignment and Robustness with Circuit Breakers

投稿日: 2024年7月15日作成者: jarxiv

要約 AI システムは有害な動作を行う可能性があり、敵対的な攻撃に対して非常に脆 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.CY, cs.LG | コメントを受け付けていません

Rethinking temporal self-similarity for repetitive action counting

投稿日: 2024年7月15日作成者: jarxiv

要約トリミングされていない長いビデオ内の反復アクションをカウントすることは、リ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

D2S: Representing sparse descriptors and 3D coordinates for camera relocalization

投稿日: 2024年7月15日作成者: jarxiv

要約最先端の視覚的位置特定手法は、主に、ローカル記述子と 3D 点群を照合する … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Let Me DeCode You: Decoder Conditioning with Tabular Data

投稿日: 2024年7月15日作成者: jarxiv

要約 3D セグメンテーションタスク用のディープニューラルネットワークのト … 続きを読む →

カテゴリー: cs.AI, cs.CV, eess.IV | コメントを受け付けていません

GraspXL: Generating Grasping Motions for Diverse Objects at Scale

投稿日: 2024年7月15日作成者: jarxiv

要約人間の手は、物体の特定の部分を掴んだり、目的の方向から近づいたりするなど、 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Facial Affective Behavior Analysis with Instruction Tuning

投稿日: 2024年7月15日作成者: jarxiv

要約顔の感情行動分析 (FABA) は、画像から人間の精神状態を理解するために … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年7月

LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

OmniSat: Self-Supervised Modality Fusion for Earth Observation

Improving Alignment and Robustness with Circuit Breakers

Rethinking temporal self-similarity for repetitive action counting

D2S: Representing sparse descriptors and 3D coordinates for camera relocalization

Let Me DeCode You: Decoder Conditioning with Tabular Data

GraspXL: Generating Grasping Motions for Diverse Objects at Scale

Facial Affective Behavior Analysis with Instruction Tuning

最近の投稿

最近のコメント

アーカイブ

カテゴリー