月別アーカイブ: 2023年8月

Whose Emotion Matters? Speaking Activity Localisation without Prior Knowledge

投稿日: 2023年8月16日作成者: jarxiv

要約会話中の感情認識 (ERC) のタスクは、たとえばビデオベースの Mult … 続きを読む →

カテゴリー: 68T20, cs.CV, cs.LG, cs.NE, cs.SD, eess.AS, I.2.0 | コメントを受け付けていません

Memory-and-Anticipation Transformer for Online Action Understanding

投稿日: 2023年8月16日作成者: jarxiv

要約既存の予測システムのほとんどは、さまざまな記憶メカニズムを使用して人間の予 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Tirtha — An Automated Platform to Crowdsource Images and Create 3D Models of Heritage Sites

投稿日: 2023年8月16日作成者: jarxiv

要約文化遺産 (CH) のデジタル保存は、自然災害や人間の活動による被害から保 … 続きを読む →

カテゴリー: cs.CV, cs.HC, cs.LG, I.4.5 | コメントを受け付けていません

A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert knowledge in text supervision

投稿日: 2023年8月16日作成者: jarxiv

要約財団のビジョン言語モデルは現在、コンピュータービジョンを変革しており、そ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Relightable and Animatable Neural Avatar from Sparse-View Video

投稿日: 2023年8月16日作成者: jarxiv

要約この論文では、未知の照明の下で動的な人間のスパースビュー (または単眼) … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

投稿日: 2023年8月16日作成者: jarxiv

要約私たちは、エゴ中心のビデオにおける時空間表現のパフォーマンスを向上させるた … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

投稿日: 2023年8月16日作成者: jarxiv

要約 GPT-4 や PaLM-2 などの大規模言語モデル (LLM) における … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

投稿日: 2023年8月16日作成者: jarxiv

要約我々は、新しいタイプのビデオ表現としてコンテンツ変形フィールド CoDeF … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Multiscale Attention via Wavelet Neural Operators for Vision Transformers

投稿日: 2023年8月16日作成者: jarxiv

要約トランスフォーマーは、コンピュータービジョンにおいて広く成功を収めてきま … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Large Language Models for Information Retrieval: A Survey

投稿日: 2023年8月16日作成者: jarxiv

要約情報取得の主要な手段として、検索エンジンなどの情報検索 (IR) システム … 続きを読む →

カテゴリー: cs.CL, cs.IR | コメントを受け付けていません

月別アーカイブ: 2023年8月

Whose Emotion Matters? Speaking Activity Localisation without Prior Knowledge

Memory-and-Anticipation Transformer for Online Action Understanding

Tirtha — An Automated Platform to Crowdsource Images and Create 3D Models of Heritage Sites

A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert knowledge in text supervision

Relightable and Animatable Neural Avatar from Sparse-View Video

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Multiscale Attention via Wavelet Neural Operators for Vision Transformers

Large Language Models for Information Retrieval: A Survey

最近の投稿

最近のコメント

アーカイブ

カテゴリー