「eess.AS」カテゴリーアーカイブ

SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation

投稿日: 2024年8月12日作成者: jarxiv

要約サウンドイベントの位置特定と検出 (SELD) タスクでは、Transf … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

投稿日: 2024年8月9日作成者: jarxiv

要約 Scale は自然言語処理の新たな境地を切り開きましたが、それには高いコス … 続きを読む →

カテゴリー: cs.CL, eess.AS, I.2.7 | コメントを受け付けていません

HydraFormer: One Encoder For All Subsampling Rates

投稿日: 2024年8月9日作成者: jarxiv

要約自動音声認識では、多様なシナリオに取り組むためにサブサンプリングが不可欠で … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Simulating Articulatory Trajectories with Phonological Feature Interpolation

投稿日: 2024年8月9日作成者: jarxiv

要約知覚-生成ループを含む音声学習の完全な計算モデルに向けた最初のステップとし … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Articulatory Configurations across Genders and Periods in French Radio and TV archives

投稿日: 2024年8月9日作成者: jarxiv

要約この論文では、音響パラメータから調音パラメータへの反転を使用して、性別およ … 続きを読む →

カテゴリー: cs.CL, cs.CY, cs.SD, eess.AS | コメントを受け付けていません

BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization

投稿日: 2024年8月8日作成者: jarxiv

要約残響環境における正確な音の定位は、人間の聴覚にとって不可欠です。最近、畳 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS, I.2 | コメントを受け付けていません

Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

投稿日: 2024年8月8日作成者: jarxiv

要約我々は、MASSIVE テキストコーパスの一部の音声対応物を構成する多言 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

投稿日: 2024年8月7日作成者: jarxiv

要約本稿では、音声表現に基づいてビデオシーケンス内の特定のオブジェクトを動的に … 続きを読む →

カテゴリー: cs.CV, cs.RO, eess.AS, eess.IV | コメントを受け付けていません

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

投稿日: 2024年8月7日作成者: jarxiv

要約大規模なマルチモダリティデータセットは、大規模なビデオ言語モデルの成功を … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation

投稿日: 2024年8月6日作成者: jarxiv

要約この論文では、特定の音楽コンテキストとよく調和する単一の楽器の音声録音を特 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation

U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

HydraFormer: One Encoder For All Subsampling Rates

Simulating Articulatory Trajectories with Phonological Feature Interpolation

Articulatory Configurations across Genders and Periods in French Radio and TV archives

BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization

Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation

最近の投稿

最近のコメント

アーカイブ

カテゴリー