Fine-Tuning Large Audio-Language Models with LoRA for Precise Temporal Localization of Prolonged Exposure Therapy Elements

要約

長期曝露（PE）療法は、心的外傷後ストレス障害（PTSD）の効果的な治療法ですが、セッション記録の手動レビューの必要性があるため、セラピストの忠実度を評価することは労働集約型のままです。
セッションオーディオとトランスクリプトから直接、開始時間と停止時間を特定する主要なPE忠実度要素の自動時間局在の方法を提示します。
私たちのアプローチは、低ランクの適応（LORA）を使用して、オーディオ転写入力の30秒の焦点を処理するために、低ランク適応（LORA）を使用して、大規模な訓練を受けたオーディオ言語モデルQWEN2-AUDIOを微調整します。
3つのコアプロトコルフェーズのフィデリティラベル – セラピスト志向（P1）、想像力曝露（P2）、および象徴的処理（P3） – は、LLMベースのプロンプトを介して生成され、訓練を受けた評価者によって検証されます。
このモデルは、タスク固有のプロンプトによって導かれたソフト監督を使用して、正規化された境界オフセットを予測するようにトレーニングされています。
313の実際のPEセッションのデータセットでは、最適な構成（LORAランク8、30Sウィンドウ）は、タスク全体で5.3秒の平均絶対誤差（MAE）を達成します。
さらに、ウィンドウサイズとロラランクの効果を分析し、コンテキストの粒度とモデル適応の重要性を強調します。
この作業では、PE療法における忠実度追跡のためのスケーラブルなフレームワークを紹介し、臨床医の訓練、監督、品質保証をサポートする可能性があります。

要約(オリジナル)

Prolonged Exposure (PE) therapy is an effective treatment for post-traumatic stress disorder (PTSD), but evaluating therapist fidelity remains labor-intensive due to the need for manual review of session recordings. We present a method for the automatic temporal localization of key PE fidelity elements — identifying their start and stop times — directly from session audio and transcripts. Our approach fine-tunes a large pre-trained audio-language model, Qwen2-Audio, using Low-Rank Adaptation (LoRA) to process focused 30-second windows of audio-transcript input. Fidelity labels for three core protocol phases — therapist orientation (P1), imaginal exposure (P2), and post-imaginal processing (P3) — are generated via LLM-based prompting and verified by trained raters. The model is trained to predict normalized boundary offsets using soft supervision guided by task-specific prompts. On a dataset of 313 real PE sessions, our best configuration (LoRA rank 8, 30s windows) achieves a mean absolute error (MAE) of 5.3 seconds across tasks. We further analyze the effects of window size and LoRA rank, highlighting the importance of context granularity and model adaptation. This work introduces a scalable framework for fidelity tracking in PE therapy, with potential to support clinician training, supervision, and quality assurance.

arxiv情報

著者	Suhas BN,Andrew M. Sherrill,Jyoti Alaparthi,Dominik Mattioli,Rosa I. Arriaga,Chris W. Wiese,Saeed Abdullah
発行日	2025-06-11 13:21:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fine-Tuning Large Audio-Language Models with LoRA for Precise Temporal Localization of Prolonged Exposure Therapy Elements

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー