What A Situated Language-Using Agent Must be Able to Do: A Top-Down Analysis


状況に応じたインタラクションは、自然言語処理の最後のフロンティアでもあります。テキスト処理の分野と比較して、過去 10 年間でほとんど進歩が見られず、無数の実用的なアプリケーションが解き放たれるのを待っています。
具体的には、表象的要求 (世界モデル、言語モデル、状況モデル、談話モデル、およびエージェント モデルの構築と適用) と、私がアンカー プロセスと呼ぶもの (増分処理、増分学習、会話的グラウンディング、マルチモーダル グラウンディング) について説明します。


Even in our increasingly text-intensive times, the primary site of language use is situated, co-present interaction. It is primary ontogenetically and phylogenetically, and it is arguably also still primary in negotiating everyday social situations. Situated interaction is also the final frontier of Natural Language Processing, where, compared to the area of text processing, very little progress has been made in the past decade, and where a myriad of practical applications is waiting to be unlocked. While the usual approach in the field is to reach, bottom-up, for the ever next ‘adjacent possible’, in this paper I attempt a top-down analysis of what the demands are that unrestricted situated interaction makes on the participating agent, and suggest ways in which this analysis can structure computational models and research on them. Specifically, I discuss representational demands (the building up and application of world model, language model, situation model, discourse model, and agent model) and what I call anchoring processes (incremental processing, incremental learning, conversational grounding, multimodal grounding) that bind the agent to the here, now, and us.


著者 David Schlangen
発行日 2023-02-16 21:30:26+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL パーマリンク