ReactGenie: An Object-Oriented State Abstraction for Complex Multimodal Interactions Using Large Language Models

要約

マルチモーダルインタラクションは、従来のグラフィカルインターフェイスよりも柔軟で効率的で、さまざまなユーザーやタスクに適応できることがわかっています。
ただし、既存のマルチモーダル開発フレームワークは、マルチモーダルコマンドの複雑さと構成性を適切に処理できていないか、開発者がこれらのマルチモーダルインタラクションをサポートするために大量のコードを記述する必要があります。
このペーパーでは、共有オブジェクト指向状態抽象化を使用して複雑なマルチモーダルモバイルアプリケーションの構築をサポートするプログラミングフレームワークである ReactGenie を紹介します。
異なるモダリティが同じ状態抽象化を共有することで、開発者は ReactGenie を使用してこれらのモダリティをシームレスに統合および構成し、マルチモーダルインタラクションを実現できます。
ReactGenie は、React-Redux のワークフローと同様、グラフィカルアプリを構築する既存のワークフローを自然に拡張したものです。
開発者は、いくつかの注釈と例を追加するだけで、自然言語がプログラム内のユーザーがアクセス可能な関数にどのようにマッピングされるかを示すことができます。
ReactGenie は、大規模な言語モデルを活用するパーサーを生成することで、自然言語を理解するという複雑な問題を自動的に処理します。
ReactGenie フレームワークを使用して 3 つのデモアプリを構築し、それを評価しました。
クラウドワーカーから引き出したコマンドを使用して言語パーサーの精度を評価し、生成されたマルチモーダルアプリの使いやすさを 16 人の参加者で評価しました。
私たちの結果は、ReactGenie を使用して、高精度の言語パーサーを備えた多用途のマルチモーダルアプリケーションを構築できること、およびマルチモーダルアプリによりユーザーの認知負荷とタスク完了時間を短縮できることを示しています。

要約(オリジナル)

Multimodal interactions have been shown to be more flexible, efficient, and adaptable for diverse users and tasks than traditional graphical interfaces. However, existing multimodal development frameworks either do not handle the complexity and compositionality of multimodal commands well or require developers to write a substantial amount of code to support these multimodal interactions. In this paper, we present ReactGenie, a programming framework that uses a shared object-oriented state abstraction to support building complex multimodal mobile applications. Having different modalities share the same state abstraction allows developers using ReactGenie to seamlessly integrate and compose these modalities to deliver multimodal interaction. ReactGenie is a natural extension to the existing workflow of building a graphical app, like the workflow with React-Redux. Developers only have to add a few annotations and examples to indicate how natural language is mapped to the user-accessible functions in the program. ReactGenie automatically handles the complex problem of understanding natural language by generating a parser that leverages large language models. We evaluated the ReactGenie framework by using it to build three demo apps. We evaluated the accuracy of the language parser using elicited commands from crowd workers and evaluated the usability of the generated multimodal app with 16 participants. Our results show that ReactGenie can be used to build versatile multimodal applications with highly accurate language parsers, and the multimodal app can lower users’ cognitive load and task completion time.

arxiv情報

著者	Jackie,Yang,Karina Li,Daniel Wan Rosli,Shuning Zhang,Yuhan Zhang,Monica S. Lam,James A. Landay
発行日	2023-06-16 06:53:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ReactGenie: An Object-Oriented State Abstraction for Complex Multimodal Interactions Using Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー