The Evolution of LLM Adoption in Industry Data Curation Practices

要約

大規模言語モデル (LLM) は非構造化テキストデータの処理にますます熟練するようになり、データキュレーションワークフローを強化する新たな機会を提供します。
このペーパーでは、大手テクノロジー企業の実務者の間での LLM 導入の進化を調査し、参加者の認識、統合戦略、報告された使用シナリオを通じて、データキュレーションタスクにおける LLM の影響を評価します。
一連の調査、インタビュー、ユーザー調査を通じて、組織が LLM 進化の極めて重要な瞬間をどのように乗り越えているかについてタイムリーなスナップショットを提供します。
2023 年第 2 四半期には、開発タスクに対する業界での LLM の導入を評価する調査を実施し (N=84)、2023 年第 3 四半期には、進化するデータニーズを評価するために専門家インタビューを促進しました (N=10)。 2024 年第 2 四半期には、実務者の現状を調査しました。
2 つの LLM ベースのプロトタイプ (N=12) を含むユーザー調査を通じて、LLM の使用を予測しました。
それぞれの研究は異なる研究目標に取り組んでいますが、全体として進化する LLM の使用法についてのより広範な物語を明らかにしました。
私たちは、ヒューリスティック優先のボトムアップアプローチから、LLM によってサポートされるインサイト優先のトップダウンワークフローへのデータ理解の新たな変化を発見しました。
さらに、より複雑なデータ状況に対応するために、データ専門家は現在、LLM が生成した「シルバー」データセットと、多様な専門家が厳選した厳密に検証された「スーパーゴールデン」データセットで、対象の専門家が作成した従来の「ゴールデンデータセット」を補完しています。
この研究は、非構造化データの大規模分析における LLM の革新的な役割に光を当て、さらなるツール開発の機会を浮き彫りにします。

要約(オリジナル)

As large language models (LLMs) grow increasingly adept at processing unstructured text data, they offer new opportunities to enhance data curation workflows. This paper explores the evolution of LLM adoption among practitioners at a large technology company, evaluating the impact of LLMs in data curation tasks through participants’ perceptions, integration strategies, and reported usage scenarios. Through a series of surveys, interviews, and user studies, we provide a timely snapshot of how organizations are navigating a pivotal moment in LLM evolution. In Q2 2023, we conducted a survey to assess LLM adoption in industry for development tasks (N=84), and facilitated expert interviews to assess evolving data needs (N=10) in Q3 2023. In Q2 2024, we explored practitioners’ current and anticipated LLM usage through a user study involving two LLM-based prototypes (N=12). While each study addressed distinct research goals, they revealed a broader narrative about evolving LLM usage in aggregate. We discovered an emerging shift in data understanding from heuristic-first, bottom-up approaches to insights-first, top-down workflows supported by LLMs. Furthermore, to respond to a more complex data landscape, data practitioners now supplement traditional subject-expert-created ‘golden datasets’ with LLM-generated ‘silver’ datasets and rigorously validated ‘super golden’ datasets curated by diverse experts. This research sheds light on the transformative role of LLMs in large-scale analysis of unstructured data and highlights opportunities for further tool development.

arxiv情報

著者	Crystal Qian,Michael Xieyang Liu,Emily Reif,Grady Simon,Nada Hussein,Nathan Clement,James Wexler,Carrie J. Cai,Michael Terry,Minsuk Kahng
発行日	2024-12-20 17:34:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Evolution of LLM Adoption in Industry Data Curation Practices

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー