AI and the Dynamic Supply of Training Data

要約

人工知能（AI）システムは、人間で生成されたデータに大きく依存していますが、そのデータの背後にある人々はしばしば見落とされています。
人間の行動は、既存の作品へのアクセスを制限したり、どのタイプの新しい作品を作成したり、まさに作成するかを決定することで、AIトレーニングデータセットで大きな役割を果たすことができます。
作成者の行動の変化が、作業が商用AIのトレーニングデータになったときに調べます。
具体的には、約600万の高品質の写真とイラストを備えた人気のあるストックイメージプラットフォームであるUnsplashの貢献者に焦点を当てています。
2020年の夏、Unsplashは研究プログラムを開始し、商用AI使用のために25,000の画像のデータセットをリリースしました。
貢献者の反応を研究し、このデータセットに作品が含まれていた貢献者との貢献者を比較しています。
我々の結果は、扱われた貢献者がプラットフォームをより高い速度でプラットフォームを残し、新しいアップロードの速度を大幅に減速させたことを示唆しています。
プロの写真家と影響を受けたユーザーがより強く影響を受けたユーザーは、アマチュアや影響を受けないユーザーよりも強い反応を示しました。
また、影響を受けるユーザーがプラットフォームへの貢献の多様性と斬新さを変えたことを示しています。
私たちの調査結果は、重要なトレードオフを強調しています。AI機能を拡大するためのドライブと、トレーニングデータを作成するインセンティブです。
ダイナミック補償スキームや構造化されたデータ市場を含む政策提案を、データフロンティアでインセンティブを再編成することを締めくくります。

要約(オリジナル)

Artificial intelligence (AI) systems rely heavily on human-generated data, yet the people behind that data are often overlooked. Human behavior can play a major role in AI training datasets, be it in limiting access to existing works or in deciding which types of new works to create or whether to create any at all. We examine creators’ behavioral change when their works become training data for commercial AI. Specifically, we focus on contributors on Unsplash, a popular stock image platform with about 6 million high-quality photos and illustrations. In the summer of 2020, Unsplash launched a research program and released a dataset of 25,000 images for commercial AI use. We study contributors’ reactions, comparing contributors whose works were included in this dataset to contributors whose works were not. Our results suggest that treated contributors left the platform at a higher-than-usual rate and substantially slowed down the rate of new uploads. Professional photographers and more heavily affected users had a stronger reaction than amateurs and less affected users. We also show that affected users changed the variety and novelty of contributions to the platform, which can potentially lead to lower-quality AI outputs in the long run. Our findings highlight a critical trade-off: the drive to expand AI capabilities versus the incentives of those producing training data. We conclude with policy proposals, including dynamic compensation schemes and structured data markets, to realign incentives at the data frontier.

arxiv情報

著者	Christian Peukert,Florian Abeillon,Jérémie Haese,Franziska Kaiser,Alexander Staub
発行日	2025-06-04 15:28:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AI and the Dynamic Supply of Training Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー