The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings

要約

5 ドルのモデルは、エンコードされたテキストプロンプトから低次元の画像を生成する、軽量のテキストから画像への生成アーキテクチャです。
このモデルは、限られた量のトレーニングデータを使用して、低次元ドメインで正確で見た目に美しいコンテンツを正常に生成できます。
モデルとデータセットの両方のサイズが小さいにもかかわらず、生成された画像はテキストプロンプトのエンコードされた意味的意味を維持できます。
このモデルを 3 つの小さなデータセット (ピクセルアートビデオゲームマップ、ビデオゲームスプライト画像、ダウンスケール絵文字画像) に適用し、新しい拡張戦略を適用して、これらの限られたデータセットでのモデルのパフォーマンスを向上させます。
CLIP VIT-B/32 モデルによって生成されたテキストと画像のペア間のコサイン類似性スコアを使用して、モデルのパフォーマンスを評価します。

要約(オリジナル)

The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model.

arxiv情報

著者	Timothy Merino,Roman Negri,Dipika Rajesh,M Charity,Julian Togelius
発行日	2023-08-08 05:16:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー