LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

要約

この研究では、テキストで事前トレーニングされた大規模言語モデル (LLM) の機能を拡張して、統合モデル内で 3D メッシュを生成することを検討しています。
これにより、(1) 3D チュートリアルなどのテキストソースから派生した、LLM にすでに組み込まれている空間知識を活用する、(2) 会話型 3D 生成とメッシュの理解を可能にするという重要な利点が得られます。
主な課題は、3D メッシュデータを、LLM がシームレスに処理できる個別のトークンに効果的にトークン化することです。
これに対処するために、3D メッシュの頂点座標と面定義をプレーンテキストとして表現する新しいアプローチである LLaMA-Mesh を導入します。これにより、語彙を拡張することなく LLM と直接統合できます。
私たちは、事前トレーニング済み LLM が (1) テキストプロンプトから 3D メッシュを生成し、(2) 必要に応じてインターリーブされたテキストと 3D メッシュ出力を生成し、(3) 3D メッシュを理解して解釈できるようにする教師あり微調整 (SFT) データセットを構築します。
私たちの研究は、LLM を微調整してテキストベース形式で 3D メッシュ生成のための複雑な空間知識を取得し、3D モダリティとテキストモダリティを効果的に統合できることを初めて実証しました。
LLaMA-Mesh は、強力なテキスト生成パフォーマンスを維持しながら、ゼロからトレーニングしたモデルと同等のメッシュ生成品質を実現します。

要約(オリジナル)

This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model. This offers key advantages of (1) leveraging spatial knowledge already embedded in LLMs, derived from textual sources like 3D tutorials, and (2) enabling conversational 3D generation and mesh understanding. A primary challenge is effectively tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly. To address this, we introduce LLaMA-Mesh, a novel approach that represents the vertex coordinates and face definitions of 3D meshes as plain text, allowing direct integration with LLMs without expanding the vocabulary. We construct a supervised fine-tuning (SFT) dataset enabling pretrained LLMs to (1) generate 3D meshes from text prompts, (2) produce interleaved text and 3D mesh outputs as required, and (3) understand and interpret 3D meshes. Our work is the first to demonstrate that LLMs can be fine-tuned to acquire complex spatial knowledge for 3D mesh generation in a text-based format, effectively unifying the 3D and text modalities. LLaMA-Mesh achieves mesh generation quality on par with models trained from scratch while maintaining strong text generation performance.

arxiv情報

著者	Zhengyi Wang,Jonathan Lorraine,Yikai Wang,Hang Su,Jun Zhu,Sanja Fidler,Xiaohui Zeng
発行日	2024-11-14 17:08:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー