VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning

要約

自然言語加工（NLP）は、事前に訓練されたタンパク質言語モデル（PLMS）が顕著な成功を示しているタンパク質工学を含む、人間の言語を超えた科学的領域に大きな影響を与えてきました。
ただし、データ収集、タスクベンチマーク、およびアプリケーションの課題により、学際的な採用は依然として限られています。
この作業では、生物学的データの検索、標準化されたタスクベンチマーク、およびPLMSのモジュラー微調整を統合する汎用性の高いエンジンであるVenusFactoryが提示されます。
VenusFactoryは、コンピューターサイエンスと生物学の両方のコミュニティを、コマンドライン実行とグラデーションベースのノーコードインターフェイスの両方の選択でサポートし、40ドル+$タンパク質関連のデータセットと40ドル+$の人気PLMを統合します。
すべての実装は、https：//github.com/tyang816/venusfactoryにオープンソースされています。

要約(オリジナル)

Natural language processing (NLP) has significantly influenced scientific domains beyond human language, including protein engineering, where pre-trained protein language models (PLMs) have demonstrated remarkable success. However, interdisciplinary adoption remains limited due to challenges in data collection, task benchmarking, and application. This work presents VenusFactory, a versatile engine that integrates biological data retrieval, standardized task benchmarking, and modular fine-tuning of PLMs. VenusFactory supports both computer science and biology communities with choices of both a command-line execution and a Gradio-based no-code interface, integrating $40+$ protein-related datasets and $40+$ popular PLMs. All implementations are open-sourced on https://github.com/tyang816/VenusFactory.

arxiv情報

著者	Yang Tan,Chen Liu,Jingyuan Gao,Banghao Wu,Mingchen Li,Ruilin Wang,Lingrong Zhang,Huiqun Yu,Guisheng Fan,Liang Hong,Bingxin Zhou
発行日	2025-03-19 17:19:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー