AstroCLIP: A Cross-Modal Foundation Model for Galaxies

要約

我々は、銀河画像とスペクトルの両方を物理的に意味のある共有の潜在空間に埋め込むことができる単一の多用途モデルである AstroCLIP を紹介します。
これらの埋め込みは、モデルの微調整を行わずに、(1) 正確なモダリティ内およびクロスモダリティの意味的類似性検索、(2) 測光赤方偏移推定、(3) からの銀河特性推定などのさまざまな下流タスクに使用できます。
画像とスペクトルの両方、および (4) 形態分類。
AstroCLIP を実装するためのアプローチは 2 つの部分で構成されます。
まず、自己教師あり設定で個別のトランスフォーマーベースの画像エンコーダーとスペクトルエンコーダーを事前トレーニングすることにより、銀河画像とスペクトルを個別に埋め込みます。
次に、コントラスト損失を使用してエンコーダを調整します。
私たちは、ダークエネルギー分光装置からのスペクトルと、それに対応するレガシー画像調査からの画像にこの方法を適用します。
全体として、すべての下流タスクで、教師ありベースラインと比較しても、顕著なパフォーマンスが得られていることがわかります。
たとえば、測光赤方偏移予測のようなタスクでは、特別にトレーニングされた ResNet18 と同様のパフォーマンスが得られ、物理的特性の推定 (星の質量、年齢、金属性、sSFR) などの追加タスクでは、この教師付きベースラインを 19 上回ることがわかりました。
$R^2$ に換算すると %。
また、私たちの結果を銀河画像の最先端の自己教師あり単一モードモデルと比較し、私たちのアプローチが測光赤方偏移推定と物理的特性予測に関してこのベンチマークをおよそ 2 倍上回るパフォーマンスを示していることを発見しました。
$R^2$ ですが、形態分類に関してはほぼ一致しています。
最終的に、私たちのアプローチは、銀河のための最初のクロスモーダル自己教師ありモデル、および銀河画像とスペクトルのための最初の自己教師あり変換ベースのアーキテクチャを表します。

要約(オリジナル)

We present AstroCLIP, a single, versatile model that can embed both galaxy images and spectra into a shared, physically meaningful latent space. These embeddings can then be used – without any model fine-tuning – for a variety of downstream tasks including (1) accurate in-modality and cross-modality semantic similarity search, (2) photometric redshift estimation, (3) galaxy property estimation from both images and spectra, and (4) morphology classification. Our approach to implementing AstroCLIP consists of two parts. First, we embed galaxy images and spectra separately by pretraining separate transformer-based image and spectrum encoders in self-supervised settings. We then align the encoders using a contrastive loss. We apply our method to spectra from the Dark Energy Spectroscopic Instrument and images from its corresponding Legacy Imaging Survey. Overall, we find remarkable performance on all downstream tasks, even relative to supervised baselines. For example, for a task like photometric redshift prediction, we find similar performance to a specifically-trained ResNet18, and for additional tasks like physical property estimation (stellar mass, age, metallicity, and sSFR), we beat this supervised baseline by 19\% in terms of $R^2$. We also compare our results to a state-of-the-art self-supervised single-modal model for galaxy images, and find that our approach outperforms this benchmark by roughly a factor of two on photometric redshift estimation and physical property prediction in terms of $R^2$, while remaining roughly in-line in terms of morphology classification. Ultimately, our approach represents the first cross-modal self-supervised model for galaxies, and the first self-supervised transformer-based architectures for galaxy images and spectra.

arxiv情報

著者	Liam Parker,Francois Lanusse,Siavash Golkar,Leopoldo Sarra,Miles Cranmer,Alberto Bietti,Michael Eickenberg,Geraud Krawezik,Michael McCabe,Ruben Ohana,Mariel Pettee,Bruno Regaldo-Saint Blancard,Tiberiu Tesileanu,Kyunghyun Cho,Shirley Ho
発行日	2024-06-14 17:19:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AstroCLIP: A Cross-Modal Foundation Model for Galaxies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー