jarxiv | Japanese arxiv | ページ 243

A Comparative Study of SMT and MILP for the Nurse Rostering Problem

投稿日: 2025年6月3日作成者: jarxiv

要約

医療従事者のケアの質と労働条件に対する人員のスケジューリングの影響は、徹底的に文書化されています。
ただし、常に存在する需要と制約の大きな変動により、ヘルスケアのスケジューリングは特に困難になります。
この問題は数十年にわたって研究されており、限られた研究は満足度モジュロ理論（SMT）を適用することを目的としています。
SMTは、過去数十年で正式な検証コミュニティ内で勢いを増しており、標準的な数学プログラミング技術よりも優れていることが示されているSMTソルバーの進歩につながりました。
この作業では、幅広い現実世界のスケジューリング制約をモデル化できる一般的な制約定式化を提案します。
次に、一般的な制約はSMTおよびMILPの問題として策定され、学術的および実世界にインスパイアされた名簿の問題について、それぞれの最先端のソルバーであるZ3とGurobiを比較するために使用されます。
実験結果は、各ソルバーが特定の種類の問題にどのように優れているかを示しています。
MILPソルバーは通常、問題が高度に制約または実行不可能である場合にパフォーマンスが向上しますが、SMTソルバーはそうでない場合はパフォーマンスが向上します。
より多様なシフトと人員のセットを含む現実世界に触発された問題では、SMTソルバーが優れています。
さらに、実験中に、SMTソルバーは一般的な制約の策定方法により敏感であり、パフォーマンスを向上させるために慎重な検討と実験が必要であることが注目されました。
SMTベースの方法は、人事スケジューリングの領域内で将来の研究のための有望な手段を提示すると結論付けています。

要約(オリジナル)

The effects of personnel scheduling on the quality of care and working conditions for healthcare personnel have been thoroughly documented. However, the ever-present demand and large variation of constraints make healthcare scheduling particularly challenging. This problem has been studied for decades, with limited research aimed at applying Satisfiability Modulo Theories (SMT). SMT has gained momentum within the formal verification community in the last decades, leading to the advancement of SMT solvers that have been shown to outperform standard mathematical programming techniques. In this work, we propose generic constraint formulations that can model a wide range of real-world scheduling constraints. Then, the generic constraints are formulated as SMT and MILP problems and used to compare the respective state-of-the-art solvers, Z3 and Gurobi, on academic and real-world inspired rostering problems. Experimental results show how each solver excels for certain types of problems; the MILP solver generally performs better when the problem is highly constrained or infeasible, while the SMT solver performs better otherwise. On real-world inspired problems containing a more varied set of shifts and personnel, the SMT solver excels. Additionally, it was noted during experimentation that the SMT solver was more sensitive to the way the generic constraints were formulated, requiring careful consideration and experimentation to achieve better performance. We conclude that SMT-based methods present a promising avenue for future research within the domain of personnel scheduling.

arxiv情報

著者	Alvin Combrink,Stephie Do,Kristofer Bengtsson,Sabino Francesco Roselli,Martin Fabian
発行日	2025-06-02 06:55:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.SY, eess.SY | コメントを受け付けていません

ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction

投稿日: 2025年6月3日作成者: jarxiv

要約

LLMアライメントにおける最近の取り組みは、ヒトまたは人工知能（AI）アノテーターを介して大規模な選好データセットの構築に焦点を当てています。
ただし、そのようなアプローチは、インスタンスごとの監督に依存しており、実質的な注釈コストと制限された解釈可能性が発生します。
このホワイトペーパーでは、ベンチマークパフォーマンスから導き出されたモデルの動作知識を活用することにより、好みデータを構築するモデルの動作性ゼロアノレーションフレームワークであるZebraを提案します。
Zebraは、その原点モデルの品質と類似性を評価し、インスタンスレベルの注釈を完全にバイパスすることにより、応答ペアを双方向させます。
これにより、スケーラブル、制御可能、および費用対効果の高いアライメントデータ生成が可能になります。
経験的結果は、Zebraがマニュアルまたはモデルベースのラベル付けを必要としないにもかかわらず、インスタンス監視方法に匹敵するアライメントパフォーマンスを達成することを示しています。

要約(オリジナル)

Recent efforts in LLM alignment have focused on constructing large-scale preference datasets via human or Artificial Intelligence (AI) annotators. However, such approaches rely on instance-wise supervision, incurring substantial annotation cost and limited interpretability. In this paper, we propose ZEBRA – a model behavior-wise zero-annotation framework that constructs preference data by leveraging model behavior knowledge derived from benchmark performances. ZEBRA binarizes response pairs by evaluating the quality and similarity of their origin models, entirely bypassing instance-level annotation. This allows scalable, controllable, and cost-effective alignment data generation. Empirical results show that ZEBRA achieves alignment performance comparable to instance-supervised methods, despite requiring no manual or model-based labeling.

arxiv情報

著者	Jeesu Jung,Chanjun Park,Sangkeun Jung
発行日	2025-06-02 07:16:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Prediction hubs are context-informed frequent tokens in LLMs

投稿日: 2025年6月3日作成者: jarxiv

要約

ハブネス、いくつかのポイントが不釣り合いな数の他のポイントの最近隣人の1つになる傾向は、一般に高次元データに標準距離測定を適用すると発生し、多くの場合、距離ベースの分析に悪影響を及ぼします。
自己回帰の大手言語モデル（LLMS）は高次元表現で動作するため、それらがハブネスの影響を受けているかどうかを尋ねます。
まず、LLMSによって実行される唯一の大規模な表現比較操作、すなわち、継続確率を決定するためのコンテキストと具体化されていないベクターの間で、通常、偏見の葉の外観を引き起こす距離現象の濃度によって特徴付けられないことを証明します。
次に、この比較が依然として高度なハブネスにつながることを経験的に示しますが、この場合のハブは妨害を構成しません。
それらはむしろ、次のトークン予測の可能性のある候補者のプールにしばしば現れるコンテキストに変化する頻繁なトークンの結果です。
ただし、LLM表現を比較するために他の距離を使用している場合、同じ理論的保証はありません。実際、迷惑なハブが現れます。
2つの主要なポイントがあります。
第一に、高次元空間で遍在している一方で、ハブネスは、LLMが次のトークン予測に使用されているときに緩和する必要があるネガティブな特性ではありません。
第二に、ユークリッドまたはコサイン距離を使用してLLMSからの表現を比較する場合、迷惑なハブのリスクが高く、実務家は関連する場合は緩和技術を使用する必要があります。

要約(オリジナル)

Hubness, the tendency for a few points to be among the nearest neighbours of a disproportionate number of other points, commonly arises when applying standard distance measures to high-dimensional data, often negatively impacting distance-based analysis. As autoregressive large language models (LLMs) operate on high-dimensional representations, we ask whether they are also affected by hubness. We first prove that the only large-scale representation comparison operation performed by LLMs, namely that between context and unembedding vectors to determine continuation probabilities, is not characterized by the concentration of distances phenomenon that typically causes the appearance of nuisance hubness. We then empirically show that this comparison still leads to a high degree of hubness, but the hubs in this case do not constitute a disturbance. They are rather the result of context-modulated frequent tokens often appearing in the pool of likely candidates for next token prediction. However, when other distances are used to compare LLM representations, we do not have the same theoretical guarantees, and, indeed, we see nuisance hubs appear. There are two main takeaways. First, hubness, while omnipresent in high-dimensional spaces, is not a negative property that needs to be mitigated when LLMs are being used for next token prediction. Second, when comparing representations from LLMs using Euclidean or cosine distance, there is a high risk of nuisance hubs and practitioners should use mitigation techniques if relevant.

arxiv情報

著者	Beatrix M. G. Nielsen,Iuri Macocco,Marco Baroni
発行日	2025-06-02 07:26:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

CCFC: Bridging Federated Clustering and Contrastive Learning

投稿日: 2025年6月3日作成者: jarxiv

要約

フェデレーションされたシナリオ向けの集中クラスタリングの本質的な拡張機能であるフェデレーションクラスタリングにより、複数のデータ保持クライアントがデータをローカルに保持しながら、データをグループ化することができます。
集中シナリオでは、表現学習によって駆動されるクラスタリングにより、高次元の複雑なデータの処理において大きな進歩がありました。
ただし、フェデレートクラスタリングと表現学習の組み合わせは、採用されていないままです。
これを橋渡しするために、まずクラスタリングに優しい表現を学習するためのクラスター制御モデルを調整します。
次に、このモデルを、クラスター制御のフェデレートクラスタリング（CCFC）と呼ばれる新しいフェデレーションクラスタリング方法を提案するための基礎として活用します。
表現学習の恩恵を受けて、CCFCのクラスタリングパフォーマンスは、場合によっては最高のベースラインメソッドのクラスタリングパフォーマンスも2倍になります。
最も関連するベースラインと比較して、この利点は、最も顕著なケースで最大0.4155の大幅なNMIスコアの改善をもたらします。
さらに、CCFCは、実際の観点からデバイスの障害を処理する際の優れたパフォーマンスも示しています。

要約(オリジナル)

Federated clustering, an essential extension of centralized clustering for federated scenarios, enables multiple data-holding clients to collaboratively group data while keeping their data locally. In centralized scenarios, clustering driven by representation learning has made significant advancements in handling high-dimensional complex data. However, the combination of federated clustering and representation learning remains underexplored. To bridge this, we first tailor a cluster-contrastive model for learning clustering-friendly representations. Then, we harness this model as the foundation for proposing a new federated clustering method, named cluster-contrastive federated clustering (CCFC). Benefiting from representation learning, the clustering performance of CCFC even double those of the best baseline methods in some cases. Compared to the most related baseline, the benefit results in substantial NMI score improvements of up to 0.4155 on the most conspicuous case. Moreover, CCFC also shows superior performance in handling device failures from a practical viewpoint.

arxiv情報

著者	Jing Liu,Jie Yan,Zhong-Yuan Zhang
発行日	2025-06-02 07:36:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification

投稿日: 2025年6月3日作成者: jarxiv

要約

製品属性値識別（PAVI）には、eコマースプラットフォームでの製品検索、推奨、およびビジネス分析を改善するための重要なタスクである製品プロファイルからの属性値の識別が含まれます。
ただし、既存のPAVIメソッドは、暗黙の値の推測、分散除外（OOD）値の処理、正規化された出力の生成など、重要な課題に直面しています。
これらの制限に対処するために、PAVIの最初の検索ベースの方法である分類学を意識した対照学習検索（TACLR）を導入します。
TACLRは、製品プロファイルと候補値を埋め込みにエンコードし、その類似性に基づいて値を取得することにより、PAVIを情報検索タスクとして定式化します。
それは、分類学を意識したハードネガティブサンプリングと対照的なトレーニングを活用し、動的なしきい値で適応推論を採用しています。
TACLRは3つの重要な利点を提供します。（1）正規化された出力を生成しながら、暗黙的値とOOD値を効果的に処理します。
（2）数千のカテゴリ、数万の属性、および数百万の価値をスケーリングします。
（3）高負荷の産業展開に対する効率的な推論をサポートしています。
独自およびパブリックデータセットに関する広範な実験は、TACLRの有効性と効率を検証します。
さらに、現実世界の電子商取引プラットフォームXianyuに正常に展開されており、毎日何百万もの製品リストを処理し、頻繁に更新される大規模な属性分類法を処理しています。
https://github.com/suyindu/taclrで再現性と将来の研究を促進するためにコードをリリースします。

要約(オリジナル)

Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendation, and business analytics on e-commerce platforms. However, existing PAVI methods face critical challenges, such as inferring implicit values, handling out-of-distribution (OOD) values, and producing normalized outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR formulates PAVI as an information retrieval task by encoding product profiles and candidate values into embeddings and retrieving values based on their similarity. It leverages contrastive training with taxonomy-aware hard negative sampling and employs adaptive inference with dynamic thresholds. TACLR offers three key advantages: (1) it effectively handles implicit and OOD values while producing normalized outputs; (2) it scales to thousands of categories, tens of thousands of attributes, and millions of values; and (3) it supports efficient inference for high-load industrial deployment. Extensive experiments on proprietary and public datasets validate the effectiveness and efficiency of TACLR. Further, it has been successfully deployed on the real-world e-commerce platform Xianyu, processing millions of product listings daily with frequently updated, large-scale attribute taxonomies. We release the code to facilitate reproducibility and future research at https://github.com/SuYindu/TACLR.

arxiv情報

著者	Yindu Su,Huike Zou,Lin Sun,Ting Zhang,Haiyang Yang,Liyu Chen,David Lo,Qingheng Zhang,Shuguang Han,Jufeng Chen
発行日	2025-06-02 07:43:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.IR | コメントを受け付けていません

Diversity-oriented Data Augmentation with Large Language Models

投稿日: 2025年6月3日作成者: jarxiv

要約

データ増強は、多様なサンプルを生成することにより、トレーニングデータセットを濃縮するための自然言語処理（NLP）の不可欠な手法です。
このプロセスは、NLPモデルの堅牢性と一般化機能を改善するために重要です。
ただし、重要な課題は次のとおりです。
ほとんどの既存の方法は、サンプル分布の多様性を無視しながらサンプル数の増加に焦点を当てており、モデルの過剰適合につながる可能性があります。
これに応じて、データ増強のデータセットの多様性への影響を調査し、\ textbf {\ underline {d}} eversity- \ textbf {\ underline {o}} riented date \ textbf {\ underline {aug}} mentation framework（\ textbf {doaug}）を提案します。
％\（\ mathscr {doaug} \）具体的には、多様な言い換えを生成することでテキストデータセットを増強できる多様な言い換えとしてLLMを訓練するために、多様性指向の微調整アプローチを利用します。
次に、LLMパラフラザーを非常に有益なサンプルの選択したコアセットに適用し、パラフラゼを元のデータと統合して、より多様な拡張データセットを作成します。
最後に、12の実際のテキストデータセットで広範な実験を実施します。
結果は、微調整されたLLMアウゲン剤がラベルの一貫性を維持しながら多様性を改善し、それによりダウンストリームタスクの堅牢性とパフォーマンスを向上させることを示しています。
具体的には、\（10.52 \％\）の平均パフォーマンスゲインを達成し、3パーセントポイント以上の準優勝ベースラインを上回ります。

要約(オリジナル)

Data augmentation is an essential technique in natural language processing (NLP) for enriching training datasets by generating diverse samples. This process is crucial for improving the robustness and generalization capabilities of NLP models. However, a significant challenge remains: \textit{Insufficient Attention to Sample Distribution Diversity}. Most existing methods focus on increasing the sample numbers while neglecting the sample distribution diversity, which can lead to model overfitting. In response, we explore data augmentation’s impact on dataset diversity and propose a \textbf{\underline{D}}iversity-\textbf{\underline{o}}riented data \textbf{\underline{Aug}}mentation framework (\textbf{DoAug}). % $\mathscr{DoAug}$ Specifically, we utilize a diversity-oriented fine-tuning approach to train an LLM as a diverse paraphraser, which is capable of augmenting textual datasets by generating diversified paraphrases. Then, we apply the LLM paraphraser to a selected coreset of highly informative samples and integrate the paraphrases with the original data to create a more diverse augmented dataset. Finally, we conduct extensive experiments on 12 real-world textual datasets. The results show that our fine-tuned LLM augmenter improves diversity while preserving label consistency, thereby enhancing the robustness and performance of downstream tasks. Specifically, it achieves an average performance gain of $10.52\%$, surpassing the runner-up baseline with more than three percentage points.

arxiv情報

著者	Zaitian Wang,Jinghan Zhang,Xinhao Zhang,Kunpeng Liu,Pengfei Wang,Yuanchun Zhou
発行日	2025-06-02 07:51:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

RL-SPH: Learning to Achieve Feasible Solutions for Integer Linear Programs

投稿日: 2025年6月3日作成者: jarxiv

要約

整数線形プログラミング（ILP）は、さまざまな組み合わせ最適化問題に広く利用されています。
原始ヒューリスティックは、NPハードILPの実行可能なソリューションを迅速に見つける上で重要な役割を果たします。
$ \ textit {エンドツーエンド学習} $ベースのPrimal Heuristics（e2eph）が最近提案されていますが、通常、実現可能なソリューションを独立して生成することができず、主にバイナリ変数に焦点を当てることができません。
特に非バイナリ整数変数を処理する場合は、実現可能性を確保することが重要です。
この課題に対処するために、非バイナリ整数を含むILPであっても、実現可能なソリューションを独立して生成できる新規強化学習ベースのスタートスタートプライマルヒューリスティックであるRL-SPHを提案します。
実験結果は、RL-SPHが高品質の実現可能なソリューションを急速に獲得し、平均して既存の原始ヒューリスティックと比較して平均44倍低い原始ギャップと2.3倍低い原始積分を達成することを示しています。

要約(オリジナル)

Integer linear programming (ILP) is widely utilized for various combinatorial optimization problems. Primal heuristics play a crucial role in quickly finding feasible solutions for NP-hard ILP. Although $\textit{end-to-end learning}$-based primal heuristics (E2EPH) have recently been proposed, they are typically unable to independently generate feasible solutions and mainly focus on binary variables. Ensuring feasibility is critical, especially when handling non-binary integer variables. To address this challenge, we propose RL-SPH, a novel reinforcement learning-based start primal heuristic capable of independently generating feasible solutions, even for ILP involving non-binary integers. Experimental results demonstrate that RL-SPH rapidly obtains high-quality feasible solutions, achieving on average a 44x lower primal gap and a 2.3x lower primal integral compared to existing primal heuristics.

arxiv情報

著者	Tae-Hoon Lee,Min-Soo Kim
発行日	2025-06-02 08:21:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation

投稿日: 2025年6月3日作成者: jarxiv

要約

説明可能な推奨事項に関する最近の調査では、一般に、タスクを標準的なテキスト生成の問題としてフレーム化し、予測された説明とグラウンドトゥルースの説明の間のテキストの類似性に基づいてモデルを評価します。
ただし、このアプローチでは、システムの重要な側面の1つを考慮することはできません。出力がユーザーの（購入後の）感情を正確に反映しているかどうか、つまり、推奨されるアイテムを望んでいるか、または嫌いなのか。
この問題に光を当てるために、ユーザーの感情に焦点を当てた新しいデータセットと評価方法を紹介します。
具体的には、LLMを使用した購入後のレビューからユーザーの肯定的および否定的な意見を明示的に抽出し、生成された説明がユーザーの感情とよく整合するかどうかに基づいてシステムを評価することを提案し、ターゲット項目のユーザーの肯定的および否定的な意見を正確に識別することを提案します。
データセットにいくつかの最近のモデルをベンチマークし、既存のメトリックで強力なパフォーマンスを達成しても、生成された説明がユーザーの感情とうまく調和していることを保証しないことを示しています。
最後に、既存のモデルは、ターゲットアイテムのユーザーの（予測）評価が入力としてモデルに直接供給されると、より感情を意識した説明を提供できることがわかります。
データセットとベンチマークの実装は、https：//github.com/jchanxtarov/sent_xrecで入手できます。

要約(オリジナル)

Recent research on explainable recommendation generally frames the task as a standard text generation problem, and evaluates models simply based on the textual similarity between the predicted and ground-truth explanations. However, this approach fails to consider one crucial aspect of the systems: whether their outputs accurately reflect the users’ (post-purchase) sentiments, i.e., whether and why they would like and/or dislike the recommended items. To shed light on this issue, we introduce new datasets and evaluation methods that focus on the users’ sentiments. Specifically, we construct the datasets by explicitly extracting users’ positive and negative opinions from their post-purchase reviews using an LLM, and propose to evaluate systems based on whether the generated explanations 1) align well with the users’ sentiments, and 2) accurately identify both positive and negative opinions of users on the target items. We benchmark several recent models on our datasets and demonstrate that achieving strong performance on existing metrics does not ensure that the generated explanations align well with the users’ sentiments. Lastly, we find that existing models can provide more sentiment-aware explanations when the users’ (predicted) ratings for the target items are directly fed into the models as input. The datasets and benchmark implementation are available at: https://github.com/jchanxtarov/sent_xrec.

arxiv情報

著者	Ryotaro Shimizu,Takashi Wada,Yu Wang,Johannes Kruse,Sean O’Brien,Sai HtaungKham,Linxin Song,Yuya Yoshikawa,Yuki Saito,Fugee Tsung,Masayuki Goto,Julian McAuley
発行日	2025-06-02 08:41:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.IR, cs.LG | コメントを受け付けていません

GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

投稿日: 2025年6月3日作成者: jarxiv

要約

構造化されたグラフデータをノードからの豊富なテキスト情報と統合することは、特に異系統ノード分類において、重要な課題をもたらします。
現在のアプローチは、多くの場合、計算コストや異なるモダリティの効果的な融合に苦しんでいます。
グラフニューラルネットワーク（GNNS）と事前に訓練された言語モデル（PLM）を効率的に組み合わせた新しいアーキテクチャである\ textBf {グラフマスク言語モデル（GMLM）}を提案します。
GMLMは、3つの重要なイノベーションを導入します。（i）a \ textBf {動的アクティブノード選択}スケーラブルPLMテキスト処理のための戦略。
（ii）学習可能なグラフを使用したソフトマスキングを使用したGNN固有の\ textBf {コントラスト前削除段階} \ texttt {[mask]}トークンを使用して、堅牢な構造表現を使用しています。
（iii）a \ textBf {専用のフュージョンモジュール} RGCNベースのGNN埋め込みとPLM（GTE-SMALL \＆DISTILBERT）埋め込み。
異種のベンチマーク（テキサス州ウィスコンシン州コーネル）の広範な実験は、GMLMの優位性を示しています。
特に、GMLM（Distilbert）は大幅なパフォーマンスの向上を達成し、以前のベストパフォーマンスのベースラインと比較して、Cornellで\ textBf {4.7 \％}を超えてテキサスで\ textbf {2.0 \％}を超える精度を向上させます。
この作業は、ターゲットを絞ったPLMエンゲージメントと、テキストが豊富なグラフでの改善された効率的な学習のためのモダリティ固有の事前販売の利点を強調しています。

要約(オリジナル)

Integrating structured graph data with rich textual information from nodes poses a significant challenge, particularly for heterophilic node classification. Current approaches often struggle with computational costs or effective fusion of disparate modalities. We propose \textbf{Graph Masked Language Model (GMLM)}, a novel architecture efficiently combining Graph Neural Networks (GNNs) with Pre-trained Language Models (PLMs). GMLM introduces three key innovations: (i) a \textbf{dynamic active node selection} strategy for scalable PLM text processing; (ii) a GNN-specific \textbf{contrastive pretraining stage} using soft masking with a learnable graph \texttt{[MASK]} token for robust structural representations; and (iii) a \textbf{dedicated fusion module} integrating RGCN-based GNN embeddings with PLM (GTE-Small \& DistilBERT) embeddings. Extensive experiments on heterophilic benchmarks (Cornell, Wisconsin, Texas) demonstrate GMLM’s superiority. Notably, GMLM(DistilBERT) achieves significant performance gains, improving accuracy by over \textbf{4.7\%} on Cornell and over \textbf{2.0\%} on Texas compared to the previous best-performing baselines. This work underscores the benefits of targeted PLM engagement and modality-specific pretraining for improved, efficient learning on text-rich graphs.

arxiv情報

著者	Aarush Sinha,OM Kumar CU
発行日	2025-06-02 08:42:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

POPGym Arcade: Parallel Pixelated POMDPs

投稿日: 2025年6月3日作成者: jarxiv

要約

PopGym Arcadeを紹介します。PopGymArcadeは、観察スペースとアクションスペースを共有したハードウェアアクセラレーションのピクセルベースの環境のコレクションです。
各環境には、完全かつ部分的に観察可能なバリアントが含まれており、部分的な観測可能性に関する反事実的研究を可能にします。
また、部分的な観察可能性の下でポリシーを分析するための数学ツールを紹介します。これは、エージェントが過去の情報を思い出して決定を下す方法を明らかにします。
私たちの分析は、（1）部分的な観測可能性を制御することが重要であり、（2）長期的な記憶を持つエージェントが一般化に苦労する脆い政策を学ぶことを示しています。
最後に、SIMから現実的な転送、模倣学習、およびオフラインの強化学習に影響を与え、再発政策を古くからの分布していない観察によって「毒」できることを実証します。

要約(オリジナル)

We present the POPGym Arcade, a collection of hardware-accelerated, pixel-based environments with shared observation and action spaces. Each environment includes fully and partially observable variants, enabling counterfactual studies on partial observability. We also introduce mathematical tools for analyzing policies under partial observability, which reveal how agents recall past information to make decisions. Our analysis shows (1) that controlling for partial observability is critical and (2) that agents with long-term memory learn brittle policies that struggle to generalize. Finally, we demonstrate that recurrent policies can be ‘poisoned’ by old, out-of-distribution observations, with implications for sim-to-real transfer, imitation learning, and offline reinforcement learning.

arxiv情報

著者	Zekang Wang,Zhe He,Borong Zhang,Edan Toledo,Steven Morad
発行日	2025-06-02 09:04:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント