jarxiv | Japanese arxiv | ページ 1564

CryptoPulse: Short-Term Cryptocurrency Forecasting with Dual-Prediction and Cross-Correlated Market Indicators

投稿日: 2025年2月28日作成者: jarxiv

要約

暗号通貨は、価格のボラティリティが高い市場で変動し、投資家に大きな課題をもたらします。
情報に基づいた意思決定を支援するために、暗号通貨市場の動きを予測するシステムが開発されており、通常は歴史的なパターンに焦点を当てています。
ただし、これらの方法は、市場のダイナミクスに影響を与える3つの重要な要因を見落としていることがよくあります。1）マクロ投資環境は、共同投資家の行動に影響を与える主要な暗号通貨の変動に反映されています。
2）投資家の戦略に影響を与えるニュースに大きく影響された全体的な市場感情。
3）技術的指標。短期的な価格の動きに不可欠な、過剰に買われたり過剰販売条件、勢い、市場動向についての洞察を提供します。
このペーパーでは、マクロ経済の変動、技術指標、および個々の暗号通貨の価格の変化を組み込むことにより、翌日の終値を予測する二重予測メカニズムを提案します。
さらに、新しい改良メカニズムは、市場センチメントベースの再融合と融合を通じて予測を強化します。
実験は、提案されたモデルが最先端のパフォーマンスを達成し、一貫して10の比較方法を上回ることを示しています。

要約(オリジナル)

Cryptocurrencies fluctuate in markets with high price volatility, posing significant challenges for investors. To aid in informed decision-making, systems predicting cryptocurrency market movements have been developed, typically focusing on historical patterns. However, these methods often overlook three critical factors influencing market dynamics: 1) the macro investing environment, reflected in major cryptocurrency fluctuations affecting collaborative investor behaviors; 2) overall market sentiment, heavily influenced by news impacting investor strategies; and 3) technical indicators, offering insights into overbought or oversold conditions, momentum, and market trends, which are crucial for short-term price movements. This paper proposes a dual prediction mechanism that forecasts the next day’s closing price by incorporating macroeconomic fluctuations, technical indicators, and individual cryptocurrency price changes. Additionally, a novel refinement mechanism enhances predictions through market sentiment-based rescaling and fusion. Experiments demonstrate that the proposed model achieves state-of-the-art performance, consistently outperforming ten comparison methods.

arxiv情報

著者	Amit Kumar,Taoran Ji
発行日	2025-02-27 17:38:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, q-fin.PR | コメントを受け付けていません

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

投稿日: 2025年2月28日作成者: jarxiv

要約

最近、O1のようなモデルが大きな注目を集めており、これらのモデルは、既存の大手言語モデル（LLM）の推論能力を改善するための長い考え方（COT）の推論ステップを生成します。
このホワイトペーパーでは、これらの長いコットの品質を理解し、これらの長いコットの既存のLLMの批評能力を測定するために、さまざまな推論タスク（例：数学、コード、一般的な推論）のために、さまざまなO1様モデル（例：QWQ、DeepSeek-R1）から生成された長いコットを含むデルタベンチを紹介します。
Deltabenchに基づいて、最初に生成された長いCOTの細粒分析を実行して、異なるO1様モデルの有効性と効率を発見します。
次に、既存のプロセス報酬モデル（PRM）と批評家モデルの広範な評価を実施して、既存のPRMSおよび批評家モデルの境界と制限を調査することを目的とした各注釈プロセスのエラーを検出します。
最後に、Deltabenchが開発者がモデルの長いCOT推論能力をよりよく理解できるように導くことができることを願っています。

要約(オリジナル)

Recently, o1-like models have drawn significant attention, where these models produce the long Chain-of-Thought (CoT) reasoning steps to improve the reasoning abilities of existing Large Language Models (LLMs). In this paper, to understand the qualities of these long CoTs and measure the critique abilities of existing LLMs on these long CoTs, we introduce the DeltaBench, including the generated long CoTs from different o1-like models (e.g., QwQ, DeepSeek-R1) for different reasoning tasks (e.g., Math, Code, General Reasoning), to measure the ability to detect errors in long CoT reasoning. Based on DeltaBench, we first perform fine-grained analysis of the generated long CoTs to discover the effectiveness and efficiency of different o1-like models. Then, we conduct extensive evaluations of existing process reward models (PRMs) and critic models to detect the errors of each annotated process, which aims to investigate the boundaries and limitations of existing PRMs and critic models. Finally, we hope that DeltaBench could guide developers to better understand the long CoT reasoning abilities of their models.

arxiv情報

著者	Yancheng He,Shilong Li,Jiaheng Liu,Weixun Wang,Xingyuan Bu,Ge Zhang,Zhongyuan Peng,Zhaoxiang Zhang,Zhicheng Zheng,Wenbo Su,Bo Zheng
発行日	2025-02-27 16:34:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding

投稿日: 2025年2月28日作成者: jarxiv

要約

具体化されたインテリジェンスでは、エージェントが言語の指示に基づいて3D環境とリアルタイムでやり取りする必要があります。
このドメインの基本的なタスクは、エゴ中心の3D視覚接地です。
ただし、RGB-D画像からレンダリングされたポイントクラウドは、大量の冗長な背景データと固有のノイズを保持します。どちらもターゲット領域のマニホールド構造を妨げる可能性があります。
既存のポイントクラウド強化方法は、多くの場合、マニホールドを改善するために退屈なプロセスを必要としますが、これはリアルタイムのタスクには適していません。
マルチモーダルタスクに適したプロキシ変換を提案して、ポイントクラウドマニホールドを効率的に改善します。
私たちの方法は、最初に変形可能なポイントクラスタリングを活用して、ターゲット領域のポイントクラウドサブマニホールドを識別します。
次に、マルチモーダルプロキシを利用してポイントクラウド変換をガイドするプロキシ注意モジュールを提案します。
プロキシの注意に基づいて構築されたサブマニホールド変換生成モジュールを設計します。ここでは、テキスト情報がグローバルに異なるサブマニホールドの翻訳ベクトルを導き、ターゲット領域の相対的な空間的関係を最適化します。
同時に、画像情報は各サブマニホールド内の線形変換をガイドし、ターゲット領域のローカルポイントクラウドマニホールドを改良します。
広範な実験は、プロキシ変換が既存のすべての方法を大幅に上回り、簡単なターゲットで7.49％、ハードターゲットで4.60％の印象的な改善を達成し、注意ブロックの計算オーバーヘッドを40.6％削減することを示しています。
これらの結果は、私たちのアプローチの有効性と堅牢性を示し、自我中心の3D視覚接地に新しいソタを確立します。

要約(オリジナル)

Embodied intelligence requires agents to interact with 3D environments in real time based on language instructions. A foundational task in this domain is ego-centric 3D visual grounding. However, the point clouds rendered from RGB-D images retain a large amount of redundant background data and inherent noise, both of which can interfere with the manifold structure of the target regions. Existing point cloud enhancement methods often require a tedious process to improve the manifold, which is not suitable for real-time tasks. We propose Proxy Transformation suitable for multimodal task to efficiently improve the point cloud manifold. Our method first leverages Deformable Point Clustering to identify the point cloud sub-manifolds in target regions. Then, we propose a Proxy Attention module that utilizes multimodal proxies to guide point cloud transformation. Built upon Proxy Attention, we design a submanifold transformation generation module where textual information globally guides translation vectors for different submanifolds, optimizing relative spatial relationships of target regions. Simultaneously, image information guides linear transformations within each submanifold, refining the local point cloud manifold of target regions. Extensive experiments demonstrate that Proxy Transformation significantly outperforms all existing methods, achieving an impressive improvement of 7.49% on easy targets and 4.60% on hard targets, while reducing the computational overhead of attention blocks by 40.6%. These results establish a new SOTA in ego-centric 3D visual grounding, showcasing the effectiveness and robustness of our approach.

arxiv情報

著者	Qihang Peng,Henry Zheng,Gao Huang
発行日	2025-02-27 09:22:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation

投稿日: 2025年2月28日作成者: jarxiv

要約

病理学のビジョン言語モデルにより、マルチモーダルケースの検索と自動レポート生成が可能になります。
しかし、これまでに開発されたモデルの多くは、ペアの全体的なスライド画像（患者の歴史など）から推測できない情報を含む病理学レポートの訓練を受けており、生成されたレポートで幻覚を引き起こす可能性があります。
この目的のために、視覚言語モデリングの病理レポートからの情報の選択が、マルチモーダル表現と生成されたレポートの品質にどのように影響するかを調査します。
より具体的には、H＆E染色スライドに基づいた細胞と組織の外観を説明する文のみを含む、前処理されたレポートでトレーニングされたモデルに対して完全なレポートでトレーニングされたモデルを比較します。
実験では、BLIP-2フレームワークの上に構築され、42,433 H＆E染色全体のスライド画像と19,636の対応する病理報告の皮膚メラニン細胞病変データセットを使用しました。
モデルのパフォーマンスは、画像からテキスト、テキストから画像の検索、および専門家の病理学者による生成されたレポートの定性的評価を使用して評価されました。
私たちの結果は、テキストの前処理が報告の生成における幻覚を防ぐことを示しています。
生成されたレポートの品質の改善にもかかわらず、完全なレポートでビジョン言語モデルをトレーニングすることで、モーダルの検索パフォーマンスが向上しました。

要約(オリジナル)

Vision-language models in pathology enable multimodal case retrieval and automated report generation. Many of the models developed so far, however, have been trained on pathology reports that include information which cannot be inferred from paired whole slide images (e.g., patient history), potentially leading to hallucinated sentences in generated reports. To this end, we investigate how the selection of information from pathology reports for vision-language modeling affects the quality of the multimodal representations and generated reports. More concretely, we compare a model trained on full reports against a model trained on preprocessed reports that only include sentences describing the cell and tissue appearances based on the H&E-stained slides. For the experiments, we built upon the BLIP-2 framework and used a cutaneous melanocytic lesion dataset of 42,433 H&E-stained whole slide images and 19,636 corresponding pathology reports. Model performance was assessed using image-to-text and text-to-image retrieval, as well as qualitative evaluation of the generated reports by an expert pathologist. Our results demonstrate that text preprocessing prevents hallucination in report generation. Despite the improvement in the quality of the generated reports, training the vision-language model on full reports showed better cross-modal retrieval performance.

arxiv情報

著者	Ruben T. Lucassen,Tijn van de Luijtgaarden,Sander P. J. Moonemans,Gerben E. Breimer,Willeke A. M. Blokx,Mitko Veta
発行日	2025-02-27 09:06:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

GHOST 2.0: generative high-fidelity one shot transfer of heads

投稿日: 2025年2月28日作成者: jarxiv

要約

フェイススワッピングのタスクは最近、研究コミュニティで注目を集めていますが、ヘッドスワッピングの関連する問題はほとんど未踏のままです。
肌の色の伝達に加えて、ヘッドスワップは、合成中に頭全体の構造情報を保存する必要性や、スワップされた頭と背景の間の塗装ギャップなど、追加の課題をもたらします。
この論文では、これらの懸念に2つの問題固有のモジュールで構成されるGhost 2.0に対処します。
まず、ヘッド再現の拡張アライナーモデルを導入します。これは、複数のスケールでアイデンティティ情報を保存し、極端なポーズバリエーションから堅牢なものです。
第二に、肌の色を転送し、不一致の領域を入力することにより、再現されたヘッドをターゲットの背景にシームレスに統合するブレンダーモジュールを使用します。
両方のモジュールは、対応するタスクのベースラインよりも優れているため、ヘッドスワッピングで最先端の結果を達成できます。
また、ソースやターゲットのヘアスタイルの大きな違いなど、複雑なケースにも取り組みます。
コードはhttps://github.com/ai-forever/ghost-2.0で入手できます

要約(オリジナル)

While the task of face swapping has recently gained attention in the research community, a related problem of head swapping remains largely unexplored. In addition to skin color transfer, head swap poses extra challenges, such as the need to preserve structural information of the whole head during synthesis and inpaint gaps between swapped head and background. In this paper, we address these concerns with GHOST 2.0, which consists of two problem-specific modules. First, we introduce enhanced Aligner model for head reenactment, which preserves identity information at multiple scales and is robust to extreme pose variations. Secondly, we use a Blender module that seamlessly integrates the reenacted head into the target background by transferring skin color and inpainting mismatched regions. Both modules outperform the baselines on the corresponding tasks, allowing to achieve state of the art results in head swapping. We also tackle complex cases, such as large difference in hair styles of source and target. Code is available at https://github.com/ai-forever/ghost-2.0

arxiv情報

著者	Alexander Groshev,Anastasiia Iashchenko,Pavel Paramonov,Denis Dimitrov,Andrey Kuznetsov
発行日	2025-02-27 11:45:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions

投稿日: 2025年2月28日作成者: jarxiv

要約

数百万のメラニン細胞皮膚病変が毎年病理学者によって検査されていますが、その大部分は一般的なネビ（すなわち、通常のほくろ）に関係しています。
これらの病変のほとんどは数秒で診断できますが、対応する病理学レポートを書くことは、はるかに時間がかかります。
したがって、レポート作成の一部を自動化することで、病理学者のワークロードが増加する可能性があります。
この作業では、皮膚メラニン細胞性病変の病理学的領域向けに特に視覚言語モデルを開発します。
このモデルは、対照的なキャプションフレームワークに従い、42,512 H＆E染色全体のスライド画像と19,645の対応する病理報告のメラニン細胞性病変データセットを使用して訓練および評価されました。
我々の結果は、モデル生成レポートの品質スコアが、読者の研究で専門家の病理学者によって評価された一般的なネビの病理学者が記述したレポートと同等であることを示しています。
レポートの生成は、まれなメラニン細胞病変のサブタイプにとってより困難であることが明らかになったが、これらの症例の交差モーダル検索性能はかなり良くなった。

要約(オリジナル)

Millions of melanocytic skin lesions are examined by pathologists each year, the majority of which concern common nevi (i.e., ordinary moles). While most of these lesions can be diagnosed in seconds, writing the corresponding pathology report is much more time-consuming. Automating part of the report writing could, therefore, alleviate the increasing workload of pathologists. In this work, we develop a vision-language model specifically for the pathology domain of cutaneous melanocytic lesions. The model follows the Contrastive Captioner framework and was trained and evaluated using a melanocytic lesion dataset of 42,512 H&E-stained whole slide images and 19,645 corresponding pathology reports. Our results show that the quality scores of model-generated reports were on par with pathologist-written reports for common nevi, assessed by an expert pathologist in a reader study. While report generation revealed to be more difficult for rare melanocytic lesion subtypes, the cross-modal retrieval performance for these cases was considerably better.

arxiv情報

著者	Ruben T. Lucassen,Sander P. J. Moonemans,Tijn van de Luijtgaarden,Gerben E. Breimer,Willeke A. M. Blokx,Mitko Veta
発行日	2025-02-27 09:09:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Gaze-Guided Task Decomposition for Imitation Learning in Robotic Manipulation

投稿日: 2025年2月28日作成者: jarxiv

要約

ロボット操作のための模倣学習では、オブジェクト操作タスクをサブタスクに分解することにより、実証された動きを単に複製するのではなく、学習スキルの再利用と学習行動の組み合わせが新しいタスクを実行することができます。
人間の視線は、オブジェクトの操作中の手の動きに密接に関連しています。
模倣エージェントの視線制御が特定のランドマークに固定され、それらの間の移行が固定され、同時にセグメントがサブタスクへの操作を実証したと仮定します。
この研究では、視線遷移に基づいたシンプルで堅牢なタスク分解方法を提案しています。
テレ操作を使用して、デモンストレーションを収集するためのロボット操作における一般的なモダリティであり、そこでは人間のオペレーターの視線が測定され、模倣エージェントの視線の代替としてタスク分解に使用されます。
私たちのアプローチは、各タスクのすべてのデモンストレーションにわたって一貫したタスク分解を保証します。これは、機械学習などのコンテキストで望ましいものです。
さまざまなタスクのデモンストレーション全体で方法を評価し、結果のサブタスクの特性と一貫性を評価しました。
さらに、異なるハイパーパラメーター設定にわたる広範なテストにより、その堅牢性が確認され、多様なロボットシステムに適応できます。
私たちのコードは、https：//github.com/crumbyrobotics/gazetaskdecompで入手できます。

要約(オリジナル)

In imitation learning for robotic manipulation, decomposing object manipulation tasks into sub-tasks enables the reuse of learned skills and the combination of learned behaviors to perform novel tasks, rather than simply replicating demonstrated motions. Human gaze is closely linked to hand movements during object manipulation. We hypothesize that an imitating agent’s gaze control, fixating on specific landmarks and transitioning between them, simultaneously segments demonstrated manipulations into sub-tasks. This study proposes a simple yet robust task decomposition method based on gaze transitions. Using teleoperation, a common modality in robotic manipulation for collecting demonstrations, in which a human operator’s gaze is measured and used for task decomposition as a substitute for an imitating agent’s gaze. Our approach ensures consistent task decomposition across all demonstrations for each task, which is desirable in contexts such as machine learning. We evaluated the method across demonstrations of various tasks, assessing the characteristics and consistency of the resulting sub-tasks. Furthermore, extensive testing across different hyperparameter settings confirmed its robustness, making it adaptable to diverse robotic systems. Our code is available at https://github.com/crumbyRobotics/GazeTaskDecomp.

arxiv情報

著者	Ryo Takizawa,Yoshiyuki Ohmura,Yasuo Kuniyoshi
発行日	2025-02-27 03:41:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO | コメントを受け付けていません

SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models

投稿日: 2025年2月28日作成者: jarxiv

要約

大規模な言語モデル（LLMS）の急速な発展により、完全に微調整された（FT）これらのモデルは、高い計算需要のためにますます非現実的になっています。
さらに、FTは壊滅的な忘却につながる可能性があります。
別の方法として、低ランクの適応（LORA）が提案されています。これは、パラメーターのわずかなサブセットのみを微調整し、FTと同様のパフォーマンスを達成しながら、リソース要件を大幅に削減します。
しかし、ロラはFTの設計を継承しているため、壊滅的な忘却の問題は残っています。
これらの課題に対処するために、Securaを提案します。シグモイド強化CUR分解Loraは、微調整パフォーマンスを改善しながら壊滅的な忘却を緩和する新しいパラメーター効率の高い微調整（PEFT）バリアントです。
この方法では、パラメーターの保持と全体的なパフォーマンスを強化するために、新しい正規化手法であるSignormを紹介します。
Securaは、数学的問題解決（GSM8K）、挑戦的な質問（CNNDM）、翻訳（Newsde）、複雑な多肢選択式推論（Logiqa）など、さまざまなタスクで評価されています。
実験結果は、Securaが4つの多肢選択質問（MCQ）タスクで3.59％の平均微調整を達成し、GEMMA2 2B、QWEN2 1.5B、QWEN 2 7B、LLAMA3 8B、LLAMA38B、LLAMA3 8B、LLAMA3 8Bなどのモデルの5つの質問回答（QA）タスクで2.51％の改善を達成することを示しています。
さらに、Securaは優れた知識保持機能を実証し、16の継続的な学習テストにわたって基本的なLLM知識の70％以上の精度を維持し、エクスペリエンスリプレイ（ER）、シーケンシャル学習（SEQ）、EWC、I-Lora、およびCur-Loraを上回ることを維持します。

要約(オリジナル)

With the rapid development of large language models (LLMs), fully fine-tuning (FT) these models has become increasingly impractical due to the high computational demands. Additionally, FT can lead to catastrophic forgetting. As an alternative, Low-Rank Adaptation (LoRA) has been proposed, which fine-tunes only a small subset of parameters, achieving similar performance to FT while significantly reducing resource requirements. However, since LoRA inherits FT’s design, the issue of catastrophic forgetting remains. To address these challenges, we propose SECURA: Sigmoid-Enhanced CUR Decomposition LoRA, a novel parameter-efficient fine-tuning (PEFT) variant that mitigates catastrophic forgetting while improving fine-tuning performance. Our method introduces a new normalization technique, SigNorm, to enhance parameter retention and overall performance. SECURA has been evaluated on a variety of tasks, including mathematical problem-solving (GSM8K), challenging question-answering (CNNDM), translation (NewsDE), and complex multiple-choice reasoning (LogiQA). Experimental results show that SECURA achieves an average fine-tuning improvement of 3.59% across four multiple-choice question (MCQ) tasks and a 2.51% improvement across five question-answering (QA) tasks on models such as Gemma2 2b, Qwen2 1.5b, Qwen 2 7b, Llama3 8b, and Llama3.1 8b, compared to DoRA. Moreover, SECURA demonstrates superior knowledge retention capabilities, maintaining more than 70% accuracy on basic LLM knowledge across 16 continual learning tests, outperforming Experience Replay (ER), Sequential Learning (SEQ), EWC, I-LoRA, and CUR-LoRA.

arxiv情報

著者	Yuxuan Zhang
発行日	2025-02-27 02:57:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, I.2.6 | コメントを受け付けていません

WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies

投稿日: 2025年2月28日作成者: jarxiv

要約

単一およびマルチファームの設定で年間および多年生作物の農業決定の決定を最適化するために、強化学習（RL）エージェントを訓練するために設計された新しい作物シミュレーション環境であるWofostGymを紹介します。
効果的な作物管理には、環境への影響を最小限に抑えながら、収量と経済的リターンを最適化する必要があります。これは、RLに適した複雑なシーケンシャルな意思決定問題です。
ただし、マルチファームコンテキストでの多年生作物のシミュレーターの不足により、このドメインのRLアプリケーションが妨げられています。
既存の作物シミュレータは、複数の年間作物もサポートしていません。
WofostGymは、23の年間作物と2つの多年生作物をサポートすることにより、これらのギャップに対処し、RLエージェントが複数年、マルチクロップ、およびマルチファーム設定で多様なアグロマネージション戦略を学ぶことができます。
私たちのシミュレーターは、部分的な観察可能性、非マルコビアのダイナミクス、および遅延フィードバックの下で学習するための一連の挑戦的なタスクを提供します。
WofostGymの標準RLインターフェイスにより、農業の専門知識のない研究者は、幅広い農業の問題を探ることができます。
私たちの実験は、さまざまな作物の品種や土壌タイプにわたる学習行動を示しており、農業におけるRL主導の意思決定支援を進めるためのWofostGYMの可能性を強調しています。

要約(オリジナル)

We introduce WOFOSTGym, a novel crop simulation environment designed to train reinforcement learning (RL) agents to optimize agromanagement decisions for annual and perennial crops in single and multi-farm settings. Effective crop management requires optimizing yield and economic returns while minimizing environmental impact, a complex sequential decision-making problem well suited for RL. However, the lack of simulators for perennial crops in multi-farm contexts has hindered RL applications in this domain. Existing crop simulators also do not support multiple annual crops. WOFOSTGym addresses these gaps by supporting 23 annual crops and two perennial crops, enabling RL agents to learn diverse agromanagement strategies in multi-year, multi-crop, and multi-farm settings. Our simulator offers a suite of challenging tasks for learning under partial observability, non-Markovian dynamics, and delayed feedback. WOFOSTGym’s standard RL interface allows researchers without agricultural expertise to explore a wide range of agromanagement problems. Our experiments demonstrate the learned behaviors across various crop varieties and soil types, highlighting WOFOSTGym’s potential for advancing RL-driven decision support in agriculture.

arxiv情報

著者	William Solow,Sandhya Saisubramanian,Alan Fern
発行日	2025-02-27 03:35:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

RetinaRegen: A Hybrid Model for Readability and Detail Restoration in Fundus Images

投稿日: 2025年2月28日作成者: jarxiv

要約

眼底の画質は眼疾患を診断するために重要ですが、実際の状態はしばしばぼやけた画像または読めない画像をもたらし、診断の不確実性を高めます。
これらの課題に対処するために、この研究では、読みやすさの分類モデル、拡散モデル、および変分自動エンコーダー（VAE）を統合する網膜画像修復のハイブリッドモデルであるRetinaregenを提案します。
SynFundus-1Mデータセットの例は、提案された方法が27.4521、SSIM 0.9556、および光椎間板（RO）領域の読みやすさのラベルのLPIPSを0.1911で達成していることを示しています。
これらの結果は、重要な地域の回復における優れたパフォーマンスを示しており、Fundusの画質を高め、臨床診断をサポートするための効果的なソリューションを提供します。

要約(オリジナル)

Fundus image quality is crucial for diagnosing eye diseases, but real-world conditions often result in blurred or unreadable images, increasing diagnostic uncertainty. To address these challenges, this study proposes RetinaRegen, a hybrid model for retinal image restoration that integrates a readability classifi-cation model, a Diffusion Model, and a Variational Autoencoder (VAE). Ex-periments on the SynFundus-1M dataset show that the proposed method achieves a PSNR of 27.4521, an SSIM of 0.9556, and an LPIPS of 0.1911 for the readability labels of the optic disc (RO) region. These results demonstrate superior performance in restoring key regions, offering an effective solution to enhance fundus image quality and support clinical diagnosis.

arxiv情報

著者	Yuhan Tang,Yudian Wang,Weizhen Li,Ye Yue,Chengchang Pan,Honggang Qi
発行日	2025-02-27 06:41:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント