Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations




We investigate the internal representations of vision-language models (VLMs) to address hallucinations, a persistent challenge despite advances in model size and training. We project VLMs’ internal image representations to their language vocabulary and observe more confident output probabilities on real objects than hallucinated objects. We additionally use these output probabilities to spatially localize real objects. Building on this approach, we introduce a knowledge erasure algorithm that removes hallucinations by linearly orthogonalizing image features with respect to hallucinated object features. We show that targeted edits to a model’s latent representations can reduce hallucinations by up to 25.7% on the COCO2014 dataset while preserving performance. Our findings demonstrate how a deeper understanding of VLMs’ latent representations can enhance reliability and enable novel capabilities, such as zero-shot segmentation.


Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats




We introduce a simple yet effective approach for separating transmitted and reflected light. Our key insight is that the powerful novel view synthesis capabilities provided by modern inverse rendering methods (e.g.,~3D Gaussian splatting) allow one to perform flash/no-flash reflection separation using unpaired measurements — this relaxation dramatically simplifies image acquisition over conventional paired flash/no-flash reflection separation methods. Through extensive real-world experiments, we demonstrate our method, Flash-Splat, accurately reconstructs both transmitted and reflected scenes in 3D. Our method outperforms existing 3D reflection separation methods, which do not leverage illumination control, by a large margin. Our project webpage is at


Achieving Fairness in Predictive Process Analytics via Adversarial Learning




Predictive business process analytics has become important for organizations, offering real-time operational support for their processes. However, these algorithms often perform unfair predictions because they are based on biased variables (e.g., gender or nationality), namely variables embodying discrimination. This paper addresses the challenge of integrating a debiasing phase into predictive business process analytics to ensure that predictions are not influenced by biased variables. Our framework leverages on adversial debiasing is evaluated on four case studies, showing a significant reduction in the contribution of biased variables to the predicted value. The proposed technique is also compared with the state of the art in fairness in process mining, illustrating that our framework allows for a more enhanced level of fairness, while retaining a better prediction quality.


PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation?


本稿では、インドの法律判例文書に限定して学習させた法律言語モデルのコレクションであるParamanu-Aynを紹介する。この9,700万パラメータの自己回帰(AR)デコーダのみのモデルは、単一のGPUでコンテキストサイズ8,192、わずか185時間でゼロから事前学習され、41.35の効率的なMFUを達成した。また、法律分野に特化したBPEトークナイザーも開発しました。このモデルをパープレキシティとゼロショットタスクを用いて評価したところ、説明を伴う事例判断予測と抽象的な事例要約を行うことができた。Paramanu-AynはLlama-2 7BとGemini-Proよりも72倍小さいにもかかわらず、説明付き事例判断予測タスクのテスト精度を2ポイント近く上回った。ゼロショット抽象的要約では、固定長要約(5000トークン)を生成するデコーダのみのLLMを、BLEUとMETEORメトリクスで10%ポイント以上、BERTScoreで4%ポイント近く上回った。さらに、ゼロショットのコモンセンスベンチマークと数学ベンチマークで評価した結果、Paramanu-Aynは法律文書のみで学習したにもかかわらず、AGIEVAL-AQuA-RATとAGIEVAL-SAT-MathタスクにおいてLlama-1、Llama-2、Falconを凌駕する優れた結果を示しました。また、法律条文生成、法律草案作成、判例要約など、10,763の多様な法律タスクに対して、我々のモデルをインストラクションチューニングした。Paramanu-Ayn-instructモデルは、GPT-3.5-Turboにより、明瞭性、関連性、完全性、法的推論指標において10点満点中8点以上のスコアを獲得した。また、GPT-3.5-Turboでは、明確性、関連性、完全性、法的推論指標において10点満点中8点以上を獲得した。したがって、我々は、強いドメインに特化した生成言語モデル(法律など)に対して、ゼロからドメインに特化した事前学習を行うことは、より費用対効果が高く、環境に優しく、より大規模なモデルとの競争力を維持し、あるいは法律ドメインのタスクにLLMを適応させるよりも優れていると結論付けた。


In this paper, we present Paramanu-Ayn, a collection of legal language models trained exclusively on Indian legal case documents. This 97-million-parameter Auto-Regressive (AR) decoder-only model was pretrained from scratch with a context size of 8192 on a single GPU for just 185 hours, achieving an efficient MFU of 41.35. We also developed a legal domain specialized BPE tokenizer. We evaluated our model using perplexity and zero-shot tasks: case judgment prediction with explanation and abstractive case summarization. Paramanu-Ayn outperformed Llama-2 7B and Gemini-Pro in case judgment prediction with explanation task on test accuracy by nearly 2 percentage points, despite being 72 times smaller. In zero-shot abstractive summarization, it surpassed decoder-only LLMs generating fixed-length summaries (5000 tokens) by over 10 percentage points in BLEU and METEOR metrics, and by nearly 4 percentage points in BERTScore. Further evaluations on zero-shot commonsense and mathematical benchmarks showed that Paramanu-Ayn excelled despite being trained exclusively on legal documents, outperforming Llama-1, Llama-2, and Falcon on AGIEVAL-AQuA-RAT and AGIEVAL-SAT-Math tasks. We also instruction-tuned our model on 10,763 diverse legal tasks, including legal clause generation, legal drafting, case summarization, etc. The Paramanu-Ayn-instruct model scored above 8 out of 10 in clarity, relevance, completeness, and legal reasoning metrics by GPT-3.5-Turbo. We found that our models, were able to learn drafting knowledge and generalize to draft legal contracts and legal clauses with limited instruction-tuning. Hence, we conclude that for a strong domain-specialized generative language model (such as legal), domain specialized pretraining from scratch is more cost effective, environmentally friendly, and remains competitive with larger models or even better than adapting LLMs for legal domain tasks.


A Methodological Report on Anomaly Detection on Dynamic Knowledge Graphs


本稿では、Kubernetesアプリケーションのマイクロサービス環境において、特に動的ナレッジグラフの異常検知に対するさまざまなアプローチを探求する。我々のアプローチは、3つの動的知識グラフ表現を探求する:シーケンシャルデータ、ワンホップグラフ構造、および2ホップグラフ構造であり、各表現はますます複雑な構造情報を組み込んでいる。各フェーズには、異なる機械学習とディープラーニングモデルが含まれる。我々はそれらの性能を経験的に分析し、これらのモデルのアンサンブル学習に基づくアプローチを提案する。我々のアプローチは、ISWC 2024動的知識グラフ異常検知データセットにおいてベースラインを大幅に上回り、動的複雑データにおける異常検知のための頑健なソリューションを提供する。


In this paper, we explore different approaches to anomaly detection on dynamic knowledge graphs, specifically in a microservices environment for Kubernetes applications. Our approach explores three dynamic knowledge graph representations: sequential data, one-hop graph structure, and two-hop graph structure, with each representation incorporating increasingly complex structural information. Each phase includes different machine learning and deep learning models. We empirically analyse their performance and propose an approach based on ensemble learning of these models. Our approach significantly outperforms the baseline on the ISWC 2024 Dynamic Knowledge Graph Anomaly Detection dataset, providing a robust solution for anomaly detection in dynamic complex data.


PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling


プロンプト最適化の目的は、与えられたタスクに最適なプロンプトを大規模言語モデル(LLM)に求めることである。LLMは、シングルステップのタスクのプロンプト候補を見つけ、改善するために使用されてきた。(1)プロンプトの内容はより広範で複雑になる可能性が高く、LLMがエラーを分析することがより困難になる、(2)個々のステップの影響を評価することが困難である、(3)タスク実行に関する好みは人によって異なる可能性がある。そこで我々は、LLM駆動の新しい離散プロンプト最適化フレームワークPRompt Optimization in Multi-Step Tasks (PROMST)を導入し、人間が設計したフィードバックルールを組み込んで、改善のための直接的な提案を自動的に行う。また、プロンプト候補から効率的にサンプリングするために、プロンプトのパフォーマンスを予測する追加学習されたヒューリスティックモデルを用いる。このアプローチは、11の代表的なマルチステップタスクにおいて、人間が設計したプロンプトと他のいくつかのプロンプト最適化手法の両方を大幅に上回る(それぞれ5つのLLMにおいて、現在の最良手法に対して平均10.6%~29.3%の改善)。我々は、我々の研究が、LLM駆動マルチステップタスクの自動プロンプト最適化のベンチマークになると信じている。データセットとコードは。プロジェクトページは。


Prompt optimization aims to find the best prompt to a large language model (LLM) for a given task. LLMs have been successfully used to help find and improve prompt candidates for single-step tasks. However, realistic tasks for agents are multi-step and introduce new challenges: (1) Prompt content is likely to be more extensive and complex, making it more difficult for LLMs to analyze errors, (2) the impact of an individual step is difficult to evaluate, and (3) different people may have varied preferences about task execution. While humans struggle to optimize prompts, they are good at providing feedback about LLM outputs; we therefore introduce a new LLM-driven discrete prompt optimization framework PRompt Optimization in Multi-Step Tasks (PROMST) that incorporates human-designed feedback rules to automatically offer direct suggestions for improvement. We also use an extra learned heuristic model that predicts prompt performance to efficiently sample from prompt candidates. This approach significantly outperforms both human-engineered prompts and several other prompt optimization methods across 11 representative multi-step tasks (an average 10.6\%-29.3\% improvement to current best methods on five LLMs respectively). We believe our work can serve as a benchmark for automatic prompt optimization for LLM-driven multi-step tasks. Datasets and Codes are available at Project Page is available at


Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization


条件付き分布 $pi^*(\cdot|x)$ の学習は機械学習における中心的な問題であり、一般的にペアデータ $(x,y)ΓsimΓpi^*$ を用いた教師あり手法によってアプローチされる。しかし、ペアデータの取得は、特にドメイン翻訳のような問題では、しばしば困難である。このため、限られたペアデータと、マージナル分布からの追加のペアでない i.i.d.サンプル$x \sim \pi^*_x$, $y \sim \pi^*_y$ の両方を利用する$textit{半教師付き}$モデルの開発が必要となる。このような結合データの利用は複雑であり、しばしば発見的アプローチに依存する。この問題に取り組むために、我々は、データ尤度最大化技法を通して、対になったデータと対になっていないデータの両方を$textbf{seamlessly}$統合する新しい学習パラダイムを提案する。我々は、我々のアプローチが逆エントロピー最適輸送(OT)と興味深いつながりを持つことを示す。この発見により、計算OTの最近の進歩を応用して、$pi^*(˶cdot|x)$を得る$textbf{light}$学習アルゴリズムを確立することができる。さらに、本手法が、対になったデータと対になっていないデータを同時に用いて条件付き分布を効果的に学習することを、実証実験により示す。


Learning conditional distributions $\pi^*(\cdot|x)$ is a central problem in machine learning, which is typically approached via supervised methods with paired data $(x,y) \sim \pi^*$. However, acquiring paired data samples is often challenging, especially in problems such as domain translation. This necessitates the development of $\textit{semi-supervised}$ models that utilize both limited paired data and additional unpaired i.i.d. samples $x \sim \pi^*_x$ and $y \sim \pi^*_y$ from the marginal distributions. The usage of such combined data is complex and often relies on heuristic approaches. To tackle this issue, we propose a new learning paradigm that integrates both paired and unpaired data $\textbf{seamlessly}$ through the data likelihood maximization techniques. We demonstrate that our approach also connects intriguingly with inverse entropic optimal transport (OT). This finding allows us to apply recent advances in computational OT to establish a $\textbf{light}$ learning algorithm to get $\pi^*(\cdot|x)$. Furthermore, we demonstrate through empirical tests that our method effectively learns conditional distributions using paired and unpaired data simultaneously.


A deep learning-enabled smart garment for accurate and versatile sleep conditions monitoring in daily life




In wearable smart systems, continuous monitoring and accurate classification of different sleep-related conditions are critical for enhancing sleep quality and preventing sleep-related chronic conditions. However, the requirements for device-skin coupling quality in electrophysiological sleep monitoring systems hinder the comfort and reliability of night wearing. Here, we report a washable, skin-compatible smart garment sleep monitoring system that captures local skin strain signals under weak device-skin coupling conditions without positioning or skin preparation requirements. A printed textile-based strain sensor array responds to strain from 0.1% to 10% with a gauge factor as high as 100 and shows independence to extrinsic motion artefacts via strain-isolating printed pattern design. Through reversible starching treatment, ink penetration depth during direct printing on garments is controlled to achieve batch-to-batch performance variation < 10%. Coupled with deep learning, explainable artificial intelligence (XAI), and transfer learning data processing, the smart garment is capable of classifying six sleep states with an accuracy of 98.6%, maintaining excellent explainability (classification with low bias) and generalization (95% accuracy on new users with few-shot learning less than 15 samples per class) in practical applications, paving the way for next-generation daily sleep healthcare management.


Beyond principlism: Practical strategies for ethical AI use in research practices




The rapid adoption of generative artificial intelligence (AI) in scientific research, particularly large language models (LLMs), has outpaced the development of ethical guidelines, leading to a Triple-Too problem: too many high-level ethical initiatives, too abstract principles lacking contextual and practical relevance, and too much focus on restrictions and risks over benefits and utilities. Existing approaches, including principlism (reliance on abstract ethical principles), formalism (rigid application of rules), and technical solutionism (overemphasis on technological fixes), offer little practical guidance for addressing ethical challenges of AI in scientific research practices. To bridge the gap between abstract principles and day-to-day research practices, a user-centered, realism-inspired approach is proposed here. It outlines five specific goals for ethical AI use: 1) understanding model training and output, including bias mitigation strategies; 2) respecting privacy, confidentiality, and copyright; 3) avoiding plagiarism and policy violations; 4) applying AI beneficially compared to alternatives; and 5) using AI transparently and reproducibly. Each goal is accompanied by actionable strategies and realistic cases of misuse and corrective measures. I argue that ethical AI application requires evaluating its utility against existing alternatives rather than isolated performance metrics. Additionally, I propose documentation guidelines to enhance transparency and reproducibility in AI-assisted research. Moving forward, we need targeted professional development, training programs, and balanced enforcement mechanisms to promote responsible AI use while fostering innovation. By refining these ethical guidelines and adapting them to emerging AI capabilities, we can accelerate scientific progress without compromising research integrity.


Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions




Designing sample-efficient and computationally feasible reinforcement learning (RL) algorithms is particularly challenging in environments with large or infinite state and action spaces. In this paper, we advance this effort by presenting an efficient algorithm for Markov Decision Processes (MDPs) where the state-action value function of any policy is linear in a given feature map. This challenging setting can model environments with infinite states and actions, strictly generalizes classic linear MDPs, and currently lacks a computationally efficient algorithm under online access to the MDP. Specifically, we introduce a new RL algorithm that efficiently finds a near-optimal policy in this setting, using a number of episodes and calls to a cost-sensitive classification (CSC) oracle that are both polynomial in the problem parameters. Notably, our CSC oracle can be efficiently implemented when the feature dimension is constant, representing a clear improvement over state-of-the-art methods, which require solving non-convex problems with horizon-many variables and can incur computational costs that are exponential in the horizon.


