Question Answering as Programming for Solving Time-Sensitive Questions

要約

質問応答は世界についての知識の獲得に関わるため、人間の日常生活において極めて重要な役割を果たしています。
ただし、現実世界の事実は動的で常に変化する性質があるため、質問の時間制限が変わると、答えがまったく異なる可能性があります。
最近、大規模言語モデル (LLM) は質問応答において顕著な知能を示していますが、私たちの実験では、前述の問題が依然として既存の LLM に重大な課題をもたらしていることが明らかになりました。
これは、LLM が表面レベルのテキストセマンティクスに基づいた厳密な推論を実行できないことが原因であると考えられます。
この制限を克服するために、LLM に質問に直接答えることを要求するのではなく、$\textbf{Q}$uestion $\textbf{A}$nswering タスク $\textbf{a}$s $ を再構成する新しいアプローチを提案します。
\textbf{P}$プログラミング ($\textbf{QAaP}$)。
具体的には、自然言語とプログラミング言語の両方を理解する現代のLLMの優れた能力を活用することで、LLMを活用して、多様に表現されたテキストを適切に構造化されたコードとして表現し、プログラミングを通じて複数の候補から最適な答えを選択することに努めています。
私たちは、時間に敏感ないくつかの質問応答データセットで QAaP フレームワークを評価し、強力なベースラインに対して最大 $14.5$% というかなりの改善を達成しました。
コードとデータは https://github.com/TianHongZXY/qaap で入手できます。

要約(オリジナル)

Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world. However, due to the dynamic and ever-changing nature of real-world facts, the answer can be completely different when the time constraint in the question changes. Recently, Large Language Models (LLMs) have shown remarkable intelligence in question answering, while our experiments reveal that the aforementioned problems still pose a significant challenge to existing LLMs. This can be attributed to the LLMs’ inability to perform rigorous reasoning based on surface-level text semantics. To overcome this limitation, rather than requiring LLMs to directly answer the question, we propose a novel approach where we reframe the $\textbf{Q}$uestion $\textbf{A}$nswering task $\textbf{a}$s $\textbf{P}$rogramming ($\textbf{QAaP}$). Concretely, by leveraging modern LLMs’ superior capability in understanding both natural language and programming language, we endeavor to harness LLMs to represent diversely expressed text as well-structured code and select the best matching answer from multiple candidates through programming. We evaluate our QAaP framework on several time-sensitive question answering datasets and achieve decent improvement, up to $14.5$% over strong baselines. Our codes and data are available at https://github.com/TianHongZXY/qaap

arxiv情報

著者	Xinyu Zhu,Cheng Yang,Bei Chen,Siheng Li,Jian-Guang Lou,Yujiu Yang
発行日	2023-10-18 12:44:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Question Answering as Programming for Solving Time-Sensitive Questions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー