Corrections of Zipf’s and Heaps’ Laws Derived from Hapax Rate Models

要約

この記事では、hapax レートの体系的なモデルに基づいた Zipf の法則と Heaps の法則の修正が紹介されています。
導出は 2 つの仮定に基づいています。 1 つ目は、短いテキストの周辺頻度分布が、与えられた長いテキストから単語トークンが盲目的にサンプリングされたかのように見えることを予測する標準的な urn モデルです。
2 番目の仮定では、hapax の割合はテキストサイズの単純な関数であると仮定しています。
定数モデル、デービスモデル、線形モデル、ロジスティックモデルという 4 つの関数について説明します。
ロジスティックモデルが最適な適合をもたらすことが示されています。

要約(オリジナル)

The article introduces corrections to Zipf’s and Heaps’ laws based on systematic models of the hapax rate. The derivation rests on two assumptions: The first one is the standard urn model which predicts that marginal frequency distributions for shorter texts look as if word tokens were sampled blindly from a given longer text. The second assumption posits that the rate of hapaxes is a simple function of the text size. Four such functions are discussed: the constant model, the Davis model, the linear model, and the logistic model. It is shown that the logistic model yields the best fit.

arxiv情報

著者	Łukasz Dębowski
発行日	2023-09-28 10:12:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Corrections of Zipf’s and Heaps’ Laws Derived from Hapax Rate Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー