Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Heaps’ Law and Heaps functions in tagged texts: evidences of their linguistic relevance

View through CrossRef
We study the relationship between vocabulary size and text length in a corpus of 75 literary works in English, authored by six writers, distinguishing between the contributions of three grammatical classes (or ‘tags,’ namely, nouns , verbs and others ), and analyse the progressive appearance of new words of each tag along each individual text. We find that, as prescribed by Heaps’ Law, vocabulary sizes and text lengths follow a well-defined power-law relation. Meanwhile, the appearance of new words in each text does not obey a power law, and is on the whole well described by the average of random shufflings of the text. Deviations from this average, however, are statistically significant and show systematic trends across the corpus. Specifically, we find that the appearance of new words along each text is predominantly retarded with respect to the average of random shufflings. Moreover, different tags add systematically distinct contributions to this tendency, with verbs and others being respectively more and less retarded than the mean trend, and nouns following instead the overall mean. These statistical systematicities are likely to point to the existence of linguistically relevant information stored in the different variants of Heaps’ Law, a feature that is still in need of extensive assessment.
Title: Heaps’ Law and Heaps functions in tagged texts: evidences of their linguistic relevance
Description:
We study the relationship between vocabulary size and text length in a corpus of 75 literary works in English, authored by six writers, distinguishing between the contributions of three grammatical classes (or ‘tags,’ namely, nouns , verbs and others ), and analyse the progressive appearance of new words of each tag along each individual text.
We find that, as prescribed by Heaps’ Law, vocabulary sizes and text lengths follow a well-defined power-law relation.
Meanwhile, the appearance of new words in each text does not obey a power law, and is on the whole well described by the average of random shufflings of the text.
Deviations from this average, however, are statistically significant and show systematic trends across the corpus.
Specifically, we find that the appearance of new words along each text is predominantly retarded with respect to the average of random shufflings.
Moreover, different tags add systematically distinct contributions to this tendency, with verbs and others being respectively more and less retarded than the mean trend, and nouns following instead the overall mean.
These statistical systematicities are likely to point to the existence of linguistically relevant information stored in the different variants of Heaps’ Law, a feature that is still in need of extensive assessment.

Related Results

Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
Quantitative method on study of Mani heaps: the case study on color analysis of Mani heaps in Dingqing county, Tibet
Quantitative method on study of Mani heaps: the case study on color analysis of Mani heaps in Dingqing county, Tibet
Abstract A Mani heap, as an important religious art, is piled up by hundreds of stones or slates carved with a lot of information about religion, culture, history, art and ...
Editorial: Complexity of Medical Law
Editorial: Complexity of Medical Law
If one puts forward a question what medical law is all about, the common answer will be medical mishaps as result of clinical negligence leading to lawsuit and/or inquires of disci...
Atypical business law provisions
Atypical business law provisions
The article is devoted to the vision of atypical business law provisions. It was found that the state of scientific opinion regarding atypical business law provisions is irrelevant...
Autonomy on Trial
Autonomy on Trial
Photo by CHUTTERSNAP on Unsplash Abstract This paper critically examines how US bioethics and health law conceptualize patient autonomy, contrasting the rights-based, individualist...
Envisioning Originalism Applied to Bioethics Cases
Envisioning Originalism Applied to Bioethics Cases
Photo ID 123697425 © Alexandersikov | Dreamstime.com Abstract Originalism is an increasingly prevalent method for interpreting provisions of the US Constitution. It requires strict...
On the Status of Rights
On the Status of Rights
Photo by Patrick Tomasso on Unsplash ABSTRACT In cases where the law conflicts with bioethics, the status of rights must be determined to resolve some of the tensions. ...
ANKSI KEBIRI KIMIA BSAGI PELAKU KEJAHATAN SEKSUAL TERHADAP ANAK
ANKSI KEBIRI KIMIA BSAGI PELAKU KEJAHATAN SEKSUAL TERHADAP ANAK
Sexual crime case against children’s in Indonesia are increasing rapidly from time to time. The more tragic fact is that most of the suspect origins from their own kin or around ...

Back to Top