Javascript must be enabled to continue!
Zipf extensions and their applications for modeling the degree sequences of real networks
View through CrossRef
The Zipf distribution, also known as discrete Pareto distribution, attracts considerable attention because it helps describe skewed data from many natural as well as man-made systems. Under the Zipf distribution, the frequency of a given value is a power function of its size. Consequently, when plotting the frequencies versus the size in log-log scale for data following this distribution, one obtains a straight line. Nevertheless, for many data sets the linearity is only observed in the tail and when this happens, the Zipf is only adjusted for values larger than a given threshold. This procedure implies a loss of information, and unless one is only interested in the tail of the distribution, the need to have access to more flexible alternatives distributions is evidenced.
The work conducted in this thesis revolves around four bi-parametric extensions of the Zipf distribution. The first two belong to the class of Random Stopped Extreme distributions. The third extension is the result of applying the concept of Poisson-Stopped-Sum to the Zipf distribution and, the last one, is obtained by including an additional parameter to the probability generating function of the Zipf. An interesting characteristic of three of the models presented is that they allow for a parameter interpretation that gives some insights about the mechanism that generates the data. In order to analyze the performance of these models, we have fitted the degree sequences of real networks from different areas as: social networks, protein interaction networks or collaboration networks. The fits obtained have been compared with those obtained with other bi-parametric models such as: the Zipf-Mandelbrot, the discrete Weibull or the negative binomial. To facilitate the use of the models presented, they have been implemented in the zipfextR package available in the Comprehensive R Archive Network.
La distribución Zipf, también conocida como distribución discreta de Pareto, atrae una atención considerable debido a su versatilidad para describir datos sesgados provenientes de diferentes entornos tanto naturales como artificiales. Bajo la distribución Zipf, la probabilidad de un valor dado es proporcional a una potencia negativa del mismo. En consecuencia, al dibujar en escala doble logarítmica las frecuencias, de datos provenientes de esta distribución, en función de su tamaño, se obtiene una línea recta. Sin embargo, en muchos conjuntos de datos, esta linealidad solo se observa en la cola, y cuando esto sucede, la distribución Zipf solo se ajusta para valores mayores que un umbral dado. Este procedimiento implica una pérdida de información, y a menos que a uno solo le interese la cola de la distribución, se pone de manifiesto la necesidad de disponer de distribuciones alternativas con una mayor flexibilidad. El trabajo realizado en esta tesis gira en torno a cuatro extensiones bi-paramétricas de la distribución Zipf. Las dos primeras pertenecen a la familia de distribuciones Random Stopped Extreme. La tercera extensión es el resultado de aplicar el concepto Poisson-Stopped-Sum a la distribución Zipf y, la última familia de distribuciones se obtiene al incluir un parámetro adicional a la función generadora de probabilidad de la Zipf. Una característica de tres de los modelos presentados es que proporcionan una interpretación directa de sus parámetros, lo que permite extraer algunas ideas sobre el mecanismo subyacente que ha generado los datos. Con el objetivo de analizar la aplicabilidad de estos modelos, hemos ajustado secuencias de grados de redes reales de diferentes áreas tales como: redes sociales, redes de interacción de proteínas y redes de colaboración. Los ajustes obtenidos se han comparado con los obtenidos con otros modelos bi-paramétricos como: el Zipf-Mandelbrot, la distribución discreta de Weibull o la binomial negativa. Para facilitar el uso de los modelos presentados, estos se han implementado en el paquete de R zipfextR, disponible en el Comprehensive R Archive Network.
Title: Zipf extensions and their applications for modeling the degree sequences of real networks
Description:
The Zipf distribution, also known as discrete Pareto distribution, attracts considerable attention because it helps describe skewed data from many natural as well as man-made systems.
Under the Zipf distribution, the frequency of a given value is a power function of its size.
Consequently, when plotting the frequencies versus the size in log-log scale for data following this distribution, one obtains a straight line.
Nevertheless, for many data sets the linearity is only observed in the tail and when this happens, the Zipf is only adjusted for values larger than a given threshold.
This procedure implies a loss of information, and unless one is only interested in the tail of the distribution, the need to have access to more flexible alternatives distributions is evidenced.
The work conducted in this thesis revolves around four bi-parametric extensions of the Zipf distribution.
The first two belong to the class of Random Stopped Extreme distributions.
The third extension is the result of applying the concept of Poisson-Stopped-Sum to the Zipf distribution and, the last one, is obtained by including an additional parameter to the probability generating function of the Zipf.
An interesting characteristic of three of the models presented is that they allow for a parameter interpretation that gives some insights about the mechanism that generates the data.
In order to analyze the performance of these models, we have fitted the degree sequences of real networks from different areas as: social networks, protein interaction networks or collaboration networks.
The fits obtained have been compared with those obtained with other bi-parametric models such as: the Zipf-Mandelbrot, the discrete Weibull or the negative binomial.
To facilitate the use of the models presented, they have been implemented in the zipfextR package available in the Comprehensive R Archive Network.
La distribución Zipf, también conocida como distribución discreta de Pareto, atrae una atención considerable debido a su versatilidad para describir datos sesgados provenientes de diferentes entornos tanto naturales como artificiales.
Bajo la distribución Zipf, la probabilidad de un valor dado es proporcional a una potencia negativa del mismo.
En consecuencia, al dibujar en escala doble logarítmica las frecuencias, de datos provenientes de esta distribución, en función de su tamaño, se obtiene una línea recta.
Sin embargo, en muchos conjuntos de datos, esta linealidad solo se observa en la cola, y cuando esto sucede, la distribución Zipf solo se ajusta para valores mayores que un umbral dado.
Este procedimiento implica una pérdida de información, y a menos que a uno solo le interese la cola de la distribución, se pone de manifiesto la necesidad de disponer de distribuciones alternativas con una mayor flexibilidad.
El trabajo realizado en esta tesis gira en torno a cuatro extensiones bi-paramétricas de la distribución Zipf.
Las dos primeras pertenecen a la familia de distribuciones Random Stopped Extreme.
La tercera extensión es el resultado de aplicar el concepto Poisson-Stopped-Sum a la distribución Zipf y, la última familia de distribuciones se obtiene al incluir un parámetro adicional a la función generadora de probabilidad de la Zipf.
Una característica de tres de los modelos presentados es que proporcionan una interpretación directa de sus parámetros, lo que permite extraer algunas ideas sobre el mecanismo subyacente que ha generado los datos.
Con el objetivo de analizar la aplicabilidad de estos modelos, hemos ajustado secuencias de grados de redes reales de diferentes áreas tales como: redes sociales, redes de interacción de proteínas y redes de colaboración.
Los ajustes obtenidos se han comparado con los obtenidos con otros modelos bi-paramétricos como: el Zipf-Mandelbrot, la distribución discreta de Weibull o la binomial negativa.
Para facilitar el uso de los modelos presentados, estos se han implementado en el paquete de R zipfextR, disponible en el Comprehensive R Archive Network.
Related Results
Zipf Law Analysis of Urban Scale in China
Zipf Law Analysis of Urban Scale in China
In this paper, by using China's urban population data from 1990 to 2010 and double logarithmic regressionmodel to test China’s urban scale and urban rank through Zipf law, we found...
ACM SIGCOMM computer communication review
ACM SIGCOMM computer communication review
At some point in the future, how far out we do not exactly know, wireless access to the Internet will outstrip all other forms of access bringing the freedom of mobility to the way...
Network modeling using graph neural networks
Network modeling using graph neural networks
(English) Network modeling is central to the field of computer networks. Models are useful in researching new protocols and mechanisms, allowing administrators to estimate their pe...
What do analyses of city size distributions have in common?
What do analyses of city size distributions have in common?
AbstractIn this article, I conduct a textual and contextual meta-analysis of the empirical literature on Zipf's law for cities. Combining citation network analysis and bibliometric...
Phylogenetic Classification of Feline Immunodeficiency Virus
Phylogenetic Classification of Feline Immunodeficiency Virus
Background: The feline immunodeficiency virus (FIV) is responsible for a retroviral disease that affects domestic and wild cats worldwide, causing Feline Acquired Immunodeficiency ...
New estimation of Zipf–Mandelbrot and Shannon entropies via refinements of Jensen’s inequality
New estimation of Zipf–Mandelbrot and Shannon entropies via refinements of Jensen’s inequality
Zipf–Mandelbrot and Shannon entropies are some basic and useful tools to quantify information about certain phenomena in various fields of science and technology, for example stati...
A century of invertebrate range extensions in the eastern North Pacific
A century of invertebrate range extensions in the eastern North Pacific
Aim Understanding the fundamental drivers of species’ range edges has been a core question in ecology and biogeography for centuries and has taken on new urgency in the Anthropocen...
Septins Coordinate with Microtubules and Actin to Initiate Cell Morphogenesis
Septins Coordinate with Microtubules and Actin to Initiate Cell Morphogenesis
Abstract
Many organs are formed by a process of branching morphogenesis, which begins with the formation of cytoplasmic extensions from the basal surface of polariz...

