Javascript must be enabled to continue!
Zipf extensions and their applications for modeling the degree sequences of real networks
View through CrossRef
The Zipf distribution, also known as discrete Pareto distribution, attracts considerable attention because it helps describe skewed data from many natural as well as man-made systems. Under the Zipf distribution, the frequency of a given value is a power function of its size. Consequently, when plotting the frequencies versus the size in log-log scale for data following this distribution, one obtains a straight line. Nevertheless, for many data sets the linearity is only observed in the tail and when this happens, the Zipf is only adjusted for values larger than a given threshold. This procedure implies a loss of information, and unless one is only interested in the tail of the distribution, the need to have access to more flexible alternatives distributions is evidenced.
The work conducted in this thesis revolves around four bi-parametric extensions of the Zipf distribution. The first two belong to the class of Random Stopped Extreme distributions. The third extension is the result of applying the concept of Poisson-Stopped-Sum to the Zipf distribution and, the last one, is obtained by including an additional parameter to the probability generating function of the Zipf. An interesting characteristic of three of the models presented is that they allow for a parameter interpretation that gives some insights about the mechanism that generates the data. In order to analyze the performance of these models, we have fitted the degree sequences of real networks from different areas as: social networks, protein interaction networks or collaboration networks. The fits obtained have been compared with those obtained with other bi-parametric models such as: the Zipf-Mandelbrot, the discrete Weibull or the negative binomial. To facilitate the use of the models presented, they have been implemented in the zipfextR package available in the Comprehensive R Archive Network.
La distribución Zipf, también conocida como distribución discreta de Pareto, atrae una atención considerable debido a su versatilidad para describir datos sesgados provenientes de diferentes entornos tanto naturales como artificiales. Bajo la distribución Zipf, la probabilidad de un valor dado es proporcional a una potencia negativa del mismo. En consecuencia, al dibujar en escala doble logarítmica las frecuencias, de datos provenientes de esta distribución, en función de su tamaño, se obtiene una línea recta. Sin embargo, en muchos conjuntos de datos, esta linealidad solo se observa en la cola, y cuando esto sucede, la distribución Zipf solo se ajusta para valores mayores que un umbral dado. Este procedimiento implica una pérdida de información, y a menos que a uno solo le interese la cola de la distribución, se pone de manifiesto la necesidad de disponer de distribuciones alternativas con una mayor flexibilidad. El trabajo realizado en esta tesis gira en torno a cuatro extensiones bi-paramétricas de la distribución Zipf. Las dos primeras pertenecen a la familia de distribuciones Random Stopped Extreme. La tercera extensión es el resultado de aplicar el concepto Poisson-Stopped-Sum a la distribución Zipf y, la última familia de distribuciones se obtiene al incluir un parámetro adicional a la función generadora de probabilidad de la Zipf. Una característica de tres de los modelos presentados es que proporcionan una interpretación directa de sus parámetros, lo que permite extraer algunas ideas sobre el mecanismo subyacente que ha generado los datos. Con el objetivo de analizar la aplicabilidad de estos modelos, hemos ajustado secuencias de grados de redes reales de diferentes áreas tales como: redes sociales, redes de interacción de proteínas y redes de colaboración. Los ajustes obtenidos se han comparado con los obtenidos con otros modelos bi-paramétricos como: el Zipf-Mandelbrot, la distribución discreta de Weibull o la binomial negativa. Para facilitar el uso de los modelos presentados, estos se han implementado en el paquete de R zipfextR, disponible en el Comprehensive R Archive Network.
Title: Zipf extensions and their applications for modeling the degree sequences of real networks
Description:
The Zipf distribution, also known as discrete Pareto distribution, attracts considerable attention because it helps describe skewed data from many natural as well as man-made systems.
Under the Zipf distribution, the frequency of a given value is a power function of its size.
Consequently, when plotting the frequencies versus the size in log-log scale for data following this distribution, one obtains a straight line.
Nevertheless, for many data sets the linearity is only observed in the tail and when this happens, the Zipf is only adjusted for values larger than a given threshold.
This procedure implies a loss of information, and unless one is only interested in the tail of the distribution, the need to have access to more flexible alternatives distributions is evidenced.
The work conducted in this thesis revolves around four bi-parametric extensions of the Zipf distribution.
The first two belong to the class of Random Stopped Extreme distributions.
The third extension is the result of applying the concept of Poisson-Stopped-Sum to the Zipf distribution and, the last one, is obtained by including an additional parameter to the probability generating function of the Zipf.
An interesting characteristic of three of the models presented is that they allow for a parameter interpretation that gives some insights about the mechanism that generates the data.
In order to analyze the performance of these models, we have fitted the degree sequences of real networks from different areas as: social networks, protein interaction networks or collaboration networks.
The fits obtained have been compared with those obtained with other bi-parametric models such as: the Zipf-Mandelbrot, the discrete Weibull or the negative binomial.
To facilitate the use of the models presented, they have been implemented in the zipfextR package available in the Comprehensive R Archive Network.
La distribución Zipf, también conocida como distribución discreta de Pareto, atrae una atención considerable debido a su versatilidad para describir datos sesgados provenientes de diferentes entornos tanto naturales como artificiales.
Bajo la distribución Zipf, la probabilidad de un valor dado es proporcional a una potencia negativa del mismo.
En consecuencia, al dibujar en escala doble logarítmica las frecuencias, de datos provenientes de esta distribución, en función de su tamaño, se obtiene una línea recta.
Sin embargo, en muchos conjuntos de datos, esta linealidad solo se observa en la cola, y cuando esto sucede, la distribución Zipf solo se ajusta para valores mayores que un umbral dado.
Este procedimiento implica una pérdida de información, y a menos que a uno solo le interese la cola de la distribución, se pone de manifiesto la necesidad de disponer de distribuciones alternativas con una mayor flexibilidad.
El trabajo realizado en esta tesis gira en torno a cuatro extensiones bi-paramétricas de la distribución Zipf.
Las dos primeras pertenecen a la familia de distribuciones Random Stopped Extreme.
La tercera extensión es el resultado de aplicar el concepto Poisson-Stopped-Sum a la distribución Zipf y, la última familia de distribuciones se obtiene al incluir un parámetro adicional a la función generadora de probabilidad de la Zipf.
Una característica de tres de los modelos presentados es que proporcionan una interpretación directa de sus parámetros, lo que permite extraer algunas ideas sobre el mecanismo subyacente que ha generado los datos.
Con el objetivo de analizar la aplicabilidad de estos modelos, hemos ajustado secuencias de grados de redes reales de diferentes áreas tales como: redes sociales, redes de interacción de proteínas y redes de colaboración.
Los ajustes obtenidos se han comparado con los obtenidos con otros modelos bi-paramétricos como: el Zipf-Mandelbrot, la distribución discreta de Weibull o la binomial negativa.
Para facilitar el uso de los modelos presentados, estos se han implementado en el paquete de R zipfextR, disponible en el Comprehensive R Archive Network.
Related Results
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
<p>This study is a quantitative investigation and characterization of earthquake sequences in the Central Volcanic Region (CVR) of New Zealand, and several regions in New Zea...
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting
<p>This study is a quantitative investigation and characterization of earthquake sequences in the Central Volcanic Region (CVR) of New Zealand, and several regions in New Zea...
The Geography of Cyberspace
The Geography of Cyberspace
The Virtual and the Physical
The structure of virtual space is a product of the Internet’s geography and technology. Debates around the nature of the virtual — culture, s...
Bioinformatics tool and web server development focusing on structural bioinformatics applications
Bioinformatics tool and web server development focusing on structural bioinformatics applications
This thesis is divided into two main sections: Part 1 describes the design, and evaluation of the accuracy of a new web server – PRotein Interactive MOdeling (PRIMO-Complexes) for ...
Discovering Irregularities from Computer Networks by Topological Mapping
Discovering Irregularities from Computer Networks by Topological Mapping
Any number that can be uniquely identified and varied by a graph is known as a graph invariant. This paper will talk about three unique variations of bridge networks, sierpinski ne...
Metaphoric and Metonymic Operations for the Iron Expressions
Metaphoric and Metonymic Operations for the Iron Expressions
This study uses the word iron as an example to analyze the phenomena of meaning extensions and function shifts from a cognitive semantic perspective. The analysis shows that the me...
Figs S1-S9
Figs S1-S9
Fig. S1. Consensus phylogram (50 % majority rule) resulting from a Bayesian analysis of the ITS sequence alignment of sequences generated in this study and reference sequences from...
Toward All-IP networks : IP and wireless networks convergence
Toward All-IP networks : IP and wireless networks convergence
In this thesis the state of the art for IP networks and the two most predominant wireless access networks, UMTS and Wireless LANs, has been reviewed with respect to the enhancement...

