Javascript must be enabled to continue!
A Comparative Analysis of Discrete Entropy Estimators for Large-Alphabet Problems
View through CrossRef
This paper presents a comparative study of entropy estimation in a large-alphabet regime. A variety of entropy estimators have been proposed over the years, where each estimator is designed for a different setup with its own strengths and caveats. As a consequence, no estimator is known to be universally better than the others. This work addresses this gap by comparing twenty-one entropy estimators in the studied regime, starting with the simplest plug-in estimator and leading up to the most recent neural network-based and polynomial approximate estimators. Our findings show that the estimators’ performance highly depends on the underlying distribution. Specifically, we distinguish between three types of distributions, ranging from uniform to degenerate distributions. For each class of distribution, we recommend the most suitable estimator. Further, we propose a sample-dependent approach, which again considers three classes of distribution, and report the top-performing estimators in each class. This approach provides a data-dependent framework for choosing the desired estimator in practical setups.
Title: A Comparative Analysis of Discrete Entropy Estimators for Large-Alphabet Problems
Description:
This paper presents a comparative study of entropy estimation in a large-alphabet regime.
A variety of entropy estimators have been proposed over the years, where each estimator is designed for a different setup with its own strengths and caveats.
As a consequence, no estimator is known to be universally better than the others.
This work addresses this gap by comparing twenty-one entropy estimators in the studied regime, starting with the simplest plug-in estimator and leading up to the most recent neural network-based and polynomial approximate estimators.
Our findings show that the estimators’ performance highly depends on the underlying distribution.
Specifically, we distinguish between three types of distributions, ranging from uniform to degenerate distributions.
For each class of distribution, we recommend the most suitable estimator.
Further, we propose a sample-dependent approach, which again considers three classes of distribution, and report the top-performing estimators in each class.
This approach provides a data-dependent framework for choosing the desired estimator in practical setups.
Related Results
Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
Efficient Class of Variance Estimators for Population using Supplementary Information in Stratified Random Sampling
Efficient Class of Variance Estimators for Population using Supplementary Information in Stratified Random Sampling
This paper addresses an efficient class of variance estimators for population using stratified random sampling. The suggested class of estimators using supplementary information ha...
Generalized Estimator of Population Variance utilizing Auxiliary Information in Simple Random Sampling Scheme
Generalized Estimator of Population Variance utilizing Auxiliary Information in Simple Random Sampling Scheme
In this study, using the Simple Random Sampling without Replacement (SRSWOR) method, we propose a generalized estimator of population variance of the primary variable. Up to the fi...
A Benchmark for Entropy Estimators
A Benchmark for Entropy Estimators
This study assessed the performance of several entropy estimators for numerical time series and symbolic data on non-trivial one-dimensional dynamical systems whose Kolmogorov–Sina...
Improved Mean Estimators for Population utilizing Dual Supplementary Characteristics under Simple Random Sampling
Improved Mean Estimators for Population utilizing Dual Supplementary Characteristics under Simple Random Sampling
This paper makes another addition to the existing literature of population mean estimation. An improved family of mean estimators for the population is suggested using simple rando...
Correct Pronunciation Detection of the Arabic Alphabet Using Deep Learning
Correct Pronunciation Detection of the Arabic Alphabet Using Deep Learning
Automatic speech recognition for Arabic has its unique challenges and there has been relatively slow progress in this domain. Specifically, Classic Arabic has received even less re...
Machine Learning for Causal Inference: On the Use of Cross-fit Estimators
Machine Learning for Causal Inference: On the Use of Cross-fit Estimators
Background:
Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result...
Generalized Inequalities to Optimize the Fitting Method for Track Reconstruction
Generalized Inequalities to Optimize the Fitting Method for Track Reconstruction
A standard criterium in statistics is to define an optimal estimator as the one with the minimum variance. Thus, the optimality is proved with inequality among variances of competi...

