Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Genetic Programming for Symbolic Regression on Incomplete Data

View through CrossRef
<p><b>Symbolic regression is the process of constructing mathematical expressions that best fit given data sets, where a target variable is expressed in terms of input variables. Unlike traditional regression methods, which optimise the parameters of pre-defined models, symbolic regression learns both the model structure and its parameters simultaneously.</b></p> <p>Genetic programming (GP) is a biologically-inspired evolutionary algorithm, that automatically generates computer programs to solve a given task. The flexible representation of GP along with its ``white box" nature makes it a dominant method for symbolic regression. Moreover, GP has been successfully employed for different learning tasks such as feature selection and transfer learning.</p> <p>Data incompleteness is a pervasive problem in symbolic regression, and machine learning in general, especially when dealing with real-world data sets. One common approach to handling data missingness is data imputation. Data imputation is the process of estimating missing values based on existing data. Another approach to deal with incomplete data is to build learning algorithms that directly work with missing values.</p> <p>Although a number of methods have been proposed to tackle the data missingness issue in machine learning, most studies focus on classification tasks. Little attention has been paid to symbolic regression on incomplete data. The existing symbolic regression methods are only applicable when the given data set is complete.</p> <p>The overall goal of the thesis is to improve the performance of symbolic regression on incomplete data by using GP for data imputation, instance selection, feature selection, and transfer learning.</p> <p>This thesis develops an imputation method to handle missing values for symbolic regression. The method integrates the instance-based similarity of the k-nearest neighbour method with the feature-based predictability of GP to estimate the missing values. The results show that the proposed method outperforms existing popular imputation methods.</p> <p>This thesis develops an instance selection method for improving imputation for symbolic regression on incomplete data. The proposed method has the ability to simultaneously build imputation and symbolic regression models such that the performance is improved. The results show that involving instance selection with imputation advances the performance of using the imputation alone.</p> <p>High-dimensionality is a serious data challenge, which is even more difficult on incomplete data. To address this problem in symbolic regression tasks, this thesis develops a feature selection method that can select a good set of features directly from incomplete data. The method not only improves the regression accuracy, but also enhances the efficiency of symbolic regression on high-dimensional incomplete data.</p> <p>Another challenging problem is data shortage. This issue is even more challenging when the data is incomplete. To handle this situation, this thesis develops transfer learning methods to improve symbolic regression in domains with incomplete and limited data. These methods utilise two powerful abilities of GP: feature construction and feature selection. The results show the ability of these methods to achieve positive transfer learning from domains with complete data to different (but related) domains with incomplete data.</p> <p>In summary, the thesis develops a range of approaches to improving the effectiveness and efficiency of symbolic regression on incomplete data by developing a number of GP-based methods. The methods are evaluated using different types of data sets considering various missingness and learning scenarios.</p>
Victoria University of Wellington Library
Title: Genetic Programming for Symbolic Regression on Incomplete Data
Description:
<p><b>Symbolic regression is the process of constructing mathematical expressions that best fit given data sets, where a target variable is expressed in terms of input variables.
Unlike traditional regression methods, which optimise the parameters of pre-defined models, symbolic regression learns both the model structure and its parameters simultaneously.
</b></p> <p>Genetic programming (GP) is a biologically-inspired evolutionary algorithm, that automatically generates computer programs to solve a given task.
The flexible representation of GP along with its ``white box" nature makes it a dominant method for symbolic regression.
Moreover, GP has been successfully employed for different learning tasks such as feature selection and transfer learning.
</p> <p>Data incompleteness is a pervasive problem in symbolic regression, and machine learning in general, especially when dealing with real-world data sets.
One common approach to handling data missingness is data imputation.
Data imputation is the process of estimating missing values based on existing data.
Another approach to deal with incomplete data is to build learning algorithms that directly work with missing values.
</p> <p>Although a number of methods have been proposed to tackle the data missingness issue in machine learning, most studies focus on classification tasks.
Little attention has been paid to symbolic regression on incomplete data.
The existing symbolic regression methods are only applicable when the given data set is complete.
</p> <p>The overall goal of the thesis is to improve the performance of symbolic regression on incomplete data by using GP for data imputation, instance selection, feature selection, and transfer learning.
</p> <p>This thesis develops an imputation method to handle missing values for symbolic regression.
The method integrates the instance-based similarity of the k-nearest neighbour method with the feature-based predictability of GP to estimate the missing values.
The results show that the proposed method outperforms existing popular imputation methods.
</p> <p>This thesis develops an instance selection method for improving imputation for symbolic regression on incomplete data.
The proposed method has the ability to simultaneously build imputation and symbolic regression models such that the performance is improved.
The results show that involving instance selection with imputation advances the performance of using the imputation alone.
</p> <p>High-dimensionality is a serious data challenge, which is even more difficult on incomplete data.
To address this problem in symbolic regression tasks, this thesis develops a feature selection method that can select a good set of features directly from incomplete data.
The method not only improves the regression accuracy, but also enhances the efficiency of symbolic regression on high-dimensional incomplete data.
</p> <p>Another challenging problem is data shortage.
This issue is even more challenging when the data is incomplete.
To handle this situation, this thesis develops transfer learning methods to improve symbolic regression in domains with incomplete and limited data.
These methods utilise two powerful abilities of GP: feature construction and feature selection.
The results show the ability of these methods to achieve positive transfer learning from domains with complete data to different (but related) domains with incomplete data.
</p> <p>In summary, the thesis develops a range of approaches to improving the effectiveness and efficiency of symbolic regression on incomplete data by developing a number of GP-based methods.
The methods are evaluated using different types of data sets considering various missingness and learning scenarios.
</p>.

Related Results

Programming model abstractions for optimizing I/O intensive applications
Programming model abstractions for optimizing I/O intensive applications
This thesis contributes from the perspective of task-based programming models to the efforts of optimizing I/O intensive applications. Throughout this thesis, we propose programmin...
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Abstract A cervical rib (CR), also known as a supernumerary or extra rib, is an additional rib that forms above the first rib, resulting from the overgrowth of the transverse proce...
BIOMEDICAL ISSUES NECESSITATING LEGAL REGULATION OF GENETICS
BIOMEDICAL ISSUES NECESSITATING LEGAL REGULATION OF GENETICS
The article explores the various biomedical issues surrounding genetics that necessitate legal regulation. Genetics is a rapidly advancing field that holds immense potential for re...
Genetic Programming 1996
Genetic Programming 1996
Genetic programming is a domain-independent method for automatic programming that evolves computer programs that solve, or approximately solve, problems. Starting with a primordial...
Genetic diversity in global chicken breeds as a function of genetic distance to the wild populations
Genetic diversity in global chicken breeds as a function of genetic distance to the wild populations
Abstract Migration of populations from their founder population is expected to cause a reduction in genetic diversity and facilitates population differentiation bet...
Mapping geographical inequalities of incomplete immunization in Ethiopia: a spatial with multilevel analysis
Mapping geographical inequalities of incomplete immunization in Ethiopia: a spatial with multilevel analysis
BackgroundImmunization is one of the most cost-effective interventions, averting 3.5–5 million deaths every year worldwide. However, incomplete immunization remains a major public ...
Basic and Advance: Phython Programming
Basic and Advance: Phython Programming
"This book will introduce you to the python programming language. It's aimed at beginning programmers, but even if you have written programs before and just want to add python to y...
WEB PROGRAMMING
WEB PROGRAMMING
"Web Programming" is a comprehensive book that provides a detailed overview of various aspects of web programming. The book is co-authored by Dr. Chitra Ravi and Dr. Mohan Kumar S,...

Back to Top