Javascript must be enabled to continue!
Evolutionary Grammatical Inference
View through CrossRef
Grammatical Inference (also known as grammar induction) is the problem of learning a grammar for a language from a set of examples. In a broad sense, some data is presented to the learner that should return a grammar capable of explaining to some extent the input data. The grammar inferred from data can then be used to classify unseen data or provide some suitable model for it. The classical formalization of Grammatical Inference (GI) is known as Language Identification in the Limit (Gold, 1967). Here, there are a finite set S+ of strings known to belong to the language L (the positive examples) and another finite set S- of strings not belonging to L (the negative examples). The language L is said to be identifiable in the limit if there exists a procedure to find a grammar G such that S+ ? L(G), S- ? L(G) and, in the limit, for sufficiently large S+ and S-, L = L(G). The disjoint sets S+ and S- are given to provide clues for the inference of the production rules P of the unknown grammar G used to generate the language L. Grammatical inference include such diverse fields as speech and natural language processing, gene analysis, pattern recognition, image processing, sequence prediction, information retrieval, cryptography, and many more. An excellent source for a state-of-the art overview of the subject is provided in (de la Higuera, 2005). Traditionally, most work in GI has been focused on the inference of regular grammars trying to induce finite-state automata, which can be efficiently learned. For context free languages some recent approaches have shown limited success (Starckie, Costie & Zaanen, 2004), because the search space of possible grammars is infinite. Basically, the parenthesis and palindrome languages are common test cases for the effectiveness of grammatical inference methods. Both languages are context-free. The parenthesis language is deterministic but the palindrome language is nondeterministic (de la Higuera, 2005). The use of evolutionary methods for context-free grammatical inference are not new, but only a few attempts have been successful. Wyard (1991) used Genetic Algorithm (GA) to infer grammars for the language of correctly balanced and nested parentheses with success, but fails on the language of sentences containing the same number of a’s and b’s (anbn language). In another attempt (Wyard, 1994), he obtained positive results on the inference of two classes of context-free grammars: the class of n-symbol palindromes with 2 = n = 4 and a class of small natural language grammars. Sen and Janakiraman (1992) applied a GA using a pushdown automata to the inference and successfully learned the anbn language and the parentheses balancing problem. But their approach does not scale well. Huijsen (1994) applied GA to infer context-free grammars for the parentheses balancing problem, the language of equal numbers of a’s and b’s and the even-length 2-symbol palindromes. Huijsen uses a “markerbased” encoding scheme with has the main advantage of allowing variable length chromosomes. The inference of regular grammars was successful but the inference of context-free grammars failed. Those results obtained in earlier attempts using GA to context-free grammatical inference were limited. The first attempt to use Genetic Programming (GP) for grammatical inference used a pushdown automata (Dunay, 1994) and successfully learned the parenthesis language, but failed for the anbn language. Korkmaz and Ucoluk (2001) also presented a GP approach using a prototype theory, which provides a way to recognize similarity between the grammars in the population. With this representation, it is possible to recognize the so-called building blocks but the results are preliminary. Javed and his colleagues (2004) proposed a Genetic Programming (GP) approach with grammar-specific heuristic operators with non-random construction of the initial grammar population. Their approach succeeded in inducing small context-free grammars. More recently, Rodrigues and Lopes (2006) proposed a hybrid GP approach that uses a confusion matrix to compute the fitness. They also proposed a local search mechanism that uses information obtained from the sentence parsing to generate a set of useful productions. The system was used for the parenthesis and palindromes languages with success.
Title: Evolutionary Grammatical Inference
Description:
Grammatical Inference (also known as grammar induction) is the problem of learning a grammar for a language from a set of examples.
In a broad sense, some data is presented to the learner that should return a grammar capable of explaining to some extent the input data.
The grammar inferred from data can then be used to classify unseen data or provide some suitable model for it.
The classical formalization of Grammatical Inference (GI) is known as Language Identification in the Limit (Gold, 1967).
Here, there are a finite set S+ of strings known to belong to the language L (the positive examples) and another finite set S- of strings not belonging to L (the negative examples).
The language L is said to be identifiable in the limit if there exists a procedure to find a grammar G such that S+ ? L(G), S- ? L(G) and, in the limit, for sufficiently large S+ and S-, L = L(G).
The disjoint sets S+ and S- are given to provide clues for the inference of the production rules P of the unknown grammar G used to generate the language L.
Grammatical inference include such diverse fields as speech and natural language processing, gene analysis, pattern recognition, image processing, sequence prediction, information retrieval, cryptography, and many more.
An excellent source for a state-of-the art overview of the subject is provided in (de la Higuera, 2005).
Traditionally, most work in GI has been focused on the inference of regular grammars trying to induce finite-state automata, which can be efficiently learned.
For context free languages some recent approaches have shown limited success (Starckie, Costie & Zaanen, 2004), because the search space of possible grammars is infinite.
Basically, the parenthesis and palindrome languages are common test cases for the effectiveness of grammatical inference methods.
Both languages are context-free.
The parenthesis language is deterministic but the palindrome language is nondeterministic (de la Higuera, 2005).
The use of evolutionary methods for context-free grammatical inference are not new, but only a few attempts have been successful.
Wyard (1991) used Genetic Algorithm (GA) to infer grammars for the language of correctly balanced and nested parentheses with success, but fails on the language of sentences containing the same number of a’s and b’s (anbn language).
In another attempt (Wyard, 1994), he obtained positive results on the inference of two classes of context-free grammars: the class of n-symbol palindromes with 2 = n = 4 and a class of small natural language grammars.
Sen and Janakiraman (1992) applied a GA using a pushdown automata to the inference and successfully learned the anbn language and the parentheses balancing problem.
But their approach does not scale well.
Huijsen (1994) applied GA to infer context-free grammars for the parentheses balancing problem, the language of equal numbers of a’s and b’s and the even-length 2-symbol palindromes.
Huijsen uses a “markerbased” encoding scheme with has the main advantage of allowing variable length chromosomes.
The inference of regular grammars was successful but the inference of context-free grammars failed.
Those results obtained in earlier attempts using GA to context-free grammatical inference were limited.
The first attempt to use Genetic Programming (GP) for grammatical inference used a pushdown automata (Dunay, 1994) and successfully learned the parenthesis language, but failed for the anbn language.
Korkmaz and Ucoluk (2001) also presented a GP approach using a prototype theory, which provides a way to recognize similarity between the grammars in the population.
With this representation, it is possible to recognize the so-called building blocks but the results are preliminary.
Javed and his colleagues (2004) proposed a Genetic Programming (GP) approach with grammar-specific heuristic operators with non-random construction of the initial grammar population.
Their approach succeeded in inducing small context-free grammars.
More recently, Rodrigues and Lopes (2006) proposed a hybrid GP approach that uses a confusion matrix to compute the fitness.
They also proposed a local search mechanism that uses information obtained from the sentence parsing to generate a set of useful productions.
The system was used for the parenthesis and palindromes languages with success.
Related Results
Evolution and the cell
Evolution and the cell
Genotype to phenotype, and back again
Evolution is intimately linked to biology at the cellular scale- evolutionary processes act on the very genetic material that is carried and ...
العلة النحوية عند المتقدمين والمتأخرين
العلة النحوية عند المتقدمين والمتأخرين
الملخصتُعدّ العلة سمةً من أهم السمات في الفكر النحوي. فما من قاعدة نحوية إلا وقد علّلها النحاة، فكل حكم يُعلَّل، وكل ظاهرة نحوية لا بد لها من علة. فهذا ما جعل كثيرين من غير العرب ي...
GRAMMATICAL VARIATION IN THE HUTSUL DIALECTS OF NORTHERN BUKOVYNA
GRAMMATICAL VARIATION IN THE HUTSUL DIALECTS OF NORTHERN BUKOVYNA
The article is devoted to the study of grammatical variation in the modern Hutsul dialects of Northern Bukovyna. The aim of the study is to provide a detailed analysis of the varia...
ẨN DỤ NGỮ PHÁP TRONG THI CA
ẨN DỤ NGỮ PHÁP TRONG THI CA
Halliday and Martin (1985, 1992) confirmed that grammatical metaphor is a resource for expressing meaning with the function of packing information, creating interpersonal links in...
Evolutionary Biomechanics
Evolutionary Biomechanics
Life has diversified on Earth in many stunning ways. Understanding how this diversity arose and has been maintained is a common interest for many evolutionary biologists. One appro...
Evolutionary Medicine
Evolutionary Medicine
Abstract
Evolutionary medicine is a fast‐growing research field providing biomedical scientists with evolutionary perspective for the comprehens...
The phonological and grammatical status of Murui ‘word’
The phonological and grammatical status of Murui ‘word’
Different sorts of phonological and grammatical criteria can be used to identify wordhood in Murui, a Witotoan language from Northwest Amazonia. A phonological word is determined o...
The Philosophy of Evolutionary Biology
The Philosophy of Evolutionary Biology
Philosophy of evolutionary biology is a major subfield of philosophy of biology concerned with the methods, conceptual foundations, and implications of evolutionary biology. It als...

