Javascript must be enabled to continue!
Semantics and algorithms for data-dependent grammars
View through CrossRef
We present the design and theory of a new parsing engine, YAKKER, capable of satisfying the many needs of modern programmers and modern data processing applications. In particular, our new parsing engine handles (1) full scannerless context-free grammars with (2) regular expressions as right-hand sides for defining nonterminals. YAKKER also includes (3) facilities for binding variables to intermediate parse results and (4) using such bindings within arbitrary constraints to control parsing. These facilities allow the kind of data-dependent parsing commonly needed in systems applications, particularly those that operate over binary data. In addition, (5) nonterminals may be parameterized by arbitrary values, which gives the system good modularity and abstraction properties in the presence of data-dependent parsing. Finally, (6) legacy parsing libraries,such as sophisticated libraries for dates and times, may be directly incorporated into parser specifications. We illustrate the importance and utility of this rich collection of features by presenting its use on examples ranging from difficult programming language grammars to web server logs to binary data specification. We also show that our grammars have important compositionality properties and explain why such properties areimportant in modern applications such as automatic grammar induction.
In terms of technical contributions, we provide a traditional high-level semantics for our new grammar formalization and show how to compile grammars into non deterministic automata. These automata are stack-based, somewhat like conventional push-down automata,but are also equipped with environments to track data-dependent parsing state. We prove the correctness of our translation of data-dependent grammars into these new automata and then show how to implement the automata efficiently using a variation of Earley's parsing algorithm.
Association for Computing Machinery (ACM)
Title: Semantics and algorithms for data-dependent grammars
Description:
We present the design and theory of a new parsing engine, YAKKER, capable of satisfying the many needs of modern programmers and modern data processing applications.
In particular, our new parsing engine handles (1) full scannerless context-free grammars with (2) regular expressions as right-hand sides for defining nonterminals.
YAKKER also includes (3) facilities for binding variables to intermediate parse results and (4) using such bindings within arbitrary constraints to control parsing.
These facilities allow the kind of data-dependent parsing commonly needed in systems applications, particularly those that operate over binary data.
In addition, (5) nonterminals may be parameterized by arbitrary values, which gives the system good modularity and abstraction properties in the presence of data-dependent parsing.
Finally, (6) legacy parsing libraries,such as sophisticated libraries for dates and times, may be directly incorporated into parser specifications.
We illustrate the importance and utility of this rich collection of features by presenting its use on examples ranging from difficult programming language grammars to web server logs to binary data specification.
We also show that our grammars have important compositionality properties and explain why such properties areimportant in modern applications such as automatic grammar induction.
In terms of technical contributions, we provide a traditional high-level semantics for our new grammar formalization and show how to compile grammars into non deterministic automata.
These automata are stack-based, somewhat like conventional push-down automata,but are also equipped with environments to track data-dependent parsing state.
We prove the correctness of our translation of data-dependent grammars into these new automata and then show how to implement the automata efficiently using a variation of Earley's parsing algorithm.
Related Results
Evolutionary Grammatical Inference
Evolutionary Grammatical Inference
Grammatical Inference (also known as grammar induction) is the problem of learning a grammar for a language from a set of examples. In a broad sense, some data is presented to the ...
Jezik i gramatološki prinos Lanosovićeve slavonske gramatike u kontekstu standardizacije hrvatskoga jezika
Jezik i gramatološki prinos Lanosovićeve slavonske gramatike u kontekstu standardizacije hrvatskoga jezika
The primary task of this paper was to present, describe and analyze all three editions of Fr. Marijan Lanosović’s grammar as comprehensively and systematically as possible. The gra...
Phrase Structure Grammars
Phrase Structure Grammars
Phrase structure grammars model the internal structure of a sentence in terms of a hierarchically organized representation. The sentence Every boy has a bike, for instance, is take...
ON A SUPERCLASS OF A-GRAMMARS
ON A SUPERCLASS OF A-GRAMMARS
In this paper we consider a superclass of automaton grammars that can be represented in terms of paths on graphs. With this approach, we assume that vertices of graph are labeled b...
History of European Vernacular Grammar Writing
History of European Vernacular Grammar Writing
The grammatization of European vernacular languages began in the Late Middle Ages and Renaissance and continued up until the end of the 18th century. Through this process, grammars...
Remote attribute grammars
Remote attribute grammars
Describing the static semantics of programming languages with attribute grammars is eased when the formalism allows direct dependencies to be induced between rules for nodes arbitr...
Structurally and Arithmetically Controlled Grammars
Structurally and Arithmetically Controlled Grammars
Over the quarter century, it is gratifying to note that the significance of regulated or controlledgrammars (i.e. grammars with regulated rewriting) has been recognized by many par...
ON FORMAL AND COGNITIVE SEMANTICS FOR SEMANTIC COMPUTING
ON FORMAL AND COGNITIVE SEMANTICS FOR SEMANTIC COMPUTING
Semantics is the meaning of symbols, notations, concepts, functions, and behaviors, as well as their relations that can be deduced onto a set of predefined entities and/or known co...

