Classp

a classier way to parse

Welcome to Classp

Parser generators are a great help in writing parsers, but they all have one major drawback--they are based on grammars. So what's wrong with grammars? Well ... nothing as an expository tool. When you want to diagram a complex rule for deriving tree-like structures within a string of symbols, nothing beats a good old-fashioned grammar. But when you want to write a computer program to convert a string into the logical structure that it represents and then use that logical structure for further processing, then grammars are not ideal.

There are a couple of problems with grammars as a computer parsing language. First, grammars describe the syntactic structure of a language rather than the logical structure. These two structures are often similar, but seldom identical, and converting the syntactic structure represented by the grammar into the logical structure that a programmer wants to work with can be tedious and error prone. Second, grammars do not fit in well with programming languages. They are a special formalism that have to be painfully welded together with a programming language to produce a working parser.

In addition to these general problems with grammars, there is a more specific problem that goes with the restricted sorts of grammars offered in most parser-generator systems. These grammars are often less powerful than full context-free grammars and often cannot deal with ambiguity. Programmers have to modify the grammar to get around the problems, often leading to more awkwardness. There has been a lot of interest lately in using general parsing algorithms to avoid this issue, but these systems are still based on grammars with the weaknesses I described above.

So how does Classp avoid these problems? Classp does not use a grammar. With Classp, you start with the abstract syntax tree or AST of a language. The AST represents the logical structure of the language, not the surface syntactic structure. The language for designing the AST uses common object-oriented design with inheritance. C++ and Java programmers should find it very familiar. Then you add to each class in the AST some special annotations called class patterns that describe how to format that class--how to write it out in the target language. Writing formatters is generally much simpler than writing parsers. Classp then inverts the formatter to give you a parser.

Future

Right now, Classp doesn't do much besides parsing and formatting, but it is intended to evolve into a general language processing system. Some future plans include:

  1. A translation feature: specify two (or more) sets of class patterns, different sets for different languages. Classp can generate automatic translations between the languages.

  2. Attribute evaluation: Attributes in Classp are associated with AST classes rather than with grammar productions. We believe that this will make attribute evaluation more powerful and easier to understand than attribute grammars.

  3. Logical constraints to guide parsing.

  4. A built in lexical analyzer.

The original blog post on these ideas might give a better idea of where Classp is going.

Status

Classp is still in the demo phase of software development. It works for quite a few examples, but it has not been thoroughly tested and is lacking some features that would be needed for real development work. It has only been built on an Ubuntu-based Linux.

If you want to try it out then please download the system and read the README.md file for directions.

Development discussions will be carried out on the blog Unobtainabol.

Authors and Contributors

classp was written by David Gudeman (@david-gudeman). If you would like to discuss Classp or possibly contribute, you can write me at dgudeman@google.com.

Happy Parsing