Creating a parser combinator library to parse JSON

Prev: IntroductionContentsNext: Displaying the parse result

Choosing a combinator interface

In a combinator library, all the functions to be combined need to be able to be called in the same way so that they can be plugged into the combinators, and the functions returned by the combinators also need to have that same interface, so that they in turn can be plugged in to more combinators to build up a complete application. Thus the choice of what interface to use is a key decision for a combinator library.

For parsing, depending on what kind of parsing we’re doing, there are several things that a parser combinator interface might need to support:

For the JSON library, I needed to be able to compose return values, as JSON values can recursively contain other JSON values; but I didn’t need backtracking (because JSON is simple enough that it can be parsed without it). So I chose an interface where a parser is passed a parse position, and if it matches the input, returns a new parse position and a return value, and returns nil if it doesn't match the input.

The parse position can also be represented in different ways. One choice is an integer position within the input string. For my convenience I convert the input string to a list of characters, and then use the cons cell within the list itself for the parse position. So that I can easily see when I’m returning a result, I’ll make return a synonym for list:

(def return (new-parse-position return-value)
  (list new-parse-position return-value))

Here’s a parser that matches the letter A, and has "found A!" as its return value if it succeeds:

(def match-a (p)
  (if (is car.p #\a)
       (return cdr.p "found A!")))
arc> (match-a '(#\a #\b #\c))
((#\b #\c) "found A!")
arc> (match-a '(#\d #\a #\e #\f))

Arc note: to a programmer familar with Lisp’s car, it might look like match-a would raise an error if we tried to match at the end of the input. In fact it works fine, returning nil to indicate that there’s no “a” found at the end of the input:

arc> (match-a '())

In Arc, lists are terminated with nil, (car nil) returns nil instead of raising an error, nil is not equal to #\a, and the “else” clause of if defaults to nil. Thus the result is the same as if we had explicitly checked for the end of the list and returned nil.

By providing defaults like this, Arc lets us write less code when what we need to implement is already the default.

The higher order parser combinator functions just pass the parser position around without looking at it, so they can be used whether p is a cons cell, an integer position in the input string, or something else.

Prev: IntroductionContentsNext: Displaying the parse result

Questions? Comments? Email me andrew.wilcox [at]