Creating a parser combinator library to parse JSON
Prev: Introduction | Contents | Next: Displaying the parse result |
In a combinator library, all the functions to be combined need to be able to be called in the same way so that they can be plugged into the combinators, and the functions returned by the combinators also need to have that same interface, so that they in turn can be plugged in to more combinators to build up a complete application. Thus the choice of what interface to use is a key decision for a combinator library.
For parsing, depending on what kind of parsing we’re doing, there are several things that a parser combinator interface might need to support:
For the JSON library, I needed to be able to compose return values, as JSON values can recursively contain other JSON values; but I didn’t need backtracking (because JSON is simple enough that it can be parsed without it). So I chose an interface where a parser is passed a parse position, and if it matches the input, returns a new parse position and a return value, and returns nil
if it doesn't match the input.
The parse position can also be represented in different ways. One choice is an integer position within the input string. For my convenience I convert the input string to a list of characters, and then use the cons cell within the list itself for the parse position. So that I can easily see when I’m returning a result, I’ll make return
a synonym for list
:
(def return (new-parse-position return-value) (list new-parse-position return-value))
Here’s a parser that matches the letter A, and has "found A!"
as its return value if it succeeds:
(def match-a (p) (if (is car.p #\a) (return cdr.p "found A!")))
arc> (match-a '(#\a #\b #\c)) ((#\b #\c) "found A!")arc> (match-a '(#\d #\a #\e #\f)) nil
Arc note: to a programmer familar with Lisp’s car
, it might look like match-a
would raise an error if we tried to match at the end of the input. In fact it works fine, returning nil
to indicate that there’s no “a” found at the end of the input:
arc> (match-a '()) nil
In Arc, lists are terminated with nil
, (car nil)
returns nil
instead of raising an error, nil
is not equal to #\a
, and the “else” clause of if
defaults to nil
. Thus the result is the same as if we had explicitly checked for the end of the list and returned nil
.
By providing defaults like this, Arc lets us write less code when what we need to implement is already the default.
The higher order parser combinator functions just pass the parser position around without looking at it, so they can be used whether p
is a cons cell, an integer position in the input string, or something else.
Prev: Introduction | Contents | Next: Displaying the parse result |
Questions? Comments? Email me andrew.wilcox [at] gmail.com