Creating a parser combinator library to parse JSON
Prev: Better error reporting | Contents | Next: JSON objects |
How to match that?
After the first value, any comma has to be followed by another value. I don’t care about the return value of matching the comma, so I’ll use seq2
to get just the return value of the JSON value:
(seq2 (skipwhite:match-is #\,) json-value)
There can be many (zero or more) of those “comma followed by a value” pairs:
(many (seq2 (skipwhite:match-is #\,) json-value))
There does have to be a value before the first comma:
(seq json-value (many (seq2 (skipwhite:match-is #\,) json-value)))
And the whole thing is optional, because the JSON array might be empty:
(optional (seq json-value (many (seq2 (skipwhite:match-is #\,) json-value))))
So this is pretty close for matching a JSON array:
(= json-array (seq2 (match-is #\[) (optional (seq json-value (many (seq2 (skipwhite:match-is #\,) json-value)))) (skipwhite:match-is #\])))
Just a couple of problems. JSON arrays contain JSON values, which recursively can be or contain JSON arrays...
(= json-value (skipwhite:alt json-true json-false json-null json-number json-string json-array))
But when I’m defining json-array
, I haven’t defined json-value
yet...
arc> (= json-array (seq2 (match-is #\[) (optional (seq json-value (many (seq2 (skipwhite:match-is #\,) json-value)))) (skipwhite:match-is #\]))) reference to undefined identifier: _json-value
Putting json-value
first doesn’t help of course, since then it will be json-array
that isn’t defined yet. So, I’ll need to wrap the reference to json-value
in a function:
(= json-array (seq2 (match-is #\[) (optional (seq (fn (p) (json-value p)) (many (seq2 (skipwhite:match-is #\,) (fn (p) (json-value p)))))) (skipwhite:match-is #\])))
Which I can make shorter with a macro:
(mac forward (parser) (w/uniq p `(fn (,p) (,parser ,p))))
Now I can easily have forward references:
(= json-array (seq2 (match-is #\[) (optional (seq forward.json-value (many (seq2 (skipwhite:match-is #\,) forward.json-value)))) (skipwhite:match-is #\])))
Next I need to fix the return value:
arc> (show-parse json-value "[1,2,3]") returning: ((1 (2 3))) remaining: nil
Back when I wrote optional
, if the parser matched, I put its return value in a list. Now that I’m actually using optional
for the first time, it turns out I don’t want that, I want just the value. An easy fix:
(def optional (parser) (fn (p) (iflet (p2 r) (parser p) (return p2 r) (return p nil))))
But now optional
is just returning what the parser returns, so I could write it as:
(def optional (parser) (alt parser (fn (p) (return p nil))))
Now I get:
arc> (show-parse json-value "[1,2,3]") returning: (1 (2 3)) remaining: nil
This is the same pattern I had before with many1
: a sequence of A followed by B, and I want to cons the single item returned by A together with the list of items returned by B. I can extract a cons-seq
function for that:
(def cons-seq (a b) (with-seq (r a rs b) (cons r rs)))
Now many1
is:
(def many1 (parser) (cons-seq parser (many parser)))
And I get the right return value from a JSON array:
(= json-array (seq2 (match-is #\[) (optional (cons-seq forward.json-value (many (seq2 (skipwhite:match-is #\,) forward.json-value)))) (skipwhite:match-is #\])))
arc> (show-parse json-value "[1,2,3]") returning: (1 2 3) remaining: nil
arc> (fromjson «[1, ["apple", true], 3.14159]») (1 ("apple" t) 3.14159)
Finally, the error messages can be improved.
arc> (fromjson «[») not a JSON value: [arc> (fromjson «[1,]») not a JSON value: [1,]
Once we see the opening bracket, we know there has to be a closing bracket, and when we see a comma, we know it has to be followed by a value:
(= json-array (seq2 (match-is #\[) (optional (cons-seq forward.json-value (many (seq2 (skipwhite:match-is #\,) (must "a comma must be followed by a value" forward.json-value))))) (must "a JSON array must be terminated with a closing ]" (skipwhite:match-is #\]))))
arc> (fromjson «[») a JSON array must be terminated with a closing ]arc> (fromjson «[1,]») a comma must be followed by a value
Prev: Better error reporting | Contents | Next: JSON objects |
Questions? Comments? Email me andrew.wilcox [at] gmail.com