awwx.ws

Creating a parser combinator library to parse JSON

Prev: Better error reportingContentsNext: JSON objects

JSON arrays


How to match that?

After the first value, any comma has to be followed by another value. I don’t care about the return value of matching the comma, so I’ll use seq2 to get just the return value of the JSON value:

(seq2 (skipwhite:match-is #\,)
      json-value)

There can be many (zero or more) of those “comma followed by a value” pairs:

(many (seq2 (skipwhite:match-is #\,)
            json-value))

There does have to be a value before the first comma:

(seq json-value
     (many (seq2 (skipwhite:match-is #\,)
                 json-value)))

And the whole thing is optional, because the JSON array might be empty:

(optional (seq json-value
               (many (seq2 (skipwhite:match-is #\,)
                           json-value))))

So this is pretty close for matching a JSON array:

(= json-array
  (seq2 (match-is #\[)
        (optional (seq json-value
                       (many (seq2 (skipwhite:match-is #\,)
                                   json-value))))
        (skipwhite:match-is #\])))

Just a couple of problems. JSON arrays contain JSON values, which recursively can be or contain JSON arrays...

(= json-value
  (skipwhite:alt json-true
                 json-false
                 json-null
                 json-number
                 json-string
                 json-array))

But when I’m defining json-array, I haven’t defined json-value yet...

arc> (= json-array
       (seq2 (match-is #\[)
             (optional (seq json-value
                            (many (seq2 (skipwhite:match-is #\,)
                                        json-value))))
             (skipwhite:match-is #\])))
reference to undefined identifier: _json-value

Putting json-value first doesn’t help of course, since then it will be json-array that isn’t defined yet. So, I’ll need to wrap the reference to json-value in a function:

(= json-array
  (seq2 (match-is #\[)
        (optional (seq (fn (p) (json-value p))
                       (many (seq2 (skipwhite:match-is #\,)
                                   (fn (p) (json-value p))))))
        (skipwhite:match-is #\])))

Which I can make shorter with a macro:

(mac forward (parser)
  (w/uniq p
    `(fn (,p) (,parser ,p))))

Now I can easily have forward references:

(= json-array
  (seq2 (match-is #\[)
        (optional (seq forward.json-value
                       (many (seq2 (skipwhite:match-is #\,)
                                   forward.json-value))))
        (skipwhite:match-is #\])))

Next I need to fix the return value:

arc> (show-parse json-value "[1,2,3]")
returning: ((1 (2 3))) remaining: 
nil

Back when I wrote optional, if the parser matched, I put its return value in a list. Now that I’m actually using optional for the first time, it turns out I don’t want that, I want just the value. An easy fix:

(def optional (parser)
  (fn (p)
    (iflet (p2 r) (parser p)
      (return p2 r)
      (return p nil))))

But now optional is just returning what the parser returns, so I could write it as:

(def optional (parser)
  (alt parser
       (fn (p)
         (return p nil))))

Now I get:

arc> (show-parse json-value "[1,2,3]")
returning: (1 (2 3)) remaining: 
nil

This is the same pattern I had before with many1: a sequence of A followed by B, and I want to cons the single item returned by A together with the list of items returned by B. I can extract a cons-seq function for that:

(def cons-seq (a b)
  (with-seq (r  a
             rs b)
    (cons r rs)))

Now many1 is:

(def many1 (parser)
  (cons-seq parser
            (many parser)))

And I get the right return value from a JSON array:

(= json-array
  (seq2 (match-is #\[)
        (optional (cons-seq forward.json-value
                            (many (seq2 (skipwhite:match-is #\,)
                                        forward.json-value))))
        (skipwhite:match-is #\])))
arc> (show-parse json-value "[1,2,3]")
returning: (1 2 3) remaining: 
nil
arc> (fromjson «[1, ["apple", true], 3.14159]»)
(1 ("apple" t) 3.14159)

Finally, the error messages can be improved.

arc> (fromjson «[»)
not a JSON value: [
arc> (fromjson «[1,]»)
not a JSON value: [1,]

Once we see the opening bracket, we know there has to be a closing bracket, and when we see a comma, we know it has to be followed by a value:

(= json-array
  (seq2 (match-is #\[)
        (optional (cons-seq forward.json-value
                            (many (seq2 (skipwhite:match-is #\,)
                                        (must "a comma must be followed by a value"
                                              forward.json-value)))))
        (must "a JSON array must be terminated with a closing ]"
              (skipwhite:match-is #\]))))
arc> (fromjson «[»)
a JSON array must be terminated with a closing ]
arc> (fromjson «[1,]»)
a comma must be followed by a value

Prev: Better error reportingContentsNext: JSON objects


Questions? Comments? Email me andrew.wilcox [at] gmail.com