Creating a parser combinator library to parse JSON
Prev: JSON strings | Contents | Next: JSON arrays |
Let’s say I forget to include the closing quote in my JSON string:
arc> (fromjson «"abc») not a JSON value: "abc
I get an error, but it’s not very informative. Taking another look at json-string
:
(= json-string (on-result string (seq2 (match-is #\") (many (alt json-backslash-escape (match [isnt _ #\"]))) (match-is #\"))))
if there isn’t a closing quote, then the whole match fails, and we get the generic “not a JSON value” error when none of the JSON parsers match.
However we know that once we’ve seen an opening quote that we’re parsing a string and that there needs to be a closing quote, and so we can provide a better error message. Here’s a function that turns a failed match into an error:
(def must (errmsg parser) (fn (p) (or (parser p) (err errmsg))))
so for the matching part, I want something like...
(= json-string-match (seq2 (match-is #\") (must "missing closing quote in JSON string" (seq (many (alt json-backslash-escape (match [isnt _ #\"]))) (match-is #\")))))
this does the error handling right, though I haven’t fixed up the return value yet:
arc> (show-parse json-string-match «"abc») missing closing quote in JSON stringarc> (show-parse json-string-match «"abc"») returning: ((#\a #\b #\c) #\") remaining: nil
Now I need to return the value of the first parser in a sequence. From my definition of seq2
(def seq2 parsers (with-result results (apply seq parsers) (results 1)))
I’ll extract which result to return:
(def seqi (i parsers) (with-result results (apply seq parsers) (results i)))
(def seq1 p (seqi 0 p))
(def seq2 p (seqi 1 p))
and now for json-string
I have:
(= json-string (on-result string (seq2 (match-is #\") (must "missing closing quote in JSON string" (seq1 (many (alt json-backslash-escape (match [isnt _ #\"]))) (match-is #\"))))))
arc> (fromjson «"abc») missing closing quote in JSON stringarc> (fromjson «"abc"») "abc"
We can do the same thing for a backslash Unicode escape:
arc> (fromjson «"\u») invalid backslash char #\u
My definition for a backslash-escape was that it could either be a Unicode escape (\u
) followed by four hex digits (fourhex
), or one of the other single character backslash escapes (json-backslash-char
):
(= json-backslash-escape (seq2 (match-is #\\) (alt (seq2 (match-is #\u) fourhex) (fn (p) (return cdr.p (json-backslash-char car.p))))))
Since fourhex
doesn’t match, we fall through the alt
to json-backslash-char
, which gives us the incorrect error message that \u
is not one of JSON’s backslash escapes. Since once we see the \u
we know we have to have the four hexadecimal digits of a Unicode escape, the fix is to add must
to fourhex
:
(= fourhex (must "four hex digits required after \\u" (with-seq (h1 (match hexdigit) h2 (match hexdigit) h3 (match hexdigit) h4 (match hexdigit)) (coerce (int (coerce (list h1 h2 h3 h4) 'string) 16) 'char))))
Now we get a correct error message:
arc> (fromjson «"\u») four hex digits required after \u
Prev: JSON strings | Contents | Next: JSON arrays |
Questions? Comments? Email me andrew.wilcox [at] gmail.com