Creating a parser combinator library to parse JSON
Prev: JSON numbers, implemented | Contents | Next: Better error reporting |
Here I run into an issue about how to present this material... if I have a JSON string like "\u0041"
inside of an Arc string literal, then I need to escape quotes and backslashes by prefixing each one with a backslash:
arc> (fromjson "\"\\u0041\"")
That’s pretty ugly and hard to see what the JSON string is, so I’m going to pretend that Arc has a literal string syntax like this:
arc> (fromjson «"\u0041"»)
where the stuff between the guillemets «...» becomes the contents of the Arc string as-is without any escaping. If you’re following along and want to type one of these examples into Arc, just change the guillemets into double quotes and prefix any double quotes or backslashes inside with a backslash.
OK, so I’ll need to be able to parse the four hexadecimal digits after a Unicode escape sequence \u
and turn it into a character:
(def hexdigit (c) (and (isa c 'char) (or (<= #\a c #\f) (<= #\A c #\F) (<= #\0 c #\9))))
(= fourhex (with-seq (h1 (match hexdigit) h2 (match hexdigit) h3 (match hexdigit) h4 (match hexdigit)) (coerce (int (coerce (list h1 h2 h3 h4) 'string) 16) 'char)))
Yup, with-seq
turned out to be useful.
Let’s see, I’ll need to parse the other JSON backslash escape sequences:
(def json-backslash-char (c) (case c #\" #\" #\\ #\\ #\/ #\/ #\b #\backspace #\f #\page #\n #\newline #\r #\return #\t #\tab (err "invalid backslash char" c)))
A JSON string backslash escape sequence is one or the other:
(= json-backslash-escape (seq (match [is _ #\\]) (alt (seq (match [is _ #\u]) fourhex) (fn (p) (return cdr.p (json-backslash-char car.p))))))
but oops, seq
is giving me lists when all I want is just the character:
arc> (show-parse json-backslash-escape «\u0041») returning: (#\\ (#\u #\A)) remaining: nil
In both cases I want just the return value of the second parser in the sequence, so I’ll make a combinator to do that:
(def seq2 parsers (with-result results (apply seq parsers) (results 1)))
And, I can extract a match-is
:
(def match-is (x) (match [is x _]))
Now I have:
(= json-backslash-escape (seq2 (match-is #\\) (alt (seq2 (match-is #\u) fourhex) (fn (p) (return cdr.p (json-backslash-char car.p))))))
That’s better:
arc> (show-parse json-backslash-escape «\u0041») returning: #\A remaining: nilarc> (show-parse json-backslash-escape «\/») returning: #\/ remaining: nilarc> (show-parse json-backslash-escape «\"») returning: #\" remaining: nil
Other characters in the string can be anything that isn’t a closing quote:
(match [isnt _ #\"])
Now I have an implementation for json-string
:
(= json-string (on-result string (seq2 (match-is #\") (many (alt json-backslash-escape (match [isnt _ #\"]))) (match-is #\"))))
arc> (show-parse json-string «"\u0041b\\c"») returning: "Ab\\c" remaining: nil
(= json-value (skipwhite:alt json-true json-false json-null json-number json-string))
arc> (fromjson «"greetings"») "greetings"
Prev: JSON numbers, implemented | Contents | Next: Better error reporting |
Questions? Comments? Email me andrew.wilcox [at] gmail.com