Back

Parsing JSON with FSharp

Fri Nov 05 2021

This is an overview of parsing JSON with FSharp - it is not a step-by-step tutorial. For that I recommend reading Scott Wlaschin's four-part tutorial on how to use Parser Combinators to parse JSON.

JSON is just a format to represent arbritrary data. But JSON is just text at the end of the day. How do we take a simple string and parse it into a complex in-memory representation of arbritrary data?

Final Results

Final Result: ✨ https://fsharp-json-parser.netlify.app

Source: https://github.com/VivekRajagopal/fsharp-json-parser

Parser Combinators

Parser Combinators are the logical building blocks in comining smaller, atomic units of parsing logic into more complex data structures. This JSON parser assumes that the input is a whole piece of text representing the raw JSON. This makes the atomic unit to parse a single character.

Take such an atomic parser;

let parse_n input =
  let firstChar = input.[0]

  if firstChar = 'n' then
    Ok ('n')
  else
    Error "Expected input to be 'n'"

The signature being fun input:string -> Result<char, string>

Let's make the parser a bit more flexible.

let parseF predicate input =
  let firstChar = input.[0]

  if predicate(firstChar) then
    Ok (firstChar)
  else
    Error $"Expected input to be '{firstChar}'"

Here predicate has a signature of fun char -> bool, which allows us easily create more atomic parser.

Parse this AND that

Okay, something a bit more challenging. Let's try and parse an entire JSON null value. We need to parse each letter sequentially, and we can do that by combining atomic parsers. Let's also make things easier by returning the remaining unparsed input in the Ok path in parseF. And why not check to see if input is empty while we're at it.

We need to introduce our first andThen combinator. This combines two parser into one, and requires the first one to succeed followed by the second.

open System

let parseF predicate input =
  if String.IsNullOrEmpty(input) then
    Error "input is null or empty"
  else
    let firstChar = input.[0]

    if predicate(firstChar) then
      let remainingChars = input.[1..]
      Ok (firstChar, remainingChars)
    else
      Error $"Expected input to be '{firstChar}'"

let parse_n = parseF (fun c -> c = 'n')
let parse_u = parseF (fun c -> c = 'u')
let parse_l = parseF (fun c -> c = 'l')

let andThen p1 p2 =
  let parser input =
    input
    |> p1
    |> Result.bind (
      fun (v1, r1) ->
        p2 r1
        |> Result.map (fun (v2,r2) -> (v1, v2, r2))
    )
    |> Result.bind (fun (v1, v2, r2) -> Ok ((v1, v2), r2))

  parser

let (>>>) = andThen

let parse_null =
  parse_n >>> parse_u >>> parse_l >>> parse_l

Try it out on the Fable REPL here.

Expanded Parser

The null parser we created above is fairly simple. If we write out the expanded code in english it would be something like;

Parse "n",

then in the remaining input, parse "u"

then in the remaining input, parse "l"

then in the remaining input, parse "l"

Parse this OR that

Our next useful parser is the orElse parser. This takes two parsers and requires one to succeed.

let orElse p1 p2 =
  let parser input =
    let result1 = input |> p1
    let result2 = input |> p2

    match result1, result2 with
    | Ok (v1, r1), Ok _ -> Ok (v1, r1)
    | _, Ok (v2, r2) -> Ok (v2, r2)
    | Ok (v1, r1), _ -> Ok (v1, r1)
    | Error err1, Error _ -> Error err1

  parser

let (<|>) = orElse

Armed with the andThen and orElse parser we can build fairly complex parsers.

A single JSON value can be many different types; null, true, false.. etc. We can use the orElse parser to choose between valid fully parsed JSON values.

In FSharp we can represent our JSON model like;

type JValue =
  | JString of string
  | JNumber of float
  | JBool of bool
  | JNull
  | JObject of Map<string, JValue>
  | JArray of JValue list

Mutually Recursive Parser

One slightly tricky part of parsing JSON is the fact that model is recursive. A JArray is an array of JValue, which itself can be a JArray. So how can we parse a JArray if we can't first parse JValue... which needs the JArray parser?!

The slightly "hacky" way to get around this is to declare a dummy parser for JValue, that JValue and JObject can still reference in their parsing logic. JValue parser is finally defined at the end to close off the loop.

let createForwardRefParser<'T> () =
  let dummyParser = fun _ -> failwith "Null forward parser"

  let parserRef = ref dummyParser

  let actualParser input =
    input
    |> Parser.run !parserRef

  actualParser, parserRef

let jValue, jValueRef = createForwardRefParser<JValue>()

// ... use jValue in other parsers

// ... finally update value in jValueRef to actual jValue parser

jValueRef := jValue

I wasn't a total fan of this method, and instead would prefer to use mutually recursive functions. This is totally valid code in FSharp (copied from StackOverflow);

let rec f x =
  if x > 0 then
    g (x-1)
  else
    x

and g x =
  if x > 0 then
    f (x-1)
  else
    x

Deploying this on the Web 🚀

If you don't release something you develop, does it really exist?

Faux rhetoric aside, I was keen to deploy this JSON parser in an easily consumable format. There's no production use-case for this parser, but I wanted to deploy it for demo and showcasing purposes.

Fable (not the video game) is a FSharp to JavaScript transpiler. This enables us to code our app in FSharp and use Fable to spit out JavaScript that can run on a user's browser (or purely as a NodeJS app). The idea of a functionally programmed, strongly typed "JavaScript" app is not new. Check out Elm if you're interested in an alternative.

From here there was a little trial and error to deploy the built JS web app to Netlify. Netlify does not support using the Dotnet SDK during the build process. So intead, Github builds the FSharp Dotnet solution into a JS app and pushes that into Netlify.

After all that, we get https://fsharp-json-parser.netlify.app.

Future Improvements

The final solution is the most basic implementation clocking in at a pretty light ~290 lines of code for the core business logic. Performance was not really considered for this project. But I'm still curious about how well (or rather poorly) it performs.

If I ever get time in the future, I'd like to benchmark this parser and see if I can improve on it. The first performance improvement opportunity to explore is to try and parse more than one character at a time. This would allow an early fail, or short-circuiting into a more specific JValue type.