Newline delimited JSON explained
If you’ve been dealing with JSON data lately, you may have heard about Newline delimited JSON (NDJSON). This is a new data format that lets you stream data as opposed to reading an entire file.
NDJSON vs JSON
Well, think of NDJSON as a collection of JSON objects separated by ‘\n’ instead of commas or tabs.
# NDJSON example
{"some":"thing"}
{"foo":17,"bar":false,"quux":true}
{"may":{"include":"nested","objects":["and","arrays"]}} # JSON example
{
"some":"thing",
"foo":17,
"bar":false,
"quux":true,
"may":{
"include":"nested",
"objects":[
"and",
"arrays"
]
}
}
How is it better than normal JSON?
Each line is considered a valid JSON value — so your query engine can parse each line independently of any other lines in the file.
This also makes NDJSON-formatted files easy to read: just open them up in your favorite text editor and start reading!
NDJSON is a convenient format for storing or streaming structured data that may be processed one record at a time. This leads to super fast processing. It works well with UNIX-style text processing tools and shell pipelines. It is also supported by query engines such as Athena.
Further reading
You can find out more at http://ndjson.org and https://github.com/ndjson/ndjson-spec