undefined | Better HN

0 pointskoolba1y ago0 comments

JSON serialized without extra white space with one line per record is superior to CSV.

If you want CSV-ish, enforce an array of strings for each record. Or go further with actual objects and non-string types.

You can even jump to an arbitrary point and then seek till you see an actual new line as it’s always a record boundary.

It’s not that CSV is an invalid format. It’s that libraries and tools to parse CSV tend to suck. Whereas JSON is the lingua franca of data.

0 comments

8 comments · 8 top-level

derriz1y ago

> It’s that libraries and tools to parse CSV tend to suck. Whereas JSON is the lingua franca of data.

This isn't the case. An incredible amount of effort and ingenuity has gone into CSV parsing because of its ubiquity. Despite the lack of any sort of specification, it's easily the most widely supported data format in existence in terms of tools and language support.

7 more replies

benwilber01y ago

JSON is a textual encoding no different than CSV.

It's just that people tend to use specialized tools for encoding and decoding it instead of like ",".join(row) and row.split(",")

I have seen people try to build up JSON strings like that too, and then you have all the same problems.

So there is no problem with CSV except that maybe it's too deceptively simple. We also see people trying to build things like URLs and query strings without using a proper library.

2 more replies

magicalhippo1y ago

> JSON [...] with one line per record

Couple of standards that I know of that does this, primarily intended for logging:

https://jsonlines.org/

https://clef-json.org/

Really easy to work with in my experience.

Sure some space is usually wasted on keys but compression takes care of that.

kec1y ago

Until you have a large amount of data & need either random access or to work on multiple full columns at once. Duplicated keys names mean it's very easy for data in jsonlines format to be orders of magnitude larger than the same data as CSV, which is incredibly annoying if your processing for it isn't amenable to streaming.

klysm1y ago

You serialize the keys on every row which is a bit inefficient but it’s a text format anyway

1 more reply

nmz1y ago

This is simply not true, parsing json v csv is a difference of thousands of lines.

packetlost1y ago

Eh, it really isn't. The format does not lend itself to tabular data, instead the most natural way of representing data involves duplicating the keys N times for each record.

1 more reply

juliansimioni1y ago

What happens when you need to encode the newline character in your data? That makes splitting _either_ CSV or LDJSON files difficult.

4 more replies

j / k navigate · click thread line to collapse

0 comments

8 comments · 8 top-level

derriz1y ago

> It’s that libraries and tools to parse CSV tend to suck. Whereas JSON is the lingua franca of data.

7 more replies

benwilber01y ago

JSON is a textual encoding no different than CSV.

It's just that people tend to use specialized tools for encoding and decoding it instead of like ",".join(row) and row.split(",")

I have seen people try to build up JSON strings like that too, and then you have all the same problems.

So there is no problem with CSV except that maybe it's too deceptively simple. We also see people trying to build things like URLs and query strings without using a proper library.

2 more replies

magicalhippo1y ago

> JSON [...] with one line per record

Couple of standards that I know of that does this, primarily intended for logging:

https://jsonlines.org/

https://clef-json.org/

Really easy to work with in my experience.

Sure some space is usually wasted on keys but compression takes care of that.

kec1y ago

klysm1y ago

You serialize the keys on every row which is a bit inefficient but it’s a text format anyway

1 more reply

nmz1y ago

This is simply not true, parsing json v csv is a difference of thousands of lines.

packetlost1y ago

Eh, it really isn't. The format does not lend itself to tabular data, instead the most natural way of representing data involves duplicating the keys N times for each record.

1 more reply

juliansimioni1y ago

What happens when you need to encode the newline character in your data? That makes splitting _either_ CSV or LDJSON files difficult.

4 more replies

j / k navigate · click thread line to collapse