If you want CSV-ish, enforce an array of strings for each record. Or go further with actual objects and non-string types.
You can even jump to an arbitrary point and then seek till you see an actual new line as it’s always a record boundary.
It’s not that CSV is an invalid format. It’s that libraries and tools to parse CSV tend to suck. Whereas JSON is the lingua franca of data.
This isn't the case. An incredible amount of effort and ingenuity has gone into CSV parsing because of its ubiquity. Despite the lack of any sort of specification, it's easily the most widely supported data format in existence in terms of tools and language support.
Yea and it's still a partially-parseable shit show with guessed values. But we can and could have and should have done better by simply defining a format to use.
I went looking at some of the more niche languages like Prolog, COBOL, RPG, APL, Eiffel, Maple, MATLAB, tcl, and a few others. All of these and more had JSON libraries (most had one baked into the standard library).
The exceptions I found (though I didn't look too far) were: Bash (use jq with it), J (an APL variant), Scratch (not exposed to users, but scratch code itself is encoded in JSON), and Forth (I could find implementations, but it's very hard to pin down forth dialects).
People keep saying this but RFC 4180 exists.
Even better, the majority of the time I write/read CSV these days I don't need to use a library or tools at all. It'd be overkill. CSV libraries are best saved for when you're dealing with random CSV files (especially from multiple sources) since the library will handle the minor differences/issues that can pop up in the wild.
It's just that people tend to use specialized tools for encoding and decoding it instead of like ",".join(row) and row.split(",")
I have seen people try to build up JSON strings like that too, and then you have all the same problems.
So there is no problem with CSV except that maybe it's too deceptively simple. We also see people trying to build things like URLs and query strings without using a proper library.
If that sounds like a lot of edge-case work keep in mind that people have been doing this for more than half a century. Lots of examples and notes you can steal.
You really super can't just split on commas for csv. You need to handle the string encodings since records can have commas occur in a string, and you need to handle quoting since you need to know when a string ends and that string may have internal quote characters. For either format unless you know your data super well you need to use a library.
Couple of standards that I know of that does this, primarily intended for logging:
Really easy to work with in my experience.
Sure some space is usually wasted on keys but compression takes care of that.
[“foo”,”bar”,123]
That’s as tabular as CSV but you now have optional types. You can even have lists of lists. Lists of objects. Lists of lists of objects… ["id", "species", "nickname"]
[1, "Chicken", "Chunky cheesecakes"]
[2, "Dog", "Wagging wonders"]
[3, "Bunny", "Hopping heroes"]
[4, "Bat", "Soaring shadows"]Remember that this does not allow arbitrary representation of serialized JSON data. But it allows for any and all JSON data as you can always roundtrip valid JSON to a compact one line representation without extra whitespace.
That is[0] if a string s is a valid JSON then there is no substring s[0..i] for i < n that is a valid json.
So you could just consume as many bytes you need to produce a json and then start a new one when that one is complete. To handle malformed data you just need to throw out the partial data on syntax error and start from the following byte (and likely throw away data a few more times if the error was in the middle of a document)
That is [][]""[][]""[] is unambiguos to parse[1]
[0] again assuming that we restrict ourselves to string, null, boolean, array and objects at the root
[1] still this is not a good format as a single missing " can destroy the entire document.
How do you do this simply? you read each line, and if there's an uneven number of ", then you have an incomplete record and you will keep all lines until there is an odd number of ". after having the string, parsing the fields correctly is harder but you can do it in regex or PEGs or a disgusting state machine.