To quote the top most comment (by user, slg): "CSV are a headache. Like the article says, RFC4180 doesn't necessarily represent the real world. However sometimes you just have to reject things that aren't spec.
Not too long ago I was struggling with one of these CSV issues and received some good advice from Hans Passant [1] on a Stack Overflow question pertaining to my problem (emphasis mine):
'It is pretty important that you don't try to fix it. That will make you responsible for bad data for a long time. Reject the file for being improperly formatted. If they hassle you about it then point out that it is not RFC-4180 compatible. There's another programmer somewhere that can easily fix this.'
It makes perfect sense in hindsight. If you accept a malformed CSV file, people will expect you to accept any malformed data that has a CSV extension. You are taking on a lot of extra responsibility to cover for the lack of work by another programmer. Odds are they can make a change to fix the problem that takes a fraction of the time it would take you work around it. You just have to raise the issue.
I realize that rejecting bad files isn't really possible in every circumstance. But I have a feeling it is an option more times than you might initially think."
sep=,<newline>It took roughly ten seconds to find huge problems with this approach: https://stackoverflow.com/questions/20395699/sep-statement-b...
And we haven't even gotten to file encodings yet.
The 'fun' thing is that Excel for OS X does not do this, it uses commas.
We used to always just generate CSV files with semicolons since most of our clients were using Dutch Excel on Windows. As some of them moved to OS X, we've mostly been guessing what format to use.
A semi-colon is generally used as the default list separator when the region/locale uses a comma as the decimal separator for numbers. For example Dutch (Netherlands) uses a comma for the decimal separator (ex. 3,14) whereas in English (US) we use a decimal point (ex 3.14). If comma were used as the default list separator in such a region then all floating point numbers would need to be quoted (ex. "3,14") which would make the size of the CSV file larger and also make the file less human-readable
Not breaking established behaviour?
The complex array usecase is where an opinionated-type of conversion tool is particularly needed, but I wonder why it behaves like this:
name: 'Robert',
lastname: 'Miller',
family: null,
location: [1231,3214,4214]
lastname,name,family.type,family.name,nickname,location
Miller,Robert,,,,1231,3214,4214Why not have `location_1, location_2, location_3`, instead of having a single location column? The latter implementation makes the data difficult to quickly use in a program (like a spreadsheet).
name: 'Robert',
lastname: 'Miller',
family: null,
location: [{city: 'A'}, {city: 'B'}]
you would get 2 csv lines one with location.city A and other with BAs to the last, I got nothing. It's how I would have rolled.
But i am not sure what name to use for this option.
http://npmcharts.com/compare/json2csv,jsonexport,csv,fast-cs...
/shameless plug
Comparing against Numbers (probably a bad comparison?) I am seeing one slightly different result:
If I have a field that is "hehe"
It's encoded as ""hehe""
It looks like Numbers adds the enclosing quotes:
"""hehe"""
I only see enclosing quotes if there's a comma:
"hehe,hehe" -> """hehe,hehe"""
Anyhow, I'm not sure what the "correct" thing to do here is, if there is one, just a heads up!
* double quotes aren't allowed inside a field that isn't double-quoted
* double-quotes that do appear in a field have to be escaped by preceding them with another double quote.
So by RFC-4180, I'm pretty sure ""hehe"" shouldn't be possible, and the way to represent "hehe" with the quotes is """hehe""".
Honestly you could probably write a quick pipeline to dump your json data into Mixpanel and then use JQL -- it would be a little hacky but if you have less than a few million rows it shouldn't be too much work (and would still be free at that volume).
- Wrap each line with [ and ]
- eval the file
;).
1) csv writer (and reader?) which takes care of all csv dialects crazyness; 2) a library which "flattens" nested objects/arrays?
(1) is not opinionated (besides a few API choices), it just has to be correct; it doesn't make much sense to re-implement (1) everywhere.
(2) can be more opinionated, it is easy to disagree with design choices, there is more room for personal preferences.
For example, in Python there is CSV stdlib module, and for (2) there are libraries like https://github.com/scrapinghub/flatson. Why put both to the same library? Is it something ideomatic in node.js world, with a deeper reason to design libraries this way (e.g. download size), or is it just an oversight?
Everything you said makes sense if those are the goals, but if they aren't... well, why should they be?
The design you suggest would be done by some module authors for sure, but there's no reason it needs to be that design.
Either way, neat addition.
When you want to easily provide something excel/gsuite/... Will take in without using an heavier excel compatible library, CSV can be quite decent.